Linear Algebra II

Peter Philip∗

Lecture Notes Originally Created for the Class of Spring Semester 2019 at LMU Munich Includes Subsequent Corrections and Revisions†

September 19, 2021

Contents

1 Affine Subspaces and Geometry 4 1.1 AffineSubspaces ...... 4 1.2 AffineHullandAffineIndependence ...... 6 1.3 AffineBases...... 10 1.4 BarycentricCoordinates and Convex Sets...... 12 1.5 AffineMaps ...... 14 1.6 AffineGeometry...... 20

2 Duality 22 2.1 LinearFormsandDualSpaces...... 22 2.2 Annihilators...... 27 2.3 HyperplanesandLinearSystems ...... 31 2.4 DualMaps...... 34

3 Symmetric Groups 38

∗E-Mail: [email protected] †Resources used in the preparation of this text include [Bos13, For17, Lan05, Str08].

1 CONTENTS 2

4 Multilinear Maps and Determinants 46 4.1 MultilinearMaps ...... 46 4.2 Alternating Multilinear Maps and Determinants ...... 49 4.3 DeterminantsofMatricesandLinearMaps ...... 57

5 Direct Sums and Projections 71

6 Eigenvalues 76

7 Commutative Rings, Polynomials 87

8 CharacteristicPolynomial,MinimalPolynomial 106

9 Jordan Normal Form 119

10 Vector Spaces with Inner Products 136 10.1 Definition,Examples ...... 136 10.2 PreservingNorm,Metric,InnerProduct ...... 138 10.3Orthogonality ...... 145 10.4TheAdjointMap ...... 152 10.5 Hermitian,Unitary,andNormalMaps ...... 158

11 Definiteness of Quadratic Matrices over K 172

A Multilinear Maps 175

B Polynomials in Several Variables 176

C Quotient Rings 192

D Algebraic Field Extensions 199 D.1 BasicDefinitionsandProperties ...... 199 D.2 AlgebraicClosure...... 205 CONTENTS 3

References 212 1 AFFINE SUBSPACES AND GEOMETRY 4

1 Affine Subspaces and Geometry

1.1 Affine Subspaces

Definition 1.1. Let V be a over the field F . Then M V is called an affine subspace of V if, and only if, there exists a vector v V and a (vector)⊆ subspace U V such that M = v + U. We define dim M := dim U∈ to be the dimension of M (this⊆ notion of dimension is well-defined by the following Lem. 1.2(a)). —

Thus, the affine subspaces of a vector space V are precisely the translations of vector subspaces U of V , i.e. the cosets of subspaces U, i.e. the elements of quotient spaces V/U. Lemma 1.2. Let V be a vector space over the field F .

(a) If M is an affine subspace of V , then the vector subspace corresponding to M is unique, i.e. if M = v1 + U1 = v2 + U2 with v1,v2 V and vector subspaces U1, U2 V , then ∈ ⊆ U = U = u v : u,v M . (1.1) 1 2 { − ∈ } (b) If M = v + U is an affine subspace of V , then the vector v in this representation is unique if, and only if, U = 0 . { } Proof. (a): Let M = v + U with v V and vector subspace U V . Moreover, let 1 1 1 ∈ 1 ⊆ U := u v : u,v M . It suffices to show U1 = U. Suppose, u1 U1. Since v1 M and v{+−u M, we∈ have} u = v + u v U, showing U U.∈ If a U, then there∈ 1 1 ∈ 1 1 1 − 1 ∈ 1 ⊆ ∈ are u1,u2 U1 such that a = v1 + u1 (v1 + u2) = u1 u2 U1, showing U U1, as desired. ∈ − − ∈ ⊆ (b): If U = 0 , then M = v and v is unique. If M = v + U with 0 = u U, then M = v + U ={ v} + u + U with{ v}= v + u. 6 ∈  6 Definition 1.3. In the situation of Def. 1.1, we call affine subspaces of dimension 0 points, of dimension 1 lines, and of dimension 2 planes – in R2 and R3, such objects are easily visualized and they then coincide with the points, lines, and planes with which one is already familiar. —

Affine spaces and vector spaces share many structural properties. In consequence, one can develop a theory of affine spaces that is in many respects analogous to the theory 1 AFFINE SUBSPACES AND GEOMETRY 5

of vector spaces, as will be illustrated by some of the notions and results presented in the following. We start by defining so-called affine combinations, which are, for affine spaces, what linear combinations are for vector spaces:

Definition 1.4. Let V be a vector space over the field F with v1,...,vn V and n ∈ λ1,...,λn F , n N. Then i=1 λi vi is called an affine combination of v1,...,vn if, ∈ n ∈ and only if, λi = 1. i=1 P Theorem 1.5.P Let V be a vector space over the field F , = M V . Then M is an affine subspace of V if, and only, if M is closed under∅ affine 6 combinations.⊆ More precisely, the following statements are equivalent:

(i) M is an affine subspace of V .

n n (ii) If n N, v1,...,vn M, and λ1,...,λn F with i=1 λi =1, then i=1 λi vi M. ∈ ∈ ∈ ∈ P P If char F =2, then (i) and (ii) are also equivalent to1 6 (iii) If v ,v M and λ ,λ F with λ + λ =1, then λ v + λ v M. 1 2 ∈ 1 2 ∈ 1 2 1 1 2 2 ∈ Proof. Exercise. 

The following Th. 1.6 is the analogon of [Phi19, Th. 5.7] for affine spaces: Theorem 1.6. Let V be a vector space over the field F .

(a) Let I = be an index set and (M ) a family of affine subspaces of V . Then the 6 ∅ i i∈I intersection M := i∈I Mi is either empty or it is an affine subspace of V . (b) In contrast to intersections,T unions of affine subspaces are almost never affine sub- spaces. More precisely, if M1 and M2 are affine subspaces of V and char F =2 (i.e. 1 = 1 in F ), then 6 6 − M M is an affine subspace of V M M M M (1.2) 1 ∪ 2 ⇔ 1 ⊆ 2 ∨ 2 ⊆ 1   (where “ ” also holds for char F =2, but cf. Ex. 1.7 below). ⇐ 1 For char F = 2, (iii) does not imply (i) and (ii): Let F := Z2 = 0, 1 . Let V be a vector space over F with #V 4 (e.g. V = F 2). Let p,q,r V be distinct, M{:= }p,q,r (i.e. #M = 3). If ≥ ∈ { } λ1,λ2 F with λ1 +λ2 = 1, then (λ1,λ2) (0, 1), (1, 0) and (iii) is trivially true. On the other hand, v := p ∈+ q + r is an affine combination of p,q,r∈ { , since 1+1+1=1} in F ; but v / M: v = p + q + r = p implies q = r = r, and v = r, q likewise leads to a contradiction (this counterexample∈ was pointed out by Robin Mader).− 1 AFFINE SUBSPACES AND GEOMETRY 6

Proof. (a): Let M = . We use the characterization of Th. 1.5(ii) to show M is an 6 ∅ n affine subspace: If n N, v1,...,vn M, and λ1,...,λn F with k=1 λk = 1, then n ∈ ∈ ∈ v := k=1 λk vk Mi for each i I, implying v M. Thus, M is an affine subspace of V . ∈ ∈ ∈ P P (b): If M1 M2, then M1 M2 = M2, which is an affine subspace of V . If M2 M , then M⊆ M = M ,∪ which is an affine subspace of V . For the converse, we⊆ 1 1 ∪ 2 1 now assume char F = 2, M1 M2, and M1 M2 is an affine subspace of V . We have to show M 6 M . Let 6⊆m M M ∪and m M . Since M M is an 2 ⊆ 1 1 ∈ 1 \ 2 2 ∈ 2 1 ∪ 2 affine subspace, m2 + m2 m1 M1 M2 by Th. 1.5(ii). If m2 + m2 m1 M2, then m = m + m (m− + m∈ m∪) M , in contradiction to m /−M .∈ Thus, 1 2 2 − 2 2 − 1 ∈ 2 1 ∈ 2 m2 + m2 m1 M1. Since char F = 0, we have 2 := 1+1 = 0 in F , implying m = 1 (m−+ m ∈ m )+ 1 m M , i.e.6 M M . 6  2 2 2 2 − 1 2 1 ∈ 1 2 ⊆ 1 2 Example 1.7. Consider F := Z2 = 0, 1 and the vector space V := F over F . Then M := U := (0, 0), (1, 0) = (1, 0){ is} a vector subspace and, in particular, an affine 1 1 { } h{ }i subspace of V . The set M2 := (0, 1) + U1 = (0, 1), (1, 1) is also an affine subspace. Then M M = V is an affine subspace, even{ though neither} M M nor M M . 1 ∪ 2 1 ⊆ 2 2 ⊆ 1

1.2 Affine Hull and Affine Independence

Next, we will define the affine hull of a subset A of a vector space, which is the affine analogon to the linear notion of the span of A (which is sometimes also called the linear hull of A): Definition 1.8. Let V be a vector space over the field F , = A V , and ∅ 6 ⊆ := M (V ): A M M is affine subspace of V , M ∈P ⊆ ∧ where we recall that (V ) denotes the power set of V . Then the set P aff A := M M∈M \ is called the affine hull of A. We call A a generating set of aff A. —

The following Prop. 1.9 is the analogon of [Phi19, Prop. 5.9] for affine spaces: Proposition 1.9. Let V be a vector space over the field F and = A V . ∅ 6 ⊆ (a) aff A is an affine subspace of V , namely the smallest affine subspace of V containing A. 1 AFFINE SUBSPACES AND GEOMETRY 7

(b) aff A is the set of all affine combinations of elements from A, i.e.

n n aff A = λ a : n N λ ,...,λ F a ,...,a A λ =1 . i i ∈ ∧ 1 n ∈ ∧ 1 n ∈ ∧ i ( i=1 i=1 ) X X (1.3) (c) If A B V , then aff A aff B. ⊆ ⊆ ⊆ (d) A = aff A if, and only if, A is an affine subspace of V . (e) aff aff A = aff A.

Proof. (a): Since A aff A implies aff A = , (a) is immediate from Th. 1.6(a). ⊆ 6 ∅ (b): Let W denote the right-hand side of (1.3). If M is an affine subspace of V and A M, then W M, since M is closed under affine combinations, showing W aff A. ⊆ ⊆ k k ⊆ On the other hand, suppose N,n1,...,nN N, a1,...,ank A for each k 1,...,N , k k ∈ ∈ ∈ { } λ ,...,λ F for each k 1,...,N , and α1,...,αN F such that 1 nk ∈ ∈ { } ∈

nk N k λi = αi =1. k∈{1∀,...,N} i=1 i=1 X X Then N nk α λk ak W, k i i ∈ i=1 Xk=1 X since N nk N k αkλi = αk =1, i=1 Xk=1 X Xk=1 showing W to be an affine subspace of V by Th. 1.5(ii). Thus, aff A W , completing the proof of aff A = W . ⊆ (c) is immediate from (b). (d): If A = aff A, then A is an affine subspace by (a). For the converse, while it is clear that A aff A always holds, if A is an affine subspace, then A , where is as in Def. 1.8,⊆ implying aff A A. ∈M M ⊆ (e) now follows by combining (d) with (a).  Proposition 1.10. Let V be a vector space over the field F , A V , M = v + U with v A and U a vector subspace of V . Then the following statements⊆ are equivalent: ∈ (i) aff A = M. 1 AFFINE SUBSPACES AND GEOMETRY 8

(ii) v + A = U. h− i Proof. Exercise. 

We will now define the notions of affine dependence/independence, which are, for affine spaces, what linear dependence/independence are for vector spaces:

Definition 1.11. Let V be a vector space over the field F .

(a) A vector v V is called affinely dependent on a subset U of V (or on the vectors in U) if, and∈ only if, there exists n N and u ,...,u U such that v is an affine ∈ 1 n ∈ combination of u1,...,un. Otherwise, v is called affinely independent of U. (b) A subset U of V is called affinely independent if, and only if, whenever 0 V is written as a of distinct elements of U such that the coefficients∈ have sum 0, then all coefficients must be 0 F , i.e. if, and only if, ∈

n N W U #W = n λ u =0 ∈ ∧ ⊆ ∧ ∧ u u∈W X

λu F λu =0 λu =0. (1.4) ∧u∈ ∀W ∈ ∧ ⇒u∈ ∀W u∈W ! X Sets that are not affinely independent are called affinely dependent.

As a caveat, it is underlined that, in Def. 1.11(b) above, one does not consider affine combinations of the vectors u U, but special linear combinations (this is related to the fact that 0 is only an affine∈ combination of vectors in U, if aff U is a vector subspace of V ).

Remark 1.12. It is immediate from Def. 1.11 that if v V is linearly independent of U V , then it is also affinely independent of U, and, if U∈ V is linearly independent, then⊆ U is also affinely independent. However, the converse⊆ is, in general, not true (cf. Ex. 1.13(b),(c) below).

Example 1.13. Let V be a vector space over the field F .

(a) is affinely independent: Indeed, if U = , then the left side of the implication in (1.4)∅ is always false (since W U means∅ #W = 0), i.e. the implication is true. ⊆ 1 AFFINE SUBSPACES AND GEOMETRY 9

(b) Every singleton set v , v V , is affinely independent, since λ = 1 λ = 0 { } ∈ 1 i=1 i means λ1 = 0 (if v = 0, then v is not linearly independent, cf. [Phi19, Ex. 5.13(b)]). { } P

(c) Every set v,w with two distinct vectors v,w V is affinely independent (but not linearly{ independent} for w = αv with some α∈ F ): 0= λv λw = λ(v w) implies λ =0 or v = w. ∈ − −

There is a close relationship between affine independence and :

Proposition 1.14. Let V be a vector space over the field F and U V . Then the following statements are equivalent: ⊆

(i) U is affinely independent.

(ii) If u U, then U := u u : u U u is linearly independent. 0 ∈ 0 { − 0 ∈ \ { 0}} (iii) The set X := (u, 1) V F : u U is a linearly independent subset of the { ∈ × ∈ } vector space V F . × Proof. Exercise. 

The following Prop. 1.15 is the analogon of [Phi19, Prop. 5.14(a)-(c)] for affine spaces:

Proposition 1.15. Let V be a vector space over the field F and U V . ⊆

(a) U is affinely dependent if, and only if, there exists u0 U such that u0 is affinely dependent on U u . ∈ \ { 0} (b) If U is affinely dependent and U M V , then M is affinely dependent as well. ⊆ ⊆ (c) If U is affinely independent and M U, then M is affinely independent as well. ⊆ Proof. (a): Suppose, U is affinely dependent. Then there exists W U, #W = n N, such that λ u = 0 with λ F , λ = 0, and there exists⊆ u W ∈with u∈W u u ∈ u∈W u 0 ∈ λu = 0. Then 0 6 P P u = λ−1 λ u = ( λ−1λ )u, ( λ−1λ )=( λ−1) ( λ )=1, 0 − u0 u − u0 u − u0 u − u0 · − u0 u∈WX\{u0} u∈WX\{u0} u∈WX\{u0} 1 AFFINE SUBSPACES AND GEOMETRY 10

showing u to be affinely dependent on U u . Conversely, if u U is affinely 0 \ { 0} 0 ∈ dependent on U u0 , then there exists n N, distinct u1,...,un U u0 , and λ ,...,λ F with\ { }n λ = 1 such that ∈ ∈ \ { } 1 n ∈ i=1 i P n n u = λ u u + λ u =0, 0 i i ⇒ − 0 i i i=1 i=1 X X showing U to be affinely dependent, since the coefficient of u0 is 1 = 0 and 1+ n − 6 − i=1 λi = 0. P(b) and (c) are now both immediate from (a). 

1.3 Affine Bases

Definition 1.16. Let V be a vector space over the field F , let M V be an affine subspace, and B V . Then B is called an affine basis of M if, and⊆ only if, B is a generating set for ⊆M (i.e. M = aff B) that is also affinely independent. —

There is a close relationship between affine bases and vector space bases: Proposition 1.17. Let V be a vector space over the field F , let M V be an affine subspace, and let B M with v B. Then the following statements are⊆ equivalent: ⊆ ∈ (i) B is an affine basis of M.

(ii) B := b v : b B v is a vector space basis of the vector space U := 0 { − ∈ \ { }} v v : v ,v M . { 1 − 2 1 2 ∈ } Proof. As a consequence of Lem. 1.2(a), we know U to be a vector subspace of V and M = a + U for each a M. Moreover, v B M implies B U. According ∈ ∈ ⊆ 0 ⊆ to Prop. 1.14, B is affinely independent if, and only if, B0 is linearly independent. According to Prop. 1.10, aff B = M holds if, and only if, v + B = U, which, since B =( v + B) 0 , holds if, and only if, B = U. h− i  0 − \ { } h 0i The following Th. 1.18 is the analogon of [Phi19, Th. 5.17] for affine spaces: Theorem 1.18. Let V be a vector space over the field F , let M V be an affine subspace, and let = B V . Then the following statements (i) – (iii)⊆are equivalent: ∅ 6 ⊆ (i) B is an affine basis of M. 1 AFFINE SUBSPACES AND GEOMETRY 11

(ii) B is a maximal affinely independent subset of M, i.e. B is affinely independent and each set A M with B ( A is affinely dependent. ⊆ (iii) B is a minimal generating set for M, i.e. aff B = M and aff A ( M for each A ( B.

Proof. Let v B, and let B and U be as in Prop. 1.17 above. Then, due to Prop. ∈ 0 1.14, B is a maximal affinely independent subset of M if, and only if, B0 is a maximal linearly independent subset of U. Moreover, due to Prop. 1.10, B is a minimal (affine) generating set for M if, and only if, B0 is a minimal (linear) generating set for U. Thus, the equivalences of Th. 1.18 follow by combining Prop. 1.17 with [Phi19, Th. 5.17]. 

The following Th. 1.19 is the analogon of [Phi19, Th. 5.23] for affine spaces: Theorem 1.19. Let V be a vector space over the field F and let M V be an affine subspace. ⊆

(a) If S M is affinely independent, then there exists an affine basis of M that contains S. ⊆ (b) M has an affine basis B M. ⊆ (c) Affine bases of M have a unique cardinality, i.e. if B M and B˜ M are both affine bases of M, then there exists a bijective map φ : ⊆B B˜. ⊆ −→ (d) If B is an affine basis of M and S M is affinely independent, then there exists C B such that B˜ := S ˙ C is an affine⊆ basis of M. ⊆ ∪ Proof. Let v V and let U be a vector subspace of V such that M = v + U. Then v M and U∈= v v : v ,v M according to Lem. 1.2(a). ∈ { 1 − 2 1 2 ∈ } (a): It suffices to consider the case S = . Thus, let v S. According to Prop. 1.14(ii), S := x v : x S v is a6 linearly∅ independent∈ subset of U. According to 0 { − ∈ \ { }} [Phi19, Th. 5.23(a)], U has a vector space basis S0 B0 U. Then, by Prop. 1.17, S (v + B ) ˙ v is an affine basis of M. ⊆ ⊆ ⊆ 0 ∪{ } (b) is immediate from (a). (c): Let B M and B˜ M be affine bases of M. Moreover, let b B and ˜b B˜. ⊆ ⊆ ∈ ∈ Then, by Prop. 1.17, B0 := x b : x B b and B˜0 := x ˜b : x B˜ ˜b are both vector space bases of{ U−. Thus, by∈ [Phi19,\ { }} Th. 5.23(c)],{ there− exists∈ a bijective\ { }} map ψ : B B˜ . Then, clearly, the map 0 −→ 0 ˜b for x = b, φ : B B,˜ φ(x) := −→ ˜b + ψ(x b) for x = b, ( − 6 1 AFFINE SUBSPACES AND GEOMETRY 12

is well-defined and bijective, thereby proving (c). (d): If B aff S, then, according to Prop. 1.9(c),(e), M = aff B aff aff S = aff S, i.e. aff S =⊆ M, as M is an affine subspace containing S. Thus, S⊆is itself an affine basis of M and the statement holds with C := . It remains to consider the case, where there exists b B S such that S b is affinely∅ independent. Then, by Prop. 1.17, B := x b :∈x \B b is a vector∪ { } space basis of U and, by Prop. 1.14(ii), 0 { − ∈ \ { }} S0 := x b : x S is a linearly independent subset of U. Thus, by [Phi19, Th. 5.23(d)],{ there− exists∈ C} B such that B˜ := S ˙ C is a vector space basis of U, and, 0 ⊆ 0 0 0 ∪ 0 then, using Prop. 1.17 once again, (b + B˜0) ˙ b = S ˙ C with C := (b + C0) ˙ b B is an affine basis of M. ∪{ } ∪ ∪{ }⊆ 

1.4 Barycentric Coordinates and Convex Sets

The following Th. 1.20 is the analogon of [Phi19, Th. 5.19] for affine spaces:

Theorem 1.20. Let V be a vector space over the field F and assume M V is an affine subspace with affine basis B of M. Then each vector v M has unique barycentric⊆ coordinates with respect to the affine basis B, i.e., for each v∈ M, there exists a unique finite subset B of B and a unique map c : B F 0 such∈ that v v −→ \ { } v = c(b) b c(b)=1. (1.5) ∧ bX∈Bv bX∈Bv

Proof. The existence of Bv and the map c follows from the fact that the affine basis B is an affine generating set, aff B = M. For the uniqueness proof, consider finite sets B , B˜ B and maps c : B F 0 ,c ˜ : B˜ F 0 such that v v ⊆ v −→ \ { } v −→ \ { } v = c(b) b = c˜(b) b c(b)= c˜(b)=1. ∧ b∈B ˜ b∈B ˜ Xv bX∈Bv Xv bX∈Bv

Extend both c andc ˜ to A := Bv B˜v by letting c(b) := 0 for b B˜v Bv andc ˜(b):=0 for b B B˜ . Then ∪ ∈ \ ∈ v \ v 0= c(b) c˜(b) b, − b∈A X  such that the affine independence of A implies c(b)=˜c(b) for each b A, which, in ∈ turn, implies Bv = B˜v and c =˜c.  Example 1.21. With respect to the affine basis 0, 1 of R over R, the barycentric coordinates of 1 are 2 and 1 , whereas the barycentric{ coordinates} of 5 are 4 and 5. 3 3 3 − 1 AFFINE SUBSPACES AND GEOMETRY 13

Remark 1.22. Let V be a vector space over the field F and assume M V is an affine subspace with affine basis B of M. ⊆

(a) Caveat: In the literature, one also finds the notion of affine coordinates, however, this notion of affine coordinates is usually (but not always, so one has to use care) defined differently from the notion of barycentric coordinates as defined in Th. 1.20 above: For the affine coordinates, one designates one point x B to be the origin 0 ∈ of M. Let v M and let c : Bv F 0 be the map yielding the barycentric coordinates according∈ to Th. 1.20.−→ We\ write { } x B = x ,x ,...,x with { 0}∪ v { 0 1 n} distinct elements x1,...,xn M (if any) and we set c(x0) := 0 in case x0 / Bv. Then ∈ ∈ n n v = c(x ) x c(x )=1, i i ∧ i i=0 i=0 X X which, since 1 n c(x )= c(x ), is equivalent to − i=1 i 0 P n n v = x + c(x )(x x ) c(x )=1. 0 i i − 0 ∧ i i=1 i=0 X X

One calls the c(x1),...,c(xn), given by the map ca := c↾Bv\{x0}, the affine coordi- nates of v with respect to the affine coordinate system x ( x +B) (for v = x , { 0}∪ − 0 0 ca turns out to be the empty map).

(b) If x1,...,xn M are distinct points that are affinely independent and n := n 1 =0 in F , then one∈ sometimes calls · 6

1 n v := x M n i ∈ i=1 X

the barycenter of x1,...,xn. Definition and Remark 1.23. Let V be a vector space over R (we restrict ourselves to vector spaces over R, since, for a scalar λ we will need to know what it means for λ to be positive, i.e. λ > 0 needs to be well-defined). Let v1,...,vn V and n ∈ λ1,...,λn R, n N. Then we call the affine combination i=1 λi vi of v1,...,vn a ∈ ∈ n of v1,...,vn if, and only if, in addition to λi = 1, one has P i=1 λi 0 for each i 1,...,n . Moreover, we call C V convex if, and only if, C ≥ ∈ { } ⊆ P is closed under convex combinations, i.e. if, and only if, n N, v1,...,vn C, and + n n ∈ ∈ λ1,...,λn R0 with i=1 λi = 1, implies i=1 λi vi C (analogous to Th. 1.5, C V is then convex∈ if, and only if, each convex combination∈ of merely two elements of⊆ C is again in C). NoteP that, in contrast toP affine subspaces, we allow convex sets to 1 AFFINE SUBSPACES AND GEOMETRY 14

be empty. Clearly, the convex subsets of R are precisely the intervals (open, closed, half-open, bounded or unbounded). Convex subsets of R2 include triangles and disks. Analogous to the proof of Th. 1.6(a), one can show that arbitrary intersections of convex sets are always convex, and, analogous to the definition of the affine hull in Def. 1.8, one defines the conv A of a set A V by letting ⊆ := C (V ): A C C is convex subset of V , C ∈P ⊆ ∧ conv A :=  C. C∈C \ Then Prop. 1.9 and its proof still work completely analogously in the convex situation and one obtains conv A to be the smallest convex subset of V containing A, where conv A consists precisely of all convex combinations of elements from A; A = conv A holds if, and only if, A is convex; convconv A = conv A; and conv A conv B for each A B V . If n N and A = x ,x ,...,x V is an affinely⊆ independent set, consisting⊆ ⊆ of ∈ 0 { 0 1 n} ⊆ the n + 1 distinct points x0,x1,...,xn, then conv A is called an n-dimensional simplex (or simply an n-simplex) with vertices x0,x1,...,xn – 0-simplices are called points, 1- simplices line segments, 2-simplices triangles, and 3-simplices tetrahedra. If e1,...,ed denotes the standard basis of Rd, d N, then conv e ,...,e , 0 n

1.5 Affine Maps

We first study a special type of affine map, namely so-called translations. Definition 1.24. Let V be a vector space over the field F . If v V , then the map ∈ T : V V, T (x) := x + v, v −→ v is called a translation, namely, the translation by v or the translation with translation vector v. Let (V ) := T : v V denote the set of translations on V . T { v ∈ } Proposition 1.25. Let V be a vector space over the field F .

(a) If v V and A, B V , then Tv(A + B) = v + A + B. In particular, translations map∈ affine subspaces⊆ of V into affine subspaces of V .

(b) If v V , then T is bijective with (T )−1 = T . In particular, (V ) S , where ∈ v v −v T ⊆ V SV denotes the symmetric group on V according to [Phi19, Ex. 4.9(b)]. (c) Nontrivial translations are not linear: More precisely, T with v V is linear if, v ∈ and only if, v =0 (i.e. Tv = Id). 1 AFFINE SUBSPACES AND GEOMETRY 15

(d) If v,w V , then T T = T = T T . ∈ v ◦ w v+w w ◦ v (e) ( (V ), ) is a commutative subgroup of (SV , ). Moreover, ( (V ), ) ∼= (V, +), whereT ◦ ◦ T ◦ I : (V, +) ( (V ), ), I(v) := T , −→ T ◦ v constitutes a group isomorphism.

Proof. Exercise. 

We will now define affine maps, which are, for affine spaces, what linear maps are for vector spaces:

Definition 1.26. Let V and W be vector spaces over the field F . Amap A : V W −→ is called affine if, and only if, there exists a linear map L (V,W ) and w W such that ∈ L ∈ A(x)=(Tw L)(x)= w + L(x) (1.6) x∈∀V ◦ (i.e. the affine maps are precisely the compositions of linear maps with translations). We denote the set of all affine maps from V into W by (V,W ). A Proposition 1.27. Let V,W,X be vector spaces over the field F .

(a) If L (V,W ) and v V , then L T = T L (V,W ). ∈L ∈ ◦ v Lv ◦ ∈A (b) If A (V,W ), L (V,W ), and w W , then A = T L if, and only if, ∈ A ∈ L ∈ w ◦ T−w A = L. In particular, A = Tw L is injective (resp. surjective, resp. bijective) if, and◦ only if, L is injective (resp. surjective,◦ resp. bijective).

(c) If A : V W is an affine and bijective, then A−1 is also affine. −→ (d) If A : V W and B : W X are affine, then so is B A. −→ −→ ◦ (e) Define GA(V ) := A (V,V ): A bijective . Then (GA(V ), ) forms a subgroup { ∈A } ◦ of the symmetric group (SV , ) (then, clearly, GL(V ) forms a subgroup of GA(V ), cf. [Phi19, Cor. 6.23]). ◦

Proof. (a): If L (V,W ) and v,x V , then ∈L ∈ (L T )(x)= L(v + x)= Lv + Lx =(T L)(x), ◦ v Lv ◦ proving L T = T L. ◦ v Lv ◦ 1 AFFINE SUBSPACES AND GEOMETRY 16

(b) is due to the bijectivity of T : One has, since T T = Id, w −w ◦ w A = T L T A = T T L = Id L = L. w ◦ ⇔ −w ◦ −w ◦ w ◦ ◦ Moreover, for each x,y V and z W , one has ∈ ∈ Ax = w + Lx = w + Ly = Ay Lx = Ly, ⇔ z + w = Ax = w + Lx z = Lx, ⇔ z w = Lx z = w + Lx = Ax, − ⇔ proving A = Tw L is injective (resp. surjective, resp. bijective) if, and only if, L is injective (resp. surjective,◦ resp. bijective).

(c): If A = Tw L with L (V,W ) and w W is affine and bijective, then, by (b), L is bijective. Thus,◦ A−1 = L∈L−1 (T )−1 = L−∈1 T , which is affine by (a). ◦ w ◦ −w (d): If A = T L, B = T K with L (V,W ), w W , K (W, X), x X, then w ◦ x ◦ ∈L ∈ ∈L ∈

(B A)(a)= B(w + La)= x + Kw +(K L)(a)= TKw+x (K L) (a), a∈∀V ◦ ◦ ◦ ◦  showing B A to be affine. ◦ (e) is an immediate consequence of (c) and (d). 

Proposition 1.28. Let V and W be vector spaces over the field F .

(a) Let v V , w W , L (V,W ), and let U be a vector subspace of V . Then ∈ ∈ ∈L (T L)(v + U)= w + Lv + L(U) w ◦ (in particular, each affine image of an affine subspace is an affine subspace). More- over, if A := Tw L and S V such that M := v + U = aff S, then A(M) = w + Lv + L(U) =◦ aff(A(S)). ⊆

(b) Let y W , L (V,W ), and let U be a vector subspace of W . Then L−1(U) is a ∈ ∈L vector subspace of V and

L−1(y + U)= v + L−1(U) v∈L−∀1{y}

(in particular, each linear preimage of an affine subspace is either empty or an affine subspace).

(c) If M W is an affine subspace of W and A (V,W ), then A−1(M) is either ⊆ ∈ A empty or an affine subspace of V . 1 AFFINE SUBSPACES AND GEOMETRY 17

Proof. Exercise. 

The following Prop. 1.29 is the analogon of [Phi19, Prop. 6.5(a),(b)] for affine spaces (but cf. Caveat 1.30 below): Proposition 1.29. Let V and W be vector spaces over the field F , and let A : V W be affine. −→

(a) If A is injective, then, for each affinely independent subset S of V , A(S) is an affinely independent subset of W . (b) A is surjective if, and only if, for each subset S of V with V = aff S, one has W = aff(A(S)).

Proof. Let w W and L (V,W ) be such that A = T L. ∈ ∈L w ◦ (a): If A is injective, S V is affinely independent, and λ1,...,λn F ; s1,...,sn S distinct; n N; such that⊆ n λ = 0 and ∈ ∈ ∈ i=1 i n n P n n n 0= λiA(si)= λi(w + Lsi)= λi w + L λisi = L λisi , i=1 i=1 i=1 ! i=1 ! i=1 ! X X X X X n then i=1 λisi = 0 by [Phi19, Prop. 6.3(d)], implying λ1 = = λn = 0 and, thus, showing that A(S) is also affinely independent. ··· P (b): If A is not surjective, then aff(A(V )) = A(V ) = W , since A(V ) is an affine subspace of W by Prop. 1.28(a). Conversely, if A is surjective,6 S V , and aff(S)= V , then ⊆ Prop. 1.28(a) W = A(V )= A(aff S) = aff(A(S)), thereby establishing the case.  Caveat 1.30. Unlike in [Phi19, Prop. 6.5(a)], the converse of Prop. 1.29(a) is, in general, not true: If dim V 1 and A w W is constant, then A is affine, not injective, but it maps every nonempty≥ affinely≡ independent∈ subset of V (in fact, every nonempty subset of V ) onto the affinely independent set w . { } Corollary 1.31. Let V and W be vector spaces over the field F , and let A : V W be affine and injective. If M V is an affine subspace and B is an affine basis−→ of M, then A(B) is an affine basis of⊆A(M) (Caveat 1.30 above shows that the converse is, in general, not true).

Proof. Since B is affinely independent, A(B) is affinely independent by Prop. 1.29(a). On the other hand, A(M) = aff(A(B)) by Prop. 1.28(a).  1 AFFINE SUBSPACES AND GEOMETRY 18

The following Prop. 1.32 shows that affine subspaces are precisely the images of vector subspaces under translations and also precisely the sets of solutions to linear systems with nonempty sets of solutions: Proposition 1.32. Let V be a vector space over the field F and M V . Then the ⊆ following statements are equivalent:

(i) M is an affine subspace of V .

(ii) There exists v V and a vector subspace U V such that M = T (U). ∈ ⊆ v (iii) There exists a linear map L (V,V ) and a vector b V such that = M = L−1 b = x V : Lx = b ∈(if LV is finite-dimensional,∈ then L−1 b =∅ 6 (L b), where{ } (L{b) denotes∈ the set} of solutions to the linear system Lx ={b according} L | to [Phi19,L Rem.| 8.3]).

Proof. “(i) (ii)”: By the definition of affine subspaces, (i) is equivalent to the existence of v V and⇔ a vector subspace U V such that M = v + U = T (U), which is (ii). ∈ ⊆ v “(iii) (i)”: Let L (V,V ) and b V such that = M = L−1 b . Let x M. Then, ⇒ ∈L ∈ ∅ 6 { } 0 ∈ by [Phi19, Th. 4.20(f)], M = x0 + ker L, showing M to be an affine subspace. “(i) (iii)”: Now suppose M = v + U with v V and U a vector subspace of V . According⇒ to [Phi19, Th. 5.27(c)], there exists a subspace∈ W of V such that V = U W . Then, clearly, L : V V , L(u+w) := w (where u U, w W ), defines a linear⊕ map. Let b := Lv. Then M−→= L−1 b : Indeed, if u U, then∈ L(∈v + u) = Lv +0= Lv = b, showing M L−1 b ; if L(u+{w})= w = b = Lv∈, then u+w = v+u+w v v+U = M (since L(u +⊆w {v)} = Lw Lv = w Lv = 0 implies u + w v− ∈U), showing − − − − ∈ L−1 b M.  { }⊆ The following Th. 1.33 is the analogon of [Phi19, Th. 6.9] for affine spaces:

Theorem 1.33. Let V and W be vector spaces over the field F . Moreover, let MV = v + UV V and MW = w + UW W be affine subspaces of V and W , respectively, where v ⊆ V , w W , U is a vector⊆ subspace of V and U is a vector subspace of ∈ ∈ V W W . Let BV be an affine basis of MV and let BW be an affine basis of MW . Then the following statements are equivalent:

(i) There exists a linear isomorphism L : U U such that V −→ W M =(T L T )(M ). W w ◦ ◦ −v V

(ii) UV and UW are linearly isomorphic. 1 AFFINE SUBSPACES AND GEOMETRY 19

(iii) dim MV = dim MW .

(iv) #BV =#BW (i.e. there exists a bijective map from BV onto BW ).

Proof. “(i) (ii)” is trivially true. ⇒ “(ii) (i)” holds, since the restricted translations T : M U and T : U ⇒ −v V −→ V w W −→ MW are, clearly, bijective.

“(ii) (iii)”: By Def. 1.1, (iii) is equivalent to dim UV = dim UW , which, according to [Phi19,⇔ Th. 6.9], is equivalent to (ii). “(iii) (iv)”: Let x B and y B . Then, by Prop. 1.17, S := b x : b B x ⇔ ∈ V ∈ W V { − ∈ V \{ }} is a vector space basis of UV and SW := b y : b BW y is a vector space basis of U , where the restricted translations T { : −B x∈ \S { }}and T : B y S W −x V \{ } −→ V −y W \{ } −→ W are, clearly, bijective. Thus, if dim MV = dim MW , then there exists a bijective map φ : SV SW , implying (Ty φ T−x): BV x BW y to be bijective as well. Conversely,−→ if ψ : B x ◦ B◦ y is bijective,\ { } −→ so is\ (T { } ψ T ): S S , V \ { } −→ W \ { } −y ◦ ◦ x V −→ W implying dim MV = dim MW . 

Analogous to [Phi19, Def. 6.17], we now consider, for vector spaces V,W over the field F , (V,W ) with pointwise addition and scalar multiplication, letting, for each A, B (V,WA ), λ F , ∈ A ∈ (A + B): V W, (A + B)(x) := A(x)+ B(x), −→ (λ A): V W, (λ A)(x) := λ A(x) for each λ F. · −→ · · ∈ The following Th. 1.34 corresponds to [Phi19, Th. 6.18] and [Phi19, Th. 6.21] for linear maps. Theorem 1.34. Let V and W be vector spaces over the field F . Addition and scalar multiplication on (V,W ), given by the pointwise definitions above, are well-defined in the sense that, if A,A B (V,W ) and λ F , then A+B (V,W ) and λA (V,W ). Moreover, with these pointwise∈A defined∈ operations, (V,W∈A) forms a vector∈A space over F . A

Proof. According to [Phi19, Ex. 5.2(c)], it only remains to show that (V,W ) is a V A vector subspace of (V,W )= W . To this end, let A, B (V,W ) with A = Tw1 L1, B = T L , whereF w ,w W , L ,L (V,W ), and let∈Aλ F . If v V , then ◦ w2 ◦ 2 1 2 ∈ 1 2 ∈L ∈ ∈ (A + B)(v)= w1 + L1v + w2 + L2v = w1 + w2 +(L1 + L2)v,

(λA)(v)= λw1 + λL1v, proving A + B = T (L + L ) (V,W ) and λA = T (λL ) (V,W ), as w1+w2 ◦ 1 2 ∈A λw1 ◦ 1 ∈A desired.  1 AFFINE SUBSPACES AND GEOMETRY 20

1.6 Affine Geometry

The subject of affine geometry is concerned with the relationships between affine sub- spaces, in particular, with the way they are contained in each other. Definition 1.35. Let V be a vector space over the field F and let M,N V be affine subspaces. ⊆

(a) We define the incidence M I N by

M I N : M N N M . ⇔ ⊆ ∨ ⊆   If M I N holds, then we call M,N incident or M incident with N or N incident with M.

(b) If M = v + UM and N = w + UN with v,w V and UM , UN vector subspaces of V , then we call M,N parallel (denoted M N∈) if, and only if, U I U . k M N Proposition 1.36. Let V be a vector space over the field F and let M,N V be affine subspaces. ⊆

(a) If M N, then M I N or M N = . k ∩ ∅

(b) If n N0 and n denotes the set of affine subspaces with dimension n of V , then the parallelity∈ relationA of Def. 1.35(b) constitutes an equivalence relation on . An (c) If denotes the set of all affine subspaces of V , then, for dim V 2, the parallelity relationA of Def. 1.35(b) is not transitive (in particular, not an equivalence≥ relation) on . A

Proof. (a): Let M = v + UM and N = w + UN with v,w V and UM , UN vector subspaces of V . Without loss of generality, assume U U∈ . Assume there exists M ⊆ N x M N. Then, if y M, then y x UM UN , implying y = x +(y x) N and M∈ N∩. ∈ − ∈ ⊆ − ∈ ⊆ (b): It is immediate from Def. 1.35 that is reflexive and symmetric. It remains to k show is transitive on n. Thus, suppose M = v + UM , N = w + UN , P = z + UP with v,w,zk V and U A, U , U vector subspaces of dimension n of V . If M N, then ∈ M N P k UM I UN and dim UM = dim UN = n implies UM = UN by [Phi19, Th. 5.27(d)]. In the same way, N P implies UN = UP . But then UM = UP and M P , proving transitivity of . k k k (c): Let u,w V be linearly independent, U := u , W := w . Then U V and ∈ h{ }i h{ }i k W V , but U ∦ W (e.g., due to (a)).  k 1 AFFINE SUBSPACES AND GEOMETRY 21

Caveat 1.37. The statement of Prop. 1.36(b) becomes false if n N0 is replaced by an infinite cardinality: In an adaptation of the proof of Prop. 1.36(c),∈ suppose V is a vector space over the field F , where the distinct vectors v1,v2,... are linearly independent, and define B := vi : i N , U := B v1 , W := B v2 , X := B . Then, clearly, U X and W{ X,∈ but}U ∦ W (e.g.,h \ { due}i to Prop.h 1.36(a)).\ { }i h i k k Proposition 1.38. Let V be a vector space over the field F .

(a) If x,y V with x = y, then there exists a unique line l V (i.e. a unique affine subspace∈ l of V with6 dim l =1) such that x,y l. Moreover,⊆ this affine subspace is ∈ given by l = x + x y . (1.7) h{ − }i (b) If x,y,z V and there does not exist a line l V such that x,y,z l, then there exists a unique∈ plane p V (i.e. a unique affine⊆ subspace p of V with∈ dim p = 2) such that x,y,z p. Moreover,⊆ this affine subspace is given by ∈ p = x + y x,z x . (1.8) h{ − − }i (c) If v ,...,v V , n N, then aff v ,...,v = v + v v ,...,v v . 1 n ∈ ∈ { 1 n} 1 h{ 2 − 1 n − 1}i Proof. Exercise. 

Proposition 1.39. Let V,W be vector spaces over the field F and let M,N V be affine subspaces. ⊆

(a) If A (V,W ), then M I N implies A(M)I A(N), and M N implies A(M) A(N)∈. A k k

(b) If v V , then T (M) M. ∈ v k Proof. (a): Let A (V,W ). Then M I N implies A(M)I A(N), since M N im- plies A(M) A(N∈) and A A(M),A(N) are affine subspaces of W due to Prop.⊆ 1.28(a). ⊆ Moreover, if M = v + UM , N = w + UN with v,w V and UM , UN vector subspaces of V , A = T L with x W and L (V,W ), then∈ A(M) = x + Lv + L(U ) and x ◦ ∈ ∈ L M A(N) = x + Lw + L(UN ), such that M N implies A(M) A(N), since UM UN implies L(U ) L(U ). k k ⊆ M ⊆ N (b) is immediate from T (M)= v + w + U for M = w + U with w V and U a vector v ∈ subspace of V .  2 DUALITY 22

2 Duality

2.1 Linear Forms and Dual Spaces

If V is a vector space over the field F , then maps from V into F are often of particular interest and importance. Such maps are sometimes called functionals or forms. Here, we will mostly be concerned with linear forms: Let us briefly review some examples of linear forms that we already encountered in [Phi19]: Example 2.1. Let V be a vector space over the field F .

(a) Let B be a basis of V . If cv : Bv F 0 , Bv V , are the corresponding v c b−→b \ { }v V⊆ b B coordinate maps (i.e. = b∈Bv v( ) for each ), then, for each , the projection onto the coordinate with respect to b, ∈ ∈ P

cv(b) for b Bv, πb : V F, πb(v) := ∈ −→ 0 for b / B , ( ∈ v is a (cf. [Phi19, Ex. 6.7(b)]).

(b) Let I be a nonempty set, V := (I,F ) = F I (i.e. the vector space of functions from I into F ). Then, for each i FI, the projection onto the ith coordinate ∈ π : V F, π (f) := f(i), i −→ i is a linear form (cf. [Phi19, Ex. 6.7(c)]).

(c) Let F := K, where, as in [Phi19], we write K if K may stand for R or C. Let V be the set of convergent sequences in K. Then

A : V K,A(zn)n∈N := lim zn, −→ n→∞ is a linear form (cf. [Phi19, Ex. 6.7(e)(i)]).

(d) Let a,b R, a b, I := [a,b], and let V := (I, K) be the set of all K-valued Riemann∈ integrable≤ functions on I. Then R

J : V K, J(f) := f, −→ ZI is a linear form (cf. [Phi19, Ex. 6.7(e)(iii)]). Definition 2.2. Let V be a vector space over the field F . 2 DUALITY 23

(a) The functions from V into F (i.e. the elements of (V,F ) = F V ) are called func- tionals or forms on V . In particular, the elementsF of (V,F ) are called linear functionals or linear forms on V . L

(b) The set V ′ := (V,F ) (2.1) L is called the (linear2) (or just the dual) of V (in the literature, one often also finds the notation V ∗ instead of V ′). We already know from [Phi19, Th. 6.18] that V ′ constitutes a vector space over F .

Corollary 2.3. Let V be a vector space over the field F . Then each linear form α : V F is uniquely determined by its values on a basis of V . More precisely, if B is a basis−→ of V , (λ ) is a family in F , and, for each v V , c : B F 0 , B V , b b∈B ∈ v v −→ \ { } v ⊆ is the corresponding coordinate map (i.e. v = cv(b) b for each v V ), then b∈Bv ∈ P α : V F, α(v)= α cv(b) b := cv(b) λb, (2.2) −→ ! bX∈Bv bX∈Bv is linear, and α˜ V ′ with ∈ α˜(b)= λb, b∈∀B implies α =α ˜.

Proof. Corollary 2.3 constitutes a special case of [Phi19, Th. 6.6]. 

Corollary 2.4. Let V be a vector space over the field F and let B be a basis of V . Using Cor. 2.3, define linear forms α V ′ by letting b ∈ 1 for a = b, αb(a) := δba = (2.3) (b,a)∈∀B×B 0 for a = b. ( 6 Define B′ := α : b B . (2.4) b ∈  (a) Then B′ is linearly independent.

2In , where the vector space V over K is endowed with the additional structure n ′ of a topology (e.g., V might be the normed space K ), one defines the (topological) dual Vtop of V (there usually also just denoted as V ′ or V ∗) to consist of all linear functionals on V that are also continuous with respect to the topology on V (cf. [Phi17b, Ex. 3.1]). Depending on the topology on ′ ′ ′ V , Vtop can be much smaller than V – Vtop tends to be much more useful in an analysis context. 2 DUALITY 24

(b) If V is finite-dimensional, dim V = n N, then B′ constitutes a basis for V ′ (in particular, dim V = dim V ′). In this case,∈ B′ is called the dual basis of B (and B the dual basis of B′).

(c) If dim V = , then B′ ( V ′ and, in particular, B′ is not a basis of V ′ (in fact, in this case,∞ one has hdimi V ′ > dim V , see [Jac75, pp. 244-248]).

Proof. Cor. 2.4(a),(b),(c) constitute special cases of the corresponding cases of [Phi19, Th. 6.19]. 

Definition 2.5. If V is a vector space over the field F with dim V = n N and ′ ∈ B := (b1,...,bn) is an ordered basis of V , then we call B := (α1,...,αn), where

′ αi V αi(bj)= δij , (2.5) i∈{1∀,...,n} ∈ ∧   the ordered dual basis of B (and B the ordered dual basis of B′) – according to Cor. 2.4(b), B′ is, indeed, an ordered basis of V ′.

2 Example 2.6. Consider V := R . If b1 := (1, 0), b2 := (1, 1), then B := (b1,b2) is ′ ′ an ordered basis of V . Then the ordered dual basis B = (α1,α2) of V consists of the ′ maps α1,α2 V with α1(b1) = α2(b2) = 1, α1(b2) = α2(b1) = 0, i.e. with, for each (v ,v ) V , ∈ 1 2 ∈ α (v ,v )= α (v v )b + v b = v v , 1 1 2 1 1 − 2 1 2 2 1 − 2 α (v ,v )= α (v v )b + v b = v . 2 1 2 2 1 − 2 1 2 2 2 Notation 2.7. Let V be a vector space over the field F , dim V = n N, with ordered ′ ∈ basis B = (b1,...,bn). Moreover, let B = (α1,...,αn) be the corresponding ordered dual basis of V ′. If one then denotes the coordinates of v V with respect to B as the column vector ∈ v1 . v =  .  , vn   then one typically denotes the coordinates of γ  V ′ with respect to B′ as the row vector ∈

γ = γ1 ... γn

(this has the advantage that one then can express γ(v) as a matrix product, cf. Rem. 2.8(a) below).

Remark 2.8. We remain in the situation of Not. 2.7 above. 2 DUALITY 25

(a) We obtain

n n n n n n

γ(v)= γkαk vlbl = γkvlαk(bl)= γkvlδkl ! ! Xk=1 Xl=1 Xl=1 Xk=1 Xl=1 Xk=1 n v1 . = γkvk = γ1 ... γn  .  . k=1 vn X      (b) Let B˜ := (˜v ,..., v˜ ) be another ordered basis of V and (c ) GL (F ) such that V 1 n ji ∈ n n

v˜i = cjivj. i∈{1∀,...,n} j=1 X ˜′ ˜ If BV := (˜α1,..., α˜n) denotes the ordered dual basis corresponding to BV and −1 (dji):=(cji) , then n n t α˜i = djiαj = dijαj, i∈{1∀,...,n} j=1 j=1 X X t where (dji) denotes the transpose of (dji), i.e. t t (dji) GLn(F ) with dji := dij : ∈ (j,i)∈{1,...,n∀}×{1,...,n} Indeed, for each k,l 1,...,n , we obtain ∈ { } n n n n n n

dkjαj (˜vl)= dkjαj cilvi = dkjcilδji = dkjcjl = δkl. j=1 ! j=1 ! i=1 ! j=1 i=1 j=1 X X X X X X Proposition 2.9. Let V be a vector space over the field F . If U is a vector subspace of V and v V U, then there exists α V ′, satisfying ∈ \ ∈ α(v)=1 α(u)=0. ∧u∈ ∀U

Proof. Let B be a basis of U. Then B := v ˙ B is linearly independent and, U v { } ∪ U according to [Phi19, Th. 5.23(a)], there exists a basis B of V such that Bv B. According to Cor. 2.3, ⊆

1 for b = v, α : V F, α(b) := −→ b∈∀B 0 for b = v, ( 6 defines an element of V ′, which, clearly, satisfies the required conditions.  2 DUALITY 26

Definition 2.10. Let V be a vector space over the field F .

(a) The map , : V V ′ F, v,α := α(v), (2.6) h· ·i × −→ h i is called the dual pairing corresponding to V .

(b) The dual of V ′ is called the bidual or the second dual of V . One writes V ′′ := (V ′)′.

(c) The map

Φ: V V ′′, v Φv, (Φv): V ′ F, (Φv)(α) := α(v), (2.7) −→ 7→ −→ is called the canonical embedding of V into V ′′ (cf. Th. 2.11 below).

Theorem 2.11. Let V be a vector space over the field F .

(a) The canonical embedding Φ: V V ′′ of (2.7) is a linear monomorphism (i.e. a −→ linear isomorphism Φ: V = Im Φ V ′′). ∼ ⊆ ′′ (b) If dim V = n N, then Φ is a linear isomorphism Φ : V ∼= V (in fact, the converse is also∈ true, i.e., if Φ is an isomorphism, then dim V < , cf. the remark in Cor. 2.4(c)). ∞

Proof. (a): Exercise. (b): According to Cor. 2.4(b), n = dim V = dim V ′ = dim V ′′. Thus, by [Phi19, Th. 6.10], the linear monomorphism Φ is also an epimorphism. 

Corollary 2.12. Let V be a vector space over the field F , dim V = n N. If B′ = α ,...,α is a basis of V ′, then there exists a basis B of V such that ∈B and B′ are { 1 n} dual.

Proof. According to Th. 2.11(b), the canonical embedding Φ : V V ′′ of (2.7) ′′ −→′′ constitutes a linear isomorphism. Let B = f1,...,fn be the basis of V that is dual ′ −{1 } to B and, for each i 1,...,n , bi := Φ (fi). Then, as Φ is a linear isomorphism, B := b ,...,b is a∈ basis { of V .} Moreover, B and B′ are dual: { 1 n}

αi(bj)=(Φbj)(αi)= fj(αi)= δij, i,j∈{∀1,...,n}

where we used that B′ and B′′ are dual.  2 DUALITY 27

2.2 Annihilators

Definition 2.13. Let V be a vector space over the field F , M V , S V ′. Moreover, let Φ : V V ′′ denote the canonical embedding of (2.7). Then⊆ ⊆ −→ V ′ for M = , M ⊥ := α V ′ : α(v)=0 = ∅ ∈ v∈∀M ker(Φv) for M =   ( v∈M 6 ∅ is called the (forward) annihilator of M in V ′, T

V for S = , S⊤ := v V : α(v)=0 = ∅ ∈ α∀∈S ker α for S =   ( α∈S 6 ∅ is called the (backward) annihilator of S in V . InT view of Rem. 2.15 and Ex. 2.16(b) below, one also calls v V and α V ′ such that ∈ ∈ (2.6) α(v) = v,α =0 h i perpendicular or orthogonal and, in consequence, sets M ⊥ and S⊤ are sometimes called M perp and S perp, respectively.

Lemma 2.14. Let V be a vector space over the field F , M V , S V ′. Then M ⊥ is a subspace of V ′ and S⊤ is a subspace of V . Moreover, ⊆ ⊆

M ⊥ = M ⊥, S⊤ = S ⊤. (2.8) h i h i Proof. Since M ⊥ and S⊤ are both intersections of kernels of linear maps, they are subspaces, since kernels are subspaces by [Phi19, Prop. 6.3(c)] and intersections of sub- spaces are subspaces by [Phi19, Th. 5.7(a)]. Moreover, it is immediate from Def. 2.13 that M ⊥ M ⊥ and S⊤ S ⊤. On the other hand, consider α M ⊥ and v S⊤. Let λ ,...,λ⊇ h iF , n N.⊇ If hv ,...,vi M, then ∈ ∈ 1 n ∈ ∈ 1 n ∈ n n α∈M ⊥ α λivi = λiα(vi) = 0, i=1 ! i=1 X X showing α M ⊥ and M ⊥ M ⊥. Analogously, if α ,...,α S, then ∈h i ⊆h i 1 n ∈ n n v∈S⊤ λiαi (v)= λiαi(v) = 0, i=1 ! i=1 X X showing v S ⊤ and S⊤ S ⊤.  ∈h i ⊆h i 2 DUALITY 28

Remark 2.15. On real vector spaces V , one can study so-called inner products (also called scalar products), , : V V R, (v,w) v,w R, which, as part of their definition, haveh· the·i requirement× −→ of being bilinear7→ h formsi, ∈ i.e., for each v V , v, : V R is a linear form and, for each w V , ,w : V R is a linear∈ form (weh ·i will come−→ back to vector spaces with inner products∈ h· againi Sec.−→ 10 below). One then calls vectors v,w V perpendicular or orthogonal with respect to , if, and only if, v,w = 0 so that∈ the notions of Def. 2.13 can be seen as generalizingh· ·i orthogonality h i with respect to inner products (also cf. Ex. 2.16(b) below). Example 2.16. (a) Let V be a vector space over the field F and let U be a subspace of V with BU being a basis of U. Then, according to [Phi19, Th. 5.23(a)], there exists a basis B of V such that B B. Then Cor. 2.3 implies U ⊆ α U ⊥ α(b)=0 . ′ α∈∀V ∈ ⇔b∈ ∀BU   (b) Consider the real vector space R2 and let

, : R2 R2 R, (v ,v ), (w ,w ) := v w + v w , h· ·i × −→ h 1 2 1 2 i 1 1 2 2 denote the so-called Euclidean inner product on R2. Then, clearly, for each w = (w ,w ) R2, 1 2 ∈ α : R2 R, α (v) := v,w = v w + v w , w −→ w h i 1 1 2 2 2 defines a linear form on R . Let v := (1, 2). Then the span of v, i.e. lv := (λ, 2λ): λ R , represents the line through v. Moreover, for each w =(w ,w ) {R2, ∈ } 1 2 ∈ α v ⊥ = l⊥ α (v)= w +2w =0 w = 2w w ∈ { } v ⇔ w 1 2 ⇔ 1 − 2 w l := ( 2λ,λ): λ R . ⇔ ∈ ⊥ { − ∈ } Thus, l is spanned by ( 2, 1) and we see that l⊥ consists precisely of the linear ⊥ − v forms αw that are given by vectors w that are perpendicular to v in the Euclidean geometrical sense (i.e. in the sense usually taught in high school geometry). —

The following notions defined for linear forms in connection with subspaces can some- times be useful when studying annihilators: Definition 2.17. Let V be a vector space over the field F and let U be a subspace of V . Then R : V ′ U ′, Rf := f ↾ , −→ U 2 DUALITY 29

is called the restriction operator from V to U;

I : (V/U)′ V ′, (Ig)(v) := g(v + U), −→ is called the inflation operator from V/U to V . Theorem 2.18. Let V be a vector space over the field F and let U be a subspace of V with the restriction operator R and the inflation operator I defined as in Def. 2.17.

(a) R : V ′ U ′ is a linear epimorphism with ker R = U ⊥. Moreover, −→ dim U ⊥ + dim U ′ = dim V ′ (2.9)

and ′ ′ ⊥ U ∼= V /U . (2.10) (see [Phi19, Th. 6.8(a)] for the precise meaning of (2.9) in case at least one of the occurring cardinalities is infinite). If dim V = n N, then one also has ∈ n = dim V = dim U ⊥ + dim U. (2.11)

′ ⊥ (b) I is a linear isomorphism I : (V/U) ∼= U . Proof. (a): Let α, β V ′ and λ,µ F . Then, for each u U, ∈ ∈ ∈ R(λα + µβ)(u)= λα(u)+ µβ(u)= λ(Rα)(u)+ µ(Rβ)(u)= λ(Rα)+ µ(Rβ) (u), showing R to be linear. Moreover, for each α V ′, one has  ∈ α ker R α(u)=0 α U ⊥, ∈ ⇔u∈ ∀U ⇔ ∈

⊥ proving ker R = U . Let BU be a basis of U. Then, according to [Phi19, Th. 5.23(a)], ′ there exists C V such that BU ˙ C is a basis of V . Consider α U . Using Cor. 2.3, define β V ′ by⊆ setting ∪ ∈ ∈ α(b) for b B , β(b) := ∈ U 0 for b C. ( ∈ Then, clearly, Rβ = α, showing R to be surjective. Thus, we have

[Phi19, Th. 6.8(a)] dim V ′ = dimker R + dimIm R = dim U ⊥ + dim U ′, thereby proving (2.9). Next, applying the isomorphism theorem of [Phi19, Th. 6.16(a)] yields ′ ′ ′ ⊥ U = Im R ∼= V / ker R = V /U , 2 DUALITY 30

which is (2.10). Finally, if dim V = n N, then ∈ Cor. 2.4(b) (2.9) Cor. 2.4(b) n = dim V = dim V ′ = dim U ⊥ + dim U ′ = dim U ⊥ + dim U,

proving (2.11). (b): Exercise.  Theorem 2.19. Let V be a vector space over the field F .

(a) If U is a subspace of V , then (U ⊥)⊤ = U.

(b) If S is a subspace of V ′, S (S⊤)⊥. If dim V = n N, then one even has (S⊤)⊥ = S. ⊆ ∈

(c) If U1, U2 are subspaces of V , then

(U + U )⊥ = U ⊥ U ⊥, (U U )⊥ = U ⊥ + U ⊥. 1 2 1 ∩ 2 1 ∩ 2 1 2

′ (d) If S1,S2 are subspaces of V , then

(S + S )⊤ = S⊤ S⊤, (S S )⊤ S⊤ + S⊤. 1 2 1 ∩ 2 1 ∩ 2 ⊇ 1 2 If dim V = n N, then one also has ∈ (S S )⊤ = S⊤ + S⊤. 1 ∩ 2 1 2 Proof. (a): Exercise. (b): According to Def. 2.13, we have

(S⊤)⊥ := α V ′ : α(v)=0 , ∈ v∈∀S⊤   showing S (S⊤)⊥. Now assume dim V = n N and suppose there exists α (S⊤)⊥ S. Then, according⊆ to Prop. 2.9, there exists f ∈ V ′′, satisfying ∈ \ ∈ f(α)=1 f(β)=0. ∧β ∀∈S Since dim V = n N, we may employ Th. 2.11(b) to conclude that the canonical embedding Φ : V ∈ V ′′ is a linear isomorphism, in particular, surjective. Thus, there exists v V such−→ that f = Φv, i.e. f(γ) = γ(v) for each γ V ′. Since f S⊥, we have β(v∈) = f(β) = 0 for each β S, showing v S⊤. Thus,∈ α (S⊤)⊥ implies∈ the ∈ ∈ ∈ contradiction 0 = α(v)= f(α) = 1. In consequence, (S⊤)⊥ S = , proving (b). \ ∅ 2 DUALITY 31

⊥ ⊥ (c): Let α (U1 + U2) . Then U1 U1 + U2 implies α U1 , U2 U1 + U2 implies ⊥ ∈ ⊥ ⊥ ⊆ ⊥ ⊥ ⊥∈ ⊆ ⊥ ⊥ α U2 , showing α U1 U2 and (U1 + U2) U1 U2 . Conversely, if α U1 U2 ∈ ∈ ∩ ⊆ ∩ ∈ ∩ ⊥ and u1 U1, u2 U2, then α(u1 + u2) = α(u1)+ α(u2) = 0, showing α (U1 + U2) ⊥∈ ⊥ ∈ ⊥ ∈ ⊥ ⊥ and U1 U2 (U1 + U2) . To prove the second equality in (c), first, let α U1 + U2 , ∩ ⊆ ⊥ ⊥ ∈ i.e. α = α1 + α2 with α1 U1 , α2 U2 . Then, if v U1 U2, one has α(v) = ∈ ∈ ⊥ ⊥ ⊥∈ ∩ ⊥ α1(v)+ α2(v) = 0, showing α (U1 U2) and U1 + U2 (U1 U2) . Conversely, let α (U U )⊥. We now proceed∈ ∩ similar to the proof of⊆ [Phi19,∩ Th. 5.30(c)]: We ∈ 1 ∩ 2 choose bases B , B , B of U U , U , and U , respectively, such that B B and ∩ U1 U2 1 ∩ 2 1 2 ∩ ⊆ U1 B∩ BU2 , defining B1 := BU1 B∩, B2 := BU2 B∩. Then it was shown in the proof of [Phi19,⊆ Th. 5.30(c)] that \ \

B := B B = B ˙ B ˙ B + U1 ∪ U2 1 ∪ 2 ∪ ∩

is a basis of U1 + U2 and we may choose C V such that B := B+ ˙ C is a basis of V . Using Cor. 2.3, we now define α ,α V ′ by⊆ setting ∪ 1 2 ∈

α(b) for b B B1, α(b) for b B1, α1(b) := ∈ \ α2(b) := ∈ b∈∀B 0 for b B , 0 for b B B . ( ∈ 1 ( ∈ \ 1

Since α↾ 0, we obtain α ↾ = α ↾ ˙ 0 and α ↾ = α ↾ ˙ 0, showing B∩ 1 BU1 1 B1 ∪ B∩ 2 BU2 2 B2 ∪ B∩ ⊥ ≡ ⊥ ≡ ≡ α1 U1 and α2 U2 . On the other hand, we have, for each b B, α(b)= α1(b)+α2(b), ∈ ∈ ⊥ ⊥ ⊥ ⊥ ∈ ⊥ showing α = α1 + α2, α U1 + U2 , and (U1 U2) U1 + U2 , thereby completing the proof of (c). ∈ ∩ ⊆ (d): Exercise. 

2.3 Hyperplanes and Linear Systems

In the present section, we combine duality with the theory of affine spaces of Sec. 1 and with the theory of linear systems of [Phi19, Sec. 8].

Definition 2.20. Let V be a vector space over the field F . If α V ′ 0 and r F , then the set ∈ \ { } ∈ H := α−1 r = v V : α(v)= r V α,r { } { ∈ }⊆ is called a hyperplane in V .

Notation 2.21. Let V be a vector space over the field F , v V , and α V ′. We then write ∈ ∈ v⊥ := v ⊥, α⊤ := α ⊤. { } { } Theorem 2.22. Let V be a vector space over the field F . 2 DUALITY 32

(a) Each hyperplane H in V is an affine subspace of V , where dim V = 1+dim H, i.e. dim V = dim H if V is infinite-dimensional, and dim H = n 1 if dim V = n N. More precisely, if 0 = α V ′ and r F , then − ∈ 6 ∈ ∈ w ⊤ w Hα,r = r + α = r + ker α. (2.12) w∈V : ∀α(w)6=0 α(w) α(w)

(b) If dim V = n N and M is an affine subspace of V with dim M = n 1, then M is a hyperplane∈ in V , i.e. there exist 0 = α V ′ and r F such that M− = H . 6 ∈ ∈ α,r (c) Let α, β V ′ 0 and r, s F . Then ∈ \ { } ∈

Hα,r = Hβ,s β = λα s = λr . ⇔06= ∃λ∈F ∧   Moreover, if α = β, then Hα,r and Hβ,s are parallel.

Proof. (a): Let 0 = α V ′ and r F with w V such that α(w) = 0. Define w 6 ∈ ∈ −1 ∈ 6 v := r α(w) . Then α(v) = r, i.e. v Hα,r = α r . Thus, by [Phi19, Th. 4.20(f)], we have ∈ { } H = v + ker α = v + x V : α(x)=0 = v + α⊤, α,r { ∈ } proving (2.12). In particular, we have dim Hα,r = dimker α and, by [Phi19, Th. 6.8(a)],

dim V = dimker α + dimIm α = dim Hα,r + dim F = dim Hα,r +1,

thereby completing the proof of (a). (b),(c): Exercise. 

Proposition 2.23. Let V be a vector space over the field F , dim V = n N. If M V ∈ ⊆ is an affine subspace of V with dim M = m N0, m

ri := αi(v). i∈{1,...,n∀ −m}

We claim M = N := n−m H : Indeed, if x = v + u with u U, then i=1 αi,ri ∈ ⊥ T αi∈U αi(x)= αi(v)+ αi(u) = ri +0= ri, i∈{1,...,n∀ −m} 2 DUALITY 33

showing x N and M N. Conversely, let x N. Then ∈ ⊆ ∈ ⊤ αi(x v)= ri ri =0, i.e. x v αi , i∈{1,...,n∀ −m} − − − ∈ implying Th. 2.19(a) x v α ,...,α ⊤ =(U ⊥)⊤ = U, − ∈ h{ 1 n−m}i showing x v + U = M and N M as claimed.  ∈ ⊆ Example 2.24. Let F be a field. As in [Phi19, Sec. 8.1], consider the linear system

n

ajk xk = bj, (2.13) j∈{1∀,...,m} Xk=1 where m,n N; b ,...,b F and the a F , j 1,...,m , i 1,...,n . ∈ 1 m ∈ ji ∈ ∈ { } ∈ { } We know that we can also write (2.13) in matrix form as Ax = b with A = (aji) t m ∈ (m,n,F ), m,n N, and b =(b1,...,bm) (m, 1,F ) ∼= F . The set of solutions toM (2.13) is ∈ ∈M (A b)= x F n : Ax = b . L | { ∈ } If we now define the linear forms

v1 v1 n n . . αj : F F, αj . := aj1 ... ajn . = ajk vk, j∈{1∀,...,m} −→     k=1 vn vn      X     then we can rewrite (2.13) as

x1 m . x = . Hαj ,bj x Hαj ,bj . (2.14) j∈{1∀,...,m}   ∈  ⇔ ∈ j=1 xn     \     m Thus, we have (A b)= j=1 Hαj ,bj and we can view (2.14) as a geometric interpretation of (2.13), namelyL that| the solution vectors x are required to lie in the intersection of the T m hyperplanes Hα1,b1 ,...,Hαm,bm . Even though we know from [Phi19, Th. 8.15] that the elementary row operations of [Phi19, Def. 8.13] do not change the set of solutions (A b), it might be instructive to reexamine this fact in terms of linear forms and hyperplanes:L | The elementary row operation of row switching merely corresponds to changing the order of the H in the intersection yielding (A b). The elementary row operation of row αj ,bj L | multiplication rj λrj (0 = λ F ) does not change (A b) due to Hαj ,bj = Hλαj ,λbj according to Th.7→ 2.22(c). The6 elementary∈ row operationL of|row addition r r + λr j 7→ j i 2 DUALITY 34

(λ F , i = j) replaces Hαj ,bj by Hαj +λαi,bj +λbi . We verify, once again, what we already know∈ form6 [Phi19, Th. 8.15], namely

m m

(A b)= Hα ,b = M :=  Hα ,b  Hα +λα ,b +λb : L | k k k k ∩ j i j i k=1 k=1, \  k\6=j      If x (A b), then (α + λα )(x) = b + λb , showing x H and x M. ∈ L | j i j i ∈ αj +λαi,bj +λbi ∈ Conversely, if x M, then αj(x)=(αj +λαi)(x) λαi(x)= bj +λbi λbi = bj, showing x H and x∈ (A b). − − ∈ αj ,bj ∈L |

2.4 Dual Maps

Theorem 2.25. Let V,W be vector spaces over the field F . If A (V,W ), then there exists a unique map A′ : W ′ V ′ such that (using the notation∈L of (2.6)) −→ (A′β)(v)= v, A′β = Av, β = β(Av). (2.15) v∈∀V β∈∀W ′ h i h i Moreover, this map turns out to be linear, i.e. A′ (W ′,V ′). ∈L Proof. Clearly, given A (V,W ), (2.15) uniquely defines a map A′ : W ′ V ′ (for each β W ′, (2.15) defines∈ L the map (A′β)= β A (V,F )= V ′). It merely−→ remains to check∈ that A′ is linear. To this end, let β, β ,◦ β ∈LW ′, λ F , and v V . Then 1 2 ∈ ∈ ∈ ′ ′ ′ A (β1 + β2) (v)=(β1 + β2)(Av)= β1(Av)+ β2(Av)=(A β1)(v)+(A β1)(v) ′ ′  =(A β1 + A β2)(v), A′(λβ) (v)=(λβ)(Av)= λ(A′β)(v),

′  ′ ′ ′ ′ ′ showing A (β1 + β2)= A β1 + A β2, A (λβ)= λA (β), and the linearity of A .  Definition 2.26. Let V,W be vector spaces over the field F , A (V,W ). Then the map A′ (W ′,V ′) given by Th. 2.26 is called the dual map corresponding∈ L to A (or the transpose∈ L of A).

Theorem 2.27. Let F be a field, m,n N. Let V,W be finite-dimensional vector spaces ∈ over F , dim V = n, dim W = m, where BV := (v1,...,vn) is an ordered basis of V and BW := (w1,...,wm) is an ordered basis of W . Moreover, let BV ′ = (α1,...,αn) and ′ ′ BW ′ =(β1,...,βm) be the corresponding (ordered) dual bases of V and W , respectively. 2 DUALITY 35

If A (V,W ) and (a ) (m,n,F ) is the matrix corresponding to A with respect ∈ L ji ∈ M to BV and BW , then the transpose of (aji), i.e.

t t (aji)(j,i)∈{1,...,n}×{1,...,m} (n,m,F ), where aji := aij ∈M (j,i)∈{1,...,n∀}×{1,...,m}

′ is the matrix corresponding to A with respect to BW ′ and BV ′ .

Proof. If (aji) is the matrix corresponding to A with respect to BV and BW , then (cf. [Phi19, Th. 7.10(b)]) m

Avi = ajiwj, (2.16) i∈{1∀,...,n} j=1 X and we have to show

n n ′ t A βj = aijαi, = ajiαi. (2.17) j∈{1∀,...,m} i=1 i=1 X X Indeed, one computes, for each j 1,...,m and for each k 1,...,n , ∈ { } ∈ { } m m m ′ (A βj)vk = βj(Avk)= βj alkwl = alkβj(wl)= alkδjl = ajk l=1 ! l=1 l=1 n n X Xn X

= ajiδik = ajiαi(vk)= ajiαi vk, i=1 i=1 i=1 ! X X X thereby proving (2.17). 

Remark 2.28. As in Th. 2.27, let m,n N, let V,W be finite-dimensional vector spaces ∈ over the field F , dim V = n, dim W = m, where BV := (v1,...,vn) is an ordered basis of V and BW := (w1,...,wm) is an ordered basis of W with corresponding (ordered) ′ ′ dual bases BV ′ = (α1,...,αn) of V and BW ′ = (β1,...,βm) of W . Let A (V,W ) with dual map A′ (W ′,V ′). ∈ L ∈L

(a) If (aji) (m,n,F ) is the matrix corresponding to A with respect to BV and BW and if one∈M represents elements of the duals as column vectors, then, according to Th. 2.27, one obtains, for γ = m γ β W ′ and ǫ := A′(γ) = n ǫ α V ′ i=1 i i ∈ i=1 i i ∈ with γ1,...,γm,ǫ1,...,ǫn F , ∈ P P

ǫ1 γ1 . t .  .  =(aji)  .  . ǫn γm         2 DUALITY 36

However, if one adopts the convention of Not. 2.7 to represent elements of the duals as row vectors, then one applies transposes in the above equation to obtain

ǫ1 ... ǫn = γ1 ... γm (aji), showing that this notation allows A and A′ to be represented by the same matrix (aji). (b) As in [Phi19, Th. 7.14] and Rem. 2.8(b) above, we now consider basis transitions n m

v˜i = cjivj, w˜i = fjiwj,, i∈{1∀,...,n} i∈{1∀,...,m} j=1 j=1 X X B˜V := (˜v1,..., v˜n), B˜W := (˜w1,..., w˜m). We then know from [Phi19, Th. 7.14] that −1 the matrix representing A with respect to B˜V and B˜W is (fji) (aji)(cji). Thus, ac- ′ ˜′ cording to Th. 2.26, the matrix representing A with respect to the dual bases BV = ˜′ ˜ ˜ −1 t t t −1 t (˜α1,..., α˜n) and BW = (β1,..., βm) is (fji) (aji)(cji) = (cji) (aji) ((fji) ) . Of course, we can, alternatively, observe that, by Rem. 2.8(b), the basis transi- ˜′ −1 t  tion from BV ′ to BV is given by ((cji) ) and the basis transition from BW ′ to ˜′ −1 t ′ BW is given by ((fji) ) and compute the matrix representing A with respect ˜′ ˜′ t t −1 t to BV and BW via Th. 2.26 and [Phi19, Th. 7.14] to obtain (cji) (aji) ((fji) ) , as before. If γ = m γ β˜ W ′ and ǫ := A′(γ) = n ǫ α˜ V ′ with i=1 i i ∈ i=1 i i ∈ γ1,...,γm,ǫ1,...,ǫn F , then this yields ∈P P −1 ǫ1 ... ǫn = γ1 ... γm (fji) (aji)(cji).

(c) Comparing with [Phi19, Rem. 7.24], we observe that the dual map A′ (W ′,V ′) is precisely the transpose map At of the map A considered in [Phi19,∈L Rem. 7.24]. Moreover, as a consequence of Th. 2.26, the rows of the matrix (aji), representing ′ A, span Im A in the same way that the columns of (aji) span Im A. Theorem 2.29. Let V,W be vector spaces over the field F .

(a) The duality map ′ : (V,W ) (W ′,V ′), A A′, is linear. L −→ L 7→ (b) If X is another vector space over F , A (V,W ) and B (W, X), then ∈L ∈L (BA)′ = A′B′.

Proof. (a): Let A, B (V,W ), λ F , β W ′, and v V . Then we compute ∈L ∈ ∈ ∈ (A + B)′(β)(v)= β (A + B)(v) = β(Av + Bv)= β(Av)+ β(Bv) ′ ′ ′ ′ =(A β)(v)+(Bβ)(v)=(A + B )(β)(v), (λA)′(β)(v)= β (λA)(v) = λβ(Av)=(λA′)(β)(v),  2 DUALITY 37

showing (A + B)′ = A′ + B′, (λA)′ = λA′, and the linearity of ′. (b): If γ X′, then ∈ (B A)′(γ)= γ (B A)=(γ B) A = A′(γ B)= A′(B′γ)=(A′ B′)(γ), ◦ ◦ ◦ ◦ ◦ ◦ ◦ showing (B A)′ = A′ B′.  ◦ ◦ Theorem 2.30. Let V,W be vector spaces over the field F and A (V,W ). ∈L (a) ker A′ = (Im A)⊥.

(b) ker A = (Im A′)⊤.

(c) A is an epimorphism if, and only if, A′ is a monomorphism.

(d) If A′ is an epimorphism, then A is a monomorphism. If A is a monomorphism and dim V = n N, then A′ is an epimorphism. ∈ (e) If A′ is an isomorphism, then A is a isomorphism. If A is a isomorphism and dim V = n N, then A′ is an isomorphism. ∈ Proof. (a) is due to the equivalence

β ker A′ β(Av)=(A′β)(v)=0 β (Im A)⊥. ∈ ⇔ v∈∀V ⇔ ∈   (b): Exercise. (c): If A is an epimorphism, then Im A = W , implying

(a) ker A′ = (Im A)⊥ = W ⊥ = 0 , { } showing A′ to be a monomorphism. Conversely, if A′ is a monomorphism, then ker A′ = 0 , implying { } Th. 2.19(a) ⊤ (a) Im A = (Im A)⊥ = (ker A′)⊤ = 0 ⊤ = W, { } showing A to be an epimorphism.  (d): Exercise. (e) is now immediate from combining (c) and (d).  3 SYMMETRIC GROUPS 38

Theorem 2.31. Let V,W be vector spaces over the field F with canonical embeddings ′′ ′′ ΦV : V V and ΦW : W W according to Def. 2.10(c). Let A (V,W ) and A′′ := (A−→′)′ (V ′′,W ′′). Then−→ we have ∈ L ∈L Φ A = A′′ Φ . (2.18) W ◦ ◦ V Proof. If v V and β W ′, then ∈ ∈ (Φ A)(v) (β)= β(Av)=(A′β)(v)=(Φ v) A′β = (Φ v) A′ (β) W ◦ V V ◦ = A′′(Φ v) (β)= (A′′ Φ )(v) (β)  V ◦ V   proves (2.18).   

3 Symmetric Groups

In preparation for the introduction of the notion of determinant (which we will find to be a useful tool to further study linear endomorphisms between finite-dimensional vector spaces), we revisit the symmetric group Sn of [Phi19, Ex. 4.9(b)].

Definition 3.1. Let k,n N, k n. A permutation π Sn is called a k-cycle if, and only if, there exist k distinct∈ numbers≤ i ,...,i 1,...,n∈ such that 1 k ∈ { } i if i = i , j 1,...,k 1 , j+1 j ∈ { − } π(i)= i1 if i = ik, (3.1) i if i / i1,...,ik . ∈ { } A 2-cycle is also known as a transposition .

Notation 3.2. Let n N, π S . ∈ ∈ n (a) One writes i i ... i π = 1 2 n , π(i ) π(i ) ... π(i )  1 2 n  where i ,...,i = 1,...,n . { 1 n} { } (b) If π is a k-cycle as in (3.1), k N, then one also writes ∈

π =(i1 i2 ... ik). (3.2) 3 SYMMETRIC GROUPS 39

Example 3.3. (a) Consider π S , ∈ 5 54321 π = =(315). 34125   Then π(1) = 5, π(2) = 2, π(3) = 1, π(4) = 4, π(5) = 3.

(b) Letting π S5 be as in (a), we have (recalling that the composition on Sn is merely the usual composition∈ of maps)

54321 12345 π = = (3 5)(3 1). 41253 23451   

Lemma 3.4. Let α,k,n N, α k n, and consider distinct numbers i1,...,ik 1,...,n . Then, in S , the∈ following≤ statements≤ hold true: ∈ { } n (a) One has (i1 i2 ... ik)=(iα iα+1 ... ik i1 ... iα−1).

(b) Let n 2. If 1 <α

(i1 i2 ... ik)(i1 iα)=(i1 iα+1 ... ik)(i2 ... iα);

and, moreover,

(i1 i2 ... ik)(i1 ik)=(i2 ... ik)(i1).

(c) Let n 2. Given β,l N, β l n, and distinct numbers j1, . . . , jl 1,...,n i ,...,i≥ , one has ∈ ≤ ≤ ∈ { } \ { 1 k}

(i1 ... ik)(j1 . . . jl)(i1 j1)=(j1 i2 i3 ... ik i1 j2 j3 . . . jl).

Proof. Exercise. 

Notation 3.5. Let M,N be sets. Define

S(M,N) := (f : M N): f bijective . { −→ } Proposition 3.6. Let M,N be sets with #M =#N = n N , S := S(M,N) (cf. Not. ∈ 0 3.5). Then #S = n!; in particular #SM = n!. 3 SYMMETRIC GROUPS 40

Proof. We conduct the proof via induction: If n = 0, then S contains precisely the empty map (i.e. the empty set) and #S = 1 = 0! is true. If n = 1 and M = a , N = b , then S contains precisely the map f : M N, f(a)= b, and #S =1=1!{ } is true.{ } For the induction step, fix n N and assume−→ #M =#N = n +1. Let a M and ∈ ∈ := S M a ,N b . (3.3) A \ { } \ { } b[∈N   Since the union in (3.3) is finite and disjoint, one has

ind.hyp. # = #S M a ,N b = (n!)=(n + 1) n!=(n + 1)!. A \ { } \ { } · Xb∈N   Xb∈N Thus, it suffices to show

φ : S , φ(f): M a N f(a) , φ(f) := f ↾ , −→ A \ { } −→ \ { } M\{a}

is well-defined and bijective. If f : M N is bijective, then f ↾M\{a}: M a N f(a) is bijective as well, i.e. φ is−→ well-defined. Suppose f,g S with \f {=}g −→. If f(a\) {= g(a}), then φ(f) = φ(g), as they have different ranges. If f(a)=∈ g(a), then6 there exists6 x M a with6 f(x) = g(x), implying φ(f)(x) = f(x) = g(x) = φ(g)(x), i.e., once again,∈ φ(\f) {=}φ(g). Thus,6 φ is injective. Now let h S M 6 a ,N b for some b N. Letting 6 ∈ \ { } \ { } ∈  b for x = a, f : M N, f(x) := −→ h(x) for x = a, ( 6 we have φ(f)= h, showing φ to be surjective as well. 

Theorem 3.7. Let n N. ∈ (a) Each permutation can be decomposed into finitely many disjoint cycles: For each π Sn, there exists a decomposition of 1,...,n into disjoint sets A1,...,AN , N ∈ N, i.e. { } ∈ N 1,...,n = A and A A = for i = j, (3.4) { } i i ∩ j ∅ 6 i=1 [ such that Ai consists of the distinct elements ai1,...,ai,Ni and

π =(a ... a ) (a ... a ). (3.5) N1 N,NN ··· 11 1,N1 The decomposition (3.5) is unique up to the order of the cycles. 3 SYMMETRIC GROUPS 41

(b) If n 2, then every permutation π Sn is the composition of finitely many trans- positions,≥ where each transposition permutes∈ two juxtaposed elements, i.e.

π = τN τ1, (3.6) π∈∀Sn N∃∈N τ1,...,τ∃N ∈T ◦···◦

where T := (i i +1) : i 1,...,n 1 . ∈ { − }  Proof. (a): We prove the statement by induction on n. For n = 1, there is nothing to prove. Let n> 1 and choose i 1,...,n . We claim that ∈ { }

πk(i)= i πl(i) = i . (3.7) k∈∃N ∧l∈{1,...,k ∀ −1} 6   k Indeed, since 1,...,n is finite, there must be a smallest k N such that π (i) A1 := i,π(i),...,πk{−1(i) . Since} π is bijective, it must be πk(i)=∈ i and (iπ(i) ...,π∈k−1(i)) is{ a k-cycle. We are} already done in case k = n. If k

i1 for i = ik,

(i1 i2)(i2 i3) (ik−1 ik)(i)= il+1 for i = il, l 1,...,k 1 , i∈{1∀,...,n} ···  ∈ { − } i for i / i1,...,ik , ∈ { }  3 SYMMETRIC GROUPS 42

proving (3.8). To finish the proof of (b), we observe that every transposition is a composition of finitely many elements of T : If i, j 1,...,n , i < j, then ∈ { } (i j)=(i i + 1) (j 2 j 1)(j 1 j) (i +1 i + 2)(i i +1): (3.9) ··· − − − ··· Indeed, (i i + 1) (j 2 j 1)(j 1 j) (i +1 i + 2)(i i + 1)(k) k∈{1∀,...,n} ··· − − − ··· j for k = i, i for k = j, =  k for i

π = hi (3.10) i=1 Y 3 SYMMETRIC GROUPS 43

with transpositions h1,...,hk Sn and π is decomposed into N cycles according to Th. 3.7(a), then ∈ k n N (mod 2), (3.11) ≡ − i.e. the parity of k is uniquely determined by π. (b) The map 1 for n =1, sgn : Sn 1, 1 , sgn(π) := (a) −→ {− } (( 1)k = ( 1)n−N (mod 2) for n 2, − − ≥ where, for n 2, k = k(π) and N = N(π) are as in (a), constitutes a group epimorphism (here,≥ 1, 1 is considered as a multiplicative subgroup of R (or Q) {− } – as we know all groups with two elements to be isomorphic, we have ( 1, 1 , ) = {− } · ∼ (Z2, +)). One calls sgn(π) the sign, signature, or signum of the permutation π.

Proof. (a): We conduct the proof via induction on k: For k = 0, the product in (3.10) is empty, i.e. π = Id = (1) (n), ··· yielding N = n, showing (3.11) to hold. If k = 1, then π = h1 is a transposition and, thus, has N = n 1 cycles, showing (3.11) to hold once again. Now assume (3.11) for − k 1 by induction and consider π = k+1 h = k h h . Thus, π = π h , ≥ i=1 i i=1 i k+1 k k+1 k where πk := i=1 hi. If πk has Nk cycles,Q then, by induction,Q 

Q k n Nk (mod 2). (3.12) ≡ − Moreover, from Rem. 3.8 we know N = Nk +1 or N = Nk 1. In both cases, (3.12) implies k +1 n N (mod 2), completing the induction. − ≡ − (b): For n = 1, there is nothing to prove. For n 2, we first note sgn to be well-defined, ≥ as the number of cycles N(π) is uniquely determined by π Sn (and each π Sn can be written as a product of transpositions by Th. 3.7(b)).∈ Next, we note sgn∈ to be surjective, since, for the identity, we can choose k = 0, i.e. sgn(Id) = ( 1)0 = 1, and, for each transposition τ S (as n 2, S contains at least the transposition− τ = (1 2)), ∈ n ≥ n we can choose k = 1, i.e. sgn(τ)=( 1)1 = 1. To verify sgn to be a homomorphism, − − let π,σ Sn. By Th. 3.7(b), there are transpositions τ1,...,τkπ ,h1,...,hkσ Sn such that ∈ ∈ kπ kσ kπ kσ π = τ , σ = h πσ = τ h , i i ⇒ i i i=1 i=1 i=1 ! i=1 ! Y Y Y Y implying sgn(πσ)=( 1)kπ+kσ =( 1)kπ ( 1)kσ = sgn(π) sgn(σ), − − − thus, completing the proof that sgn constitutes a homomorphism.  3 SYMMETRIC GROUPS 44

Proposition 3.12. Let n N. ∈ (a) One has π(i) π(j) sgn(π)= − . (3.13) π∈∀Sn i j 1≤i

where, for each i 1,...,N , γi := (ai1 ... ai,Ni ) is a cycle of length Ni N. Then ∈ { } ∈ PN (N −1) sgn(π)=( 1) i=1 i . (3.15) − Proof. (a): For each π S , let σ(π) denote the value given by the right-hand side of ∈ n (3.13). If n = 1, then Sn = Id and σ(Id) = 1 = sgn(Id), since the product in (3.13) is empty (and, thus, equal to 1).{ } For n 2, we first show ≥

σ(π1 π2)= σ(π1)σ(π2): π1,π∀2∈Sn ◦ For each π ,π S , one computes 1 2 ∈ n π (π (i)) π (π (j)) σ(π π ) = 1 2 − 1 2 1 ◦ 2 i j 1≤i

thereby establishing the case. Next, if τ Sn is a transposition, then there exist elements i, j 1,...,n such that i < j and τ =(∈ i j). Thus, ∈ { } τ(i) τ(j) j i σ(τ)= − = − = 1 i j i j − − − holds for each transposition τ. In consequence, if π Sn is the composition of k N transpositions, then ∈ ∈ Th. 3.11(b) σ(π)=( 1)k = sgn(π), − proving (a). 3 SYMMETRIC GROUPS 45

(b): Using (3.14) together with the homomorphism property of sgn given by Th. 3.11(b), if suffices to show that sgn(γ)=( 1)k−1 (3.16) − holds for each cycle γ := (i ... i ) S , where i ,...,i are distinct elements of 1 k ∈ n 1 k 1,...,n , k N. According to (3.8), we have { } ∈ γ =(i ... i )=(i i )(i i ) (i i ), 1 k 1 2 2 3 ··· k−1 k showing γ to be the product of k 1 transpositions, thereby proving (3.16) and the proposition. − 

Definition 3.13. Let n N, n 2, and let sgn : Sn 1, 1 be the group homomorphism defined in∈ Th. 3.11(b)≥ above. We call π −→S {even− }if, and only if, ∈ n sgn(π) = 1; we call π odd if, and only if, sgn(π) = 1. The property of being even or odd is called the parity of the permutation π. Moreover,− we call A := kersgn = π S : sgn(π)=1 n { ∈ n } the alternating group on 1,...,n . { } Proposition 3.14. Let n N, n 2. ∈ ≥

(a) An is a normal subgroup of Sn and one has S /A = Imsgn = 1, 1 = Z (3.17) n n ∼ { − } ∼ 2 (where 1, 1 is considered with multiplication and Z2 = 0, 1 is considered with addition{ modulo− } 2). { }

(b) For each transposition τ Sn, one has τ / An, Sn = (Anτ) ˙ An, where we recall that ˙ denotes a disjoint union∈ and A τ denotes∈ the coset πτ :∪π A . Moreover, ∪ n { ∈ n} #An = #(Anτ)=(n!)/2.

Proof. (a): As the kernel of a homomorphism, An is a normal subgroup by [Phi19, Ex. 4.25(a)] and, thus, (3.17) is immediate from the isomorphism theorem [Phi19, Th. 4.27(b)]. (b): Let τ S be a transposition. Since sgn(τ)=( 1)1 = 1, τ / A . Moreover, if ∈ n − − ∈ n π An, then sgn(πτ) = sgn(π)sgn(τ)=1 ( 1) = 1, showing (Anτ) An = . Let π ∈ S A . Then · − − ∩ ∅ ∈ n \ n sgn(π)= 1 sgn(πτ)=1 πτ A π = πττ A τ, − ⇒ ⇒ ∈ n ⇒ ∈ n showing Sn =(Anτ) An. To prove #An = #(Anτ), we note the maps φ : An Anτ, φ(π) := πτ, φ−1 : A∪τ A , φ−1(π) := πτ, to be inverses of each other and,−→ thus, n −→ n bijective. Moreover, as we know #Sn = n! from Prop. 3.6, we have n!=#Sn = #A + #(A τ)=2 (#A ), thereby completing the proof.  n n · n 4 MULTILINEAR MAPS AND DETERMINANTS 46

4 Multilinear Maps and Determinants

Employing the preparations of the previous section, we are now in a position to formulate an ad hoc definition of the notion of determinant. However, it seems more instructive to first study some more general related notions that embed determinants into a context that is also of independent interest and importance. Determinants are actually rather versatile objects: We will see below that, given a finite-dimensional vector space V over a field F , they can be viewed as maps det : (V,V ) F , assigning numbers to linear endomorphisms. However, they can also beL viewed as−→ functions det : (n,F ) F , assigning numbers to quadratic matrices, and also as polynomial functionsM det : F−→n2 F of degree n in n2 variables. But the point of view, we will focus on first, is determinants−→ as (alternating) multilinear forms (i.e. F -valued maps) det : V n F . −→ We start with a general introduction to multilinear maps, which have many important applications, not only in Algebra, but also in Analysis, where they, e.g., occur in the form of higher order total derivatives of maps from Rn to Rm (cf. [Phi16b, Sec. 4.6]) and when studying the integration of so-called differential forms (see, e.g., [For17, 19] § and [K¨on04, Sec. 13.1]).

4.1 Multilinear Maps

Definition 4.1. Let V and W be vector spaces over the field F , α N. We call a map ∈ L : V α W (4.1) −→ multilinear (more precisely, α times linear, bilinear for α = 2) if, and only if, it is linear in each component, i.e., for each x1,...,xi−1,xi+1,...,xα,v,w V , i 1,...,α and each λ,µ F : ∈ ∈ { } ∈ L(x1,...,xi−1,λv + µw,xi+1,...,xα) = λL(x1,...,xi−1,v,xi+1,...,xα)+ µL(x1,...,xi−1,w,xi+1,...,xα), (4.2) where we note that, here, and in the following, the superscripts merely denote upper indices and not exponentiation. We denote the set of all α times linear maps from V α into W by α(V,W ). We also set 0(V,W ) := W . In extension of Def. 2.2(a), we also call L αL(V,F ) a multilinear formL, a bilinear form for α = 2. ∈L α Remark 4.2. In the situation of Def. 4.1, each (V,W ), α N0, constitutes a vector space over F : It is a subspace of the vector spaceL over F of all functions∈ from V α into W , since, clearly, if K,L : V α W are both α times linear and λ,µ F , then λK + µL −→ ∈ is also α times linear. — 4 MULTILINEAR MAPS AND DETERMINANTS 47

The following Th. 4.3 is in generalization of [Phi19, Th. 6.6].

Theorem 4.3. Let V and W be vector spaces over the field F , α N. Moreover, let B be a basis of V . Then each α times linear map L α(V,W ) is uniquely∈ determined α ∈L by its values on B : More precisely, if (w ) α is a family in W , and, for each v V , b b∈B ∈ Bv and cv : Bv F 0 are as in [Phi19, Th. 5.19] (providing the coordinates of v with respect to B−→in the\ usual{ } way), then the map

α 1 α 1 1 α α L : V W, L(v ,...,v )= L c 1 (b ) b ,..., c α (b ) b −→  v v  b1∈B bα∈B α Xv1 Xv  1 α  := c 1 (b ) c α (b ) w 1 α , (4.3) v ··· v (b ,...,b ) (b1,...,bα)∈B ×···×B α Xv1 v is α times linear, and L˜ α(V,W ) with ∈L

L˜(b)= wb, (4.4) b∈∀Bα

implies L = L˜.

Proof. Exercise: Apart from the more elaborate notation, everything works as in the proof of [Phi19, Th. 6.6]. 

The following Th. 4.4 is in generalization of [Phi19, Th. 6.19].

Theorem 4.4. Let V and W be vector spaces over the field F , let BV and BW be bases 1 α of V and W , respectively, let α N. Given b ,...,b BV and b BW , and using Th. α ∈ ∈ ∈ 4.3, define maps L 1 α (V,W ) by letting b ,...,b ,b ∈L ˜1 ˜α 1 α ˜1 ˜α b for (b ,..., b )=(b ,...,b ), Lb1,...,bα,b(b ,..., b ) := (4.5) (0 otherwise.

1 α α Let := L 1 α : (b ,...,b ) (B ) , b B . B { b ,...,b ,b ∈ V ∈ W } (a) is linearly independent. B 1 n (b) If V is finite-dimensional, dim V = n N, BV = b ,...,b , then constitutes a α ∈ { } 1 B m basis for (V,W ). If, in addition, dim W = m N, BW = w ,...,w , then we can writeL ∈ { } dim α(V,W ) = (dim V )α dim W = nα m. (4.6) L · · 4 MULTILINEAR MAPS AND DETERMINANTS 48

(c) If dim V = and dim W 1, then ( α(V,W ) and, in particular, is not a basis of α(∞V,W ). ≥ hBi L B L Proof. (a): We verify that the elements of are linearly independent: Let M,N N. 1 α 1 α α B 1 M ∈ Let (b1,...,b1 ),..., (bN ,...,bN ) (BV ) be distinct and let w ,...,w BW be distinct as well. Assume λ F to∈ be such that ∈ lk ∈ M N

L := λlkL 1 α l =0. bk,...,bk ,w Xl=1 Xk=1 Let k¯ 1,...,N . Then ∈ { } M N M 1 α 1 α l 0= L(b¯,...,b¯ )= λlkL 1 α l (b¯,...,b¯ )= λ ¯ w k k bk,...,bk ,w k k lk Xl=1 Xk=1 Xl=1 l implies λ ¯ = = λ ¯ = 0 due to the linear independence of the w B . As this 1k ··· Mk ∈ W holds for each k¯ 1,...,N , we have established the linear independence of . ∈ { } B (b): According to (a), it remains to show = α(V,W ). Let L α(V,W ) α hBi L ∈ L and (i1,...,iα) 1,...,n . Then there exists a finite set B(i1,...,iα) BW such i1 i∈α { } ⊆ that L(b ,...,b ) = w∈B λww with λw F . Now let w1,...,wM , M N, (i1,...,iα) ∈ ∈ be an enumeration of the finite set α B . Then there exist λ F , P I∈{1,...,n} I j,(i1,...,iα) ∈ (j, (i ,...,i )) 1,...,M 1,...,n α, such that 1 α ∈ { } × { S } M i1 iα L(b ,...,b )= λj,(i ,...,i )wj. α 1 α (i1,...,iα)∈{∀ 1,...,n} j=1 X M Letting L˜ := α λ L i iα , we claim L˜ = L. Indeed, j=1 (i1,...,iα)∈{1,...,n} j,(i1,...,iα) b 1 ,...,b ,wj P P M ˜ j1 jα j1 jα L(b ,...,b )= λj,(i ,...,i )L i1 iα (b ,...,b ) α 1 α b ,...,b ,wj (j1,...,jα)∀∈{1,...,n} j=1 α X (i1,...,iαX)∈{1,...,n} M

= λj,(i1,...,iα)δ(i1,...,iα),(j1,...,jα)wj j=1 α X (i1,...,iαX)∈{1,...,n} M j1 jα = λj,(j1,...,jα)wj = L(b ,...,b ), j=1 X proving L˜ = L by Th. 4.3. Since L , the proof of (b) is complete. ∈ hBi α (c): As dim W 1, there exists w BW . If L , then b (BV ) : L(b) = 0 is finite. Thus,≥ if B is infinite, then∈ the map L∈ hBiα(V,W ){ with∈ L(b) := w for6 each} V ∈ L b (B )α is not in , proving (c).  ∈ V hBi 4 MULTILINEAR MAPS AND DETERMINANTS 49

4.2 Alternating Multilinear Maps and Determinants

Definition 4.5. Let V and W be vector spaces over the field F , α N. Then A α(V,W ) is called alternating if, and only if, ∈ ∈ L

vi = vj A(v1,...,vα)=0. (4.7) (v1,...,v∀α)∈V α i,j∈{1,...,α∃ },i6=j ⇒   Moreover, define the sets

Alt0(V,W ) := W, Altα(V,W ) := A α(V,W ): A alternating ∈L (note that this immediately yields Alt1(V,W)= (V,W )). L Remark 4.6. Let V and W be vector spaces over the field F , α N. Then Altα(V,W ) is a vector subspace of α(V,W ): Indeed, 0 Altα(V,W ) and,∈ if λ F and A, B α(V,W ) satisfy (4.7), thenL λA and A + B satisfy∈ (4.7) as well. ∈ ∈ L Notation 4.7. Let V,W be a sets and α N. Then, for each f (V α,W ) = W V α and each permutation π S , define ∈ ∈ F ∈ α (πf): V α W, (πf)(v ,...,v ) := f(v ,...,v ). −→ 1 α π(1) π(α) Lemma 4.8. Let V,W be a sets and α N. Then ∈

(π1π2)f = π1(π2f). (4.8) π1,π∀2∈Sα f∈F(∀V α,W )

α α Proof. For each (v1,...,vα) V , let (w1,...,wα) := (vπ1(1),...,vπ1(α)) V . Then, for each i 1,...,α , we have∈ w = v and w = v . Thus, we∈ compute ∈ { } i π1(i) π2(i) π1(π2(i))

(π1π2)f (v1,...,vα)= f(v(π1π2)(1),...,v(π1π2)(α))= f(vπ1(π2(1)),...,vπ1(π2(α))) = f(w ,...,w )=(π f)(w ,...,w )  π2(1) π2(α) 2 1 α

=(π2f)(vπ1(1),...,vπ1(α))= π1(π2f) (v1,...,vα), thereby establishing (4.8).  

Proposition 4.9. Let V and W be vector spaces over the field F , α N. Then, given A α(V,W ), the following statements are equivalent for char F =2, where∈ “(i) (ii)” also∈L holds for char F =2. 6 ⇒

(i) A is alternating. 4 MULTILINEAR MAPS AND DETERMINANTS 50

(ii) For each permutation π S , one has ∈ α A(vπ(1),...,vπ(α)) = sgn(π) A(v1,...,vα). (4.9) (v1,...,v∀α)∈V α

Proof. “(i) (ii)”: We first prove (4.9) for transpositions π =(ii+1) with i 1,...,α 1 : Let vk ⇒V , k 1,...,α i,i +1 , and define ∈ { − } ∈ ∈ { } \ { } B : V V W, B(v,w) := A(v1,...,vi−1,v,w,vi+2,...,vα). × −→ Then, as A is α times linear and alternating, B is bilinear and alternating. Thus, for each v,w V , ∈ 0= B(v + w,v + w)= B(v,v)+ B(v,w)+ B(w,v)+ B(w,w)= B(v,w)+ B(w,v), implying B(v,w) = B(w,v), proving (4.9) for this case. For general π S , (4.9) − ∈ α now follows from Th. 3.7(b), Th. 3.11(b), and Lem. 4.8: Let T := (ii + 1) Sα : i 1,...,α 1 . Then, given π S , Th. 3.7(b) implies the existence of π ,...,π∈ T∈, { − } ∈ α  1 N ∈ N N, such that π = π π . Thus, for each (v1,...,vα) V α, ∈ 1 ··· N ∈ π(1) π(α) 1 α Lem. 4.8 1 α A(v ,...,v )=(πA)(v ,...,v ) = π1 ... (πN A) (v ,...,v )

Th. 3.11(b)  =( 1)N A(v1,...,vα) = =sgn(π) A (v1,...,vα), − proving (4.9). “(ii) (i)”: Let char F = 2. Let (v1,...,vα) V α and suppose i, j 1,...,α are such⇒ that i = j as well as6 vi = vj. Then ∈ ∈ { } 6 (ii) A(v1,...,vα) = sgn(ij)A(v1,...,vα)= A(v1,...,vα). − Thus, 2A(v1,...,vα)=0and2 = 0 implies A(v1,...,vα)=0.  6 The following Ex. 4.10 shows that “(i) (ii)” can not be expected to hold in Prop. 4.9 for char F = 2: ⇐

2 Example 4.10. Let F := Z2 = 0, 1 , V := F . Consider the bilinear map A : V F , A(λ,µ) := λµ. Then A is not{ alternating,} since A(1, 1) = 1 1=1 = 0. However,−→ (4.9) does hold for A (due to 1 = 1 F ): A(1, 1) = 1 =· 1 = 6 A(1, 1) (and 0= A(0, 0) = A(0, 0) = A(0, 1) = −A(1,∈0)). − − − − Proposition 4.11. Let V and W be vector spaces over the field F , α N. Then, given A α(V,W ), the following statements are equivalent: ∈ ∈L 4 MULTILINEAR MAPS AND DETERMINANTS 51

(i) A is alternating.

(ii) The implication in (4.7) holds whenever j = i +1 (i.e. whenever two juxtaposed arguments are identical).

(iii) If the family (v1,...,vα) in V is linearly dependent, then A(v1,...,vα)=0.

Proof. Exercise. 

Definition 4.12. Let F be a field and V := F α, α N. Then det Altα(V,F ) is called ∈ ∈ a determinant if, and only if, det(e1,...,eα)=1, where e1,...,eα denote the standard basis vectors of V . Definition 4.13. Let V be a vector space over the field F , α N. We define the map Λα : (V ′)α Altα(V,F ), called outer product or wedge product∈ of linear forms, as follows: −→ Λα(ω ,...,ω )(v1,...,vα) := sgn(π) ω (v1) ω (vα) (4.10) 1 α π(1) ··· π(α) π∈S Xα ′ (cf. Th. 4.15 below). Given linear forms ω1,...,ωα V , it is common to also use the notation ∈ ω ω := Λα(ω ,...,ω ). 1 ∧···∧ α 1 α Lemma 4.14. Let V be a vector space over the field F , α N. Moreover, let Λα denote the wedge product of Def. 4.13. Then one has ∈

α 1 α π(1) π(α) Λ (ω1,...,ωα)(v ,...,v ) := sgn(π) ω1(v ) ωα(v ). ′ ω1,...,ω∀α∈V v1,...,v∀α∈V ··· π∈Sα X (4.11)

Proof. Using the commutativity of multiplication in F , the bijectivity of permutations, as well as sgn(π) = sgn(π−1), we obtain

Λα(ω ,...,ω )(v1,...,vα)= sgn(π) ω (v1) ω (vα) 1 α π(1) ··· π(α) π∈Sα X −1 −1 = sgn(π) ω (vπ (1)) ω (vπ (α)) 1 ··· α π∈S Xα = sgn(π) ω (vπ(1)) ω (vπ(α)), 1 ··· α π∈S Xα proving (4.11).  4 MULTILINEAR MAPS AND DETERMINANTS 52

Theorem 4.15. Let V be a vector space over the field F , α N. Moreover, let Λα denote the wedge product of Def. 4.13. ∈

(a) Λα is well-defined, i.e. it does, indeed, map (V ′)α into Altα(V,F ). (b) Λα Altα V ′, Altα(V,F ) . ∈  ′ α Proof. (a): Let ω1,...,ωα V and A := Λ (ω1,...,ωα). First, we show A to be α times linear: To this end, let∈ λ,µ F and x,v1,...,vα V . Then ∈ ∈ A(v1,...,vi−1,λvi + µx,vi+1,...,vα) = sgn(π) ω (v1) ω (vi−1) λω (vi)+ µω (x) ω (vi+1) ...ω (vα) π(1) ··· π(i−1) π(i) π(i) π(i+1) π(α) π∈Sα X  = λ sgn(π) ω (v1) ω (vi−1)ω (vi)ω (vi+1) ...ω (vα) π(1) ··· π(i−1) π(i) π(i+1) π(α) π∈S Xα + µ sgn(π) ω (v1) ω (vi−1)ω (x)ω (vi+1) ...ω (vα) π(1) ··· π(i−1) π(i) π(i+1) π(α) π∈S Xα = λA(v1,...,vi−1,vi,vi+1,...,vα)+ µA(v1,...,vi−1,x,vi+1,...,vα), proving A α(V,F ). It remains to show A is alternating. To this end, let i, j 1,...,α with∈ L i < j. Then, according to Prop. 3.14(b). τ := (i j) / A and S =∈ { } ∈ α α (Aατ) ˙ Aα. If k 1,...,α i, j , then, for each π Sα, (πτ)(k) = π(k). Thus, if (v1,...,v∪ α) V α ∈is { such that} \vi {= v}j, then ∈ ∈ (4.11) A(v1,...,vα) = sgn(π) ω (vπ(1)) ω (vπ(α)) 1 ··· α π∈S Xα = sgn(π) ω (vπ(1)) ω (vπ(α)) + sgn(πτ) ω (v(πτ)(1)) ω (v(πτ)(α)) 1 ··· α 1 ··· α π∈A Xα   = sgn(π) ω (vπ(1)) ω (vπ(α)) sgn(π) ω (vπ(1)) ω (vπ(α)) =0, 1 ··· α − 1 ··· α π∈A Xα   showing A to be alternating. (b) follows from (a): According to (a), Λα maps (V ′′)α into Altα(V ′,F ). Thus, if Φ : V V ′′ is the canonical embedding, then, for each v1,...,vα V α and each ω ,...,ω−→ V ′, ∈ 1 α ∈ Λα(Φv1,..., Φvα)(ω ,...,ω ) = sgn(π) (Φvπ(1))(ω ) (Φvπ(α))(ω ) 1 α 1 ··· α π∈S Xα = sgn(π) ω (vπ(1)) ω (vπ(α)) 1 ··· α π∈S Xα (4.11) α 1 α = Λ (ω1,...,ωα)(v ,...,v ), 4 MULTILINEAR MAPS AND DETERMINANTS 53

thereby completing the proof.  Corollary 4.16. Let F be a field and V := F α, α N. Given the standard basis α ∈ 1 α e1,...,eα of V , we define a map det : V F as follows: For each (v ,...,v ) {V α such that,} for each j 1,...,α , vj = −→α a e with (a ) (α,F ), let ∈ ∈ { } i=1 ji i ji ∈M det(v1,...,vα) := Psgn(π) a a . (4.12) 1π(1) ··· απ(α) π∈S Xα Then det is a determinant according to Def. 4.12. Moreover,

det(v1,...,vα)= sgn(π) a a (4.13) π(1)1 ··· π(α)α π∈S Xα also holds.

Proof. For each i 1,...,α , let ωi := πei : V F be the projection onto the ∈ { } −→ j coordinate with respect to ei according to Ex. 2.1(a). Then, if v is as above, we have j ωi(v )= aji. Thus,

(4.10) (4.12) Λα(ω ,...,ω )(v1,...,vα) = sgn(π) ω (v1) ω (vα) = det(v1,...,vα), 1 α π(1) ··· π(α) π∈S Xα showing det Altα(V,F ) by Th. 4.15(a). Then we also have ∈ (4.11) det(v1,...,vα)=Λα(ω ,...,ω )(v1,...,vα) = sgn(π) ω (vπ(1)) ω (vπ(α)) 1 α 1 ··· α π∈S Xα = sgn(π) a a , π(1)1 ··· π(α)α π∈S Xα 1 α proving (4.13). Moreover, for (v ,...,v )=(e1,...,eα), (aji)=(δji) holds. Thus,

det(e ,...,e )= sgn(π) δ δ = sgn(Id) = 1, 1 α 1π(1) ··· απ(α) π∈S Xα which completes the proof. 

According to Th. 4.17(b) below, the map defined by (4.12) is the only determinant in Altα(F α,F ).

Theorem 4.17. Let V and W be vector spaces over the field F , let BV and BW be bases ′ of V and W , respectively, let α N. Moreover, as in Cor. 2.4, let B := ωb : b BV , where ∈ { ∈ } ′ ωb V , ωb(a) := δba . (4.14) (b,a)∈B∀V ×BV ∈   4 MULTILINEAR MAPS AND DETERMINANTS 54

In addition, assume < to be a strict total order on B (for dim V = , the order on V ∞ BV exists due to the axiom of choice, cf. [Phi19, Th. A.52(iv)]) and define (B )α := (b1,...,bα) (B )α : b1 < nB). If, in B ∅ addition, dim W = m N, then we can write ∈ n m for α n, dim Altα(V,W )= α · ≤ (4.16) (0  for α>n. In particular, dim Altα(F α,F )=1, showing that the map det Altα(F α,F ) is uniquely determined by Def. 4.12 and given by (4.12). ∈ (c) If V is infinite-dimensional and dim W 1, then is not a basis for Altα(V,W ). ≥ B Proof. (a): We verify that the elements of are linearly independent: Note that B π(1) π(α) α (b ,...,b ) / (BV ) . (4.17) 1 α α ord (b ,...,b )∀∈(BV )ord Id6=π∀∈Sα ∈ Thus, if (b1,...,bα), (c1,...,cα) (B )α , ∈ V ord then

α 1 α 1 α Λ (ω 1 ,...,ω α )(c ,...,c ) = sgn(π) ω π(1) (c ) ω π(α) (c ) b b b ··· b π∈S Xα 1 α 1 α (4.17),(4.14) 1 for(b ,...,b )=(c ,...,c ), = (4.18) (0 otherwise.

1 α 1 α α Now let M,N N, let (b1,...,b1 ),..., (bN ,...,bN ) (BV )ord be distinct, and let y ,...,y B ∈be distinct as well. Assume λ F to∈ be such that 1 M ∈ W lk ∈ M N

A := λlkA 1 α =0. bk,...,bk ,yl Xl=1 Xk=1 4 MULTILINEAR MAPS AND DETERMINANTS 55

Let k¯ 1,...,N . Then ∈ { } M N M 1 α α 1 α (4.18) 0= A(b¯,...,b¯ )= λlkΛ (ω 1 ,...,ωbα )(b¯,...,b¯ ) yl = λ ¯ yl k k bk k k k lk Xl=1 Xk=1 Xl=1

implies λ1k¯ = = λMk¯ = 0 due to the linear independence of the yl BW . As this holds for each ···k¯ 1,...,N , we have established the linear independence∈ of . ∈ { } B (b): Note that

(4.17) [Phi16a, Prop. 5.18(b)] n #(B )α = # I B : #I = α = . V ord ⊆ V α    According to (a), it remains to show = Altα(V,W ). Let A Altα(V,W ) and i1 iα α hBi ∈ (b ,...,b ) (BV )ord. Then there exists a finite set B(i1,...,iα) BW such that i1 iα ∈ ⊆ A(b ,...,b ) = y∈B λyy with λy F . Now let y1,...,yM , M N, be an (i1,...,iα) ∈ ∈ enumeration of the finite set P

B(i1,...,iα). (i ,...,i )∈{1,...,n}α:(bi1 ,...,biα )∈(B )α 1 α [ V ord

i1 iα α Then, for each (j, (b ,...,b )) 1,...,M (BV )ord, there exists λj,(i1,...,iα) F , such that ∈ { }× ∈ M i1 iα A(b ,...,b )= λj,(i1,...,iα)yj. i i α (b 1 ,...,b α∀)∈(BV ) ord j=1 X M ˜ i ˜ Letting A := i1 iα α λj,(i1,...,iα)Ab 1 ,...,biα ,y , we claim A = A. Indeed, for j=1 (b ,...,b )∈(BV )ord j each (bj1 ,...,bjα ) (B )α , P ∈P V ord M j1 jα α j1 jα ˜ i i A(b ,...,b ) = λj,(i1,...,iα)Λ (ωb 1 ,...,ωb α )(b ,...,b ) yj j=1 (bi1 ,...,biα )∈(B )α X X V ord M (4.18) = λj,(i1,...,iα)δ(i1,...,iα),(j1,...,jα) yj j=1 (bi1 ,...,biα )∈(B )α X X V ord M j1 jα = λj,(j1,...,jα) yj = A(b ,...,b ), j=1 X proving A˜ = A by (4.9) and Th. 4.3. Since A˜ , the proof of (b) is complete. ∈ hBi (c): Exercise.  4 MULTILINEAR MAPS AND DETERMINANTS 56

The following Th. 4.18 compiles some additional rules of importance for alternating multilinear maps:

Theorem 4.18. Let V and W be vector spaces over the field F , α N. The following rules hold for each A Altα(V,W ): ∈ ∈ (a) The value of A remains unchanged if one argument is replaced by the sum of that argument and a linear combination of the other arguments, i.e., if λ1,...,λα F , v1,...,vα V , and i 1,...,α , then ∈ ∈ ∈ { }

α 1 α 1 i−1 i j i+1 α A(v ,...,v )= A v ,...,v , v + λj v , v ,...,v  . j=1  Xj6=i      (b) If v1,...,vα V and a F are such that ∈ ji ∈ α j i w = ajiv , j∈{1∀,...,α} i=1 X then

A(w1,...,wα)= sgn(π) a a A(v1,...,vα) 1π(1) ··· απ(α) π∈S ! Xα = det(x1,...,xα) A(v1,...,vα), (4.19)

where α j α x := ajiei F j∈{1∀,...,α} ∈ i=1 X and e ,...,e is the standard basis of F α. { 1 α} (c) Suppose the family (v1,...,vα) in V is linearly independent and such that there exist w1,...,wα v1,...,vα with A(w1,...,wα) = 0. Then A(v1,...,vα) = 0 as well. ∈ h{ }i 6 6 4 MULTILINEAR MAPS AND DETERMINANTS 57

Proof. (a): One computes, for each i, j 1,...,α with i = j and λ = 0: ∈ { } 6 j 6 1 i j α A(v ,...,v + λj v ,...,v ) α A∈L (V,W ) −1 1 i j j α = λj A(v ,...,v + λj v ,...,λj v ,...,v ) α A∈L (V,W ) −1 1 i j α = λj A(v ,...,v ,...,λj v ,...,v ) −1 1 j j α +λj A(v ,...,λj v ,...,λj v ,...,v ) (4.7) −1 1 i j α = λj A(v ,...,v ,...,λj v ,...,v )+0 A∈Lα(V,W ) = A(v1,...,vα).

The general case of (a) then follows via a simple induction. (b): We calculate

α α A∈Lα(V,W ) A(w1,...,wα) = a a A(vi1 ,...,viα ) 1i1 ··· αiα i =1 i =1 X1 Xα (4.7) = a a A(vπ(1),...,vπ(α)) 1π(1) ··· απ(α) π∈S Xα (4.9) = sgn(π) a a A(v1,...,vα), 1π(1) ··· απ(α) π∈S ! Xα thereby proving the first equality in (4.19). The second equality in (4.19) is now an immediate consequence of Cor. 4.16. (c): Exercise. 

4.3 Determinants of Matrices and Linear Maps

In Def. 4.12, we defined a determinant as an alternating multilinear form on F α. In the following, we will see that determinants can be particularly useful when they are considered to be maps on quadratic matrices or maps on linear endomorphisms on vector spaces of finite dimension. We begin by defining determinants on quadratic matrices:

Definition 4.19. Let F be a field and n N. Then the map ∈ det : (n,F ) F, det(a ) := sgn(π) a a , (4.20) M −→ ji 1π(1) ··· nπ(n) π∈S Xn is called the determinant on (n,F ). M 4 MULTILINEAR MAPS AND DETERMINANTS 58

Notation 4.20. If F is a field, n N, and det : (n,F ) F is the determinant, then, for (a ) (n,F ), one commonly∈ uses the notationM −→ ji ∈M a11 ... a1n ...... := det(aji). (4.21)

an1 ... ann

Notation 4.21. Let F be a field and n N. As in [Phi19, Rem. 7.4(b)], we denote the columns and rows of a matrix A := (a )∈ (n,F ) as follows: ji ∈M a1i A . A ci := ci := . , ri := ri := ai1 ... ain . i∈{1∀,...,n}   ani      Remark 4.22. Let F be a field and n N. If we consider the rows r1,...,rn of the matrix (a ) (n,F ) as elements of F∈n, then, by Cor. 4.16, ji ∈M det(aji) = det(r1,...,rn), where the second det is the map det : (F n)n F defined as in Cor. 4.16. As a further consequence of Cor. 4.16, in combination with−→ Th. 4.17(b), we see that det of Def. 4.19 is the unique form on (n,F ) that is multilinear and alternating in the rows of the M matrix and that assigns the value 1 to the identity matrix. Example 4.23. Let F be a field. We evaluate (4.20) explicitly for n =1, 2, 3:

(a) For (a ) (1,F ), we have 11 ∈M a = det(a )= sgn(π) a = sgn(Id) a = a . | 11| 11 1π(1) 11 11 π∈S X1 (b) For (a ) (2,F ), we have ji ∈M a11 a12 = det(aji)= sgn(π) a1π(1)a2π(2) a21 a22 π∈S2 X

= sgn(Id) a a + sgn(12) a a = a a a a , 11 22 12 21 11 22 − 12 21 i.e. the determinant is the product of the elements of the main diagonal minus the product of the elements of the other diagonal: + − a11 a12 a21 a22 + − 4 MULTILINEAR MAPS AND DETERMINANTS 59

(c) For (a ) (3,F ), we have ji ∈M

a11 a12 a13 a a a = det(a )= sgn(π) a a a 21 22 23 ji 1π(1) 2π(2) 3π(3) a a a π∈S3 31 32 33 X

= sgn(Id) a11a 22a33 + sgn(12) a12a21a33 + sgn(13) a13a22a31 + sgn(23) a11a23a32 + sgn(123) a12a23a31 + sgn(132) a13a21a32 = a a a + a a a + a a a a a a + a a a + a a a . 11 22 33 12 23 31 13 21 32 − 12 21 33 13 22 31 11 23 32 To remember this formula, one can use the following tableau: 

+ + + − − − a11 a12 a13 a11 a12 a21 a22 a23 a21 a22 a31 a32 a33 a31 a32 + + + − − − One writes the first two columns of the matrix once again on the right and then takes the product of the first three diagonals in the direction of the main diagonal with a positive sign and the first three diagonals in the other direction with a negative sign.

We can now translate some of the results of Sec. 4.2 into results on determinants of matrices:

Corollary 4.24. Let F be a field, n N, and A := (a ) (n,F ). Let r ,...,r ∈ ji ∈ M 1 n denote the rows of A and let c1,...,cn denote the columns of A. Then the following rules hold for the determinant:

(a) det(Id )=1, where Id denotes the identity matrix in (n,F ). n n M (b) det(At) = det(A).

(c) det is multilinear with regard to matrix rows as well as multilinear with regard to matrix columns, i.e., for each v (1,n,F ), w (n, 1,F ), i 1,...,n , and ∈M ∈M ∈ { } 4 MULTILINEAR MAPS AND DETERMINANTS 60

λ,µ F : ∈ r1 r1 . .  .   .  ri−1 ri−1     det λri + µv = λ det(A)+ µ det  v  ,      ri+1  ri+1  .   .   .   .       r   r   n   n     

det c1 ... ci−1 λci + µw ci+1 ... cn = λdet(A)+ µ det c1 ... ci−1 w ci+1  ... cn .  (d) If λ F , then det(λA)= λn det(A). ∈ (e) For each permutation π S , one has ∈ n

rπ(1) . det  .  = det cπ(1) ... cπ(n) = sgn(π) det(A). rπ(n)      In particular, switching rows i and j or columns i and j, where i, j 1,...,n , i = j, changes the sign of the determinant, i.e. ∈ { } 6

r1 r1 . .  .   .  ri rj  .   .  det  .  = det  .  ,   −       rj  ri   .   .   .   .      r  r   n  n     det c ... c ... c ... c = det c ... c ... c ... c . 1 i j n − 1 j i n   (f) The following statements are equivalent:

(i) det A =0 (ii) The rows of A are linearly dependent. (iii) The columns of A are linearly dependent. 4 MULTILINEAR MAPS AND DETERMINANTS 61

(g) Multiplication Rule: If B := (b ) (n,F ), then det(AB) = det(A)det(B). ji ∈M (h) det(A)=0 if, and only if, A is singular. If A is invertible, then

−1 det(A−1)= det(A) .  (i) The value of the determinant remains the same if one row of a matrix is replaced by the sum of that row and a scalar multiple of another row. More generally, the determinant remains the same if one row of a matrix is replaced by the sum of that row and a linear combination of the other rows. The statement also remains true if the word “row” is replaced by “column”. Thus, if λ1,...,λn F and i 1,...,n , then ∈ ∈ { } r1 .  .  r1 ri−1 .  n  det(A) = det . = det ri + j=1 λj rj ,    j6=i    rn  Pri+1     .     .     r   n    n

det(A) = det(c1,...,cn) = det c1,...,ci−1,ci + λj cj,ci+1,...,cn . j=1  Xj6=i      Proof. As already observed in Rem. 4.22, det : (n,F ) F can be viewed as the unique map det Altn(F n,F ) with det(e ,...,eM) = 1, where−→ one considers ∈ 1 n

r1 . n n A = . (F ) :   ∈ rn     Since, for each j 1,...,n , ∈ { } n

rj = aj1 ... ajn = ajiei, i=1  X a1j n . cj =  .  = aijei, i=1 anj   X   4 MULTILINEAR MAPS AND DETERMINANTS 62

we have

(4.20) (4.12) det(A) = sgn(π) a a = det(r ,...,r ) 1π(1) ··· nπ(n) 1 n π∈S Xn (4.13) (4.12) = sgn(π) a a = det(c ,...,c ). (4.22) π(1)1 ··· π(n)n 1 n π∈S Xn

(a): det(Idn) = det(e1,...,en)=1. (b) is immediate from (4.22). (c) is due to det Altn(F n,F ) and (4.22). ∈ (d) is a consequence of (c). (e) is due to det Altn(F n,F ), (4.22), and (4.9). ∈ n n (f): Again, we use det Alt (F ,F ) and (4.22): If det(A) = 0, then det(Idn)=1 and Th. 4.18(c) imply (ii) and∈ (iii). Conversely, if (ii) or (iii) holds, then det(A) = 0 follows from Prop. 4.11(iii).

(g): Let C := (cji) := AB. Using Not. 4.21, we have

n n C A B A rj = rj B = ajiri , rj = ajiei j∈{1∀,...,n} i=1 i=1 ! X X and, thus,

(4.22) C C (4.19) A A B B (4.22) det(AB) = det(r1 ,...,rn ) = det(r1 ,...,rn ) det(r1 ,...,rn ) = det(A)det(B).

(h): As a consequence of [Phi19, Th. 7.17(a)], A is singular if, and only if, the columns of A are linearly dependent. Thus, det(A) = 0 if, and only if, A is singular due to (f). Moreover, if A is invertible, then

−1 (g) −1 1 = det(Idn) = det(AA ) = det(A) det(A ).

(i) is due to det Altn(F n,F ), (4.22), and Th. 4.18(a).  ∈ Theorem 4.25 (Block Matrices). Let F be a field. The determinant of so-called block matrices over F , where one block is a zero matrix (all entries 0), can be computed as the product of the determinants of the corresponding blocks. More precisely, if n,m N, ∈ 4 MULTILINEAR MAPS AND DETERMINANTS 63

then a11 ... a1n ...... ∗ an1 ... ann = det(aji)det(bji). (4.23) 0 ... 0 b11 ... b1m ......

0 ... 0 b ... b m1 mm

Proof. Suppose (a ) (n + m,F ), where ji ∈M

bi−n,j−n for i,j >n, aji = (4.24) 0 j>n and i n. ( ≤ Via obvious embeddings, we can consider S S and T := S S . n ⊆ m+n m {n+1,...,n+m} ⊆ m+n If π Sm+n, then there are precisely two possibilities: Either there exist ω Sn and σ T∈ such that π = ωσ, or there exists j n+1,...,m+n such that i := ∈π(j) n, ∈ m ∈ { } ≤ in which case ajπ(j) = 0. Thus,

det(a )= sgn(π) a a ji 1π(1) ··· n+m,π(n+m) π∈S Xn+m = sgn(ω)sgn(σ) a a a a 1ω(1) ··· n,ω(n) n+1,σ(n+1) ··· n+m,σ(n+m) (ω,σ)X∈Sn×Tm = sgn(ω) a a sgn(σ) a a 1ω(1) ··· nω(n) n+1,σ(n+1) ··· n+m,σ(n+m) ω∈S ! σ∈T ! Xn Xm = sgn(π) a a sgn(π) b b 1π(1) ··· nπ(n) 1π(1) ··· mπ(m) π∈S ! π∈S ! Xn Xm = det(aji)↾{1,...,n}×{1,...,n} det(bji), proving (4.23). 

Corollary 4.26. Let F be a field and n N. If (aji) (n,F ) is upper triangular or lower triangular, then ∈ ∈M n

det(aji)= akk. kY=1

Proof. For (aji) upper triangular, the statement follows from Th. 4.25 via an obvious induction on n N. If(a ) is lower triangular, then the transpose (a )t is upper ∈ ji ji triangular and the statement the follows from Cor. 4.24(b).  4 MULTILINEAR MAPS AND DETERMINANTS 64

Definition 4.27. Let F be a field, n N, n 2, A = (aji) (n,F ). For each j,i 1,...,n , let M (n 1,F ) be∈ the (n≥ 1) (n 1) submatrix∈ M of A obtained ∈ { } ji ∈M − − × − by deleting the jth row and the ith column of A – the Mji are called the minor matrices of A; define A := ( 1)i+j det(M ), (4.25) ji − ji where the Aji are called cofactors of A and the det(Mji) are called the minors of A. Let t A˜ := (Aji) denote the transpose of the matrix of cofactors of A, called the adjugate matrix of A.

Lemma 4.28. Let F be a field, n N, n 2, A = (aji) (n,F ). For each j,i 1,...,n , let R(j,i) (n,F∈) be the≥ matrix obtained from∈ MA by replacing the j-th∈ row { with the} standard (row)∈ M basis vector e , and let C(j,i) (n,F ) be the matrix i ∈M obtained from A by replacing the i-th column with the standard (column) basis vector ej, i.e.

a11 ... a1,i−1 a1i a1,i+1 ... a1n . . . . .  . . . . .  aj−1,1 ... aj−1,i−1 aj−1,i aj−1,i+1 ... aj−1,n   R(j,i) :=  0 ... 0 1 0 ... 0  ,   a ... a a a ... a   j+1,1 j+1,i−1 j+1,i j+1,i+1 j+1,n  . . . . .   . . . . .     an1 ... an,i−1 ani an,i+1 ... ann      a11 ... a1,i−1 0 a1,i+1 ... a1n . . . . .  . . . . .  aj−1,1 ... aj−1,i−1 0 aj−1,i+1 ... aj−1,n   C(j,i) :=  aj1 ... aj,i−1 1 aj,i+1 ... aj,n  .   aj+1,1 ... aj+1,i−1 0 aj+1,i+1 ... aj+1,n  . . . . .   . . . . .     a ... a 0 a ... a   n1 n,i−1 n,i+1 nn    Then, we have Aji = det R(j,i) = det C(j,i) . (4.26) j,i∈{1∀,...,n}   Proof. Let j,i 1,...,n , and let Mji denote the corresponding minor matrix of A according to Def.∈ { 4.27. Then}

Cor. 4.24(e) 1 0 Cor. 4.24(e) 1 det R(j,i) = ( 1)i+j = det C(j,i) = ( 1)j+i ∗ − Mji − 0 Mji ∗  (4.23) (4.25)  = ( 1)i+j det(M ) = A , − ji ji 4 MULTILINEAR MAPS AND DETERMINANTS 65

thereby proving (4.26). 

Theorem 4.29. Let F be a field, n N, n 2, A = (aji) (n,F ). Moreover, t ∈ ≥ ∈ M let A˜ := (Aji) be the adjugate matrix of A according to Def. 4.27. Then the following holds:

(a) AA˜ = AA˜ = (det A) Idn. (b) If det A =0, then det A˜ = (det A)n−1. 6 (c) If det A =0, then A−1 = (det A)−1 A˜. 6 n (d) Laplace Expansion by Rows: det A = i=1 ajiAji (expansion with respect to the jth row). P n (e) Laplace Expansion by Columns: det A = j=1 ajiAji (expansion with respect to the ith column). P

Proof. (a): Let C := (cji) := AA˜, D := (dji) := AA˜ . Also let R(j,i) and C(j,i) be as in Lem. 4.28. Then, we compute, for each j,i 1,...,n , ∈ { } A r1 .  .  A n n ri−1 t (4.26) Cor. 4.24(c)  A  det(A) for i = j, cji = ajk A = ajk det R(i, k) = det  r  = ki  j  0 for i = j, k=1 k=1  A  ( X X  ri+1 6  .   .     rA   n   

n n t (4.26) dji = Ajkaki = aki det C(k, j) k=1 k=1 X X  Cor. 4.24(c) A A A A A det(A) for i = j, = det c1 ... cj−1 ci cj+1 ... cn = (0 for i = j,  6 proving (a). (b): We obtain

(a) Cor. 4.24(d) n det(A)det(A˜) = det(AA˜) = det (det A) Idn = (det A) ,  4 MULTILINEAR MAPS AND DETERMINANTS 66

which, for det(A) = 0, implies det A˜ = (det A)n−1. 6 (c) is immediate from (a). (d): From (a), we obtain

n n t det(A)= aji Aij = aji Aji, i=1 i=1 X X proving (d). (e): From (a), we obtain

n n t det(A)= Aijaji = ajiAji, j=1 j=1 X X proving (e).  Example 4.30. (a) We use Ex. 4.23(b) and Th. 4.29(d) to compute

1 2 D := : 1 3 4

From Ex. 4.23(b), we obtain D =1 4 2 3= 2, 1 · − · − which we also obtain when expanding with respect to the first row according to Th. 4.29(d). Expanding with respect to the second row, we obtain

D = 3 2+4 1= 2. 1 − · · − (b) We use Ex. 4.23(c) and Th. 4.29(e) to compute

1 2 3 D := 4 5 6 : 2 7 8 9

From Ex. 4.23(c), we obtain D =1 5 9+2 6 7+3 4 8 3 5 7 1 6 8 2 4 9 = 45+84+96 105 48 72=0. 2 · · · · · · − · · − · · − · · − − − Expanding with respect to the third column according to Th. 4.29(d), we obtain

4 5 1 2 1 2 D =3 6 +9 =3 ( 3) 6 ( 6)+9 ( 3) = 9+36 27=0. 2 · 7 8 − · 7 8 · 4 5 · − − · − · − − −

4 MULTILINEAR MAPS AND DETERMINANTS 67

Theorem 4.31 (Cramer’s Rule). Let F be a field and A := (aji) (n,F ), n N, n 2. If b ,...,b F and det(A) =0, then the linear system ∈ M ∈ ≥ 1 n ∈ 6 n

ajk xk = bj, j∈{1∀,...,n} Xk=1 has a unique solution x F n, which is given by ∈ n −1 xj = (det A) Akj bk, (4.27) j∈{1∀,...,n} Xk=1 where Akj denote the cofactors of A according to Def. 4.27.

Proof. In matrix form, the linear system reads Ax = b, which, for det(A) = 0, is −1 −1 −1 6 t equivalent to x = A b, where A = (det A) A˜ by Th. 4.29(c). Since A˜ := (Aji) , we have n n −1 t −1 xj = (det A) Ajk bk = (det A) Akj bk, j∈{1∀,...,n} Xk=1 Xk=1 proving (4.27).  Definition 4.32. Let n N. An element p = (p ,...,p ) (N )n is called a multi- ∈ 1 n ∈ 0 index; p := p1 + + pn is called the degree of the multi-index. Let R be a ring with unity. If| |x =(x ,...,x··· ) Rn and p =(p ,...,p ) is a multi-index, then we define 1 n ∈ 1 n xp := xp1 xp2 xpn . (4.28) 1 2 ··· n Each function from Rn into R, x xp, is called a monomial function (in n variables). A function P from Rn into R is called7→ a polynomial function (in n variables) if, and only if, it is a linear combination of monomial functions, i.e. if, and only if P has the form P : Rn R, P (x)= a xp, k N , a R (4.29) −→ p ∈ 0 p ∈ |Xp|≤k (if R is commutative, then our present definition of polynomial function is a special case of the one given in Th. B.23 of the Appendix, also cf. Rem. B.24). If F := R is an infinite field, then, as a consequence of Th. B.23(c), the representation of P in (4.29) in terms of the monomial functions x xp is unique and, in this case, we also define, for each polynomial function given in7→ the form of (4.29), its degree3, denoted deg(P ),

3For example, if R = Z = 0, 1 and f,g : R R, f(x) := x, g(x) := x2, then f(0) = g(0) = 0, 2 { } −→ f(1) = g(1) = 1, showing f = g and the nonuniqueness of the representation in (4.29) for R = Z2. However, it is still possible to generalize our degree definition for polynomial functions to situations, where the representation in (4.29) is not unique: If P : Rn R is a polynomial function, then for each representation of P in the form (4.29), one can define the−→ degree (of the representation) as in the case, where R is an infinite field, then defining deg P to be the minimum of the representation degrees. 4 MULTILINEAR MAPS AND DETERMINANTS 68

to be the largest number d k such that there is p (N )n with p = d and a = 0. ≤ ∈ 0 | | p 6 If all ap = 0, i.e. if P 0, then P is the zero polynomial function and its degree is defined to be (some≡ authors use 1 instead); in particular, d is then the degree of each monomial−∞ function x xp with−p = d. If F is a field (not necessarily infinite), then we also define a rational7→ function| |as a quotient of two polynomial functions: If P,Q : F n F are polynomials, then −→ P (x) (P/Q): F n Q−1 0 F, (P/Q)(x) := , (4.30) \ { } −→ Q(x) is called a rational function (in n variables). Remark 4.33. Let F be a field and n N. Comparing Def. 4.19 with Def. 4.32, we observe that ∈

2 det : F n F, det(a ) := sgn(π) a a , −→ ji 1π(1) ··· nπ(n) π∈S Xn is a polynomial function of degree n (and in n2 variables). According to Th. 4.29(c) and (4.25), we have

i+j −1 ( 1) det(Mij) inv : GLn(F ) GLn(F ), inv(aji):=(aji) = − , −→ det(aji) 

where the Mji (n 1,F ) denote the minor matrices of (aji). Thus, for each (k,l) 1,...,n∈2, M the component− function ∈ { } k+l ( 1) det(Mlk) invkl : GLn(F ) F, invkl(aji)= − , −→ det(aji) is a rational function inv = P / det, where P is a polynomial of degree n 1. kl kl kl − Remark 4.34 (Computation of Determinants). For n 3, one may well use the for- mulas of Ex. 4.23 to compute determinants. However,≤ the larger n becomes, the less advisable is the use of formula (4.20) to compute det(aji), as this requires n n! multi- plications (note that n! grows faster for n than nk for each k N). Sometimes· Th. 4.29(d),(e) can help, if one can expand→ with ∞ respect to a row or∈ column, where many entries are 0. However, for a generic large n n matrix A, it is usually the best strategy4 to transform A to triangular form (e.g. by× using Gaussian elimination accord- ing to [Phi19, Alg. 8.17]) and then use Cor. 4.26 (the number of multiplications then

4If A has a special structure (e.g. many zeros) and/or the field F has a special structure (e.g. F R, C ), then there might well be more efficient methods to compute det(A), studied in the fields Numerical∈ { } Analysis and Numerical Linear Algebra. 4 MULTILINEAR MAPS AND DETERMINANTS 69

merely grows proportional to n3): We know from Cor. 4.24(e),(i) that the operations of the Gaussian elimination algorithm do not change the determinant, except for sign changes, when switching rows. For the same reasons, one should use Gaussian elimi- nation (or even more efficient algorithms adapted to special situations) rather than the neat-looking explicit formula of Cramer’s rule of Th. 4.31 to solve large linear systems and rather than Th. 4.29(c) to compute inverses of large matrices.

Theorem 4.35 (Vandermonde Determinant). Let F be a field and λ0,λ1,...,λn F , n N. Moreover, let ∈ ∈ n 1 λ0 ... λ0 n 1 λ1 ... λ1 V := . .  (n +1,F ), (4.31) . . ∈M  n 1 λn ... λ   n which is known as the corresponding Vandermonde matrix. Then its determinant, the so-called Vandermonde determinant, is given by n det(V )= (λ λ ). (4.32) k − l k,l=0 Yk>l

Proof. The proof can be conducted by induction with respect to n: For n = 1, we have 1 1 λ0 det(V )= = λ1 λ0 = (λk λl), 1 λ1 − − k,l=0 Yk>l

showing (4.32) holds for n = 1. Now let n> 1. Using Cor. 4.24(i), we add the ( λ0)-fold of the nth column to the (n +1)st column, we obtain in the (n + 1)st column− 0 n n−1 λ1 λ1 λ0  − .  . .  n n−1  λ λ λ0  n − n  Next, one adds the ( λ )-fold of the (n 1)st column to the nth column, and, succes- − 0 − sively, the ( λ0)-fold of the mth column to the (m + 1)st column. One finishes, in the nth step, by− adding the ( λ )-fold of the first column to the second column, obtaining − 0 n 1 λ0 ... λ0 10 0 ... 0 1 λ ... λn 1 λ λ λ2 λ λ ... λn λn−1λ 1 1 1 0 1 1 0 1 1 0 det(V )= . . = . −. −. . − ...... n 2 n n−1 1 λn ... λ 1 λn λ0 λ λnλ0 ... λ λ λ0 n − n − n − n

4 MULTILINEAR MAPS AND DETERMINANTS 70

Applying (4.23), then yields

2 n n−1 λ1 λ0 λ1 λ1λ0 ... λ1 λ1 λ0 −. −. . − . det(V )=1 . . . . . · 2 n n−1 λn λ0 λ λnλ0 ... λ λ λ0 − n − n − n

We now use multilinearity to factor out, for each k 1,...,n , (λk λ0) from the kth row, arriving at ∈ { } − n−1 n 1 λ1 ... λ1 . . . . det(V )= (λk λ0) . . . . , − k=1 n−1 1 λn ... λ Y n

which is precisely the Vandermonde determinant of the n 1 numbers λ1,...,λn. Using the induction hypothesis, we obtain −

n n n det(V )= (λ λ ) (λ λ )= (λ λ ), k − 0 k − l k − l k=1 k,l=1 k,l=0 Y Yk>l Yk>l completing the induction proof of (4.32). 

Remark and Definition 4.36 (Determinant on Linear Endomorphisms). Let V be a finite-dimensional vector space over the field F , n N, dim V = n, and A (V,V ). ∈ ∈ L Moreover, let B =(v ,...,v ) and B =(w ,...,w ) be ordered bases of V . If(a(1)) 1 1 n 2 1 n ji ∈ (n,F ) is the matrix corresponding to A with respect to B and (a(2)) (n,F ) is M 1 ji ∈ M the matrix corresponding to A with respect to B2, then we know from [Phi19, Th. 7.14] that there exists (c ) GL (F ) such that ji ∈ n (2) −1 (1) (aji )=(cji) (aji )(cji)

(namely (c ) such that, for each i 1,...,n , w = n c v ). Thus, ji ∈ { } i j=1 ji j (2) −1 (1) P (1) det(aji )= det(cji) det(aji ) det(cji) = det(aji )

and, in consequence, 

det : (V,V ) F, det(A) := det(a(1)), L −→ ji is well-defined.

Corollary 4.37. Let V be a finite-dimensional vector space over the field F , n N, ∈ dim V = n. 5 DIRECT SUMS AND PROJECTIONS 71

(a) If A, B (V,V ), then det(AB) = det(A)det(B). ∈L (b) If A (V,V ), then det(A)=0 if, and only if, A is not bijective. If A is bijective, then ∈Ldet(A−1) = (det(A))−1.

(c) If A (V,V ) and λ F , then det(λA)= λn det(A). ∈L ∈

Proof. Let BV be an ordered basis of V . (a): Let (a ), (b ) (n,F ) be the matrices corresponding to A, B with respect to ji ji ∈ M BV . Then

[Phi19, Th. 7.10(a)] Cor. 4.24(g) det(AB) = det (aji)(bji) = det(aji) det(bji) = det(A) det(B).  (b): Since (aji) (with (aji) as before) is singular if, and only if, A is not bijective, (b) is immediate from Cor. 4.24(h).

(c): If (aji) is as before, then

Cor. 4.24(d) n n det(λA) = det λ (aji) = λ det(aji)= λ det(A),

thereby completing the proof.  

5 Direct Sums and Projections

In [Phi19, Def. 5.10], we defined sums of arbitrary (finite or infinite) families of subspaces. In [Phi19, Def. 5.28], we defined the direct sum for two subspaces. We will now extend the notion of direct sum to arbitrary families of subspaces:

Definition 5.1. Let V be a vector space over the field F , let I be an index set and let (Ui)i∈I be a family of subspaces of V . We say that V is the direct sum of the family of subspaces (Ui)i∈I if, and only if, the following two conditions hold:

(i) V = i∈I Ui.

(ii) For eachP finite J I and each family (uj)j∈J in V such that uj Uj for each j J, one has ⊆ ∈ ∈ 0= uj uj =0. ⇒j∈ ∀J j∈J X

If V is the direct sum of the Ui, then we write V = i∈I Ui. L 5 DIRECT SUMS AND PROJECTIONS 72

Proposition 5.2. Let V be a vector space over the field F , let I be an index set and let (Ui)i∈I be a family of subspaces of V . Then the following statements are equivalent:

(i) V = i∈I Ui.

(ii) For eachL v V , there exists a unique finite subset Jv of I and a unique map σ : J V∈ 0 , j u (v) := σ (j), such that v v −→ \ { } 7→ j v

v = σv(j)= uj(v) uj(v) Uj. (5.1) ∧j∈ ∀Jv ∈ j∈J j∈J Xv Xv

(iii) V = i∈I Ui and Uj Ui = 0 . (5.2) P j∀∈I ∩ { } i∈XI\{j} ′ (iv) V = i∈I Ui and, letting I := i I : Ui = 0 , each family (ui)i∈I′ in V with ′ { ∈ 6 { }} ui Ui 0 for each i I is linearly independent. ∈ P\ { } ∈ Proof. “(i) (ii)”: According to the definition of V = U , the existence of J and ⇔ i∈I i v σv such that (5.1) holds is equivalent to V = Ui. If (5.1) holds and Iv I is finite i∈I P ⊆ such that v = τv(j) with τv : Iv V 0 and τv(j) Uj for each j Iv, then j∈Iv −→ P\ { } ∈ ∈ define σv(j) := 0 for each j Iv Jv and τv(j) := 0 for each j Jv Iv. Then P ∈ \ ∈ \ 0= v v = σ (j) τ (j) − v − v j∈Jv∪Iv X  and Def. 5.1(ii) implies σ (j)= τ (j) for each j J I as well as J = I . Conversely, v v ∈ v ∪ v v v assume there exists J I finite and u U (j J) such that ⊆ j ∈ j ∈

0= uj uj0 =0. ∧j0 ∃∈J 6 j∈J X Then u = ( u ) j0 − j j∈XJ\{j0}

shows (ii) does not hold (as v := uj0 has two different representations). “(i) (iii)”: If (i) holds and v U U for some j I, then there exists a ⇒ ∈ j ∩ i∈I\{j} i ∈ finite J I j such that v = u with u U for each i J. Since v U and ⊆ \ { } i∈J i P i ∈ i ∈ − ∈ j P 0= v + u , − i i∈J X 5 DIRECT SUMS AND PROJECTIONS 73

Def. 5.1(ii) implies v =0= v, i.e. (iii). − “(iii) (i)”: Let J I be finite such that 0 = i∈J ui with ui Ui for each i J. Then ⇒ ⊆ ∈ ∈ P uj = uj Uj Ui j∈∀J − ∈ ∩ i∈XJ\{j} i∈XI\{j} i.e. (iii) implies uj = 0 and Def. 5.1(ii). ′ “(iv) (i)”: If J I is finite and uj Uj 0 for each j J with (uj)j∈J linearly independent,⇒ then⊆ u = 0, implying∈ Def.∈ \{ 5.1(ii)} via contraposition.∈ j∈J j 6 ′ “(i) (iv)”: SupposeP (ui)i∈I′ is a family in V such that ui Ui 0 for each i I . If J ⇒I′ is finite and 0 = λ u for some λ K, then Def.∈ 5.1(ii)\ { } implies λ∈u = 0 ⊆ j∈J j j j ∈ j j and, thus, λj = 0, for each j J, yielding the linear independence of (ui)i∈I′ .  P ∈ Proposition 5.3. Let V be a vector space over the field F .

(a) Let B be a basis of V with a decomposition B = ˙ B . If, for each i I, i∈I i ∈ Ui := Bi , then V = Ui. In particular, V = b . h i i∈I bS∈Bh{ }i

(b) If (Ui)i∈I is a family ofL subspaces of V such that Bi isL a basis of Ui and V = i∈I Ui, then the B are pairwise disjoint and B := ˙ B forms a basis of V . i i∈I i L Proof. Exercise. S 

Example 5.4. Consider the vector space V := R2 over R and let U := (1, 0) , 1 h{ }i U2 := (0, 1) , U3 := (1, 1) . Then V = Ui + Uj and Ui Uj = 0 for each i, j 1h{, 2, 3 }iwith i = j.h{ In particular,}i the sum V = U + U + U∩ is not a{ direct} sum, ∈ { } 6 1 2 3 showing Prop. 5.2(iii) can, in general, not be replaced by the condition Ui Uj = 0 for each i, j I with i = j. ∩ { } ∈ 6 Definition 5.5. Let S be a set. Then P : S S is called a projection if, and only if, P 2 := P P = P . −→ ◦ Remark 5.6. For each set S, Id: S S is a projection. Moreover, for each x S, the constant map f : S S, f −→x, is a projection. If V := S is a vector space∈ x −→ x ≡ over a field F , then fx is linear, if and only if, x = 0. While this shows that not every projection on a vector space is linear, here, we are interested in linear projections. We will see in Th. 5.8 below that there is a close relationship between linear projections and direct sums.

Example 5.7. (a) Let V be a vector space over the field F and let B be a basis of V . If c : B F 0 , B V finite, is the coordinate map for v V , then, v v −→ \ { } v ⊆ ∈ 5 DIRECT SUMS AND PROJECTIONS 74

clearly, for each b B, ∈ cv(b) b for b Bv, Pb : V V, Pb(v) := ∈ −→ (0 otherwise, is a linear projection. (b) Let I be a nonempty set, can consider the vector space V := (I,F ) = F I over F the field F . If ei : I F, ei(j) := δij, i∈∀I −→ then, clearly, for each i I, ∈ P : V V, P (f) := f(i) e , i −→ i i is a linear projection. (c) Let n N and let V be the vector space over R, consisting of all functions f : R ∈R such that the n-th derivative f (n) exists. Then, clearly, −→ n f (k)(0) P : V V, P (f)(x) := xk, −→ k! Xk=0 is a linear projection, where the image of P consists of the subspace of all polynomial functions of degree at most n. Theorem 5.8. Let V be a vector space over the field F .

(a) Let (Ui)i∈I be a family of subspaces of V such that V = i∈I Ui, According to Prop. 5.2(ii), for each v V , there exists a unique finite subset Jv of I and a unique map σ : J V 0∈, j u (v) := σ (j), such that L v v −→ \ { } 7→ j v

v = σv(j)= uj(v) uj(v) Uj. ∧j∈ ∀Jv ∈ j∈J j∈J Xv Xv Thus, we can define

σv(j) for j Jv, Pj : V V, Pj(v) := ∈ (5.3) j∀∈I −→ (0 otherwise.

Then each Pj is a linear projection with Im Pj = Uj and ker Pj = i∈I\{j} Ui. Moreover, if i = j, then PiPj 0. Defining 6 ≡ L P : V V, P (v) := P (v), (5.4) i −→ i i i∈I ! i∈I ! i∈J X X Xv we have Id = i∈I Pi. P 5 DIRECT SUMS AND PROJECTIONS 75

(b) Let (Pi)i∈I be a family of projections in (V,V ) such that, for each v V , the set J := i I : P (v) =0 is finite. If IdL = P (where P is defined∈ as in v { ∈ i 6 } i∈I i i∈I i (5.4)) and PiPj 0 for each i, j 1,...,n with i = j, then ≡ ∈ { P} 6 P

V = Im Pi. i∈I M (c) If P (V,V ) is a projection, then V = ker P Im P , Im P = ker(Id P ), ker P ∈= Im(Id L P ). ⊕ − − Proof. (a): If v,w V and λ F , then, extending σ and σ by 0 to J J , we have ∈ ∈ v w v ∪ w

σv+w(j)= σv(j)+ σw(j) Uj σλv(j)= λσv(j) Uj , j∈J∀v∪Jw ∈ ∧ ∈   2 showing, for each i I, the linearity of Pi as well as Im Pi = Ui and Pi = Pi. Clearly, if j I, then ∈U ker P . On the other hand, if v ker P , then j / J , showing ∈ i∈I\{j} i ⊆ j ∈ j ∈ v v U . If i = j and v V , then P v U ker P , showing P P 0. Finally, i∈I\{j}Li j j i i j for∈ each v V , we compute6 ∈ ∈ ⊆ ≡ L ∈

Pi (v)= Pi(v)= σv(i)= v, i∈I ! i∈J i∈J X Xv Xv thereby completing the proof of (a). (b): For each v V , we have ∈ v = Id v = P (v) Im P , i ∈ i i∈J i∈J Xv Xv proving V = Im P . Now let J I be finite such that 0 = u with u Im P i∈I i ⊆ j∈J j j ∈ j for each j J. Then there exist vj V such that uj = Pjvj. Thus, we obtain ∈ P ∈ P

0= Pi(0) = Pi uj = PiPjvj = PiPivi = Pivi = ui, i∈∀J j∈J ! j∈J X X showing Def. 5.1(ii) to hold and proving (b). (c): We have (Id P )2 = (Id P )(Id P )=Id P P + P 2 = Id P, − − − − − − showing Id P to be a projection. On the other hand, P (Id P ) = (Id P )P = P P 2 = P P−= 0, i.e. − − − − V = Im P Im(Id P ) (5.5) ⊕ − 6 EIGENVALUES 76

according to (b). We show ker P = Im(Id P ) next: Let v V and x := (Id P )v Im(Id P ). Then Px = Pv P 2v = Pv Pv− = 0, showing x∈ ker P and Im(Id− P ) ∈ ker P .− Conversely, let x ker− P . By (5.5),− we write x = v ∈+ v with v Im−P and⊆ ∈ 1 2 1 ∈ v2 Im(Id P ) ker P . Then v1 = x v2 ker P as well, i.e. v1 Im P ker P . Then there∈ exists−w ⊆V such that v = Pw−and∈ we compute ∈ ∩ ∈ 1 2 v1 = Pw = P w = Pv1 =0,

as v1 ker P as well. Thus, x = v2 Im(Id P ), showing ker P Im(Id P ) as desired.∈ From (5.5), we then also have V∈= Im P− ker P . Since we have⊆ seen Id− P to be a projection as well, we also obtain ker(Id P )⊕ = Im(Id (Id P )) = Im P . −  − − −

6 Eigenvalues

Definition 6.1. Let V be a vector space over the field F and A (V,V ). ∈L (a) We call λ F an eigenvalue of A if, and only if, there exists 0 = v V such that ∈ 6 ∈ Av = λv. (6.1) Then each 0 = v V such that (6.1) holds is called an eigenvector of A for the eigenvalue λ;6 the set∈ E (λ) := ker λ Id A = v V : Av = λv (6.2) A − ∈ is then called the eigenspace of A with respect  to the eigenvalue λ. The set σ(A) := λ F : λ eigenvalue of A (6.3) { ∈ } is called the spectrum of A. (b) We call A diagonalizable if, and only if, there exists basis B of V such that each v B is an eigenvector of A. ∈ Remark 6.2. Let V be a finite-dimensional vector space over the field F , dim V = n N, and assume A (V,V ) to be diagonalizable. Then there exists a basis B = ∈ ∈ L v1,...,vn of V , consisting of eigenvectors of A, i.e. Avi = λivi, λi σ(A), for each i{ 1,...,n} . Thus, with respect to B, A is represented by the diagonal∈ matrix ∈ { } λ1 .. diag(λ1,...,λn)=  .  , λn   which explains the term diagonalizable.   — 6 EIGENVALUES 77

It will be a main goal of the present and the following sections to investigate under which conditions, given A (V,V ) with dim V < , V has a basis B such that, with respect to B, A is represented∈L by a diagonal matrix.∞ While such a basis does not always exist, we will see that there always exist bases such that the representing matrix has a particularly simple structure, a so-called normal form.

Theorem 6.3. Let V be a vector space over the field F and A (V,V ). ∈L (a) λ F is an eigenvalue of A if, and only if, ker(λ Id A) = 0 , i.e. if, and only if, λ Id∈ A is not injective. − 6 { } −

(b) For each eigenvalue λ of A, the eigenspace EA(λ) constitutes a subspace of V .

(c) Let (vλ)λ∈σ(A) be a family in V such that, for each λ σ(A), vλ is a eigenvector for λ. Then (v ) is linearly independent (in particular,∈ #σ(A) dim V ). λ λ∈σ(A) ≤ (d) A is diagonalizable if, and only if, V is the direct sum of the eigenspaces of V , i.e. if, and only if,

V = EA(λ). (6.4) λ∈Mσ(A)

(e) Let A be diagonalizable and, for each λ σ(A), let Pλ : V V be the projection with ∈ −→ Im Pλ = EA(λ) and ker Pλ = EA(µ) µ∈σM(A)\{λ} given by (d) in combination with Th. 5.8(a). Then

A = λ Pλ (6.5a) λ∈Xσ(A) and APλ = PλA, (6.5b) λ∈σ∀(A) where (6.5a) is known as the spectral decomposition of A.

Proof. (a) holds, as, for each λ F and each v V , ∈ ∈ Av = λv λv Av =0 (λ Id A)v =0 v ker(λ Id A). ⇔ − ⇔ − ⇔ ∈ −

(b) holds, as EA(λ) is the kernel of a linear map. 6 EIGENVALUES 78

(c): Seeking a contradiction, assume (vλ)λ∈σ(A) to be linearly dependent. Then there exists a minimal family of vectors (vλ1 ,...,vλk ), k N, such that λ1,...,λk σ(A) and there exist c ,...,c F 0 with ∈ ∈ 1 k ∈ \ { } k

0= ci vλi . i=1 X We compute

k k k k 0=0 0= A c v λ c v = c λ v λ c v − i λi − k i λi i i λi − k i λi i=1 ! i=1 i=1 i=1 k−1 X X X X = c (λ λ ) v . i i − k λi i=1 X

As we had chosen the family (vλ1 ,...,vλk ) to be minimal, we obtain ci (λi λk)=0 for each i 1,...,k 1 , which is a contradiction, since c = 0 as well as λ −= λ . ∈ { − } i 6 i 6 k (d): If A is diagonalizable, V has basis B, consisting of eigenvectors of A. Letting, for each λ σ(A), Bλ := b B : Ab = λb , we have that Bλ is a basis of EA(λ). Since ∈ ˙ { ∈ } we have B = λ∈σ(A)Bλ, (6.4) now follows from Prop. 5.3(a). Conversely, if (6.4) holds, then V has a basis of eigenvectors of A by means of Prop. 5.3(b). S (e): Exercise. 

Corollary 6.4. Let V be a vector space over the field F and A (V,V ). If dim V = n N and A has n distinct eigenvalues λ ,...,λ F , then A is∈L diagonalizable. ∈ 1 n ∈ Proof. Due to Th. 6.3(b),(c), we must have

n

V = EA(λi), i=1 M showing A to be diagonalizable by Th. 6.3(d). 

The following examples illustrate the dependence of diagonalizability, and even of the mere existence of eigenvalues, on the structure of the field F .

Example 6.5. (a) Let K R, C and let V be a vector space over K with dim V =2 and ordered basis B :=∈ ( {v ,v ).} Consider A (V,V ) such that 1 2 ∈L Av = v , Av = v . 1 2 2 − 1 6 EIGENVALUES 79

0 1 With respect to B, A is then given by the matrix M := . In consequence, 1− 0   0 1 0 1 1 0 M 2 = = , 1− 0 1− 0 −0 1     −  showing A2 = Id as well. Suppose λ σ(A) and v E (λ). Then − ∈ ∈ A v = A2v = λ2v λ2 = 1. − ⇒ − Thus, for K = R, A has no eigenvalues, σ(A)= . For K = C, we obtain ∅

A(v1 + iv2) = v2 iv1 = i(v1 + iv2), A(v iv ) = v −+ iv = −i(v iv ), 1 − 2 2 1 1 − 2

showing A to be diagonalizable with σ(A)= i, i and v1 + iv2,v1 iv2 being a basis of V of eigenvectors of A. { − } { − }

(b) Over R, consider the vector spaces

V := (f : R R): f polynomial function , 1 { −→ } V := (exp : R R): a R , 2 { a −→ ∈ } where ax expa : R R, expa(x) := e . a∈∀R −→ For i 1, 2 , consider the linear map ∈ { } D : V V ,D(f) := f ′. i −→ i If P V 0 , then deg(DP ) < deg(P ). If P V is constant, then DP =0 P = 0. ∈ 1\{ } ∈ 1 · In consequence 0 R is the only eigenvalue of D : V1 V1. On the other hand, for each a R, D∈ (exp ) = a exp , showing σ(D) =−→R for D : V V . In ∈ a a 2 −→ 2 this case, D is even diagonalizable, since B := expa : a R is a basis of V2 of eigenvectors of D. { ∈ }

(c) Let V be a vector space over the field F and assume char F = 2. Moreover, let A (V,V ) such that A2 = Id (for dim V = 2, a nontrivial example6 is given by ∈ L 0 1 each A represented by the matrix with respect to some ordered basis of V ). 1 0 We claim A to be diagonalizable with σ(A)= 1, 1 and {− } V = E ( 1) E (1) : A − ⊕ A 6 EIGENVALUES 80

Indeed, if x Im(A+Id), then there exists v V such that x =(A+Id)v, implying ∈ ∈ Ax = A(A + Id)v =(A2 + A)v = (Id+A)v = x,

showing Im(A + Id) EA(1). If x Im(Id A), then there exists v V such that x = (Id A)v, implying⊆ ∈ − ∈ − Ax = A(Id A)v =(A A2)v =(A Id)v = x, − − − −

showing Im(Id A) EA( 1). Now let v V be arbitrary. We use 2 =0 in F to obtain − ⊆ − ∈ 6 1 1 v = (v + Av)+ (v Av) Im(Id +A) + Im(Id A) E (1) + E ( 1), 2 2 − ∈ − ⊆ A A − showing V = E (1) + E ( 1). A A − (d) Let V be a vector space over the field F with dim V = 2 and ordered basis B := (v ,v ). Consider A (V,V ) such that 1 2 ∈L

Av1 = v1, Av2 = v1 + v2. 1 1 With respect to B, A is then given by the matrix . Due to Av = v , we 0 1 1 1   have 1 σ(A). Let v V . Then there exist c1,c2 F such that v = c1v1 + c2v2. If λ σ∈(A) and 0 = v ∈ E (λ), then ∈ ∈ 6 ∈ A

λ(c1v1 + c2v2)= λv = Av = c1v1 + c2v1 + c2v2.

As the coordinates with respect to the basis B are unique, we conclude λc1 = c1 +c2 and λc = c . If c = 0, then the second equation yields λ = 1. If c = 0, then 2 2 2 6 2 c1 = 0 and the first equation yields λ = 1. Altogether, we obtain σ(A)= 1 and, since6 A = Id, A is not diagonalizable. { } 6 Definition 6.6. Let S be a set and A : S S. Then U S is called A-invariant if, and only if, A(U) U. −→ ⊆ ⊆ Proposition 6.7. Let V be a vector space over the field F and let U V be a subspaces. ⊆ If A (V,V ) is diagonalizable and U is A-invariant, then A ↾U is diagonalizable as well. ∈ L

Proof. Let A be diagonalizable, let U be A-invariant, and set AU := A↾U . Clearly,

EAU (λ) for λ σ(AU ), U EA(λ)= ∈ λ∈σ∀(A) ∩ 0 for λ / σ(A ). ({ } ∈ U 6 EIGENVALUES 81

As A is diagonalizable, from Th. 6.3(d), we know

V = EA(λ). (6.6) λ∈Mσ(A) It suffices to show U = W := U E (λ) , ∩ A λ∈σ(A) X  since, then

U = EAU (λ) λ∈Mσ(AU ) due to Th. 6.3(d) and Prop. 5.2(iv). Thus, seeking a contradiction, let u U W (note u = 0). Then there exist distinct λ ,...,λ σ(A), n N, such that u =∈ n\ v with 6 1 n ∈ ∈ i=1 i vi EA(λi) 0 for each i 1,...,n , where we may choose u U W such that n ∈N is minimal.\ { } Since U is∈A-invariant, { } we know ∈ \ P ∈ n n Au = Av = λ v U. i i i ∈ i=1 i=1 X X As λ u U as well, we conclude n ∈ n−1 Au λ u = (λ λ ) v U − n i − n i ∈ i=1 X as well. Since u U W was chosen such that n is minimal, we must have Au λnu U W . Thus, there∈ \ exists a finite set σ σ(A) such that − ∈ ∩ u ⊆ n−1

Au λnu = (λi λn) vi = wλ 0 = wλ U EA(λ). (6.7) − − ∧λ∈ ∀σu 6 ∈ ∩ i=1 X λX∈σu Since the sum in (6.6) is direct, (6.7) and Prop. 5.2(ii) imply σ = λ ,...,λ and u { 1 n−1} λi6=λn wλi =(λi λn) vi vi U EA(λi) W . i∈{1,...,n∀ −1} − ⇒ ∈ ∩ ⊆   On the other hand, this then implies

n−1 v ∈E (λ ) v = u v U n A n v W, n − i ∈ ⇒ n ∈ i=1 X yielding the contradiction u = v + n−1 v W . Thus, the assumption that there n i=1 i ∈ exists u U W was false, proving U = W as desired.  ∈ \ P 6 EIGENVALUES 82

We will now use Prop. 6.7 to prove a result regarding the simultaneous diagonalizability of linear endomorphisms:

Theorem 6.8. Let V be a vector space over the field F and let A1,...,An (V,V ), n N, be diagonalizable linear endomorphisms. Then the A ,...,A are simultaneously∈ L ∈ 1 n diagonalizable (i.e. there exists a basis B of V , consisting of eigenvectors of Ai for each i 1,...,n ) if, and only if, ∈ { } AiAj = AjAi. (6.8) i,j∈{∀1,...,n}

Proof. Suppose B is a basis of V such that

Aib = λi,b b. (6.9) i∈{1∀,...,n} b∈∀B λi,b∈∃σ(Ai) Then AiAjb = λi,b λj,b b = AjAib, i,j∈{∀1,...,n} proving (6.8). Conversely, assume (6.8) to hold. We prove (6.9) via induction on n N. For technical reasons, we actually prove (6.9) via induction on n N in the following,∈ ∈ clearly, equivalent form: There exists a family (Vk)k∈K of subspaces of V such that

(i) V = k∈K Vk.

(ii) For eachL k K, Vk has a basis Bk, consisting of eigenvectors of Ai for each i 1,...,n∈. ∈ { } (iii) For each k K and each i 1,...,n , Vk is contained in some eigenspace of Ai, i.e. ∈ ∈ { }

Vk EAi (λik), Aiv = λikv . k∈∀K i∈{1∀,...,n} λik∈∃σ(Ai) ⊆ v∈∀Vk   (iv) For each k,l K with k = l, there exists i 1,...,n such that V and V are ∈ 6 ∈ { } k l not contained in the same eigenspace of Ai, i.e.

k = l λik = λil . k,l∀∈K 6 ⇒i∈{1 ∃,...,n} 6  

For n = 1, we can simply use K := σ(A1) and, for each λ K, Vλ := EA1 (λ). Thus, consider n > 1. By induction, assume (i) – (iv) to hold with∈ n replaced by n 1. It suffices to show that the spaces V , k K, are all A -invariant, i.e. A (V ) V :− Then, k ∈ n n k ⊆ k according to Prop. 6.7, Ank := An↾Vk is diagonalizable, i.e.

Vk = EAnk (λ). λ∈Mσ(Ank) 6 EIGENVALUES 83

Now each Vkλ := EAnk (λ) has a basis Bkλ, consisting of eigenvectors of An. Since, for

each i 1,...,n 1 , Bkλ Vk EAi (λik), Bkλ consists of eigenvectors of Ai for each i 1,...,n∈ { . Letting− } K :=⊆ (k,λ⊆ ) : k K,λ σ(A ) , (i) – (iv) then hold with ∈ { } n { ∈ ∈ nk } K replaced by Kn. Thus, it remains to show An(Vk) Vk for each k Vk: Fix v Vk, k K. We have ⊆ ∈ ∈ ∈ (6.8) Aj(Anv) = An(Ajv)= An(λjk v). j∈{1,...,n∀ −1}

Moreover, there exists a finite set K K such that v ⊆

Anv = vl vl Vl 0 . ∧l∈ ∀Kv ∈ \ { } lX∈Kv Then

λjk vl = λjk(Anv)= An(λjkv)= Aj(Anv)= Ajvl = λjlvl. j∈{1,...,n∀ −1} lX∈Kv lX∈Kv lX∈Kv

As the sum in (i) is direct, Prop. 5.2(ii) implies λjkvl = λjlvl for each l Kv. For each l K , we have v = 0, implying λ = λ for each j 1,...,n 1 .∈ Thus, by (iv), ∈ v l 6 jk jl ∈ { − } k = l, i.e. K = k and A v = v V as desired.  v { } n k ∈ k In general, computing eigenvalues is a difficult task (we will say more about this issue later in Rem. 8.3(e) below). The following results can sometimes help, where Th. 6.9(a) is most useful for dim V small:

Theorem 6.9. Let V be a vector space over the field F , dim V = n N. Let A (V,V ). ∈ ∈ L (a) λ F is an eigenvalue of A if, and only if, ∈ det(λ Id A)=0. − (b) If there exists a basis B of V such that the matrix (a ) (n,F ) of A with respect ji ∈M to B is upper or lower triangular, then the diagonal elements aii are precisely the eigenvalues of A, i.e. σ(A)= a : i 1,...,n . ii ∈ { }  Proof. (a): According to Th. 6.3(a), λ σ(A) is equivalent to λ Id A not being in- jective, which (as V is finite-dimensional)∈ is equivalent to det(λ Id −A) = 0 by Cor. 4.37(b). − 6 EIGENVALUES 84

(b): For each λ F , we have ∈ n det(λ Id A) = det λ Id (a ) = (λ a ). − n − ji − ii i=1  Y Thus, by (a), σ(A)= a : i 1,...,n .  ii ∈ { } Example 6.10. Consider the vector space V := R2 over R and, with respect to the 3 2 standard basis, let A (V,V ) be given by the matrix M := − . Then, for each ∈L 1 0 λ R,   ∈ λ 3 2 det(λ Id A)= − =(λ 3) λ +2= λ2 3λ +2=(λ 1)(λ 2), − 1 λ − · − − − −

i.e. σ(A)= 1, 2 by Th. 6.9(a). Since { }

v1 v1 v1 2v1 M = v1 = v2,M = v1 =2v2, v2 v2 ⇒ v2 2v2 ⇒         B := 1 , 2 is a basis of eigenvectors of A. { 1 1 } Remark  6.11.  Let V be a vector space over the field F , dim V = n N, and A (V,V ). Moreover, let λ σ(A). Clearly, one has ∈ ∈ L ∈ 0 ker(A λ Id) ker(A λ Id)2 ... { }⊆ − ⊆ − ⊆ and the inclusion can be strict for at most n times. Let

r(λ) := min k N : ker(A λ Id)k = ker(A λ Id)k+1 . (6.10) ∈ − − Then  ker(A λ Id)r(λ) = ker(A λ Id)r(λ)+k : (6.11) k∈∀N − − Indeed, otherwise, let k := min k N : ker(A λ Id)r(λ) ( ker(A λ Id)r(λ)+k . Then 0 { ∈ − − } there exists v V such that (A λ Id)r(λ)+k0 v = 0,but(A λ Id)r(λ)+k0−1v = 0. However, ∈ − − 6 that means w := (A λ Id)k0−1v ker(A λ Id)r(λ)+1, but w / ker(A λ Id)r(λ), in contradiction to the definition− of r(∈λ). − ∈ −

Definition 6.12. Let V be a vector space over the field F , dim V = n N, and A (V,V ). Moreover, let λ σ(A). The number ∈ ∈L ∈ m (λ) := dimker(A λ Id)r(λ) 1,...,n , (6.12) a − ∈ { } 6 EIGENVALUES 85

where r(λ) is given by (6.10), is called the algebraic multiplicity of the eigenvalue λ, whereas m (λ) := dimker(A λ Id) 1,...,n (6.13) g − ∈ { } is called its geometric multiplicity. We call λ simple if, and only if, mg(λ)= ma(λ)=1; we call λ semisimple if, and only if, mg(λ) = ma(λ). For each k 1,...,r(λ) , the space ∈ { } Ek (λ) := ker(A λ Id)k A − is called the generalized eigenspace of rank k of A, corresponding to the eigenvalue λ; each v Ek (λ) Ek−1(λ), k 2, is called a generalized eigenvector of rank k with ∈ A \ A ≥ corresponding to the eigenvalue λ (an eigenvector v EA(λ) is sometimes called a generalized eigenvector of rank 1). ∈

Proposition 6.13. Let V be a vector space over the field F , dim V = n N, and A (V,V ). ∈ ∈L (a) If λ σ(A) and r(λ) is given by (6.10), then ∈ 1 r(λ) n. (6.14) ≤ ≤ If k N is such that 1 k < k +1 r(λ), then ∈ ≤ ≤ 1 k k+1 r(λ) 1 mg(λ) = dim EA(λ) dim EA(λ) < dim EA (λ) dim EA (λ)= ma(λ) n, ≤ ≤ ≤ (6.15)≤ which implies, in particular,

r(λ) m (λ), (6.16a) ≤ a 1 m (λ) m (λ) n, (6.16b) ≤ g ≤ a ≤ 0 m (λ) m (λ) n 1. (6.16c) ≤ a − g ≤ −

k (b) If λ σ(A), then, for each k 1,...,r(λ) , the generalized eigenspace EA(λ) is A-invariant,∈ i.e. ∈ { } A Ek (λ) Ek (λ). A ⊆ A

(c) If A is diagonalizable, then mg(λ) = ma(λ) holds for each λ σ(A) (but cf. Ex. 6.14 below). ∈

Proof. (a): Both (6.14) and (6.15) are immediate from Rem. 6.11 together with the defi- k+1 nitions of r(λ), mg(λ) and ma(λ). Then (6.16a) follows from (6.15), since dim EA (λ) dim Ek (λ) 1; (6.16b) is immediate from (6.15); (6.16c) is immediate from (6.16b). − A ≥ 6 EIGENVALUES 86

(b): Due to A(A λ Id) = (A λ Id)A, one has − − A ker(A λ Id)k ker(A λ Id)k : k∈∀N0 − ⊆ − Indeed, if v ker(A λ Id)k, then  ∈ − (A λ Id)k(Av)= A(A λ Id)kv =0. − − (c): Exercise.  Example 6.14. Let V be a vector space over the field F , dim V = n N, n 2. Let λ F . We show that there always exists a map A (V,V ) such that∈ λ σ≥(A) and ∈ ∈ L ∈ such that the difference between ma(λ) and mg(λ) maximal, namely m (λ) m (λ)= n 1: a − g − Let B =(v ,...,v ) be an ordered basis of V , and let A (V,V ) be such that 1 n ∈L Av1 = λv1 Avi = λvi + vi−1. ∧i∈{2 ∀,...,n} Then, with respect to B, A is represented by the matrix λ 1 λ 1   M := ......    λ 1    λ     We use an induction over k 1,...,n to show { } k k (A λ Id) vi =0 (A λ Id) vi = vi−k : (6.17) k{1,...,n∀ } 1≤∀i≤k − ∧k

(A λ Id)v1 = λv1 λv1 =0, (A λ Id)vi = λvi + vi−1 λvi = vi−1, − − 1 1, we have

k ind.hyp. k ind.hyp. (A λ Id) vi = 0, (A λ Id) vk = (A λ Id)v1 =0, 1≤∀i

Definition 6.15. Let n N. Consider the vector space V := F n over the field F . All of the notions we introduced∈ in this section for linear endomorphisms A (V,V ) (e.g. eigenvalue, eigenvector, eigenspace, multiplicity of an eigenvalue, diagonalizability∈L , etc.), one also defines for quadratic matrices M (n,F ): The notions are then meant with ∈M respect to the linear map AM that M represents with respect to the standard basis of F n.

Example 6.16. Let F be field and n N. If M (n,F ) is diagonalizable, then, ∈ ∈ M according to Def. 6.15, there exists a regular matrix T GLn(F ) and a diagonal matrix D (n,F ) such that D = T −1MT . A simple induction∈ then shows ∈M M k = TDkT −1. k∈∀N0 Clearly, if one knows T and T −1, this can tremendously simplify the computation of M k, especially if k is large and M is fully populated. However, computing T and T −1 can also be difficult, and it depends on the situation if it is a good option to pursue this route.

7 Commutative Rings, Polynomials

We have already seen in the previous section that the eigenvalues of a quadratic matrix are precisely the zeros of its characteristic polynomial functions. In order to further study the relation between certain polynomials and the structure of a matrix (and the structure of corresponding linear maps), we will need to investigate some of the general theory of polynomials. We will take this opportunity to also learn more about the general theory of commutative rings, which is of algebraic interest beyond our current interest in matrix-related polynomials.

Definition 7.1. Let R be a commutative ring with unity. We call

R[X] := RN0 := (f : N R): #f −1(R 0 ) < (7.1) fin 0 −→ \ { } ∞  the set of polynomials over R (i.e. a polynomial over R is a sequence (ai)i∈N0 in R such that all, but finitely many, of the entries ai are 0, cf. [Phi19, Ex. 5.16(c)]). We then have the pointwise-defined addition and scalar multiplication on R[X], which it inherits from RN0 :

(f + g): N0 R, (f + g)(i) := f(i)+ g(i), f,g∈∀R[X] −→ (7.2) (λ f): N0 R, (λ f)(i) := λf(i), f∈R∀[X] λ∈∀R · −→ · 7 COMMUTATIVE RINGS, POLYNOMIALS 88 where we know from [Phi19, Ex. 5.16(c)] that, with these compositions, R[X] forms a vector space over R, provided R is a field and, then, B = e : i N , where { i ∈ 0}

ei : N0 R, ei(j) := δij, i∈∀N0 −→ provides the standard basis of the vector space R[X]. In the current context, we will i now write X := ei and we will call these polynomials monomials. Furthermore, we define a multiplication on R[X] by letting

(a ) , (b ) (c ) := (a ) (b ) , i i∈N0 i i∈N0 7→ i i∈N0 i i∈N0 · i i∈N0 i  (7.3) ci := ak bl := ak bl = ak bi−k. 2 kX+l=i (k,l)∈(NX0) : k+l=i Xk=0 If f := (a ) R[X], then we call the a R the coefficients of f, and we define the i i∈N0 ∈ i ∈ degree of f by for f 0, deg f := −∞ ≡ (7.4) max i N : a =0 for f 0 ( { ∈ 0 i 6 } 6≡ (defining deg(0) = instead of deg(0) = 1 has the advantage that formulas (7.5a), −∞ − (7.5b) below then also hold for the zero polynomial). If deg f = n N0 and an = 1, then the polynomial f is called monic. ∈

i Remark 7.2. In the situation of Def. 7.1, using the notation X = ei, we can write ad- dition, scalar multiplication, and multiplication in the following, perhaps, more familiar- n i n i looking forms: If λ R, f = i=0 fi X , g = i=0 gi X , n N0, f0,...,fn,g0,...,gn R, then ∈ ∈ ∈ P P n i f + g = (fi + gi) X , i=0 Xn i λf = (λfi) X , i=0 X2n i fg = fk gl X . i=0 ! X kX+l=i —

Recall from [Phi19, Def. and Rem. 4.41] that an element x in a ring with unity R is called invertible if, and only if, there exists x R such that xx = xx = 1, and that ∈ (R∗, ) denotes the group of invertible elements of R. · 7 COMMUTATIVE RINGS, POLYNOMIALS 89

Lemma 7.3. Let R be a ring with unity and x R∗. ∈ (a) x is not a zero divisor. (b) If S is also a ring with unity and φ : R S is a unital ring homomorphism, then φ(x) S∗. −→ ∈ Proof. (a): Let x R∗ and x R such that xx = xx =1. If y R such that xy = 0, then y = 1 y = xxy∈ = 0; if y∈ R such that yx = 0, then y =∈ y 1 = yxx = 0. In consequence,· x is not a zero divisor.∈ · (b): Let x R such that xx = xx = 1. Then ∈ 1= φ(1) = φ(xx)= φ(xx)= φ(x)φ(x)= φ(x)φ(x), proving φ(x) S∗.  ∈ Theorem 7.4. Let R be a commutative ring with unity.

(a) If f,g R[X] with f =(f ) , g =(g ) , then ∈ i i∈N0 i i∈N0 max deg f, deg g if f = g, deg(f + g)= −∞ ≤ { } − (7.5a) max i N : f = g max deg f, deg g , otherwise, ( { ∈ 0 i 6 − i}≤ { } deg(fg) deg f + deg g. (7.5b) ≤ If the highest nonzero coefficient of f or of g is not a zero divisor (e.g. if this coefficient is an invertible element, cf. Lem. 7.3), then one even has deg(fg) = deg f + deg g. (7.5c)

(b) (R[X], +, ) forms a commutative ring with unity, where 1 = X0 is the neutral · element of multiplication.

Proof. (a): If f 0, then f + g = g and fg 0, i.e. the degree formulas hold if f 0 ≡ ≡ ≡ or g 0. It is also immediate from (7.2) that (7.5a) holds in the remaining case. If ≡ deg f = n N0, deg g = m N0, then, for each i N0 with i>m + n, we have, for k,l N with∈ k + l = i that ∈k>m or l>n, showing∈ (fg) = f g = 0, proving ∈ 0 i k+l=i k l (7.5b). If fn is not a zero divisor, then (fg)m+n = fngm = 0, proving (7.5c). 6 P (b): We already know from [Phi19, Ex. 4.9(e)] that (R[X], +) forms a commutative group. To verify associativity of multiplication, let a,b,c,d,f,g,h R[X], ∈

a := (ai)i∈N0 , b := (bi)i∈N0 , c := (ci)i∈N0 , d := (di)i∈N0 ,

f := (fi)i∈N0 , g := (gi)i∈N0 , h := (hi)i∈N0 , 7 COMMUTATIVE RINGS, POLYNOMIALS 90

such that d := ab, f := bc, g := (ab)c, h := a(bc). Then, for each i N , ∈ 0

gi = dk cl = am bn cl = am bn cl kX+l=i kX+l=i mX+n=k m+Xn+l=i = am bn cl = am fk = hi, mX+k=i nX+l=k mX+k=i proving g = h, as desired. To verify distributivity, let a,b,c,d,f,g R[X] be as before, but this time such that d := ab, f := ac, and g := a(b + c). Then,∈ for each i N , ∈ 0

gi = ak (bl + cl)= ak bl + ak cl = di + fi, kX+l=i kX+l=i kX+l=i proving g = d + f, as desired. To verify commutativity of multiplication, let a,b,c,d R[X] be as before, but this time such that c := ab, d := ba. Then, for each i N , ∈ ∈ 0

ci = ak bl = bl ak = di, kX+l=i kX+l=i 0 proving c = d, as desired. Finally, if b := X , then b0 = 1 and bi = 0 for i> 0, yielding, for c := ab and each i N , ∈ 0

ci = ak bl = ak b0 = ai, kX+l=i kX+0=i showing X0 to be neutral and completing the proof.  Definition 7.5. Let R,R′ be rings (with unity). We call R′ a ring extension of R if, and only if, there exists a (unital) ring monomorphism ι : R R′ (if R′ is a ring extension of R, then one might even identify the elements of R−→and ι(R) and consider R to be a subset of R′). If R,R′ are fields, then one also calls R′ a field extension of R. Example 7.6. (a) If R is a commutative ring with unity, then R[X] is a ring extension of R via the unital ring monomorphism

ι : R R[X], ι(r) := rX0 : −→ Indeed, ι is unital, since ι(1) = X0; ι is a ring homomorphism, since, for each r, s R, ι(r + s)=(r + s)X0 = rX0 + sX0 = ι(r)+ ι(s) and ι(rs) = rsX0 = rX0∈sX0 = ι(r) ι(s); ι is injective, since, for r = 0, ι(r)= rX0 0. · 6 6≡ (b) If R is a ring (with unity) and n N, then the matrix ring (n,R) (cf. [Phi19, Ex. 7.7(c)]) is a ring extension of R∈ via the (unital) ring monomorphismM

ι : R (n,R), ι(r) := diag(r,...,r): −→ M 7 COMMUTATIVE RINGS, POLYNOMIALS 91

Indeed, ι is a ring homomorphism, since, for each r, s R, ∈ ι(r + s) = diag(r + s,...,r + s) = diag(r,...,r) + diag(s,...,s)= ι(r)+ ι(s), [Phi19, (7.28)] ι(rs) = diag(r,...,r) diag(s,...,s) = ι(r) ι(s),

ι is injective, since, for r = 0, ι(r) = diag(r,...,r) = 0. If R is a ring with unity, 6 6 then ι is unital, since ι(1) = Idn. Proposition 7.7. Let R be a commutative ring with unity without nonzero zero divisors. Then R[X] has no nonzero zero divisors and (R[X])∗ = R∗.

Proof. Since R has no nonzero zero divisors, (7.5c) holds, i.e., if f,g R[X] with f,g = 0, then deg(fg) = deg f + deg g 0, showing fg = 0, such that f,g∈ can not be 6 ≥ 6 zero divisors. First note that R∗ (R[X])∗ always holds according to Lem. 7.3(b). If f R[X] R, then deg f 1 and⊆ (7.5c) implies deg(fg) 1 for each 0 = g R[X], i.e.∈fg = 1.\ This shows f≥ / (R[X])∗ and also that each g ≥ R R∗ is not6 in (∈R[X])∗, thereby6 proving R∗ =(R[X∈])∗. ∈ \ 

Example 7.8. Let R := Z4 = Z/(4Z). Then (due to 4=0 in Z4)

(2X1 +1X0)(2X1 +1X0)=0X2 +0X1 +1X0 = X0, showing 2X1 +1X0 (R[X])∗ R∗, i.e. (R[X])∗ = R∗ can occur if R has nonzero zero divisors. This also provides∈ an example,\ where the6 degree formula (7.5c) does not hold. —

Next, we will prove a remainder theorem for polynomials, which can be seen as an analogon of the remainder theorem for integers (cf. [Phi19, Th. D.1]): Theorem 7.9 (Remainder Theorem). Let R be a commutative ring with unity. Let d i ∗ g = i=0 gi X R[X], deg g = d, where gd R . Then, for each f R[X], there exist unique polynomials∈ q,r R[X] such that ∈ ∈ P ∈ f = qg + r deg r

(7.5a),(7.5c) 0=(q q′)g +(r r′) deg(q q′)+deg g = deg(r r′). − − ⇒ − − However, since deg(r r′) < d and deg g = d, this can only hold for q = q′, which, in − turn, implies r = r′ as well. 7 COMMUTATIVE RINGS, POLYNOMIALS 92

Existence: We prove the existence of q,r R[X], satisfying (7.6), via induction on ∈ n := deg f N0 : If deg f

deg f deg f ǫ : R[X] R′, f ǫ (f)= ǫ f Xi := f xi, (7.8) x −→ 7→ x x i i i=0 ! i=0 X X is called the substitution homomorphism or evaluation homomorphism corresponding to x (a typical example, where one wants to use a proper ring extension rather than R, is the substitution of matrices from (n,R), n N, for X): Indeed, if x R′ satisfies M0 0 ∈ ∈ (7.7), then ǫx is unital, since ǫx(X ) = x = 1; ǫx is a ring homomorphism, since, for each f = deg f f Xi R[X] and g = deg g g Xi R[X], one has i=0 i ∈ i=0 i ∈ P deg(f+g) P deg f deg g i i i ǫx(f + g) = (fi + gi) x = fi x + gi x = ǫx(f)+ ǫx(g), i=0 i=0 i=0 X X X deg(fg) deg f deg g i k+l ǫx(fg) = fk gl x = fkgl x i=0 ! X kX+l=i Xk=0 Xl=0 deg f deg g (7.7) k l = fk x gl x = ǫx(f) ǫx(g). ! ! Xk=0 Xl=0 Moreover, ǫ is linear, since, for each λ R, x ∈ deg f deg f i i ǫx(λf)= (λfi) x = λ fi x = λǫx(f). i=0 i=0 X X We call x R′ a zero or a root of f R[X] if, and only if, ǫ (f)=0. ∈ ∈ x 7 COMMUTATIVE RINGS, POLYNOMIALS 93

Definition 7.11. Let F be a field. We call F algebraically closed if, and only if, for each f F [X] with deg f 1, there exists λ F such that ǫλ(f) = 0, i.e. such that λ is a zero of∈f, as defined in Def.≥ and Rem. 7.10 (cf.∈ Th. 7.38). It is an important result of Algebra that every field F is contained in an algebraically closed field (a proof is provided in Th. D.16 of the Appendix – some additional required material from nonlinear algebra, not covered in the main text, is also included in the Appendix).

Notation 7.12. Let R be a commutative ring with unity. One commonly uses the simplified notation X := X1 R[X] and s := sX0 R[X] for each s R. ∈ ∈ ∈ Proposition 7.13. Let R be a commutative ring with unity. For each f R[X] with deg f = n N and each s R, there exists q R[X] with deg q = n 1 such∈ that ∈ ∈ ∈ − f = ǫ (f)+(X s) q = ǫ (f) X0 +(X1 sX0) q, (7.9a) s − s −

where ǫs is the substitution homomorphism according to Def. and Rem. 7.10. In partic- ular, if s is a zero of f, then f =(X s) q. (7.9b) − Proof. According to Th. 7.9, there exist q,r R[X] such that f = q(X s)+ r with deg r< deg(X s) = 1. Thus, r R and deg∈q = n 1 by (7.5c) (which holds,− as X s − ∈ − − is monic). Applying ǫs to f = q(X s)+ r then yields ǫs(f) = ǫs(q)(s s)+ r = r, proving (7.9). − − 

Corollary 7.14. Let F be a field. If f F [X] with deg f = n N0, then f has at most n zeros. Moreover, there exists k ∈ 0,...,n and q F [X∈] with deg q = n k ∈ { } ∈ − and such that k f = q (X λ ), (7.10a) − j j=1 Y where q does not have any zeros in F and N := λ ,...,λ = λ F : ǫ (f)=0 is { 1 k} { ∈ λ } the set of zeros of f (N = and f = q is possible, and it can also occur that all λj in (7.10a) are identical). We can∅ rewrite (7.10a) as

l f = q (X µ )mj , (7.10b) − j j=1 Y

where µ1,...,µl F , l 0,...,k , are the distinct zeros of f, and mj N with l ∈ ∈ { } ∈ j=1 mj = k. Then mj is called the multiplicity of the zero µj of f. If F is algebraically closed, then k = n and q F . P ∈ 7 COMMUTATIVE RINGS, POLYNOMIALS 94

Proof. For f F [X] with deg f = n N0, the representation (7.10a) follows from (7.9b) combined with∈ a straightforward induction,∈ and (7.10b) is immediate from (7.10a). From (7.9b) combined with the degree formula (7.5c), we then also know k n, i.e. f F [X] ≤ ∈ with deg f = n N0 can have at most n zeros. If F is algebraically closed and q has no zeros, then deg q∈= 0, implying k = n.  Example 7.15. (a) Since C is algebraically closed by [Phi16a, Th. 8.32], for each f C[X] with n := deg f N, there exist numbers c,λ ,...,λ C such that ∈ ∈ 1 n ∈ n f = c (X λ ) (7.11) − j j=1 Y

(the λ1,...,λn are precisely all the zeros of f, some or all of which might be iden- tical).

(b) For each f R[X] with n := deg f N, there exist numbers n1,n2 N0 and c,ξ ,...,ξ ,α∈ ,...,α , β ,...,β R∈ such that ∈ 1 n1 1 n2 1 n2 ∈

n = n1 +2n2, (7.12a)

and

n1 n2 f = c (X ξ ) (X2 + α X + β ): (7.12b) − j j j j=1 j=1 Y Y Indeed, if f has only real coefficients, then we can take complex conjugates to obtain, for each λ C, ∈ ǫ (f)=0 ǫ (f)= ǫ (f)=0, λ ⇒ λ λ showing that the nonreal zeros of f (if any) must occur in conjugate pairs. Moreover,

(X λ)(X λ)= X2 (λ + λ)X + λλ = X2 2X Re λ + λ 2, − − − − | | showing that (7.11) implies (7.12). Remark 7.16. Let R be a commutative ring with unity. In Def. 4.32, we defined poly- nomial functions (of several variables). The following Th. 7.17 illuminates the relation between the polynomials of Def. 7.1 and the polynomial functions (of one variable) of Def. 4.32 (for a generalization to polynomials of several variables, see Th. B.23 and Rem. B.24 of the Appendix). Let Pol(R) denote the set of polynomial functions from R into R. Clearly, Pol(R) is a subring with unity of RR (cf. [Phi19, Ex. 4.42(a)]). If R is a field, then Pol(R) also is a vector subspace of RR. 7 COMMUTATIVE RINGS, POLYNOMIALS 95

Theorem 7.17. Let R be a commutative ring with unity and consider the map φ : R[X] Pol(R), f φ(f), φ(f)(x) := ǫ (f). (7.13) −→ 7→ x (a) φ is a unital ring epimorphism. If R is a field, then φ is also a linear epimorphism. (b) If R is finite, then φ is not a monomorphism. (c) If F := R is an infinite field, then φ is an isomorphism.

Proof. (a): If f = X0, then φ(f) 1. We also know from Def. and Rem. 7.10 that, for ≡ each x R, ǫx is a linear ring homomorphism. Thus, if f,g R[X] and λ R, then, for each∈x R, ∈ ∈ ∈ φ(f + g)(x)= ǫx(f + g)= ǫx(f)+ ǫx(g)= φ(f)+ φ(g) (x), φ(fg)(x)= ǫx(fg)= ǫx(f) ǫx(g)= φ(f)φ(g) (x),  φ(λf)(x)= ǫx(λf)= λǫx(f)= λφ(f) (x).  n i Moreover, φ is an epimorphism since, if P Pol(R) with P (x) = i=0 fi x , where ∈ n i f0,...,fn R, n N0, then P = φ(f) with f = fi X . ∈ ∈ i=0 P R R N0 (b): If R is finite, then R and Pol(R) R areP finite, whereas R[X]= Rfin is infinite (also cf. 7.18 below). ⊆ (c): If F is an infinite field and f F [X] is such that P := φ(f) 0, then each λ F ∈ ≡ ∈ is a zero of f, showing f to have infinitely many zeros. Thus, according to Cor. 7.14, deg f / N, implying f = 0. Thus, ker φ = 0 and φ is a monomorphism.  ∈ { } Example 7.18. If R is a finite commutative ring with unity, then f := (X1 λ) λ∈R − ∈ R[X] 0 , but, using φ of (7.13), φ(f) 0. For a concrete example, consider the field \ { } ≡ 2 1 1 Q1 with two elements, R := Z2 = 0, 1 . Then, 0 = f := X + X = X (X + 1) R[X], but, for each x R, φ(f)(x)={x(x }+1)=0. 6 ∈ ∈ Remark 7.19. If F is a field and P Pol(F ) can be written as P (x) = n a xi ∈ i=0 i with ai F , n N0, then the representation is unique if, and only if, F has at least n + 1 elements∈ ∈ (in particular, the monomial functions x xi, i 0,...,nP , are linearly independent if, and only if, F has at least n + 1 elements):7→ ∈ Indeed, { if }F has less than n + 1 elements, then, as in Ex. 7.18 (and again using φ of (7.13)), φ(f) = 0, 1 n i where f := λ∈F (X λ) R[X] 0 and 1 deg f n. If g := i=0 aiX , then − ∈ \ { } ≤ ≤ n i P = φ(g)= φ(g +f). Since g +f = g and deg(g +f) n, if we write g +f = i=0 biX Q n 6 i ≤ P with bi F , then P (x)= i=0 bix , showing the nonuniqueness of the representation of ∈ n i P P . Conversely, assume F has at least n + 1 elements. If P (x)= i=0 bix with bi F P n i ∈ is also a representation of f, then φ(h) = 0, where h := i=0(bi ai)X . Thus, h has at least n +1 zeros and deg h n together with Cor. 7.14 implies−Ph = 0, i.e. a = b for ≤ i i each i 0,...,n . In consequence, the representation ofPP is unique. ∈ { } 7 COMMUTATIVE RINGS, POLYNOMIALS 96

Definition 7.20. Let R be a commutative ring with unity and 1 = 0. 6 (a) We call R integral or an integral domain if, and only if, R does not have any nonzero zero divisors. (b) We call R Euclidean if, and only if, R is an integral domain and there exists a map deg : R 0 N0 such that, for each f,g R with g = 0, there exist q,r R, satisfying\ { } −→ ∈ 6 ∈ f = qg + r deg r< deg g r =0 . (7.14) ∧ ∨ The map deg is then called a degree map or a Euclidean mapof R. Example 7.21. (a) Every field F is a Euclidean ring, where we can choose deg : F ∗ = ∗ F 0 N0, deg 0, as the degree map: If g F , then, given f F , we choose\ { }q −→:= fg−1 and ≡r := 0. Then, clearly, (7.14) holds.∈ ∈

(b) Z is a Euclidean ring, where we can choose deg : Z 0 N0, deg(k) := k , as the degree map: According to [Phi19, Th. D.1], for\ { each} −→f,g N, there exist| | ∈ q,r N0 such that f = qg + r and 0 r

(a) a R is called an ideal in R if, and only if, the following two conditions hold: ⊆ (i) (a, +) is a subgroup of (R, +). (ii) For each x R and each a a, one has ax a (which, as 1 R, is equivalent to aR = a).∈ ∈ ∈ ∈ (b) An ideal a R is called principal if, and only if, there exists a R such that a =(a) := aR⊆. ∈ (c) R is called principal if, and only if, every ideal in R is principal. (d) R is called a principal ideal domain if, and only if, R is both principal and integral. Remark 7.23. Let R be a commutative ring with unity and let a R be an ideal. Since (a, +) is a subgroup of (R, +) and a,b a implies ab a, a is⊆ always a subring of R. However, as 1 a implies a = R, (0) =∈a is a subring∈ with unity if, and only if, ∈ 6 a = R. 7 COMMUTATIVE RINGS, POLYNOMIALS 97

Proposition 7.24. Let R be a commutative ring with unity.

(a) 0 = (0) and (1) = R are principal ideals of R. { } (b) If S is a ring and φ : R S is a ring homomorphism, then ker φ = φ−1 0 is an ideal in R. −→ { } (c) Let S be a commutative ring with unity and let φ : R S be a ring homomor- phism. If b S is an ideal in S, then φ−1(b) is an ideal−→ in R. If φ is even an epimorphism⊆ and a R is an ideal in R, then φ(a) is an ideal in S (however, cf. Rem. 7.27(c) below).⊆ (d) If a and b are ideals in R, then a + b is an ideal in R. a a a (e) If ( i)i∈I is a family of ideals in R, I = , then := i∈I i is an ideal in R as well. 6 ∅ T (f) If I = is an index set, partially ordered by in a way such that, for each i, j I, there6 exists∅ k I with i, j k (if I is totally≤ ordered by , then one can∈ use ∈ ≤ ≤ k := max i, j ), and (ai)i∈I is an increasing family of ideals in R (i.e., for each i, j I with{ i } j, one has a a ), then a := a is an ideal in R as well. ∈ ≤ i ⊆ j i∈I i Proof. (a) is clear. S (b): If x R and a ker φ, then φ(ax)= φ(a)φ(x)=0 φ(x) = 0, showing ax ker φ. We also know∈ (ker φ,∈+) to be a subgroup of (R, +). · ∈ (c): According to [Phi19, Th. 4.20(a)], (φ−1(b), +) is a subgroup of (R, +). Moreover, if x R and a φ−1(b), then φ(a) b and, thus, φ(ax) = φ(a)φ(x) b, since b is an ideal∈ in S. Thus,∈ ax φ−1(b), showing∈ φ−1(b) to be an ideal in R. According∈ to [Phi19, ∈ Th. 4.20(d)], (φ(a), +) is a subgroup of (S, +). Moreover, if y S and b φ(a), then there exist x R and a a such that y = φ(x) and b = φ(a). Then∈ ∈ ∈ ∈ by = φ(a)φ(x)= φ(ax), showing by φ(a), since ax a (using a being an ideal in R). Thus, φ(a) is an ideal in S. ∈ ∈ (d): If x R, a a, and b b, then x(a + b)= xa + xb a + b. Moreover, if a ,a a ∈ ∈ ∈ ∈ 1 2 ∈ and b1,b2 b, then (a1 + b1)+(a2 + b2)=(a1 + a2)+(b1 + b2) a + b as well as (a + b )=∈ a b a + b, showing (a + b, +) to be a subgroup∈ of (R, +). − 1 1 − 1 − 1 ∈ (e): Let x R and a a. Then, for each i I, a ai, and, thus xa ai, showing xa a. We∈ also know (∈a, +) to be a subgroup∈ of (R, +)∈ by [Phi19, Th. 4.18(d)].∈ ∈ (f): Let x R and a a. Then there exists i I such that a a , implying xa a a. ∈ ∈ ∈ ∈ i ∈ i ⊆ We also know (a, +) to be a subgroup of (R, +) by [Phi19, Th. 4.18(f)].  7 COMMUTATIVE RINGS, POLYNOMIALS 98

The following Prop. 7.25 is the ideal analogue of [Phi19, Prop. 5.9] for vector spaces. Proposition 7.25. Let R be a commutative ring with unity, A R, and ⊆ := a R : A a a is ideal in R . S ⊆ ⊆ ∧ Then the set  (A) := a (7.15) a∈S \ is called the ideal generated by A (the notation (a) for a principal ideal with a R can then be seen as a short form of ( a )). Moreover A is called a generating set∈ of the ideal b in R if, and only if, (A)={b.}

(a) (A) is an ideal in R, namely the smallest ideal in R containing A.

(b) If A = , then (A)= 0 ; if A = , then ∅ { } 6 ∅ n (A)= r a : n N r ,...,r S a ,...,a A . (7.16) i i ∈ ∧ 1 n ∈ ∧ 1 n ∈ ( i=1 ) X (c) If A B R, then (A) (B). ⊆ ⊆ ⊆ (d) A =(A) if, and only if, A is an ideal in R.

(e) ((A))=(A).

Proof. Exercise.  Theorem 7.26. If R is a Euclidean ring, then R is a principal ideal domain.

Proof. Let R be a Euclidean ring with degree map deg : R 0 N0. Moreover, let a R be an ideal, a = (0). Let a a 0 be such that \ { } −→ ⊆ 6 ∈ \ { } deg(a) = min deg(x): 0 = x a . { 6 ∈ } Then a = (a): Indeed, let f a. According to (7.14), f = qa + r with q,r R and deg(r) < deg(a) or r = 0. Then∈ r = f qa a and the choice of a implies r∈= 0 and f = qa (a), showing a (a). As (a−) a∈also holds (since a is an ideal), we have a =(a),∈ as desired. ⊆ ⊆  Example 7.27. (a) If F is a field, then (0) and F = (1) are the only ideals in F (in particular, each field is a principal ideal domain): Indeed, if a is an ideal in F , 0 = a a, and x F , then x = xa−1a a. 6 ∈ ∈ ∈ 7 COMMUTATIVE RINGS, POLYNOMIALS 99

(b) Z and F [X] (where F is a field) are principal ideal domains according to Th. 7.26, since we know from Ex. 7.21(b),(c) that both rings are Euclidean rings.

(c) According to Rem. 7.23, a proper subring with unity S of the commutative ring with unity R can never be an ideal in R (and then the unital ring monomorphism ι : S R, ι(x) := x, shows that the subring S = Im φ does not need to be an ideal).−→ For example, Z is a subring of Q, but not an ideal in Q; Q is a subring of R, but not an ideal in R.

(d) The ring Z = 0, 1, 2, 3 is principal: (2) = 0, 2 and, if a is an ideal in Z with 4 { } { } 4 3 a, then 3+3 = 2 a and 2+3 = 1 a, showing a = Z4. However, since 2 ∈2=0, Z is not a principal∈ ideal domain.∈ · 4 (e) The set A := 2Z 3Z Z satisfies Def. 7.22(a)(ii): If k,l Z, then kl 2 2Z A and kl 3 3Z ∪A, but⊆ A is not an ideal in Z, since 2+3∈ / A. This example· ∈ ⊆ also shows that· ∈ unions⊆ of ideals need not be ideals. ∈

(f) The ring Z[X] is not principal: Let

a := (f ) Z[X]: f is even . i i∈N0 ∈ 0 Then, clearly, a is an ideal inZ[X]. Moreover, 2X0 a and X1 Z[X]. However, if f a such that 2 = 2X0 (f), then f 2, 2 ,∈ showing X1∈/ (f). Thus, the ideal∈a is not principal. ∈ ∈ {− } ∈

Proposition 7.28. Let F be a field and let R = 0 be a ring with unity. Then every unital ring homomorphism φ : F R is injective6 { } (in particular, every unital ring homomorphism between fields is injective).−→

Proof. According to Prop. 7.24(b), ker φ is an ideal in F . Thus, from Ex. 7.27(a), ker φ = 0 or ker φ = F . As φ is unital, φ(1) = 1 = 0, showing ker φ = 0 , i.e. φ is injective.{ } 6 { } 

We now want to show that the analogue of the fundamental theorem of arithmetic [Phi19, Th. D.6] holds in every Euclidean ring (in particular, in F [X], if F is a field) and even in every principal ideal domain. We begin with some preparations:

Definition 7.29. Let R be an integral domain.

(a) We call x,y R associated if, and only if, there exists a R∗ such that x = ay. ∈ ∈ 7 COMMUTATIVE RINGS, POLYNOMIALS 100

(b) Let x,y R. We define x to be a divisor5 of y (and also say that x divides y, denoted x∈ y) if, and only if, there exists c R such that y = cx. If x is no divisor of y, then| we write x y. ∈ 6| (c) Let = M R. We call d R a greatest common divisor of the elements of M if, and∅ only 6 if,⊆ ∈ d x r x r d . (7.17) x∈∀M | ∧ r∈∀R x∈∀M | ⇒ |     (d) 0 = p R R∗ is called irreducible if, and only if, 6 ∈ \ p = xy (x R∗ y R∗) . (7.18) x,y∀∈R ⇒ ∈ ∨ ∈   Otherwise, p is called reducible.

(e) 0 = p R R∗ is called prime if, and only if, 6 ∈ \ p xy (p x p y) . (7.19) x,y∀∈R | ⇒ | ∨ |   —

Before looking at some examples, we prove two propositions:

Proposition 7.30. Let R be an integral domain.

(a) Cancellation Law: If a,x,y R such that a =0 and ax = ay, then x = y. ∈ 6 (b) If a b and b a, then a,b are associated. | | (c) If (a)=(b), then a,b are associated.

(d) Let = M R. If r, d R are both greatest common divisors of the elements of M, then∅ 6 r, d⊆are associated.∈

(e) If 0 = p R R∗ is prime, then p is irreducible. 6 ∈ \ 5One has to be especially cautious with divisors of 0, as there is an inconsistency between the present definition and the definition of zero divisor in [Phi19, Def and Rem. 4.32] (both definitions are the ones commonly used in the literature and there does not seem to be a good way to avoid this issue): According to the present definition, every x R is a divisor of 0; however, according to [Phi19, Def and Rem. 4.32], a zero divisor x R must satisfy∈ 0 = cx with c = 0. Thus, if one encounters a zero divisor, one needs to determine from∈ the context, which of the two6 definitions is the relevant one. 7 COMMUTATIVE RINGS, POLYNOMIALS 101

(f) If a ,...,a R 0 , n N, and d R is such that we have the equality of ideals 1 n ∈ \ { } ∈ ∈ (a )+ +(a )=(d), (7.20) 1 ··· n

then d is a greatest common divisor of a1,...,an.

Proof. (a): If ax = ay, then a(x y) = ax ay = 0. Since R has no nonzero zero divisors and a = 0, this means x =−y. − 6 (b): If a b and b a, then there exist x,y R such that b = xa and a = yb. Thus, b = xa =|xyb and| (a) yields 1 = xy, showing∈ x,y R∗ and a,b being associated. ∈ (c): If (a)=(b), then a (b) and b (a), i.e. there exist x,y R such that a = xb and b = ya, i.e. b a and a b.∈ Thus, a,b are∈ associated by (b). ∈ | | (d): If r, d R are both greatest common divisors of the elements of M, then r d and d r, i.e. r, d∈are associated by (b). | | (e): Let 0 = p R R∗ be prime and assume p = xy = 1 xy. Then p xy and, as p is prime, p6 x or∈p y\. By possibly renaming x,y, we may assume· p x, i.e.| there exists c R with|x = cp|, implying p = xy = cpy and 1 = cy by (a). Thus,| c,y R∗, i.e. p is irreducible.∈ ∈ (f): As a ,...,a (d), there exist c ,...,c R such that a = c d for each i 1 n ∈ 1 n ∈ i i ∈ 1,...,n , showing d ai for each i 1,...,n . Now suppose r R is such that r ai for {each i }1,...,n , i.e.| there exist∈x {,...,x } R with a = x r∈for each i 1,...,n| . ∈ { } 1 n ∈ i i ∈ { } On the other hand d (a1)+ +(an) implies the existence of s1,...,sn R with n ∈ ··· ∈ d = i=1 siai. Then P n n n d = siai = si xi r = si xi r, i=1 i=1 i=1 ! X X X showing r d and proving (f).  | Proposition 7.31. Let R be a principal ideal domain.

(a) B´ezout’s Lemma, cf. [Phi19, Th. D.4]: If a ,...,a R 0 , n N, and d R is 1 n ∈ \ { } ∈ ∈ a greatest common divisor of a1,...,an, then (7.20) holds. In particular,

x1a1 + + xnan = d, (7.21) x1,...,x∃ n∈R ··· which is known as B´ezout’s identity (usually for n =2). An important special case is that, if 1 is a greatest common divisor of a1,...,an, then

x1a1 + + xnan =1. (7.22) x1,...,x∃ n∈R ··· 7 COMMUTATIVE RINGS, POLYNOMIALS 102

(b) Let 0 = p R R∗. Then p is prime if, and only if, p is irreducible. 6 ∈ \

Proof. (a): Let d be a greatest common divisor of a1,...,an. Since R is a principal ideal domain and using Prop. 7.30(f), (7.20) must hold with some greatest common divisor d1 of a ,...,a . Then, by Prop. 7.30(d), there exists r R∗ such that d = rd , implying 1 n ∈ 1 (d)=(d1), proving (a). (b): Due to Prop. 7.30(e), it only remains to prove that p is prime if it is irreducible. Thus, assume p to be irreducible and let x,y R such that p xy, i.e. xy = ap with a R. If p x, then 1 is a greatest common divisor∈ p,x and, according| to (7.22), there exist∈ r, s R6| such that rp + sx = 1. Then ∈ y = y 1= y(rp + sx)= yrp + sxy = yrp + sap =(yr + sa)p, · showing p y and p prime.  | Example 7.32. Let R be an integral domain.

(a) For each x R, due to x =1 x, 1 x and x x. ∈ · | | (b) If F := R is a field, then F F ∗ = 0 , i.e. F has neither irreducible elements nor prime elements. \ { }

(c) If R = Z, then R∗ = 1, 1 , i.e. p Z is irreducible if, and only if, p is a prime number in N (and, since{− Z}is a principal∈ ideal domain by Ex. 7.27(b),| | p Z is irreducible if, and only if, it is prime). ∈

(d) For each λ R, X λ = X1 λX0 R[X] is irreducible due to (7.5c). For R = R, ∈ − − ∈ X2 +1 is irreducible: Otherwise, there exist λ,µ R and X2 +1=(X +λ)(X +µ), ∈ 2 2 yielding the contradiction 0 = ǫ−λ((X +λ)(X +µ)) = ǫ−λ(X +1) = λ +1. On the other hand, Ex. 7.15(b) shows that, if f R[X] is irreducible, then deg f 1, 2 . ∈ ∈ { } (e) Suppose R := Q + X1R[X]= (f ) R[X]: f Q . { i i∈N0 ∈ 0 ∈ } Clearly, R is a subring of R[X]. Then X = X1 R is irreducible, but X is not prime, since X 2X2 = (√2X)(√2X), but X √∈2X, since √2 / Q. Then, as a consequence of| Prop. 7.31(b), R can not be a6 principle | ideal domain.∈ Indeed, the ideal a := (X)+(√2X) is not principle in R: Clearly, X and √2X are not common multiples of any noninvertible f R. ∈ Lemma 7.33. Let R be a principal ideal domain. Let I = be an index set totally ordered by , let (a ) be an increasing family of ideals6 in∅R. According to Prop. ≤ i i∈I 7.24(f), we can form the ideal a := a . Then there exists i I such that a = a . i∈I i 0 ∈ i0 S 7 COMMUTATIVE RINGS, POLYNOMIALS 103

Proof. As R is principal, there exists a R such that (a)= a. Since a a, there exists i I such that a a , implying (a) ∈a a =(a) and establishing∈ the case.  0 ∈ ∈ i0 ⊆ i0 ⊆ Theorem 7.34 (Existence of Prime Factorization). Let R be a principal ideal domain. If 0 = a R R∗, then there exist prime elements p ,...,p R, n N, such that 6 ∈ \ 1 n ∈ ∈

a = p1 ...pn. (7.23)

Proof. Let S be the set of all ideals (a) in R that are generated by elements 0 = a R R∗ that do not have a prime factorization as in (7.23). We need to prove S = 6 . Seeking∈ \ a contradiction, assume S = and note that set inclusion provides a partial∅ order on S. 6 ∅ ⊆ a c If C = is a totally ordered subset of S, then, by Prop. 7.24(f), := c∈C is an ideal in R 6and,∅ by Lem. 7.33, there exists c C such that a = c, showing a S to provide an upper bound for C. Thus, Zorn’s lemma∈ [Phi19, Th. 5.22] applies, yielding∈S a maximal element m S (i.e. maximal in S with respect to ). Then there exists a R R∗ such ∈ ⊆ ∈ \ that m = (a) and a does not have a prime factorization. In particular, a is not prime, i.e. a must be reducible by Prop. 7.31(b). Thus, there exist a ,a R (R∗ 0 ) such 1 2 ∈ \ ∪ { } that a = a1a2. Then (a) ( (a1) and (a) ( (a2): Indeed, if a1 = ra = ra1a2 with r R, ∗ ∈ then Prop. 7.30(a) yields 1 = ra2 and a2 R (and analogously for (a)=(a2)). Due to the maximality of m = (a) in S, we conclude∈ (a ), (a ) / S. Thus, a ,a both must 1 2 ∈ 1 2 have prime factorizations, yielding the desired contradiction that a = a1a2 must have a prime factorization as well. 

Remark 7.35. In particular, we obtain from Th. 7.34 that each k Z 1, 0, 1 and each f F [X] with F being a field and deg f 1 has a prime factorization.∈ \ {− However,} for R =∈ Z and R = F [X], we can prove the≥ existence of a prime factorization for each 0 = a R R∗ in a simpler way and without making use of Zorn’s lemma: Let deg : R6 0∈ \N be the degree map as in Ex. 7.21(b),(c): We conduct the proof via \ { } −→ 0 induction on deg(a) N: If a itself is prime, then there is nothing to prove, and this, in particular, takes care∈ of the base case of the induction. If a is not prime, then it is reducible, i.e. a = a a with a ,a R (R∗ 0 ). In particular, 1 deg a , deg a < 1 2 1 2 ∈ \ ∪ { } ≤ 1 2 deg a. Thus, by induction a1,a2 both have prime factorizations, implying a to have a prime factorization as well.

Theorem 7.36 (Uniqueness of Prime Factorization). Let R be an integral domain and x R. Suppose ∈ x = p p = ar r , (7.24) 1 ··· n 1 ··· m where a R∗, p ,...,p R, n N, are prime and r ,...,r R, m N, are ∈ 1 n ∈ ∈ 1 m ∈ ∈ irreducible. Then m = n and there exists a permutation π Sn such that, for each i 1,...,n , r and p are associated (cf. Def. 7.29(a)). ∈ ∈ { } i π(i) 7 COMMUTATIVE RINGS, POLYNOMIALS 104

Proof. The proof is conducted via induction on n N. Since p is prime and p r r , ∈ 1 1| 1 ··· m there exists i 1,...,m such that p1 ri. Since ri is irreducible, we must have ri = ∈ { ∗ } | a1 p1 with a1 R . For n = 1 (the induction base case), this yields m = 1 and p1,r1 associated, as∈ desired. For n> 1, we have

p p = aa r r r r 2 ··· n 1 1 ··· i−1 i+1 ··· m and we employ the induction hypothesis to obtain n 1 = m 1 (i.e. n = m) and a bijective map σ : 1,...,n i 2,...,n such− that, for− each j 2,...,n , r { } \ { } −→ { } ∈ { } j and pσ(j) are associated. Then

1 for k = i, π : 1,...,n 1,...,n , π(k) := { } −→ { } σ(k) for k = i, ( 6 defines a permutation π Sn such that for each j 1,...,n , rj and pπ(j) are associ- ated. ∈ ∈ { } 

Corollary 7.37. If R is a principal ideal domain (e.g. R = Z or R = F [X], where F is a field), then each 0 = a R R∗ admits a factorization into prime elements, which is unique up to the order6 of∈ the\ primes and up to association (rings R with this property are called factorial, factorial integral domains are called unique factorization domains).

Proof. One merely combines Th. 7.34 with Th. 7.36. 

Theorem 7.38. Let F be a field. Then the following statements are equivalent:

(i) F is algebraically closed.

(ii) For each f F [X] with deg f = n N, there exists c F and λ1,...,λn F (not necessarily∈ distinct), such that ∈ ∈ ∈

n f = c (X1 λ ). (7.25) − i i=1 Y (iii) f F [X] is irreducible if, and only if, deg f =1. ∈ Proof. “(i) (ii)”: If F is algebraically closed, then (ii) is given by Cor. 7.14. That (ii) implies (i) is⇔ immediate. “(i) (iii)”: We already noted in Ex. 7.32(d) that each X λ with λ F is irreducible, i.e. each⇔ sX λ with s F 0 is irreducible as well. If−F is algebraically∈ closed and − ∈ \ { } deg f > 1, then (7.9b) shows f to be reducible. Conversely, if (iii) holds, then an 7 COMMUTATIVE RINGS, POLYNOMIALS 105

induction over n = deg f N shows each f F [X] with deg f N to have a zero: Indeed, f = aX + b with a,b∈ R, a = 0, has ∈ba−1 as a zero, and,∈ if deg f > 1, then f is reducible, i.e. there exist g,h∈ F [X6 ] with 1− deg g, deg h< deg f such that f = gh. By induction, g and h must have∈ a zero, i.e. f ≤must have a zero as well. 

In [Phi19, Ex. 4.39], we saw how to obtain the field of rational numbers Q from the ring of integers Z. The same construction actually still works if Z is replaced by an arbitrary integral domain R, resulting in the so-called field of fractions of R (in the following section, we will use the field of fractions of F [X] in the definition of the characteristic polynomial of A (V,V ), where V is a vector space over F ). This gives rise to the following Th. 7.39.∈ L

Theorem 7.39. Let R be an integral domain. One defines the field of fractions6 F of R as the quotient set F := (R (R 0 ))/ with respect to the following equivalence relation on R (R 0 ), where× \ the { } relation∼ on R (R 0 ) is defined by ∼ × \ { } ∼ × \ { } (a,b) (c,d) : ad = bc, (7.26) ∼ ⇔ where, as usual, we will write a := a/b := [(a,b)] (7.27) b for the equivalence class of (a,b) with respect to . Addition on F is defined by ∼ a c a c ad + bc +: F F F, , + := . (7.28) × −→ b d 7→ b d bd   Multiplication on F is defined by a c a c ac : F F F, , := . (7.29) · × −→ b d 7→ b · d bd   Then (F, +, ) does, indeed, form a field, where 0/1 and 1/1 are the neutral elements with respect· to addition and multiplication, respectively, ( a/b) is the additive inverse to a/b, whereas b/a is the multiplicative inverse to a/b with− a =0. The map 6 k ι : R F, ι(k) := , (7.30) −→ 1 is a unital ring monomorphism and it is customary to identify R with ι(R), just writing k k instead of 1 . 6Caveat: The field of fractions of R should not be confused with the quotient field or factor field of R with respect to a maximal ideal m in R (cf. Th. C.8(c)) – this is a different construction, leading to different objects (e.g., for R = Z to the finite fields Zp, where p N is prime, cf. Ex. C.11 and [Phi19, Ex. 4.38]). ∈ 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 106

Proof. Exercise.  Example 7.40. (a) Q is the field of fractions of Z.

(b) If R is an integral domain, then we know from Prop. B.11 that R[X] is an integral domain as well. The field of fractions of R[X] is denoted by R(X) and is called the field of rational fractions over R. Definition and Remark 7.41. Let R be an integral domain. We show that the field of fractions of R (as defined in Th. 7.39) is the smallest field containing R: Let L be some arbitrary field extension of R. Define

:= F L : R F F is subfield of L (7.31) S ⊆ ⊆ ∧ and  K := F. (7.32) F ∈S \ According to [Phi19, Ex. 4.36(d)], K is a field, namely the smallest subfield of L, con- taining R. If F (R) denotes the field of fractions of R, then a φ : F (R) K, φ := ab−1 (7.33) −→ b   constitutes an isomorphism: Indeed, φ is well-defined, since a c a c = ad = bc φ = ab−1 = cd−1 = φ , b d ⇒ ⇒ b d     and since the definition of guarantees ab−1 F for each a,b R with b = 0 and each F ; φ is a homomorphism,S since, for each ∈a,b,c,d R with∈b,d = 0, 6 ∈S ∈ 6 a c ad + bc a c φ + φ = ab−1 + cd−1 =(ad + bc)b−1d−1 = φ = φ + , b d bd b d    a  c  ac a c   φ φ = ab−1cd−1 = ac(bd)−1 = φ = φ ; b · d bd b · d         a −1 φ is injective, since a,b R 0 implies φ( b )= ab = 0; φ is surjective, since Im φ K is itself a subfield of L ∈that\{ contains} R, implying K 6 Im φ and Im φ = K. ⊆ ⊆

8 Characteristic Polynomial, Minimal Polynomial

We will now apply the theory of polynomials to further study linear endomorphisms on finite-dimensional vector spaces. The starting point is Th. 6.9(a), which states that, if 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 107

V is a vector space over the field F , then the eigenvalues of A (V,V ) are precisely the zeros of the polynomial function ∈ L p : F F, p (t) := det(t Id A). A −→ A − In order to make the results of the previous section available, instead of associating a polynomial function with A (V,V ), we will associate an actual polynomial (this also avoids issues related to the∈L fact that, in the case of finite fields, different polynomials can give rise to the same polynomial function according to Th. 7.17(b)). The idea is to replace t det(t Id A) with det(X Idn MA), where MA is the matrix of A with respect to an7→ ordered basis− of V . If V is a vector− space over the field F , then the entries of the matrix X Idn MA are elements of the ring F [X]. However, we defined deter- minants only for matrices− with entries in fields. Thus, to make the following definition consistent with our definition of determinants, we consider the elements of X Idn MA to be elements of F (X), the field of rational fractions over F (cf. Ex. 7.40(b)): − Definition 8.1. Let V be a vector space over the field F , dim V = n N, and A (V,V ). Moreover, let B be an ordered basis of A and let M (n,F )∈ be the matrix∈ L A ∈M of A with respect to B. Since F (X) is a field extension of F , we may consider MA as an element of (n,F (X)). We define M χ := det(X Id M ) F [X] A n − A ∈ to be the characteristic polynomial of A. Proposition 8.2. Let V be a vector space over the field F , dim V = n N, and A (V,V ). ∈ ∈L

(a) The characteristic polynomial χA is well-defined by Def. 8.1, i.e. if B1 are B2 are ordered bases of V and M1,M2 are the matrices of A with respect to B1,B2, respec- tively, then χ := det(X Id M )= χ := det(X Id M ). 1 n − 1 2 n − 2 (b) The spectrum σ(A) is precisely the set of zeros of χA.

Proof. (a): Let T GL (F ) be such that M = T −1M T . Then ∈ n 2 1 χ = det(X Id T −1M T ) = det T −1(X Id M )T = (det T −1) χ (det T )= χ , 2 n − 1 n − 1 1 1 proving (a).  (b): If λ F , then we have ∈ Th. 6.9(a) λ σ(A) ǫ (χ ) = det(λ Id A)=0, ∈ ⇔ λ A − thereby establishing the case.  8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 108

Remark 8.3. Let V be a vector space over the field F , dim V = n N, and A (V,V ). ∈ ∈ L

(a) If B is an ordered basis of V , the matrix (aji) (n,F ) represents A with respect to B, and we let (c ):=(X Id (a )), then ∈M ji n − ji n n χ = det X Id (a ) = (X a )+ sgn(π) c (8.1) A n − ji − ii iπ(i) i=1 π∈S \{Id} i=1  Y Xn Y n shows χA to be monic (i.e. the coefficient of X is 1) with deg χA = n: Indeed, clearly, the degree of the first summand is n and, for each π Sn Id , the degree of the corresponding summand is at most n 2. ∈ \ { } − (b) Some authors prefer to define the characteristic polynomial of A as the polynomial

χ˜ := det(A X Id) = ( 1)n χ . A − − A

Whileχ ˜A still has the property that σ(A) is precisely the set of zeros ofχ ˜A,χ ˜A is monic only for n even. On the other hand,χ ˜A has the advantage that ǫ0(˜χA) = det(A).

(c) According to Prop. 8.2(b), the task of finding the eigenvalues of A is the same as the task of finding the zeros of the characteristic polynomial χA. So one might hope that only particularly simple polynomials can occur as characteristic polynomials. However, this is not the case: Indeed, every monic polynomial of degree n occurs as a characteristic polynomial: Let a ,...,a F , and 1 n ∈ n n n−i f := X + ai X . i=1 X We define the companion matrix of f to be

a1 a2 a3 ... an−1 an −1− 0 − − −   1 0 M(f) :=  ..   1 .     ..   . 0     1 0      and claim χ = f, if A (F n,F n) is the linear map, represented by M(f) with A ∈ L respect to the standard basis of F n: Indeed, using Laplace expansion with respect 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 109

to the first row, we obtain

X + a1 a2 a3 ... an−1 an 1 X −

1 X χA = − . 1 .. − .. . X

1 X

− =( 1)n+1a ( 1)n−1 +( 1)na ( 1)n−2 X − n − − n−1 − + +( 1)3a ( 1)1 Xn−2 +( 1)2(X + a ) Xn−1 ··· − 2 − − 1 n−2 = ( 1)2(n−i)a Xi +( 1)2(X + a ) Xn−1 − n−i − 1 i=0 X n−1 n n i n n−i = X + an−iX = X + aiX = f. i=0 i=1 X X (d) In Ex. 6.5(a), we saw that the considered linear endomorphism A had eigenvalues for F = C, but no eigenvalues for F = R, which we can now relate to the fact that χ = X2 +1 has no zeros over R, but χ =(X i)(X + i) with zeros i over C. A A − ± (e) Given that eigenvalues are precisely the zeros of the characteristic polynomial, and given that, according to (c), every monic polynomial of degree n can occur as the characteristic polynomial of a matrix, it is not surprising that computing eigenvalues is, in general, a difficult task, even if F is algebraically closed, guaranteeing the eigenvalues’ existence. It is a result of Algebra that, for a generic polynomial of degree at least 5, it is not possible to obtain its zeros using so-called radicals (which are, roughly, zeros of polynomials of the form Xk λ, k N, λ F , see, e.g., [Bos13, Def. 6.1.1] for a precise definition) in fin−itely many∈ steps∈ (cf., e.g., [Bos13, Cor. 6.1.7]). In practice, one often has to make use of approximative numerical methods (see, e.g., [Phi21, Sec. 7]). Having said that, let us note that the problem of computing eigenvalues is, indeed, typically easier than the general problem of computing zeros of polynomials. This is due to the fact that the difficulty of computing the zeros of a polynomial depends tremendously on the form in which the polynomial is given: It is typically hard if the polynomial is expanded into n i the form f = i=0 ai X , but it is easy (trivial, in fact), if the polynomial is n given in a factored form f = c i=1(X λi). If the characteristic polynomial is given implicitlyP by a matrix, one is, in− general, somewhere between the two extremes. In particular, for a largeQ matrix, it usually makes no sense to compute 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 110

the characteristic polynomial in its expanded form (this is an expensive task in itself and, in the process, one even loses the additional structure given by the matrix). It makes much more sense to use methods tailored to the computation of eigenvalues, and, if available, one should make use of additional structure a matrix might have.

Theorem 8.4. Let V be a vector space over the field F , dim V = n N, and A (V,V ). Then there exists an ordered basis B of V such that the matrix∈ M of A with∈ L respect to B is triangular if, and only if, there exist distinct λ ,...,λ F , l N, and 1 l ∈ ∈ n1,...,nl N with ∈ l l n = n χ = (X λ )ni . (8.2) i ∧ A − i i=1 i=1 X Y In this case, σ(A)= λ ,...,λ , { 1 l}

ni = ma(λi) (8.3) i∈{1∀,...,l}

(i.e. the algebraic multiplicity of λi is precisely the multiplicity of λi as a zero of χA), and one can choose B such that M has the upper diagonal form

λ1 ..  .  λ1  ∗  M =  ...  , (8.4)      0 λl   .   ..     λ   l   where each λi occurs precisely ni times on the diagonal. Moreover, one then has

l ma(λ) ni det A = det M = λ = λi . (8.5) i=1 λ∈Yσ(A) Y

Proof. If there exists a basis B of V such that the matrix M =(mji) of A with respect to B is triangular, then

n χ Def.= 8.1 det(X Id M) Cor.= 4.26 (X m ). A n − − ii i=1 Y

Combining factors, where the mii are equal, yields (8.2). For the converse, we assume (8.2) and prove the existence of the basis B such that M has the from of (8.4) via 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 111

induction on n. For n = 1, there is nothing to prove. Thus, let n> 1. Then λ1 must be an eigenvector of A with some eigenvector 0 = v V . Then, if B := (v ,...,v ) is an 6 1 ∈ 1 1 n ordered basis of V , the matrix M1 of A with respect to B1 has the block form

λ1 M1 = ∗ , N (n 1,F ). 0 N ∈M −   According to Th. 4.25, we obtain

l l (X λ )ni = χ =(X λ ) χ χ =(X λ )n1−1 (X λ )ni . − i A − 1 N ⇒ N − 1 − i i=1 i=2 Y Y Let U := v1 and W := V/ v1 . Then dim W = n 1 and, by [Phi19, Cor. 6.13(a)], B := (vh{+ U,...,v}i + U) ish{ an}i ordered basis of W .− Let A (W, W ) be such that, W 2 n 1 ∈L with respect to BW , A1 has the matrix N. Then, by induction hypothesis, there exists an ordered basis C = (w + U,...,w + U) of W (w ,...,w V ), such that, with W 2 n 2 n ∈ respect to CW , the matrix N1 (n 1,F ) of A1 has the form (8.4), except that λ1 occurs precisely n 1 times on∈ Mthe diagonal.− That N is the matrix of A means, for 1 − 1 1 N1 =(νji)(j,i)∈{2,...,n}2 that n

A1(wi + U)= νji(wj + U). i∈{2∀,...,n} j=2 X Then, by [Phi19, Cor. 6.13(b)], B := (v1,w2,...,wn) is an ordered basis of V and, with respect to B, the matrix M of A has the form (8.4): According to [Phi19, Th. 7.14], there exists an (n 1) (n 1) transition matrix T1 =(tji)(j,i)∈{2,...,n}2 such that −1 − × − N1 = T1 NT1 and n n

wi = tjivj, wi + U = tji(vj + U), i∈{2∀,...,n} j=2 j=2 X X implying 1 0 λ 1 0 λ M = 1 = 1 . 0 T −1 0 N∗ 0 T 0 N∗  1   1  1 It remains to verify (8.3). Letting w1 := v1, we have B = (w1,...,wn). For each k 1,...,n and the standard column basis vector e , we obtain ∈ { 1} k m1k . .   k−1 mk−1,k (M λ1 Idn) ek = (mji) λ1(δji) ek =   (A λ1 Id)wk = mαk wα, − −  0  ⇒ −  .  α=1   .  X    0      8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 112

n1 showing (A λ1 Id)wk w1,...,wk−1 and w1,...,wn1 ker(A λ1 Id) . On the other hand,− for each∈k { n +1,...,n} , we{ obtain } ∈ − ∈ { 1 }

m1k .  .  mk−1,k k−1   (M λ1 Idn) ek = λ λ1 (A λ1 Id)wk =(λ λ1) wk + mαk wα, −  −  ⇒ − −  0  α=1  .  X  .     0      where λ λ ,...,λ , showing w / ker(A λ Id)N for each N N. Thus, ∈ { 2 l} k ∈ − 1 ∈ n = dimker(A λ Id)n1 = dimker(A λ Id)r(λ1) = m (λ ), 1 − 1 − 1 a 1 where r(λ1) is as defined in Rem. 6.11. Now note that λ1 was chosen arbitrarily is the ′ above argument. The same argument shows the existence of a basis B such that λi, i 1,...,l , appears in the upper left block of M. In particular, we obtain ni = ma(λi) for∈ each{ i } 1,...,l . ∈ { } Finally, if (8.4) holds, then (8.5) is immediate from Cor. 4.26. 

Corollary 8.5. Let V be a vector space over the algebraically closed field F , dim V = n N, and A (V,V ). Then there exists an ordered basis B of V such that the matrix∈ M of A with∈ L respect to B is triangular. Moreover, (8.5) then holds, i.e. det A is the product of the eigenvalues of A, where each eigenvalue is multiplied according to its algebraic multiplicity.

Proof. This is immediate from combining Th. 8.4 with Th. 7.38(ii). 

Theorem 8.6. Let V be a vector space over the field F , dim V = n N, and A ∈ ∈ (V,V ). There exists a unique monic polynomial 0 = µA F [X] (called the minimal Lpolynomial of A), satisfying the following two conditions:6 ∈

(i) ǫA(µA)=0, where ǫA : F [X] (V,V ) is the substitution homomorphism according to Def. and Rem. 7.10−→(noting L (V,V ) to be a ring extension of F via the unital ring monomorphism ι : F L(V,V ), ι(a) := a Id). −→ L (ii) For each f F [X] such that ǫ (f)=0, µ is a divisor of f, i.e. µ f. ∈ A A A| Proof. Let a := f F [X] : ǫ (f)=0 . Clearly, a is an ideal in F [X], and, thus, as { ∈ A } F [X] is a principal ideal domain, there exists g F [X] such that a = (g). Clearly, g ∈ 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 113

satisfies both (i) and (ii). We need to show g = 0, i.e. a = (0) = 0 . To this end, note that, since dim (V,V ) = n2, the n2 +1 maps6 Id, A, A6 2,...,An{2 } (V,V ) must L ∈ L be linearly dependent, i.e. there exist c ,...,c 2 F , not all 0, such that 0 n ∈ n2 i 0= ci A , i=0 X 2 showing 0 = f := n c Xi a. If h F [X] also satisfies (i) and (ii), then h g 6 i=0 i ∈ ∈ | and g h, implying g,h to be associated. In consequence, µA is the unique monic such element| of F [X]. P  Remark 8.7. Let F be a field.

(a) We extend Def. 6.15 to the characteristic polynomial and to the minimal polynomial: Let n N. Consider the vector space V := F n over the field F . If M (n,F ), ∈ ∈M then χM and µM denote the characteristic polynomial and the minimal polynomial n of the linear map AM that M represents with respect to the standard basis of F . (b) Let V be a vector space over the field F , dim V = n N. As (n,F ) is a ring extension of F , we can plug M (n,F ) into elements∈ of FM[X]. Moreover, if f F [X], A (V,V ) and M ∈ M(n,F ) represents A with respect to a basis B ∈ ∈ L ∈ M of V , then, due to [Phi19, Th. 7.10(a)], ǫM (f) represents ǫA(f) with respect to B. Example 8.8. Let F be a field. Consider 0 0 1 M := 0 0 0 (3,F ):   0 0 0 ∈M   2 2 2 We claim that the minimal polynomial is µM = X : Indeed, M = 0 implies ǫM (X )=0, n i and, if f = i=0 fiX F [X], n N, then ǫM (f) = f0 Id3 +f1M. Thus, if ǫM (f) = 0, ∈ 2 ∈ 2 then f0 = f1 = 0, implying X f and showing µM = X . P | Theorem 8.9 (Cayley-Hamilton). Let V be a vector space over the field F , dim V = n N, and A (V,V ). If χA and µA denote the characteristic and the minimal polynomial∈ of A,∈ respectively, L then the following statements hold true:

(a) χA(A) := ǫA(χA)=0. (b) µ χ and, in particular, deg µ deg χ = n. A| A A ≤ A (c) λ σ(A) if, and only if, µ (λ) := ǫ (µ )=0, i.e. the eigenvalues of A are precisely ∈ A λ A the zeros of the minimal polynomial µA. 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 114

(d) If #σ(A)= n (i.e. if A has n distinct eigenvalues), then χA = µA.

Proof. (a): Let B be an ordered basis of V and let (mji) := MA be the matrix of A with respect to B. Moreover, let N be the adjugate matrix of X Id M , i.e., up to factors n − A of 1, N contains the determinants of the (n 1) (n 1) submatrices of X Idn MA. According± to Th. 4.29(a), we then have − × − − χ Id = det(X Id M ) Id = N (X Id M ). (8.6) A n n − A n n − A Since X Id M contains only entries of degree at most 1 (deg(X m ) = 1, all other n − A − ii entries having degree 0 or degree ), each entry nji of N has degree at most n 1, i.e. −∞ − n−1 k nji = bk,j,i X . (j,i)∈{∀1,...,n}2 b0,j,i,...,b∃n−1,j,i∈F Xk=0 n−1 k If, for each k 0,...,n 1 , we let Bk := (bk,j,i) (n,F ), then N = k=0 Bk X . Plugging this∈ into { (8.6) yields− } ∈M P χ Id =(B + B X + + B Xn−1)(X Id M ) A n 0 1 ··· n−1 n − A = B M +(B B M ) X +(B B M ) X2 − 0 A 0 − 1 A 1 − 2 A + +(B B M ) Xn−1 + B Xn. (8.7) ··· n−2 − n−1 A n−1 n n−1 i Writing χA = X + i=0 aiX with a0,...,an−1 F , the coefficients in front of each Xi in (8.7) must agree: Indeed, in each entry of∈ the respective matrix, we have an element of F [X] and,P in each entry, the coefficients of Xi must agree (due to the linear independence of the Xi) – hence, the matrix coefficients of Xi in (8.7) must agree as well. This yields a Id = B M , 0 n − 0 A a Id = B B M , 1 n 0 − 1 A . . a Id = B B M , n−1 n n−2 − n−1 A Idn = Bn−1.

Thus, ǫMA (χA) turns out to be the telescoping sum

n−1 n−1 n i i n ǫMA (χA)=(MA) + ai(MA) = ai Idn (MA) + Idn (MA) i=0 i=0 Xn−1 X = B M + (B B M )(M )i + B (M )n =0. − 0 A i−1 − i A A n−1 A i=1 X 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 115

As φ : (V,V ) (n,F ), φ(A) := MA, is a ring isomorphism by [Phi19, Th. L −→ M −1 7.10(a)], we also obtain ǫA(χA)= φ (ǫMA (χA)) = 0, thereby proving (a). (b) is an immediate consequence of (a) in combination with Th. 8.6(ii). (c): Suppose λ σ(A) and let 0 = v V be a corresponding eigenvector. Also let m ,...,m F be∈ such that µ = 6 l ∈m Xi, l N. Then we compute 0 l ∈ A i=0 i ∈ l P l l i i i 0= ǫA(µA) v = miA v = mi(A v)= mi(λ v) i=0 ! i=0 i=0 l  X X X i = miλ v = ǫλ(µA) v, i=0 ! X  showing ǫλ(µA) = 0. Conversely, if λ F is such that ǫλ(µA) = 0, then (b) implies ǫ (χ ) = 0, i.e. λ σ(A) by Th. 8.2(b).∈ λ A ∈ (d): If #σ(A)= n, then µA has n distinct zeros by (c), implying deg µA = n. Since µA is monic and µ χ by (b), we have µ = χ as claimed.  A| A A A 0 1 Example 8.10. Let F be a field. If M := (2,F ), then χ = X2. 0 0 ∈ M M   Since µM χM and ǫM (X) = M = 0, we must have µM = χM . On the other hand, if 0| 0 6 N := (2,F ), then χ = X2 and µ = X. Since N is diagonalizable, M is 0 0 ∈M N N   not diagonalizable (cf. Ex. 6.5(d) and Ex. 6.14), but χM = χN , we can, in general, not decide diagonalizability merely by looking at the characteristic polynomial. However, we will see in Th. 8.14 below that the minimal polynomial does allow one to decide diagonalizability. Caveat 8.11. One has to use care when substituting matrices and endomorphisms into polynomials: For example, one must be aware that in the expression X Idn M, the polynomial X is a scalar. Thus, when substituting a matrix B (n,F ) for− X, ∈ M one must not use matrix multiplication between B and Idn: For example, for n = 2, 1 0 0 0 B := , and M := , we obtain B2 = B and 0 0 0 0     ǫ (det(X Id M)) = ǫ (X2)= B2 =0=det B = det(B M). B n − B 6 − —

The following result further clarifies the relation between χA and µA: Proposition 8.12. Let V be a vector space over the field F , dim V = n N, and ∈ A (V,V ). ∈L 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 116

(a) One has χ (µ )n and, in particular, each irreducible factor of χ must be an A| A A irreducible factor of µA. (b) There exists an ordered basis B of V such that the matrix M of A with respect to B is triangular if, and only if, there exist λ1,...,λl F , l N, and n1,...,nl N with ∈ ∈ ∈ l µ = (X λ )ni . A − i i=1 Y Proof. (a): Let M (n,F ) be a matrix representing A with respect to some ordered basis of V . Let G ∈Mbe an algebraically closed field with F G (cf. Def. 7.11). We can consider M as an element of (n,G) and, then, σ(M) is⊆ precisely the set of zeros of M both χM = χA and µA in G. As G is algebraically closed, for each λ σ(M), there exists m ,n N such that m n n and ∈ λ λ ∈ λ ≤ λ ≤ µ = (X λ)mλ χ = (X λ)nλ . A − | A − λ∈Yσ(M) λ∈Yσ(M) Letting q := (X λ)nmλ−nλ , we have q G[X] as well as q = (µ )n (χ )−1, λ∈σ(M) − ∈ A A i.e. qχ =(µ )n, proving χ (µ )n (in both G[X] and F [X], since q =(µ )n (χ )−1 A AQ A A A A F (X) G[X]= F [X]). | ∈ ∩ (b) follows by combining (a) with Th. 8.4. 

Theorem 8.13. Let V be a vector space over the field F , dim V = n N, and A ∈ ∈ (V,V ). Suppose the minimal polynomial µA can be written in the form µA = g1 gl, Ll N, where g ,...,g F [X] are such that, whenever i = j, then 1 is a greatest··· ∈ 1 l ∈ 6 common divisor of gi and gj. Then

l l

V = ker gi(A)= ker ǫA(gi). i=1 i=1 M M Proof. Define l

hi := gk. i∈{1∀,...,l} k=1, Yk6=i

Then 1 is a greatest common divisor of h1,...,hl: Indeed, as 1 is a greatest common divisor of g and g for i = j, the sets of prime factors of the g must all be disjoint. i j 6 i Thus, if f F [X] is a divisor of hi, i 1,...,l , then f does not share a prime factor with g . If∈f h holds for each i ∈1 {,...,l ,} then f does not share a prime factor i | i ∈ { } 8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 117

with any gi, implying f F 0 , i.e. 1 is a greatest common divisor of h1,...,hl. In consequence, (7.22) implies∈ \ { }

l

1= fi hi f1,...,f∃l∈F [X] i=1 X and l

v = Id v = ǫA(1)v = ǫA(fi) ǫA(hi) v. (8.8) v∈∀V i=1 X We verify that, for each i 1,...,l , ǫ (f ) ǫ (h ) v ker ǫ (g ): Indeed, since g h = ∈ { } A i A i ∈ A i i i µA, one has ǫA(gi) ǫA(fi) ǫA(hi) v = ǫA(fi) ǫA(µA) v =0. l Thus, (8.8) proves i=1 ker ǫA(gi). According to Prop. 5.2(iii), it remains to show

P U := ker ǫA(gi) ker ǫA(gj)= 0 . i∈{1∀,...,l} ∩ { } j∈{1X,...,l}\{i} To this end, fix i 1,...,l and note ǫA(gi)(U) = 0 = ǫA(hi)(U). On the other hand, 1 is a greatest∈ common { } divisor of g ,h , i.e. (7.22){ provides} r ,s F [X] such that i i i i ∈ 1= rigi + sihi, yielding 0 = ǫ (r )ǫ (g )+ ǫ (s )ǫ (h ) (U)= ǫ (1)(U) = Id(U)= U, { } A i A i A i A i A thereby completing the proof.   Theorem 8.14. Let V be a vector space over the field F , dim V = n N, and A ∈ ∈ (V,V ). Then A is diagonalizable if, and only if, there exist distinct λ1,...,λl F , Ll N, such that µ = l (X λ ). ∈ ∈ A i=1 − i Proof. Suppose A is diagonalizableQ and let B be a basis of V , consisting of eigenvectors of A. Define g := λ∈σ(A)(X λ). For each b B, there exists λ σ(A) such that Ab = λb. Thus, − ∈ ∈ Q ǫ (g)(b)= (A λ) b =0. A − λ∈Yσ(A) According to Th. 8.9(b), we have µ g. Since, by Th. 8.9(c), each λ σ(A) is a zero A| ∈ of µA, we have deg µA = deg g. As both µA and g are monic, this means µA = g. l Conversely, suppose µA = i=1(X λi) with distinct λ1,...,λl F , l N. Then, by Th. 8.13, − ∈ ∈ l Q l l V = ker ǫ (X λ )= ker(A λ Id) = E (λ ), A − i − i A i i=1 i=1 i=1 M M M proving A to be diagonalizable by Th. 6.3(d).  8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 118

Example 8.15. (a) Let V be vector space over C, dim V = n N. If A (V,V ) is m m ∈ ∈ Lm such that there exists m N with A = Id, then A Id = 0 and µA (X 1) = m k2πi/m∈ − | − k=1(X ζk), ζk := e . As the roots of unity ζk are all distinct (cf. [Phi16a, Cor. 8.31]),− A is diagonalizable by Th. 8.14. Q (b) Let V be vector space over the field F , dim V = n N, and let P (V,V ) be a 2 2 ∈ 2 ∈ L projection, i.e. P = P . Then P P = 0 and µP (X X) = X(X 1). Thus, we obtain the three cases − | − −

X for P =0,

µP = X 1 for P = Id,  − X(X 1) otherwise. −  (c) Let V be vector space over the field F , dim V = n N, and let A (V,V )beaso- called involution i.e. A2 = Id. Then A2 Id = 0 and∈ µ (X2 1)=(∈LX +1)(X 1). − A| − − Thus, we obtain the three cases

X 1 for A = Id, − µA = X +1 for A = Id,  − (X + 1)(X 1) otherwise. − If A = Id, then, according to Th. 8.14, A is diagonalizable if, and only if, 1 = 1, i.e. if,6 and± only if, char F = 2. Even though A = Id is not diagonalizable6 − for char F = 2, there still exists6 an ordered basis B of6 V ±such that the matrix M of A with respect to B is triangular (all diagonal elements being 1), due to Prop. 8.12(b). Proposition 8.16. Let V be a vector space over the real numbers R, dim V = n N, and A (V,V ). ∈ ∈L (a) There exists a vector subspace U of V such that dim U 1, 2 and U is A-invariant (i.e. A(U) U). ∈ { } ⊆ (b) There exists an ordered basis B of V and matrices M ,...,M , l N, such that 1 l ∈ each Mi is either 1 1 or 2 2 over R, and such that the matrix M of A with respect to B has the ×block triangular× form

M 1 ∗ ∗ .. M =  .  . ∗ Ml     Proof. Exercise.  9 JORDAN NORMAL FORM 119

9 Jordan Normal Form

If V is a vector space over the algebraically closed field F , dim V = n N, A (V,V ), then one can always find an ordered basis B of V such that the corresponding∈ ∈L matrix M of A is in so-called Jordan normal form, which is an especially simple (upper) triangular form, where the eigenvalues are found on the diagonal, the value 1 can, possibly, occur directly above the diagonal, and all other entries are 0. However, we will also see below, that, if F is not algebraically closed, then one can still obtain a normal form for M, albeit, in general, it is more complicated than the Jordan normal form. Definition 9.1. Let V be a vector space over the field F , dim V = n N, A (V,V ). ∈ ∈L (a) A vector subspace U of V is called A-cyclic if, and only if, U is A-invariant (i.e. A(U) U) and ⊆ i U = A v : i N0 . v∈∃V { ∈ }

(b) V is called A-irreducible if, and only if, V = U U with U , U both A-invariant 1 ⊕ 2 1 2 vector subspaces of V , implies U1 = V or U2 = V . Proposition 9.2. Let V be a finite-dimensional vector space over the field F , r N, A (V,V ). Suppose V is A-cyclic, ∈ ∈L V = Aiv : i N , v V, { ∈ 0} ∈ and r i µA = ai X , deg µA = r. a0,...,a∃ r∈F i=0 X r−1 Then χA = µA, dim V = r, B := (v, Av, . . . , A v) is an ordered basis of V , and, with respect to B, A has the matrix 0 0 ...... a − 0 1 0 ...... a1  0 1 −  M =  .  (r, F ). (9.1)  .  ∈M  .   . 0 a   r−2 ...... 1 −a   r−1  −  r−1 Proof. To show that v, Av, . . . , A v are linearly independent, let λ0,...,λr−1 F such that ∈ r−1 i 0= λi A v. (9.2) i=0 X 9 JORDAN NORMAL FORM 120

Define g := r−1 λ Xi F [X]. We need to show g = 0. Indeed, we have i=0 i ∈ P i i (9.2) ǫA(g) A v = A ǫA(g) v = 0, i∈∀N0

i which, as the A generate V , implies ǫA(g) = 0. Since deg g < deg µA, this yields g =0 and the linear independence of v, Av, . . . , Ar−1v. We show B = V next: Seeking a contradiction, assume Am / B , where we choose m N toh bei minimal. Then m r. ∈h i m−1 ∈ r−1 i ≥ Then there exist λ0,...,λr−1 F such that A v = i=0 λiA v. Then m>r yields the contradiction ∈ r P Amv = λ Aiv B . i ∈h i i=1 X However, m = r also yields a contradiction due to

r−1 0= ǫ (µ )v = Arv + a Aiv Arv B . A A i ⇒ ∈h i i=0 X Thus, B is an ordered basis of V . Then deg χA = r, implying χA = µA. Finally, if r ek is the kth standard column basis vector of F , k 1,...,r , then Mek = ek+1 for r−1 ∈ { } k 1,...,r 1 and Mer = i=0 aiei, showing M to be the matrix of A with ∈ { − } r−1 −r r−1 i respect to B, since A(A v)= A v = aiA v.  P − i=0 Proposition 9.3. Let V be a vector spaceP over the field F , dim V = n N, A (V,V ). Moreover, let U be an A-invariant subspace of V , 1 l := dim U∈

A : U U, A := A↾ , U −→ U U A : V/U V/U, A (v + U) := Av + U. V/U −→ V/U Then the following holds:

(a) Let B := (v1,...,vn) be an ordered basis of V such that BU := (v1,...,vl) is an ordered basis of U. Then the matrix M of A with respect to B has the block form

MU M =(mji)= ∗ ,MU (l,F ),MV/U (n l,F ), 0 MV/U ∈M ∈M −  

where MU is the matrix of AU with respect to BU and MV/U is the matrix of AV/U with respect to the ordered basis BV/U := (vl+1 + U,...,vn + U) of V/U.

(b) χA = χAU χAV/U . (c) µ µ and µ µ . AU | A AV/U | A 9 JORDAN NORMAL FORM 121

Proof. AU is well-defined, since U is A-invariant, and linear as A is linear. Moreover, AV/U is well-defined, since, for each v,w V with v w U, Av + U = Av + A(w v)+ U = Aw + U; A is linear, since, for∈ each v,w− V ∈and λ F , − V/U ∈ ∈

AV/U (v + U + w + U)= A(v + w)+ U = Av + U + Aw + U

= AV/U (v + U)+ AV/U (w + U),

AV/U (λv + U)= A(λv)+ U = λ(Av + U)= λ AV/U (v + U).

(a): Since U is A-invariant, we have

l

Avi = mjivj, i∈{1∀,...,l} m1i,...,m∃ li∈F j=1 X

showing M to have the claimed form with MU being the matrix of AU with respect to BU . Moreover, by [Phi19, Cor. 6.13(a)], BV/U is, indeed, an ordered basis of V/U and

n n

AV/U (vi + U)= Avi + U = mjivj + U = mjivj + U, i∈{l+1∀,...,n} j=1 X jX=l+1

proving MV/U to be the matrix of AV/U with respect to BV/U . (b): We compute

(a), Th. 4.25 (a) χ = det(X Id M) = det(X Id M ) det(X Id M ) = χ χ . A n − l − U n−l − V/U AU AV/U (c): Since ǫ (µ )v = 0 for each v V , ǫ (µ )v = 0 for each v U, proving ǫ (µ )=0 A A ∈ A A ∈ AU A and µAU µA. Similarly, ǫAV/U (µA)(v + U) = ǫA(µA)v + U = 0 for each v V , proving ǫ (µ |)=0 and µ µ . ∈  AV/U A AV/U | A Comparing Prop. 9.3(b),(c) above, one might wonder if the analogon of Prop. 9.3(b) also holds for the minimal polynomials. The following Ex. 9.4 shows that, in general, it does not: Example 9.4. Let F be field and V := F 2. Then, for A := Id (V,V ), µ = X 1. ∈L A − If U is an arbitrary 1-dimensional subspace of V , then, using the notation of Prop. 9.3, µ = X 1= µ , i.e. µ µ =(X 1)2 = µ . AU − AV/U AU AV/U − 6 A Lemma 9.5. Let V be a vector space over the field F , dim V = n N, A (V,V ). Suppose V is A-cyclic. ∈ ∈ L

(a) If µ = gh with g,h F [X], then we have: A ∈ 9 JORDAN NORMAL FORM 122

(i) dimker ǫA(h) = deg h.

(ii) ker ǫA(h)=Im ǫA(g). (b) If λ σ(A), then dim E (λ)=1, i.e. every eigenspace of A has dimension 1. ∈ A

Proof. (a): To prove (i), let U := Im h(A)= ǫA(h)(V ) and define AU , AV/U as in Prop. 9.3. As V is A-cyclic (say, generated by v V ), U is A -cyclic (generated by ǫ (h)v) ∈ U A and V/U is AV/U -cyclic (generated by v + U). Thus, Prop. 9.2 yields

χA = µA, χAU = µAU , χAV/U = µAV/U , implying Prop. 9.3(b) gh = µA = χA = χAU χAV/U = µAU µAV/U . (9.3) If v V , then ǫ (g)ǫ (h)v = ǫ (µ )v = 0, showing ǫ (g) = 0 and µ g, deg µ ∈ A A A A AU AU | AU ≤ deg g. Similarly, ǫAV/U (h)(v + U) = ǫA(h)v + U = 0 (since ǫA(h)v U), proving ǫ (h) = 0 and µ h, deg µ deg h. Since we also have deg∈ g + deg h = AV/U AV/U | AV/U ≤ deg µAU + deg µAV/U by (9.3), we must have deg g = deg µAU and deg h = deg µAV/U . Thus, dim U = deg χ = deg µ = deg g = deg µ deg h. AU AU A − According to the isomorphism theorem [Phi19, Th. 6.16(a)], we know

V/ ker ǫA(h) ∼= Im ǫA(h)= U, implying deg h = deg µ dim U = dim V dim U = dim V dim V/ ker ǫ (h) A − − − A = dim V dim V dim ker ǫ (h) = dimker ǫ (h), − − A A  thereby proving (i). We proceed to prove (ii): Since ǫ (h)(Im ǫ (g)) = 0 , we have A A { } Im ǫA(g) ker ǫA(h). To prove equality, we note that (i) must also hold with g instead of h and⊆ compute

(i) dimIm ǫ (g) = dim V dim ker ǫ (g) = deg χ deg g = deg µ deg g A − A A − A − (i) = deg h = dimker ǫA(h), completing the proof of (ii).

(b): Let λ σ(A). Then λ is a zero of µA and, by Prop. 7.13, there exists q F [X] such that µ∈ =(X λ) q. Hence, ∈ A − (a)(i) dim E (λ) = dimker(A λ Id) = deg(X λ)=1, A − − proving (b).  9 JORDAN NORMAL FORM 123

Lemma 9.6. Let V be a vector space over the field F , dim V = n N, A (V,V ). Suppose µ = gr, where g F [X] is irreducible and r N. If∈U is an∈ LA-cyclic A ∈ ∈ subspace of V such that U has maximal dimension, then µA↾U = µA and there exists an A-invariant subspace W such that V = U W . ⊕

Proof. Letting AU := A ↾U , we show µAU = µA first: According to Prop. 9.3(c), we have µ µ , i.e. there exists 1 r r such that µ = gr1 . According to Prop. AU | A ≤ 1 ≤ AU 9.2, χAU = µAU , implying dim U = deg µAU = r1 deg g. Let 0 = v V and define i 6 ∈ U1 := A v : i N0 . Then U1 is an A-cyclic subspace of V and the maximality of U h{ r1∈ }i r1 implies µA g , implying ǫA(g )(U1) = 0 . As 0 = v V was arbitrary, this shows U1 | { } 6 ∈ ǫ (gr1 ) = 0, implying µ gr1 = µ , showing µ = µ (and r = r) as claimed. A A| AU AU A 1 The proof of the existence of W is now conducted via induction on n N. For n = 1, we have U = V and there is nothing to prove (W = 0 ). Thus, let n∈ > 1. If U = V , then, as for n = 1, we can merely set W := 0 . Thus,{ U} = V . First, consider the case { } 6 that V/U is AV/U -reducible, where AV/U is as in Prop. 9.3. Then there exist A-invariant subspaces 0 ( V ,V ( V such that V/U =(V /U) (V /U), i.e. V +V +U = V and { } 1 2 1 ⊕ 2 1 2 V1 V2 U (since v V1 V2 implies v + U (V1/U) (V2/U)= U). Replacing V1,V2 by∩V +⊆U, V + U, respectively,∈ ∩ we may also∈ assume V∩ V = U. As U V ,V and 1 2 1 ∩ 2 ⊆ 1 2 dim V1, dim V2

which, when combined with (9.4), yields some u U with v ∈ s s ǫA(g ) uv = ǫA(g ) v. (9.5)

This, finally, allows us to define

w := v u , W := Aiw : i N . 0 − v h{ 0 ∈ 0}i Clearly, W is A-invariant. If x V , then x x + U V/U and, since dim(V/U) < n, there exist u U and λ ,...,λ ∈ F such that∈ ∈ ∈ 1 n ∈ n n n n i i i i x = u + λiA v = u + λiA (w0 + uv)= u + λiA uv + λiA w0, i=0 i=0 i=0 i=0 X X X X proving V = U + W . Since

(9.5) ǫ (gs) w = ǫ (gs) v ǫ (gs) u = 0, A 0 A − A v s we have dim W = deg µAW deg(g ) = deg µ = dim(V/U), implying (as V = U + W ) dim W = dim(V/U) and V ≤= U W .  ⊕ Theorem 9.7. Let V be a vector space over the field F , dim V = n N, A (V,V ). ∈ ∈L (a) If V is A-irreducible, then V is A-cyclic.

r (b) Suppose µA = g , where g F [X] is irreducible and r N. Then V is A-irreducible, if, and only if, V is A-cyclic.∈ ∈

(c) There exist subspaces U ,...,U of V , l N, such that 1 l ∈ l

V = Ui i=1 M

and each Ui is both A-irreducible and A-cyclic.

r Proof. (a): We must have µA = g with r N and g F [X] irreducible, since, otherwise V is A-reducible by Th. 8.13. In consequence,∈ V is A∈-cyclic by Lem. 9.6. (b): According to (a), it only remains to show that V being A-cyclic implies V to be A-irreducible. Thus, let V be A-cyclic and V = V V with A-invariant subspaces 1 ⊕ 2 V ,V V . Then, by Prop. 9.3(c), there exist 1 r ,r r such that µ = gr1 and 1 2 1 2 AV1 ⊆ r2 ≤ ≤ µA = g , where we choose V1 such that r2 r1. Then ǫA(µA )(V1)= ǫA(µA )(V2)= V2 ≤ V1 V1 9 JORDAN NORMAL FORM 125

0 , showing µA = µA . As χA = µA by Prop. 9.2, we have dim V1 deg µA = { } V1 ≥ V1 deg χA = dim V , showing V1 = V as desired. (c): The proof is conducted via induction on n N. If V is A-irreducible, then, by (a), V is also A-cyclic and the statement holds (in∈ particular, this yields the base case n = 1). If V is A-reducible, then there exist A-invariant subspaces V1,V2 of V such that V = V1 V2 with dim V1, dim V2 < n. Thus, by induction hypothesis, both V1 and V2 can be written⊕ as a direct sum of subspaces that are both A-irreducible and A-cyclic, proving the same for V . 

We now have all preparations in place to prove the existence of normal forms having matrices with block diagonal form, where the blocks all look like the matrix of (9.1). However, before we state and prove the corresponding theorem, we still provide a propo- sition that will help to address the uniqueness of such normal forms: Proposition 9.8. Let V be a vector space over the field F , dim V = n N, A (V,V ). Moreover, suppose we have a decomposition ∈ ∈ L l V = U , l N, (9.6) i ∈ i=1 M where the U1,...,Ul all are A-invariant and A-irreducible (and, thus, A-cyclic by Th. 9.7(a)) subspaces of V .

r1 rm (a) If µA = g1 gm is the prime factorization of µA (i.e. each gi F [X] is irreducible, r ,...,r ···N, m N), then ∈ 1 m ∈ ∈ m ri V = ker ǫA(gi ) (9.7) i=1 M and the decomposition of (9.6) is a refinement of (9.7) in the sense that each Ui is rj contained in some ker ǫA(gj ).

r (b) If µA = g with g F [X] irreducible, r N, then, for each i 1,...,l , we have ri ∈ ∈ ∈ { } µA = g , dim Ui = ri deg g, with 1 ri r. If Ui ≤ ≤ l := # i 1,...,l : µ = gk k AUi k∈∀N0 ∈ { }  (i.e. l is the number of summands U with µ = gk), then k i AUi

r r

l = lk, dim V = (deg g) klk, (9.8) Xk=1 Xk=1 9 JORDAN NORMAL FORM 126

and r s dimIm ǫA(g ) = (deg g) lk (k s). (9.9) s∈{0∀,...,r} − Xk=s In consequence, the numbers lk are uniquely determined by A.

Proof. (a): That (9.7) is an immediate consequence of Th. 8.13. Moreover, if U is an s A-invariant and A-irreducible subspace of V , then µAU µA and Th. 8.13 imply µAU = gj for some j 1,...,m and 1 s r . In consequence,| U ker ǫ (grj ) as claimed. ∈ { } ≤ ≤ j ⊆ A j ri (b): The proof of (a) already showed that, for each i 1,...,l , µA = g , with ∈ { } Ui 1 ri r, and then dim Ui = ri deg g by Prop. 9.2. From the definitions of r and lk, it ≤ ≤ r is immediate that l = k=1 lk. As the sum in (9.6) is direct, we obtain

P l r r dim V = dim Ui = lk k deg g = (deg g) klk. i=1 X Xk=1 Xk=1 s To prove (9.9), fix s 0,...,r and set h := g . For each v V , there exist u1,...,ul V such that u U ∈and { v = }l u . Then ∈ ∈ i ∈ i i=1 i P l ǫA(h) v = ǫA(h) ui, i=1 X implying l ker ǫ (h)= ker ǫ (h), (9.10) A AUi i=1 M due to the fact that ǫA(h) v = 0 if, and only if, ǫA(h) ui = 0 for each i 1,...,l . In ri ∈ { } the case, where µA = g with ri s, we have ker ǫA (h)= Ui, i.e. Ui ≤ Ui dim ker ǫ (h) = dim U = r deg g, (9.11a) AUi i i

while, in the case, where µ = gri with s

dim ker ǫ (h) = deg(gs)= s deg g. (9.11b) AUi Putting everything together yields

dimIm ǫ (gs) = dimIm ǫ (h) = dim V dim ker ǫ (h) A A − A r l r (9.8),(9.10) (9.11) = (deg g) klk dim ker ǫA (h) = (deg g) lk (k s), − Ui − i=1 Xk=1 X Xk=s 9 JORDAN NORMAL FORM 127

proving (9.9). To see that the lk are uniquely determined by A, observe lk = 0 for k>r and k = 0, and, for 1 k r, (9.9) implies the recursion ≤ ≤ −1 r−1 lr = (deg g) dimIm ǫA(g ), (9.12a) r −1 s−1 ls = (deg g) dimIm ǫA(g ) lk (k (s 1)), (9.12b) s∈{1,...,r∀ −1} − − − kX=s+1 thereby completing the proof.  Theorem 9.9 (General Normal Form). Let V be a vector space over the field F , dim V = n N, A (V,V ). ∈ ∈L

(a) There exist subspaces U1,...,Ul of V , l N, such that each Ui is A-cyclic and A-irreducible, satisfying ∈ l

V = Ui. i=1 M Moreover, for each i 1,...,l , there exists v U such that ∈ { } i ∈ i B := v , Av ,...,Ari−1v , r := dim U , i { i i i} i i ri (i) k (i) (i) is a basis of U . Then µ = a X with a ,...,ar F , A := A ↾ , i AUi k=0 k 0 i Ui Ui and, with respect to the ordered basis ∈ P r1−1 rl−1 B := (v1,...,A v1,...,vl,...,A vl), A has the block diagonal matrix

M1 0 ... 0 0 M2 ... 0 M :=  .  , .    0 ... 0 Ml     each block having the form of (9.1), namely

0 0 ...... a(i) − 0 1 0 ...... a(i)  − 1  0 1   Mi =  .  . i∈{1∀,...,l}  .     .   . 0 a(i)   ri−2  −  ...... 1 a(i)   ri−1  −  9 JORDAN NORMAL FORM 128

(b) If m

V = Wi i=1 M is another decomposition of V into A-invariant and A-irreducible subspaces W1, ... , Wm of V , m N, then m = l and there exist a permutation π Sl and T GL(V ) such that ∈ ∈ ∈ TA = AT T (Ui)= Wπ(i). (9.13) ∧i∈{1 ∀,...,l}

Proof. (a): The existence of the claimed decomposition was already shown in Th. 9.7(c) and the remaining statements are then provided by Prop. 9.2, where the A-invariance of the Ui yields the block diagonal structure of M. (b): We divide the proof into three steps: Step 1: Assume U and W to be A-cyclic subspaces of V such that 1 s := dim U = ≤s−1 dim W n and let u U, w W be such that BU := u, . . . , A u , BW := w, . . . ,≤ As−1w are bases∈ of U, W ,∈ respectively. Define S (U,{ W ) by letting} { } ∈L S(Aiu) := Aiw i∈{0,...,s∀ −1}

(then S is invertible, as it maps the basis BU onto the basis BW ). We verify SA = AS: Indeed,

SA(Aiu)= S(Ai+1u)= Ai+1w = A(Aiw)= AS(Aiu) i∈{0,...,s∀ −2}

and, letting a ,...,a F be such that µ = µ = s a Xi (cf. Prop. 9.2), 0 s ∈ AU AW i=0 i s−1 P s−1 SA(As−1u)= S(Asu)= S a Aiu = a Aiw − i − i i=0 ! i=0 X X = Asw = A(As−1w)= AS(As−1u).

Step 2: Assume µ = gs, where g F [X] is irreducible and s N. Letting A ∈ ∈ k I(U, k) := i 1,...,l : µA = g , ∈ { } Ui k k∈∀N0 I(W, k) := i 1,...,m : µA = g ,  ∈ { } Wi  the uniqueness of the numbers lk in Prop. 9.8(b) shows

lk =#I(U, k)=#I(W, k) k∈∀N0 9 JORDAN NORMAL FORM 129

and, in particular, m = l, and the existence of π Sl such that, for each k N and each i I(U, k), one has π(i) I(W, k). Thus, by Step∈ 1, for each i 1,...,l ∈, there exists ∈ ∈ ∈ { } an invertible Si (Ui,Wπ(i)) such that SiA = ASi. Define T GL(V ) by letting, for ∈L l l ∈ each v V such that v = i=1 ui with ui Ui, Tv := i=1 Siui. Then, clearly, T satisfies∈ (9.13). ∈ P P r1 rs Step 3: We now consider the general situation of (b). Let µA = g1 gs be the prime factorization of µ (i.e. each g F [X] is irreducible, r ,...,r N,···s N). According A i ∈ 1 s ∈ ∈ to Prop. 9.8(a), there exist sets I1,...,Is 1,...,l and J1,...,Js 1,...,m such that ⊆ { } ⊆ { } rk ker ǫA(gk )= Ui = Wi. k∈{1∀,...,s} i∈I i∈J Mk Mk Then, by Step 2, we have #I =#J for each k 1,...,s , implying k k ∈ { } s s

l = #Ik = #Jk = m Xk=1 Xk=1 and the existence of a permutation π S such that π(I )= J for each k 1,...,s . ∈ l k k ∈ { } Again using Step 2, we can now, in addition, choose π S such that ∈ l

Ti invertible TiA = ATi . i∈{1∀,...,l} Ti∈L(U∃i,Wπ(i)) ∧   Define T GL(V ) by letting, for each v V such that v = l u with u U , ∈ ∈ i=1 i i ∈ i Tv := l T u . Then, clearly, T satisfies (9.13).  i=1 i i P P The following Prop. 9.10 can sometimes be helpful in actually finding the Ui of Th. 9.9(a):

Proposition 9.10. Let V be a vector space over the field F , dim V = n N, A r1 rm ∈ ∈ (V,V ), and let µA = g1 gm be the prime factorization of µA (i.e. each gi F [X] L ··· ∈r is irreducible, r ,...,r N, m N). For each i 1,...,m , let V := ker ǫ (g i ). 1 m ∈ ∈ ∈ { } i A i

ri m (a) For each i 1,...,m , we have µA = g (recall V = Vi). ∈ { } Vi i i=1 lL (b) For each i 1,...,m , in the decomposition V = k=1 Uk of Th. 9.9(a), there ∈ { } ri exists at least one Uk with Uk Vi and dim Uk = deg(g ). ⊆ Li l (c) As in (b), let V = k=1 Uk be the decomposition of Th. 9.9(a). Then, for each i ri ∈ 1,...,m and each k 1,...,l such that Uk Vi, one has dim Uk deg(g ). { } L ∈ { } ⊆ ≤ i 9 JORDAN NORMAL FORM 130

s Proof. (a): Let i 1,...,m . According to Prop. 9.3(c), we have µAV = gi with ∈ { } m i 1 s ri. For each v V , there are v1,...,vm V such that v = i=1 vi and vi Vi, implying≤ ≤ ∈ ∈ ∈ ǫ (gr1 gri−1 gs gri+1 grm ) v =0 P A 1 ··· i−1 i i+1 ··· m and r s, i.e. r = s. i ≤ i (b): Let i 1,...,m . According to (a), we have µ = gri . Using the uniqueness AVi i ∈ { } l of the decomposition V = k=1 Uk is the sense of Th. 9.9(b) together with Lem. 9.6, there exists Uk with Uk Vi such that Uk is an A-cyclic subspace of Vi of maximal ⊆ r dimension. Then Lem. 9.6L also yields µ = µ = g i , which, as U is A-cyclic, AUk AVi i k ri implies dim Uk = deg(gi ). (c): Let i 1,...,m . Again, we know µ = gri by (a). If k 1,...,l is such that AVi i ∈ { } ri ∈ { }ri Uk Vi, then Prop. 9.3(c) yields µA µA = g , showing dim Uk deg(g ).  ⊆ Uk | Vi i ≤ i Remark 9.11. In general, in the situation of Prop. 9.10, the knowledge of µA and χA does not suffice to uniquely determine the normal form of Th. 9.9(a). In general, s for each gi and each s 1,...,ri , one needs to determine dimIm ǫA(gi ), which then determine the numbers∈l {of (9.9),} i.e. the number of subspaces U with µ = gk. This k j AUj i then determines the matrix M of Th. 9.9(a) (up to the order of the diagonal blocks), since one obtains precisely lk many blocks of size k deg gi and the entries of these blocks k are given by the coefficients of gi . Example 9.12. (a) Let F be a field and V := F 6. Assume A (V,V ) has ∈L χ =(X 2)2(X 3)4, µ =(X 2)2(X 3)3. A − − A − − l We want to determine the decomposition V = i=1 Ui of Th. 9.9(a) and the matrix M with respect to the corresponding basis of V given in Th. 9.9(a): We know L 2 from Prop. 9.10(b) that we can choose U1 ker(A 2Id) with dim U1 = 2 and 3 ⊆ − U2 ker(A 3Id) with dim U2 = 3. As dim V = 6, this then yields V = U1 U2 U3 with⊆ dim U−= 1. We also know σ(A) = 2, 3 and, according to Th.⊕ 8.4,⊕ the 3 { } algebraic multiplicities are ma(2) = 2, ma(3) = 4. Moreover, 4= m (3) = dimker(A 3Id)4 = dimker ǫ ((X 3)4), a − A − 4 2 2 3 implying U3 ker(A 3Id) . As (X 2) = X 4X +4 and (X 3) = X3 9X2 + 27⊆X 27,−M has the form − − − − − 0 4 1− 4  0 0 27  M = .  1 0 27   −   01 9     3      9 JORDAN NORMAL FORM 131

3 (b) Let F := Z2 = 0, 1 and V := F . Assume, with respect to some basis of V , A (V,V ) has{ the matrix} ∈L 1 1 1 N := 1 0 1 .   1 0 0   We compute (using 1 = 1 and 0=2in F ) − X 1 1 1 − − − χ = det(X Id N)= 1 X 1 A 3 − − − 1 0 X −

= X2(X 1) 1 X X = X3 X 2 2X 1= X3 + X2 +1. − − − − − − − 2 2 Since χA = X(X + X)+1 and χA = (X + 1) X + 1, χA is irreducible. Thus χA = µA, V is A-irreducible and A-cyclic and the matrix of A with respect to the basis of Th. 9.9(a) is (again making use of 1=1in F ) − 0 0 1 M = 1 0 0 .   0 1 1   —

For fields that are algebraically closed, we can improve the normal form of Th. 9.9 to the so-called Jordan normal : Theorem 9.13 (Jordan Normal Form). Let V be a vector space over the field F , dim V = n N, A (V,V ). Assume there exist distinct λ ,...,λ K such that ∈ ∈L 1 m ∈ m µ = (X λ )ri , σ(A)= λ ,...,λ , m,r ,...,r N (9.14) A − i { 1 m} 1 m ∈ i=1 Y (if F is algebraically closed, then (9.14) always holds).

(a) There exist subspaces U1,...,Ul of V , l N, such that each Ui is A-cyclic and A-irreducible, satisfying ∈ l

V = Uk. Mk=1 Moreover, for each k 1,...,l , there exists vk Uk and i = i(k) 1,...,m such that ∈ { } ∈ ∈ { } U ker(A λ Id)ri k ⊆ − i 9 JORDAN NORMAL FORM 132

and

J := v , (A λ Id)v ,..., (A λ Id)sk−1v , k k − i k − i k s := dim U r dim ker(A λ Id)ri , k  k ≤ i ≤ − i is a basis of Uk (note that, in general, the same i will correspond to many distinct subspaces Uk). Then, with respect to the ordered basis

J := (A λ Id)s1−1v ,...,v ,..., (A λ Id)sl−1v ,...,v , − i(1) 1 1 − i(l) l l A has a matrix in Jordan normal form, i.e. the block diagonal matrix 

N1 0 ... 0 0 N2 ... 0 N :=  .  , .    0 ... 0 Nl     each block (called a Jordan block) having the form N =(λ ) (1,F ) for s =1, k i(k) ∈M k

λi(k) 1 0 ... 0 λi(k) 1  . .  Nk = .. .. 0 (sk,F ) for sk > 1.   ∈M  0 λi(k) 1     λi(k)     (b) In the situation of (a) and recalling from Def. 6.12 that, for each λ σ(A), r(λ) is such that m (λ) = dimker(A λ Id)r(λ) and, for each s 1,...,r(∈λ) , a − ∈ { } Es (λ) = ker(A λ Id)s A − s is called the corresponding generalized eigenspace of rank s of A, each v EA(λ) Es−1(λ), s 2, is called a generalized eigenvector of rank s, we obtain ∈ \ A ≥ ri = r(λi), i∈{1∀,...,m} sk ri(k) Uk EA (λi(k)) EA (λi(k)). k∈{1∀,...,l} ⊆ ⊆ Moreover, for each k 1,...,l , v is a generalized eigenvector of rank s and the ∈ { } k k basis Jk consists of generalized eigenvectors, containing precisely one generalized eigenvector of rank s for each s 1,...,s . Define ∈ { k} ri l(i,s):=# k 1,...,l : Uk ker(A λi Id) dim Uk = s i∈{1∀,...,m} s∈∀N ∈ { } ⊆ − ∧  9 JORDAN NORMAL FORM 133

(thus, l(i,s) is the number of Jordan blocks of size s corresponding to the eigenvalue λi – apart from the slightly different notation used here, the l(i,s) are precisely the numbers called l in Prop. 9.8(b)). Then, for each i 1,...,m , k ∈ { } l(i,r ) = dimker(A λ Id)ri dim ker(A λ Id)ri−1, (9.15a) i − i − − i ri s−1 l(i,s) = dimker(A λi Id) dim ker(A λi Id) s∈{1,...,r∀ i−1} − − − ri l(i, j)(j (s 1)). (9.15b) − − − j=s+1 X Thus, in general, one needs to determine, for each i 1,...,m and each s 1,...,r , dim ker(A λ Id)s to know the precise structure∈ { of N.} ∈ { i} − i (c) For the sake of completeness and convenience, we restate Th. 9.9(b): If

L

V = Wi i=1 M is another decomposition of V into A-invariant and A-irreducible subspaces W1, ... , WL of V , L N, then L = l and there exist a permutation π Sl and T GL(V ) such that ∈ ∈ ∈ TA = AT T (Ui)= Wπ(i). ∧i∈{1 ∀,...,l}

Proof. (a),(b): The A-cyclic and A-irreducible subspaces U1,...,Ul are given by Th. 9.9(a). As in Th. 9.9(a), for each k 1,...,l , let v U be such that ∈ { } k ∈ k B := v , Av ,...,Ask−1v , s := dim U , k { k k k} k k

is a basis of Uk. By Prop. 9.8(a), there exists i := i(k) 1,...,m such that Uk ri ∈ { } ⊆ ker(A λi Id) . For sk = 1, we have Jk = Bk = vk . For sk > 1, letting, for each j 0,...,s− 1 , w := (A λ Id)jv , we obtain { } ∈ { k − } j − i k j j Awj = A(A λi Id) vk =(A λi Id+λi Id)(A λi Id) vk = wj+1 + λi wj j∈{0,...,s∀ k−2} − − − (9.16a) sk and, using (A λi Id) vk =0 due to ǫA (µA )=0, − Vk Vk Aw = A(A λ Id)sk−1v =(A λ Id)sk v + λ w = λ w . (9.16b) sk−1 − i k − i k i sk−1 i sk−1

Thus, with respect to the ordered basis (wsk−1,...,w0), AUk has the matrix Nk. As each Uk is A-invariant, this also proves N to be the matrix of A with respect to J. Proposition 9.10(a),(c) yields s r dim ker(A λ Id)ri . Moreover, (9.16) shows k ≤ i ≤ − i 9 JORDAN NORMAL FORM 134

that Jk contains precisely one generalized eigenvector of rank s for each s 1,...,sk . Finally, (9.15), i.e. the formulas for the l(i,s), are given by the recursion∈ (9.12) { (which} was inferred from (9.9)), using that, in the current situation, g = X λ , dim g = 1, − i and (with V := ker(A λ Id)ri ), i − i dimIm ǫ (gs) = dimker(A λ Id)ri dim ker(A λ Id)s. AVi i i s∈∀N0 − − −

(c) was already proved as it is merely a restatement of Th. 9.9(b).  Example 9.14. (a) In Ex. 9.12(a), we considered V := F 6 (F some field) and A (V,V ) with ∈ L χ =(X 2)2(X 3)4, µ =(X 2)2(X 3)3. A − − A − − 3 2 We obtained V = i=1 Ui with U1 ker(A 2Id) with dim U1 = 2 and U2, U3 3 ⊆ − ⊆ ker(A 3Id) with dim U2 = 3, dim U3 = 1. Thus, the corresponding matrix in Jordan− normal formL is 2 1 2  3 1  N = .  3 1     3     3      (b) Consider the matrices

2 1 2 1 2 1 2 1  2   2   2 1   2 1  N :=   , N :=   1  2 1  2  2       2   2 1       2   2       2  2         over some field F . Both matrices are in Jordan normal form with

χ = χ =(X 2)8, µ = µ =(X 2)3. N1 N2 − N1 N2 − Both have the same total number of Jordan blocks, namely 4, which corresponds to dim ker(N 2Id ) = dimker(N 2Id )=4. 1 − 8 2 − 8 9 JORDAN NORMAL FORM 135

The differences appear in the generalized eigenspaces of higher order: N1 has two linearly independent generalized eigenvectors of rank 2, whereas N2 has three lin- early independent generalized eigenvectors of rank 2, yielding dim ker(N 2Id )2 dim ker(N 2Id )=2, i.e. dimker(N 2Id )2 =6, 1 − 8 − 1 − 8 1 − 8 dim ker(N 2Id )2 dim ker(N 2Id )=3, i.e. dimker(N 2Id )2 =7. 2 − 8 − 2 − 8 2 − 8 Next, N1 has two linearly independent generalized eigenvectors of rank 3, whereas N2 has one linearly independent generalized eigenvector of rank 3, yielding dim ker(N 2Id )3 dim ker(N 2Id )2 =2, i.e. dimker(N 2Id )3 =8, 1 − 8 − 1 − 8 1 − 8 dim ker(N 2Id )3 dim ker(N 2Id )2 =1, i.e. dimker(N 2Id )3 =8. 2 − 8 − 2 − 8 2 − 8 From (9.15a), we obtain (with i = 1)

lN1 (1, 3)=2, lN2 (1, 3)=1,

corresponding to N1 having two blocks of size 3 and N2 having one block of size 3. From (9.15b), we obtain (with i = 1) l (1, 2)=8 4 2(3 1)=0, l (1, 2)=8 4 1(3 1)=2, N1 − − − N2 − − − corresponding to N1 having no blocks of size 2 and N2 having two blocks of size 2. To check consistency, we use (9.15b) again to obtain l (1, 1)=8 0 0(2 0) 2(3 0)=2, l (1, 1)=8 0 2(2 0) 1(3 0)=1, N1 − − − − − N2 − − − − − corresponding to N1 having two blocks of size 1 and N2 having one block of size 1. Remark 9.15. In the situation of Th. 9.13, we saw that, in order to find a matrix N in Jordan normal form for A, according to Th. 9.13(b), in general, for each i 1,...,m and each s 1,...,r , one has to know dimker(A λ Id)s. On the∈ other { hand,} ∈ { i} − i given a matrix M of A, one might also want to find the transition matrix T GLn(F ) such that M = TNT −1. As it turns out, if one has already determined generalized∈ s eigenvectors forming bases of ker(A λi Id) , one may use these same vectors for the −−1 columns of T : Indeed, if M = TNT and t1,...,tn denote the columns of T , then, if j 1,...,n corresponds to a column of N with λ being the only nonzero entry, then ∈ { } i −1 Mtj = TNT tj = TNej = Tλiej = λitj, showing t to be a corresponding eigenvector. If j 1,...,n corresponds to a column j ∈ { } of N having nonzero entries λi and 1 (above the λi), then (M λ Id )t =(TNT −1 λ Id )t = TNe λ t = T (e + λ e ) λ t − i n j − i n j j − i j j−1 i j − i j = t + λ t λ t = t j−1 i j − i j j−1 showing t to be a generalized eigenvector of rank 2 corresponding to the Jordan block j ≥ containing the index j. 10 VECTOR SPACES WITH INNER PRODUCTS 136

10 Vector Spaces with Inner Products

In this section, the field F will always be R or C. As before, we write K if K may stand for R or C.

10.1 Definition, Examples

Definition 10.1. Let X be a vector space over K. A function , : X X K is called an inner product or a scalar product on X if, and only if,h· ·i the following× −→ three conditions are satisfied:

(i) x,x R+ for each 0 = x X (i.e. an inner product is positive definite). h i∈ 6 ∈ (ii) λx + µy,z = λ x,z + µ y,z for each x,y,z X and each λ,µ K (i.e. an innerh producti is Kh -lineari inh itsi first argument). ∈ ∈

(iii) x,y = y,x for each x,y X (i.e. an inner product is conjugate-symmetric, evenh isymmetrich i for K = R). ∈

Lemma 10.2. For each inner product , on a vector space X over K, the following formulas are valid: h· ·i

(a) x,λy + µz = λ¯ x,y +µ ¯ x,z for each x,y,z X and each λ,µ K, i.e. , ish conjugate-lineari h ini its secondh i argument, even∈ linear for K = R.∈ Together withh· ·i Def. 10.1(ii), this means that , is a sesquilinear form, even a bilinear form for K = R. h· ·i

(b) 0,x = x, 0 =0 for each x X. h i h i ∈ Proof. (a): One computes, for each x,y,z X and each λ,µ K, ∈ ∈ Def. 10.1(iii) Def. 10.1(ii) x,λy + µz = λy + µz,x = λ y,x + µ z,x h i h i h i h i Def. 10.1(iii) = λ¯ y,x +¯µ z,x = λ¯ x,y +¯µ x,z . h i h i h i h i (b): One computes, for each x X, ∈ Def. 10.1(iii) Def. 10.1(ii) x, 0 = 0,x = 0x,x = 0 x,x =0, h i h i h i h i thereby completing the proof of the lemma.  10 VECTOR SPACES WITH INNER PRODUCTS 137

Remark 10.3. If X is a vector space over K with an inner product , , then the map h· ·i : X R+, x := x,x , k · k −→ 0 k k h i defines a norm on X (cf. [Phi16b, Prop. 1.65]). Onep calls this the norm induced by the inner product. Definition 10.4. Let X be a vector space over K. If , is an inner product on X, then X, , is called an or a pre-Hilberth· ·i space. An inner product space is calledh· ·i a if, and only if, (X, ) is a , where is the  induced norm, i.e. x := x,x . Frequently,k·k the inner product on X is understoodk·k and X itself is referredk k to ash an inneri product space or Hilbert space. p Example 10.5. (a) On the space Kn, n N, we define an inner product by letting, for each z =(z ,...,z ) Kn, w =(w∈,...,w ) Kn: 1 n ∈ 1 n ∈ n z w := z w¯ (10.1) · j j j=1 X (called the standard inner product on Kn, also the Euclidean inner product for K = R). Let us verify that (10.1), indeed, defines an inner product in the sense

of Def. 10.1: If z = 0, then there is j0 1,...,n such that zj0 = 0. Thus, n 2 6 2 ∈ { } 6 n z z = j=1 zj zj0 > 0, i.e. Def. 10.1(i) is satisfied. Next, let z,w,u K and· λ,µ K.| One| ≥ computes | | ∈ P∈ n n n (λz + µw) u = (λz + µw )¯u = λz u¯ + µw u¯ = λ(z u)+ µ(w u), · j j j j j j j · · j=1 j=1 j=1 X X X i.e. Def. 10.1(ii) is satisfied. For Def. 10.1(iii), merely note that

n n z w = z w¯ = w z¯ = w z. · j j j j · j=1 j=1 X X Hence, we have shown that (10.1) defines an inner product according to Def. 10.1. Due to [Phi16b, Prop. 1.59(b)], the induced norm is complete, i.e. Kn with the inner product of (10.1) is a Hilbert space.

(b) Let a,b R, a

b , : X X K, f,g := f g : (10.2) h· ·i × −→ h i Za 10 VECTOR SPACES WITH INNER PRODUCTS 138

We verify that (10.2), indeed, defines an inner product: If f X, f 0, then there exists t [a,b] such that f(t) = α R+. As f is continuous,∈ there6≡ exists δ > 0 such that∈ | | ∈ α f(s) > , s∈[a,b]∩∀[t−δ,t+δ] | | 2

implying b f,f = f 2 f 2 > 0. h i | | ≥ | | Za Z[a,b]∩[t−δ,t+δ] Next, let f,g,h X and λ,µ K. One computes ∈ ∈ λf + µg,h = (λf + µg) h dµ = λ fh dµ + µ gh dµ = λ f,h + µλ g,h . h i h i h i ZΩ ZΩ ZΩ Moreover, f,g = f g dµ = fg dµ = g,f , f,g∀∈X h i h i ZΩ ZΩ showing , to be an inner product on X. As it turns out, (X, ) is an example h· ·i k · k of an inner product space that is not a Hilbert space: With respect to the norm 2 induced by , (usually called the 2-norm, denoted 2), X is dense in L [a,b] (the space of square-integrableh· ·i functions with respect tok·k Lebesgue (or Borel) measure on [a,b], cf., e.g. [Phi17a, Th. 2.49(a)]), which is the completion of X with respect to . k · k2

10.2 Preserving Norm, Metric, Inner Product

The following Th. 10.6 provides important relations between the inner product and its induced norm and metric:

Theorem 10.6. Let X, , be an inner product space over K with induced norm . Then the following assertionsh· ·i hold true: k·k  (a) Cauchy-Schwarz Inequality:

x,y x y . x,y∀∈X |h i|≤k k k k

(b) Parallelogram Law:

x + y 2 + x y 2 =2 x 2 + y 2 . (10.3) x,y∀∈X k k k − k k k k k  10 VECTOR SPACES WITH INNER PRODUCTS 139

(c) Metric Expressed via Norm and Inner Product:

x y 2 = x 2 x,y y,x + y 2. (10.4) x,y∀∈X k − k k k −h i−h i k k

(d) If K = R, then

1 (10.3) 1 x,y = x + y 2 x y 2 = x 2 + y 2 x y 2 . (10.5) x,y∀∈X h i 4 k k − k − k 2 k k k k − k − k   If K = C, then

1 x,y = x + y 2 x y 2 + i x + iy 2 i x iy 2 . x,y∀∈X h i 4 k k − k − k k k − k − k  Proof. (a) was proved as [Phi16b, Th. 1.64]. (b),(c): The computation

x y 2 = x y,x y = x 2 x,y y,x + y 2 k − k h − − i k k −h i−h i k k proves (c), whereas

x + y 2 + x y 2 = x 2 + x,y + y,x + y 2 + x 2 x,y y,x + y 2 k k k − k k k h i h i k k k k −h i−h i k k =2 x 2 + y 2 k k k k proves (10.3).  (d): If K = R, then x + y 2 x y 2 =4 x,y . k k − k − k h i If K = C, then

x + y 2 x y 2 + i x + iy 2 i x iy 2 = 4Re x,y +4i Re x,iy k k − k − k k k − k − k h i h i = 4Re x,y +4Im x,y =4 x,y , h i h i h i proving (d). 

One can actually also show (with more effort) that a normed space that satisfies (10.3) must be an inner product space, see, e.g., [Wer11, Th. V.1.7]. Definition 10.7. Let (X, ), (Y, ) be normed vector spaces over K, f : X Y . k · k k · k −→ (a) One calls f norm-preserving if, and only if,

f(v) = v . v∈∀X k k k k 10 VECTOR SPACES WITH INNER PRODUCTS 140

(b) One calls f isometric if, and only if,

f(u) f(v) = u v u,v∀∈X k − k k − k (i.e. if, and only if, f preserves the metric induced by the norm on X).

(c) If the norms on X and Y are induced via inner products , on X and Y , re- spectively, v 2 = v,v , then one calls f inner product-preservingh· ·i if, and only if, k k h i f(u),f(v) = u,v . u,v∀∈X h i h i —

While, in general, neither norm-preserving nor isometric implies any of the other prop- erties defined in Def. 10.7 (cf. Ex. 10.9(a),(b) below), there exist simple as well as subtle relationships between these notions and also relationships with linearity:

Theorem 10.8. X, , , Y, , be inner product spaces over K, f : X Y . We consider X,Y with theh· respective·i h· induced·i norms and metrics, v 2 = v,v −→, d(u,v) =   u v . k k h i k − k (a) If f is inner product-preserving, then f is norm-preserving, isometric, and linear.

(b) If K = R and f is isometric as well as norm-preserving, then f is inner product- preserving (this does, in general, not hold for K = C, see Ex. 10.9(c)).

(c) If f is isometric with f(0) = 0, then f is norm-preserving and, for K = R, also inner product-preserving (however, once again, cf. Ex. 10.9(c) for K = C).

(d) If K = R and f is isometric, then f is affine (i.e. x f(x) f(0) is linear, cf. Def. 1.26)7. As in (b) and (c), Ex. 10.9(c) below shows7→ that− the result does not extend to K = C.

(e) If f is linear, then the following statements are equivalent (where the equivalence between (i) and (ii) even holds in arbitrary normed vector spaces X,Y over K):

(i) f is isometric.

7This result holds in more general situations: If X and Y are arbitrary normed vector spaces over R and f : X Y is isometric, then f must be linear, provided Y is strictly convex (cf. [FJ03, Th. 1.3.8] and see−→ Ex. 10.9(e) for a definition of strictly convex spaces) or provided f is surjective (this is the Mazur-Ulam Theorem, cf. [FJ03, Th. 1.3.5]). However, there exist nonlinear isometries into spaces that are not strictly convex, cf. Ex. 10.9(e). 10 VECTOR SPACES WITH INNER PRODUCTS 141

(ii) f is norm-preserving. (iii) f is inner product-preserving.

Proof. (a): Assume f to be inner product-preserving and let u,v X, λ K. Then, ∈ ∈ f(v) 2 = f(v),f(v) = v,v = v 2, k k h i h i k k showing f to be norm-preserving. In consequence,

(10.4) f(u) f(v) 2 = f(u) 2 f(u),f(v) f(v),f(u) + f(v) 2 k − k k k −h i−h i k k = u 2 u,v v,u + v 2, k k −h i−h i k k showing f to be isometric. Moreover,

f(λu) λf(u) 2 = f(λu) λf(u), f(λu) λf(u) k − k h − − i = f(λu), f(λu) f(λu), λf(u) λf(u), f(λu) + λf(u), λf(u) h i−h i−h i h i = λu, λu λ f(λu), f(u) λ f(u), f(λu) + λλ f(u), f(u) h i− h i− h i h i = λλ u, u λλ u, u λλ u, u + λλ u, u =0, h i− h i− h i h i proving f(λu)= λf(u), and

f(u + v) (f(u)+ f(v)) 2 = f(u + v) (f(u)+ f(v)), f(u + v) (f(u)+ f(v)) k − k h − − i = f(u + v), f(u + v) f(u + v), (f(u)+ f(v)) h i−h i (f(u)+ f(v)), f(u + v) + (f(u)+ f(v)), (f(u)+ f(v)) −h i h i = u + v, u + v f(u + v), f(u) f(u + v), f(v) h i−h i−h i f(u), f(u + v) f(v), f(u + v) −h i−h i + f(u), f(u) + f(u), f(v) + f(v), f(u) + f(v), f(v) h i h i h i h i = u, u + u, v + v, u + v, v h i h i h i h i u, u v, u u, v v, v −h i−h i−h i−h i u, u u, v v, u v, v −h i−h i−h i−h i + u, u + u, v + v, u + v, v =0, h i h i h i h i proving f(u + v)= f(u)+ f(v). (b) is immediate from (10.5). (c): If f is isometric with f(0) = 0, then f is always norm-preserving due to

f(x) = f(x) 0 = f(x) f(0) = x 0 = x x∈∀X k k k − k k − k k − k k k 10 VECTOR SPACES WITH INNER PRODUCTS 142

(this works in arbitrary normed vector spaces X,Y over R or C). According to (b), f is then also inner product-preserving. (d): Let f be isometric. To show f is affine, it suffices to show

F : X Y, F (x) := f(x) f(0), −→ − is linear. Due to

F (u) F (v) = f(u) f(v) = u v , u,v∀∈X k − k k − k k − k F is isometric as well. Thus, by (c), F is inner product-preserving, which, by (a), implies F to be linear, as desired. (e): (iii) (i),(ii) holds due to (a). Next, we show (i) (ii): As f is linear, we have ⇒ ⇔ (ii) f(u) = u ⇔u∈ ∀X k k k k f(u) f(v) = f(u v) = u v (i). ⇔u,v ∀∈X − − k − k ⇔

It remains to prove (ii) (iii). To this end, let u,v X. Then (ii) and the linearity of f imply ⇒ ∈

f(u),f(u) + f(u),f(v) + f(v),f(u) + f(v),f(v) = f(u)+ f(v),f(u)+ f(v) h i h i h i h i h i = f(u + v),f(u + v) = f(u + v) 2 = u + v 2 h i k k k k = u + v,u + v = u,u + u,v + v,u + v,v h i h i h i h i h i and, thus, using (ii) again,

f(u),f(v) + f(v),f(u) = u,v + v,u . h i h i h i h i Similarly,

f(u),f(u) i f(u),f(v) + i f(v),f(u) + f(v),f(v) = U(u + iv), U(u + iv) h i− h i h i h i h i = u + iv,u + iv = u,u i u,v + i v,u + v,v h i h i− h i h i h i and, thus, f(u),f(v) f(v),f(u) = u,v v,u . h i−h i h i−h i Adding both results and dividing by 2 then yields (iii). 

Example 10.9. Let (X, ), (Y, ) be normed vector spaces over K. k · k k · k 10 VECTOR SPACES WITH INNER PRODUCTS 143

(a) If 0 = a X, then the translation 6 ∈ f : X X, f(x) := x + a, −→ is isometric due to

f(u) f(v) = u + a (v + a) = u v . u,v∀∈X k − k k − k k − k However, f is not norm-preserving due to

f(0) = a =0, k k k k 6 and, if , is an inner product on X, then f is not inner product-preserving due to h· ·i f(0),f(0) = a,a =0. h i h i 6 (b) The following maps g and h are norm-preserving, but neither continuous (with respect to the induced metric) nor linear. Thus, according to [Phi16b, Lem. 2.32(b)], the maps are not isometric and, using Th. 10.8(a), they are not inner product- preserving (if , is an inner product on X). To define g, let 0 = a X and set h· ·i 6 ∈ x for x = a, g : X X, g(x) := 6 −→ a for x = a. (− Clearly, g is norm-preserving, discontinuous in a, and nonlinear (e.g. 2a = g(2a) = 2a =2g(a)). The map 6 − x for Re x Q and Im x Q, h : K K, h(x) := ∈ ∈ −→ x otherwise, (− is, clearly, norm-preserving, nowhere continuous, except in 0, and nonlinear (e.g. 1 √2= h(1 + √2) =1 √2= h(1) + h(√2)). − − 6 − (c) The map f : C C, f(z) := z, −→ satisfies f(0) = 0, is surjective, and it is norm-preserving as well as isometric due to z = z and z w = z w , but neither inner-product preserving nor C-linear | | | | | − | | − | 2 (it is conjugate linear due to f(wz) = wf(z)). If we consider (C , 2), then the map k · k g : C2 C2, g(z,w):=(z, w), −→ 10 VECTOR SPACES WITH INNER PRODUCTS 144

2 satisfies g(0) = 0, is surjective, norm-preserving (due to g(z,w) 2 = zz + ww = (z,w) 2), and isometric (due to k k k k2 g(z,w) g(u,v) 2 = (z u, w v) 2 = z u 2 + w v 2 = z u 2 + w v 2 k − k2 k − − k2 | − | | − | | − | | − | = (z,w) (u,v) 2), k − k2 but neither C-linear nor conjugate linear. (d) Let (X, )=(R, ) and (Y, )=(R2, ). Then the map k · k |·| k · k k · k∞ f : R R2, f(x):=(x, sin x), −→ satisfies f(0) = (0, 0), is isometric (due to

f(x) f(y) = max x y , sin x sin y = x y , k − k∞ {| − | | − |} | − | recalling sin to be 1-Lipschitz), but not linear. (e) The (Y, ) over R is called strictly convex if, and only if, there do not exist vectors u,vk ·Y k with u = v and such that ∈ 6 αu + (1 α) v =1 (10.6) α∈∀[0,1] −

(if is induced by an inner product on Y , then (Y, ) is strictly convex; whereas nk·k k·k (R , ∞) is not strictly convex for n> 1). If (Y, ) is not strictly convex and u,v kY · kwith u = v are such that (10.6) holds, thenk · k ∈ 6 su for s 1, f : R Y, f(s) := ≤ −→ u +(s 1) v for s> 1, ( − satisfies f(0) = (0, 0) and is isometric: To prove isometry, note

su tu = s t for s,t 1, f(s) f(t) = k − k | − | ≤ k − k u +(s 1)v u (t 1)v = s t for 1 < s,t, (k − − − − k | − | and, for s 1

− since 0 t−1 , 1−s 1 and t−1 + 1− s = 1. However, f is not linear, e.g., since ≤ t−s t−s ≤ t−s t− s u + v = f(2) = f(1+1) = f(1) + f(1) = 2u. 6 10 VECTOR SPACES WITH INNER PRODUCTS 145

10.3 Orthogonality

Definition 10.10. Let X, , be an inner product space over K. h· ·i (a) x,y X are called orthogonal or perpendicular (denoted x y) if, and only if, x,y∈= 0. ⊥ h i (b) Let E X. Define the perpendicular space E⊥ to E (called E perp) by ⊆ E⊥ := y X : x,y =0 . (10.7) ∈ x∈∀E h i   Caveat: As is common, we use the same symbol to denote the perpendicular space that we used to denote the forward annihilator in Def. 2.13, even though these objects are not the same: The perpendicular space is a subset of X, whereas the forward annihilator is a subset of X′. In the following, when dealing with inner product spaces, E⊥ will always mean the perpendicular space. (c) If X = V V with subspaces V ,V of X, then we call X the orthogonal sum of 1 ⊕ 2 1 2 V1,V2 if, and only if, v1 v2 for each v1 V1, v2 V2. In this case, we also write X = V V . ⊥ ∈ ∈ 1 ⊥ 2 (d) Let S X. Then S is an orthogonal system if, and only if, x y for each x,y S with x⊆= y.A unit vector is x X such that x = 1⊥ (with respect to∈ the induced6 norm on X). Then S is called∈ an orthonormalk k system if, and only if, S is an orthogonal system consisting entirely of unit vectors. Finally, S is called an orthonormal basis if, and only if, it is a maximal orthonormal system in the sense that, if S T X and T is an orthonormal system, then S = T (caveat: if X is an infinite-dimensional⊆ ⊆ Hilbert space, then an orthonormal basis of X is not(!) a vector space basis of X). Lemma 10.11. Let X, , be an inner product space over K, E X. h· ·i ⊆ (a) E E⊥ 0 .  ∩ ⊆ { } (b) E⊥ is a subspace of X. (c) X⊥ = 0 and 0 ⊥ = X. { } { } Proof. (a): If x E E⊥, then x,x = 0, implying x = 0. ∈ ∩ h i (b): We have 0 E⊥ and ∈

x,λy1 + µy2 = λ x,y1 + µ x,y2 =0, ⊥ λ,µ∀∈K y1,y2∀∈E x∈∀E h i h i h i 10 VECTOR SPACES WITH INNER PRODUCTS 146

showing λy + µy E⊥, i.e. E⊥ is a subspace of X. 1 2 ∈ (c): Is x X⊥, then 0= x,x , implying x = 0. On the other hand, x, 0 = 0 for each x X by∈ Lem. 10.2(b). h i h i  ∈ Proposition 10.12. Let X, , be an inner product space over K and let S X be an orthogonal system. h· ·i ⊆  (a) S 0 is linearly independent. \ { } (b) Pythagoras’ Theorem: If s ,...,s S are distinct, n N, then 1 n ∈ ∈ n 2 n 2 si = si , k k i=1 i=1 X X where is the induced norm on X. k · k

Proof. (a): Suppose n N and λ1,...,λn K together with s1,...,sn S 0 distinct are such that n λ s∈= 0. Then, as s ,s∈ = 0 for each k = j, we obtain∈ \{ } i=1 i i h k ji 6 P n n 0= 0,sj = λisi,sj = λi si,sj = λj sj,sj , j∈{1∀,...,n} h i h i h i * i=1 + i=1 X X which yields λj = 0 by Def. 10.1(i). Thus, we have shown that λj = 0 for each j 1,...,n , which establishes the linear independence of S 0 . ∈ { } \ { } (b): We compute

n 2 n n n n n n 2 si = si, sj = si,sj = si,si = si , * + h i h i k k i=1 i=1 j=1 i=1 j=1 i=1 i=1 X X X X X X X thereby establishing the case. 

To obtain orthogonal systems and orthonormal systems in inner product spaces, one can apply the algorithm provided by the following Th. 10.13: Theorem 10.13 (Gram-Schmidt Orthogonalization). Let X, , be an inner product space over K with induced norm . Let x ,x ,... be a finiteh· ·i or infinite sequence of k · k 0 1  vectors in X. Define v0,v1,... recursively as follows:

n−1 xn,vk v0 := x0, vn := xn h 2i vk (10.8) − vk k=0, k k vXk6=0 10 VECTOR SPACES WITH INNER PRODUCTS 147

for each n N, additionally assuming that n is less than or equal to the max index of ∈ the sequence x0,x1,... if the sequence is finite. Then the set v0,v1,... constitutes an orthogonal system. Of course, by omitting the v = 0 and by{ dividing} each v = 0 by k k 6 its norm, one can also obtain an orthonormal system (nonempty if at least one vk =0). Moreover, v =0 if, and only if, x span x ,...,x . In particular, if the x ,x6 ,... n n ∈ { 0 n−1} 0 1 are all linearly independent, then so are the v0,v1,... .

Proof. We show by induction on n N , that, for each 0 m 0 and 0 m

Example 10.14. In the space C[ 1, 1] with the inner product according to Ex. 10.5(b), consider − i xi : [ 1, 1] K, xi(x) := x . i∈∀N0 − −→ We check that the first four orthogonal polynomials resulting from applying (10.8) to x0,x1,... are given by 1 3 v (x)=1, v (x)= x, v (x)= x2 , v (x)= x3 x. 0 1 2 − 3 3 − 5 One has v = x 1 and, then, obtains successively from (10.8): 0 0 ≡ 1 x1,v0 x1,v0 −1 x dx v1(x)= x1(x) h 2iv0(x)= x h 2i = x 1 = x, − v0 − v0 − dx k k k k R −1 R 10 VECTOR SPACES WITH INNER PRODUCTS 148

1 2 1 3 x2,v0 x2,v1 2 −1 x dx −1 x dx v2(x)= x2(x) h 2iv0(x) h 2iv1(x)= x 1 x − v0 − v1 − 2 − x2 dx k k k k R R−1 1 = x2 . R − 3 x ,v x ,v x ,v v (x)= x (x) h 3 0iv (x) h 3 1iv (x) h 3 2iv (x) 3 3 − v 2 0 − v 2 1 − v 2 2 k 0k k 1k k 2k 1 1 1 x3 dx x4 dx x3 x2 1 dx 1 = x3 −1 −1 x −1 − 3 x2 1 1 1 2 − R 2 − R x2 dx − R x2 dx − 3 −1 −1 − 3   2 3 5 3 3R R  = x 2 x = x x. − 3 − 5 Definition 10.15. Let X, , and Y, , be inner product spaces over K. We call X and Y isometrically isomorphich· ·i if, andh· ·i only if, there exists an isometric linear   isomorphism A (X,Y ) (i.e., by Th. 10.8(e), a linear isomorphism A, satisfying Au, Av = u,v ∈for L each u,v X). h i h i ∈ Theorem 10.16. Let X, , be a finite-dimensional inner product space over K. h· ·i  (a) An orthonormal system S X is an orthonormal basis of X if, and only if, S is a vector space basis of X. ⊆

(b) X has an orthonormal basis.

(c) If Y, , is an inner product space over K, then X and Y are isometrically isomorphich· ·i if, and only if, dim X = dim Y .  (d) If U is a subspace of X, then the following holds:

(i) X = U U ⊥. ⊥ (ii) dim U ⊥ = dim X dim U. − (iii) (U ⊥)⊥ = U.

Proof. (a): Let S be an orthonormal system. If S is an orthonormal basis, then, by Prop. 10.12(a) and Th. 10.13, S is a maximal linearly independent subset of X, showing S to be a vector space basis of X. Conversely, if S is a vector space basis of X, then, again using Prop. 10.12(a), S must be a maximal orthonormal system, i.e. an orthonormal basis of X. (b): According to Th. 10.13, applying Gram-Schmidt orthogonalization to a vector space basis of X yields an orthonormal basis of X. 10 VECTOR SPACES WITH INNER PRODUCTS 149

(c): We already know that the existence of a linear isomorphism between X and Y implies dim X = dim Y . Conversely, assume n := dim X = dim Y N, and let BX = x ,...,x , B = y ,...,y be orthonormal bases of X,Y , respectively.∈ Define { 1 n} Y { 1 n} A (X,Y ) by letting Axi = yi for each i 1,...,n . As A(BX )= BY , A is a linear isomorphism.∈L Moreover, if λ ,...,λ ,µ ,...,µ∈ { K, then} 1 n 1 n ∈ n n n n n n A λ x ,A µ x = λ µ y ,y = λ µ δ i i j j i jh i ji i j ij * i=1 ! j=1 !+ i=1 j=1 i=1 j=1 X X X X X X n n n n = λ µ x ,x = λ x , µ x , i jh i ji i i j j i=1 j=1 * i=1 j=1 + X X X X showing A to be isometric. (d): Let x ,...,x be a basis of X such that x ,...,x , 1 m n, is a basis { 1 n} { 1 m} ≤ ≤ of U, dim X = n, dim U = m. In this case, if v1,...,vn are given by Gram-Schmidt orthogonalization according to Th. 10.13, then Th. 10.13 yields v1,...,vn to be a { ⊥} basis of X and v1,...,vm to be a basis of U. Since vm+1,...,vn U , we have X = U + U ⊥.{ As we also} know U U ⊥ = 0 by Lem.{ 10.11(a),} ⊆we have shown ∩ { } X = U U ⊥, then yielding dim U ⊥ = dim X dim U as well. As a consequence of (ii), we have⊥ dim(U ⊥)⊥ = dim U, which, together− with U (U ⊥)⊥, yields (U ⊥)⊥ = U.  ⊆ Using Zorn’s lemma, one can extend Th. 10.16(b) to infinite-dimensional spaces (cf. [Phi17b, Th. 4.31(a)]). However, as remarked before, an orthonormal basis of a Hilbert space X is a vector space basis of X if, and only if, dim X < (cf. [Phi17b, Rem. 4.33]). Moreover, Th. 10.16(c) also extends to infinite-dimensional∞ Hilbert spaces, which are isometrically isomorphic if, and only if, they have orthonormal bases of the same cardinality (cf. [Phi17b, Th. 4.31(c)]). If one adds the assumption that the subspace U be closed (with respect to the induced norm), then Th. 10.16(c) extends to infinite- dimensional Hilbert spaces as well (cf. [Phi17b, Th. 4.20(e),(f)]). If X is a finite-dimensional inner product space, then all linear forms on X (i.e. all elements of the dual X′) are given by means of the inner product on X:

Theorem 10.17. Let X, , be a finite-dimensional inner product space over K. Then the map h· ·i  ψ : X X′, ψ(y) := α , (10.9) −→ y where α : X K, α (a) := a,y , (10.10) y −→ y h i is bijective and conjugate-linear (in particular, each α X′ can be represented by y X, and, if K = R, then ψ is a linear isomorphism). ∈ ∈ 10 VECTOR SPACES WITH INNER PRODUCTS 150

Proof. According to Def. 10.1(ii), for each y X, αy is linear, i.e. ψ is well-defined. Moreover, ∈ ψ(λy + µy )(a)= a,λy + µy = λ a,y + µ a,y 1 2 h 1 2i h 1i h 2i y ,y ∈X a∈X λ,µ∀∈K 1 ∀2 ∀ =(λψ(y1)+ µψ(y2))(a), showing ψ to be conjugate-linear. It remains to show ψ is bijective. Let x ,...,x { 1 n} be a basis of X, dim X = n. It suffices to show that B := αx1 ,...,αxn is a basis of X′. As dim X = dim X′, it even suffices to show that B is linearly{ independent.} To this end, let λ ,...,λ K be such that 0 = n λ α . Then, for each v X, 1 n ∈ i=1 i xi ∈ n n P n n 0= λ α v = λ α v = λ v,x = v, λ x , i xi i xi i h ii i i i=1 ! i=1 i=1 * i=1 + X X X X showing n λ x X⊥ = 0 . As the x are linearly independent, this yields λ = i=1 i i ∈ { } i 1 = λn = 0 and the desired linear independence of B.  ··· P Remark 10.18. The above Th. 10.17 is a finite-dimensional version of the Riesz rep- resentation theorem for Hilbert spaces, which states that Th. 10.17 remains true if X ′ ′ is an infinite-dimensional Hilbert space and X is replaced by the topological dual Xtop of X, consisting of all linear forms on X that are also continuous with respect to the induced norm on X, cf. [Phi17b, Ex. 3.1] (recall that, if X is finite-dimensional, then all linear forms on X are automatically continuous, cf. [Phi16b, Ex. 2.16]). Proposition 10.19. (a) If M =(m ) (n, C), n N, is such that kl ∈M ∈ x∗Mx =0, (10.11) x∈∀Cn then M =0. (b) If X, , is a finite-dimensional inner product space over C and A (X,X) is such thath· ·i ∈L  Ax, x =0, (10.12) x∈∀X h i then A =0. Caveat: This result does not extend to finite-dimensional vector spaces over R (cf. Ex. 10.20 below).

Proof. (a): From matrix multiplication, we have

n l=1 m1lxl n n x∗Mx = x ... x . = m x x . 1 n P .  kl k l n k=1 l=1 mnlxl   l=1  X X   P 10 VECTOR SPACES WITH INNER PRODUCTS 151

Choosing x to be the standard basis vector e , α 1,...,n , we obtain α ∈ { } ∗ 0= eαMeα = mαα.

Now let α, β 1,...,n with α = β and assume mαβ = a + bi, mβα = c + di with a,b,c,d R. If∈x { := (s +}ti)e + e 6 , then ∈ α β n n 0= m x x = m x x + m x x =(a + bi)(s ti)+(c + di)(s + ti) kl k l αβ α β βα β α − Xk=1 Xl=1 = as + bt + cs dt +(bs at + ct + ds)i − − =(a + c)s +(b d)t + (b + d)s +(c a)t i, − − implying  (a + c)s +(b d)t =0 (b + d)s +(c a)t =0. − ∧ − Choosing s = 0 and t = c a yields c = a; choosing s = a + c and t = 0 then yields a + c =2a = 0, implying a =0=− c. Likewise, choosing s = 0 and t = b d yields b = d; then choosing s = b + d and t = 0 yields b + d = 2b = 0, implying b −=0= d. Thus, mαβ = mβα = 0, completing the proof that M = 0. (b): Let n := dim X N. According to Th. 10.16(c), there exists a linear isomorphism I : X Cn such that∈ −→ x,y = Ix,Iy 2, x,y∀∈X h i h i n n where , 2 denotes the standard inner product on C (i.e. u,v 2 = k=1 ukvk). Thus, if we leth· B·i := I A I−1, then B (Cn, Cn) and h i ◦ ◦ ∈L P −1 −1 −1 −1 Bu,u 2 = (I A I )u, (I I )u = (A(I u),I u =0. u∈∀Cn h i ◦ ◦ ◦ 2

Now, if M (n, C) represents B with respect to the standard basis of Cn, then, for n∈M∗ each u C , u Mu = Mu,u 2 = Bu,u 2 = 0, such that M = 0 by (a). Thus, B = 0, also implying∈ A = I−1 h B Ii= 0.h i  ◦ ◦ Example 10.20. Consider Rn, n N, n 2, with the standard inner product (i.e. n ∈ ≥ u,v = k=1 ukvk), and the standard basis e1,...,en . Suppose the linear map A : h n i n { } R R is defined by Ae1 := e2, Ae2 := e1, Aek := 0 for each k 1,...,n 1, 2 . Then−→A =P 0, but − ∈ { }\{ } 6

Ax, x = x2e1 x1e2,x = x2x1 x1x2 =0. n x=(x1,...,x∀ n)∈R h i h − i − Proposition 10.35 below will provide more thorough information on the real case in comparison with Prop. 10.19. 10 VECTOR SPACES WITH INNER PRODUCTS 152

10.4 The Adjoint Map

Definition 10.21. Let X1, , , X2, , be a finite-dimensional inner product h· ·i h· ·i ′ ′ ′ space over K. Moreover, let A (X1,X2), let A (X2,X1) be the dual map ∈ L ′ ∈ L ′ according to Def. 2.26, and let ψ1 : X1 X1, ψ2 : X2 X2 be the maps given by the Th. 10.17. Then the map −→ −→

A∗ : X X ,A∗ := ψ−1 A′ ψ , (10.13) 2 −→ 1 1 ◦ ◦ 2 is called the adjoint map of A.

Theorem 10.22. Let X1, , , X2, , be finite-dimensional inner product spaces over K. Let A (X ,X ).h· ·i h· ·i ∈L 1 2   (a) One has A∗ (X ,X ), and A∗ is the unique map X X such that ∈L 2 1 2 −→ 1 Ax, y = x, A∗y . (10.14) x∈∀X1 y∈∀X2 h i h i

(b) One has A∗∗ = A.

(c) One has that A A∗ is a conjugate-linear bijection of (X ,X ) onto (X ,X ). 7→ L 1 2 L 2 1 ∗ (d) (IdX1 ) = IdX1 .

(e) If X3, , is another finite-dimensional inner product space over K and B (X ,Xh·),·i then ∈ 2 3  L (B A)∗ = A∗ B∗. ◦ ◦ (f) ker(A∗) = (Im A)⊥ and Im(A∗) = (ker A)⊥.

(g) A is a monomorphism if, and only if, A∗ is an epimorphism.

(h) A is a an epimorphism if, and only if, A∗ is a monomorphism.

(i) A−1 (X ,X ) exists if, and only if, (A∗)−1 (X ,X ) exists, and, in that case, ∈L 2 1 ∈L 1 2 (A∗)−1 =(A−1)∗.

(j) A is isometric if, and only if, A∗ = A−1. 10 VECTOR SPACES WITH INNER PRODUCTS 153

∗ ′ −1 Proof. (a): Let A (X1,X2). Then A (X2,X1), since A is linear, and ψ1 and ∈ L ∈ L ′ ′ ψ2 are both conjugate-linear. Moreover, we know A is the unique map on X2 such that

A′(β)(x)= β(A(x)). ′ β∈∀X2 x∈∀X1

Thus, for each x X and each y X , ∈ 1 ∈ 2 x, A∗y = x, (ψ−1 A′ ψ )(y) = ψ (ψ−1 A′ ψ )(y) (x) h i h 1 ◦ ◦ 2 i 1 1 ◦ ◦ 2 = A′(ψ (y))(x)= ψ (y)(Ax)= Ax, y , 2 2 h i 

proving (10.14). For each y X2 and each x X1, we have Ax, y = (ψ2(y))(Ax) = ∈ ∈ ∗ h−1 i (ψ2(y)) A (x). Then Th. 10.17 and (10.14) imply A (y)= ψ1 (ψ2(y)) A , showing A∗ to be◦ uniquely determined by (10.14). ◦   (b): According to (a), A∗∗ is the unique map X X such that 1 −→ 2 A∗y,x = y, A∗∗x . x∈∀X1 y∈∀X2 h i h i Comparing with (10.14) yields A = A∗∗. (c): If A, B (X ,X ) and λ K, then, for each y X , ∈L 1 2 ∈ ∈ 2 (A + B)∗(y)=(ψ−1 (A + B)′ ψ )(y)=(ψ−1 (A′ + B′) ψ )(y) 1 ◦ ◦ 2 1 ◦ ◦ 2 =(ψ−1 A′ ψ )(y)+(ψ−1 B′ ψ )(y)=(A∗ + B∗)(y) 1 ◦ ◦ 2 1 ◦ ◦ 2 and (λA)∗(y)=(ψ−1 (λA)′ ψ )(y)= λ(ψ−1 A′ ψ )(y)=(λA∗)(y), 1 ◦ ◦ 2 1 ◦ ◦ 2 showing A A∗ to be conjugate-linear. Moreover, A A∗ is bijective due to (b). 7→ 7→ ∗ −1 ′ −1 (d): One has (IdX ) = ψ (IdX ) ψ1 = ψ IdX′ ψ1 = IdX . 1 1 ◦ 1 ◦ 1 ◦ 1 ◦ 1 (e): Let ψ : X X′ be given by Th. 10.17. Then 3 3 −→ 3 A∗ B∗ = ψ−1 A′ ψ ψ−1 B′ ψ = ψ−1 (B A)′ ψ =(B A)∗. ◦ 1 ◦ ◦ 2 ◦ 2 ◦ ◦ 3 1 ◦ ◦ ◦ 3 ◦ (f): We have

y ker(A∗) x, A∗y =0 Ax, y =0 y (Im A)⊥. ∈ ⇔x∈ ∀X1 h i ⇔x∈ ∀X1 h i ⇔ ∈ Applying the first part with A replaced by A∗ yields

ker A = ker A∗∗ = (Im(A∗))⊥ 10 VECTOR SPACES WITH INNER PRODUCTS 154

and, thus, using Th. 10.16(d)(iii),

⊥ Im(A∗)= (Im(A∗))⊥ = (ker A)⊥.  (g): According to (f), we have ker A = 0 Im(A∗) = (ker A)⊥ = 0 ⊥ = X . { } ⇔ { } 1 (h): According to (f), we have ker(A∗)= 0 Im A = (ker(A∗))⊥ = 0 ⊥ = X . { } ⇔ { } 2 (i): One has A−1 (X ,X ) exists (A−1)′ =(A′)−1 (X′ ,X′ ) exists ∈L 2 1 ⇔ ∈L 1 2 (A∗)−1 =(ψ−1 A′ ψ )−1 (X ,X ) exists. ⇔ 1 ◦ ◦ 2 ∈L 1 2 Moreover, if A−1 (X ,X ) exists, then ∈L 2 1 (A∗)−1 = ψ−1 (A−1)′ ψ =(A−1)∗, 2 ◦ ◦ 1 completing the proof. (j): If A is isometric, then Ax, y = Ax, AA−1y = x, A−1y , x∈∀X1 y∈∀X2 h i h i h i such that (a) implies A∗ = A−1. Conversely, if A∗ = A−1, then Au, Av = u, A∗Av = u,v , u,v∀∈X1 h i h i h i proving A to be isometric. 

For extensions of Def. 10.21 and Th. 10.22 to infinite-dimensional Hilbert spaces, see [Phi17b, Def. 4.34], [Phi17b, Cor. 4.35].

Definition 10.23. Let m,n N and let M := (mkl) (m,n, K) be an m n matrix over K. We call ∈ ∈M × M := (m ) (m,n, K) kl ∈M the complex conjugate matrix of M and

t M ∗ := M = (M t) (n,m, K) ∈M the adjoint matrix of M (thus, forK = R, the adjoint matrix is the same as the transpose matrix). 10 VECTOR SPACES WITH INNER PRODUCTS 155

Theorem 10.24. Let X, , , Y, , be finite-dimensional inner product spaces over K. Let A (X,Y ). h· ·i h· ·i ∈L  

(a) Let BX := (x1,...,xn) and BY := (y1,...,ym) be ordered orthonormal bases of X and Y , respectively. If M =(mkl) (m,n, K) is the matrix of A with respect to B and B , then the adjoint matrix∈MM ∗ represents the adjoint map A∗ (Y,X) X Y ∈L with respect to BY and BX .

∗ n k (b) Let X = Y , A (X,X). Then det(A ) = det A. If χA = k=0 akX is the ∈ L n k characteristic polynomial of A, then χA∗ = akX and k=0 P σ(A∗)= λ : λP σ(A) . ∈ Proof. (a): Suppose  m n ∗ Axl = mkl yk A yk = nkl xl l∈{1∀,...,n} ∧k∈{1 ∀,...,m} Xk=1 Xl=1 with n K. Then, for each (k,l) 1,...,m 1,...,n , kl ∈ ∈ { } × { } m = Ax ,y = x ,A∗y = n , kl h l ki h l ki kl ∗ ∗ showing M to be the matrix of A with respect to BY and BX . (b): For M as in (a), it is

t (∗) det(A∗) = det(M ∗) = det M = det M = det M = det A,   where ( ) holds, as the forming of complex  conjugates  commutes with the forming of sums and∗ products of complex numbers. Next, using this fact again together with the linearity of forming the transpose of a matrix, we compute

∗ t t χ ∗ = det(X Id M ) = det X Id M = det X Id M A n − n − n − n       = det X Id M = a Xk. n − k k=0  X Thus, n ∗ k λ σ(A ) ǫ (χ ∗ )= a λ =0 ∈ ⇔ λ A k k=0 Xn k ǫ (χ )= a λ =0 λ σ(A), ⇔ λ A k ⇔ ∈ k=0 X  thereby proving (b).  10 VECTOR SPACES WITH INNER PRODUCTS 156

Definition 10.25. Let X, , be an inner product space over K and let U be a h· ·i ⊥ subspace of X such that X = U U . Then the linear projection PU : X U,  ⊥ ⊥ −→ PU (u + v) := u for u + v X with u U and v U is called the orthogonal projection from X onto U. ∈ ∈ ∈

Theorem 10.26. Let X, , be an inner product space over K, let U be a subspace of X such that X = U Uh·⊥,·i and let P : X U be the orthogonal projection onto  U U. Moreover, let denote⊥ the induced norm−→ on X. k · k (a) One has

u = PU (x) PU (x) x < u x , x∈∀X u∈∀U 6 ⇒ k − k k − k   i.e. P (x) is the strict minimum of the function u u x from U into R+ and U 7→ k − k 0 PU (x) can be viewed as the best approximation to x in U. (b) If B := u ,...,u is an orthonormal basis of U, dim U = n N, then U { 1 n} ∈ n

PU (x)= x,ui ui. x∈∀X h i i=1 X

Proof. (a): Let x X and u U with u = PU (x). Then PU (x) u U and x PU (x) ⊥ ∈ ∈ 6 − ∈ − ∈ ker PU = U . Thus,

Prop. 10.12(b) u x 2 = P (x) x 2 + u P (x) 2 > P (x) x 2, k − k k U − k k − U k k U − k thereby proving (a). (b): Let x X. As B is a basis of U, there exist λ ,...,λ K such that ∈ U 1 n ∈ n

PU (x)= λi ui. i=1 X Recalling x P (x) ker P = U ⊥, we obtain − U ∈ U

x,ui PU (x),ui = x PU (x), ui =0 i∈{1∀,...,n} h i−h i h − i

and, thus, x,ui = PU (x),ui = λi, i∈{1∀,...,n} h i h i thereby proving (b).  10 VECTOR SPACES WITH INNER PRODUCTS 157

Proposition 10.27. Let X, , be an inner product space over K with induced norm . Let P (X,X) beh· a·i projection. Then P is an orthogonal projection onto  Uk ·:= k Im P (i.e.∈X L= Im P ker P ) if, and only if P = P ∗. ⊥

Proof. First, assume X = Im P ker P . Then, for each x = ux + vx and y = uy + vy with u ,u Im P and v ,v ker⊥ P , we have x y ∈ x y ∈ Px,y = u ,u + u ,v = u ,u +0= u ,u + v ,u = x,Py , h i h x yi h x yi h x yi h x yi h x yi h i proving P = P ∗. Conversely, assume P = P ∗. Then, for each x X, ∈ (∗) Px 2 = Px,Px = P 2x,x = Px,x Px x , k k h i h i h i ≤ k k k k where ( ) holds due to the Cauchy-Schwarz inequality [Phi16b, (1.41)]. In consequence, ∗ Px x . x∈∀X k k ≤ k k Now let u Im P and 0 = v ker P . Define ∈ 6 ∈ u,v y := u h i v. − v 2 k k Then y,v = 0, implying h i 2 2 2 Prop. 10.12(b) u,v u,v u,v y 2 Py 2 = u 2 = u h i v + h i v = y 2 + |h i| . k k ≥ k k k k − v 2 v 2 k k v 2

k k k k k k As this yields u,v = 0, we have X = Im P ker P , as desired.  h i ⊥ Example 10.28. Let n N and define ∈ 0 n ikt n := (f : [ π,π] C): f(t)= γk e γ−n,...,γn C T ( − −→ ∧ ∈ ) kX=−n (due to Euler’s formula, relating the exponential function to sine and cosine, the elements of n are known as trigonometric polynomials). Clearly, U := n is a subspace of the spaceT X := C[ π,π] of Ex. 10.5(b). As it turns out, we even haveT X = U U ⊥ (we will not prove this here,− but it follows from [Phi17b, Th. 4.20(e)], since the finite-dimensional⊥ subspace U is automatically a closed subspace, cf. [Phi17b, Th. 1.16(b)]). Thus, one has an orthogonal projection PU from X onto U, yielding the best approximation of a continuous function by a trigonometric polynomial. Moreover, if one has an orthonormal 10 VECTOR SPACES WITH INNER PRODUCTS 158

basis of U, then one can use Th. 10.26(b) to compute PU (x) for each function x X. We verify that an orthonormal basis is given by the functions ∈

eikt uk : [ π,π] C, uk(t) := (k n,...,n ): − −→ √2π ∈ {− } One computes, for each k,l n,...,n , ∈ {− } π π 1 i(k−l)t 1 for k = l, u u = e dt = i(k−l)t k l 2π 1 [ e ]π =0 for k = l. Z−π Z−π ( 2π i(k−l) −π 6 Thus, for each x X, the orthogonal projection is ∈ n π 1 −ikt PU (x)= uk x(t) e dt. √2π −π kX=−n Z For example, let x(t)= t and n = 1. Then, since

π teit π π eit teit dt = dt =2πi 0=2πi, −π i −π − −π i − Z π   Z t dt =0, −π π Z te−it dt = 2πi, − Z−π P (x)=2πi +0 2πi = 0. U −

10.5 Hermitian, Unitary, and Normal Maps

Definition 10.29. Let X, , be a finite-dimensional inner product space over K and A (X,X). Moreover,h· let·i M (n, K), n N. ∈L  ∈M ∈ (a) We call A normal if, and only if, AA∗ = A∗A; likewise, we call M normal if, and only if, MM ∗ = M ∗M. We define

Nor(X) := A (X,X): A is normal , ∈L Nor (K) := M (n, K): M is normal . n  ∈M  (b) We call A Hermitian or self-adjoint if, and only if, A = A∗; likewise, we call M Hermitian or self-adjoint if, and only if, M = M ∗ (thus, for K = R, Hermitian is 10 VECTOR SPACES WITH INNER PRODUCTS 159

the same as symmetric, a notion we previously defined in [Phi19, Def. 7.25(c)] for quadratic matrices M with M = M t and that is now extended to Hermitian maps A for K = R). We call A skew-Hermitian if, and only if, A = A∗; likewise, we call M skew-Hermitian if, and only if, M = M ∗ (thus, for K−= R, skew-Hermitian is the same as skew-symmetric, a notion− we previously defined in [Phi19, Def. 7.30(a)] for quadratic matrices M with M = M t and that is now extended to skew-Hermitian − 1 ∗ maps A for K = R). Moreover, we call AHer := 2 (A + A ) the Hermitian part 1 ∗ 1 ∗ of A, MHer := 2 (M + M ) the Hermitian part of M, AskHer := 2 (A A ) the skew-Hermitian part of A, and A := 1 (M M ∗) the skew-Hermitian− part of M skHer 2 − (thus, for K = R, Asym := AHer and Msym := MHer are the same as the symmetric parts of A and M, respectively; Askew := AskHer and Mskew := MskHer are the same as the skew-symmetric parts of A and M, respectively; notions previously defined in [Phi19, Def. 7.30(b)] for quadratic matrices M, now extended to maps A for K = R). We define

Herm(X) := A (X,X): A is Hermitian , ∈L Herm (K) := M (n, K): M is Hermitian , n  ∈M skHerm(X) := A (X,X): A is skew-Hermitian ,  ∈L skHerm (K) := M (n, K): M is skew-Hermitian , n  ∈M and, for K = R, 

Sym(X) := A (X,X): A is symmetric , ∈L Sym (R) := M (n, R): M is symmetric , n  ∈M Skew(X) := A (X,X): A is skew-symmetric ,  ∈L Skew (R) := M (n, R): M is skew-symmetric . n  ∈M

 −1 ∗ (c) We call A GL(X) unitary if, and only if, A = A ; likewise, we call M GLn(K) unitary if,∈ and only if, M −1 = M ∗. If K = R, then we also call unitary∈ maps and matrices orthogonal. We define

U(X) := A (X,X): A is unitary , ∈L U (K) := M (n, K): M is unitary , n  ∈M and, for K = R, 

O(X) := A (X,X): A is orthogonal , ∈L O (R) := M (n, R): M is orthogonal . n  ∈M  10 VECTOR SPACES WITH INNER PRODUCTS 160

Proposition 10.30. Let X, , be a finite-dimensional inner product space over K, n N. We have Herm(Xh·) ·i Nor(X), Herm (K) Nor (K), U(X) Nor(X),  n n U (K)∈ Nor (K), ⊆ ⊆ ⊆ n ⊆ n Proof. That Hermitian implies normal is immediate. If A U(X), then AA∗ = A∗A = Id, i.e. A Nor(X). ∈  ∈ Proposition 10.31. Let X, , be a finite-dimensional inner product space over K, n N. h· ·i ∈ 

(a) U(X) is a subgroup of GL(X); Un(K) is a subgroup of GLn(K).

(b) If A U(X) and M Un(K), then det A = det M = 1 (specializing to K = R then∈ yields that, for A,∈ M orthogonal,| one has| det| A, det| M 1, 1 ). ∈ {− } (c) Let A, B Herm(X). Then A+B Herm(X) (if λ R, then also λA Herm(X), showing Herm(∈ X) to be a vector subspace∈ of (X,X∈) for K = R). If A∈ GL(X), then A−1 Herm(X). If AB = BA, then ABL Herm(X). The analogous∈ results also hold for∈ Hermitian matrices. ∈

Proof. (a): If A, B U(X), then (AB)−1 = B−1A−1 = B∗A∗ = (AB)∗, showing AB U(X). Also A∈= (A−1)−1 = (A∗)∗, showing A−1 U(X) and establishing the ∈ ∈ case. (b): It suffices to consider A U(X). Then AA∗ = Id, implying ∈ Th. 10.24(b) det A 2 = det A det A = det A det A∗ = detId = 1. | | · · (c): If A, B Herm(X), then (A+B)∗ = A∗ +B∗ = A+B, showing A+B Herm(X). If λ R, then∈ (λA)∗ = λA∗ = λA, showing λA Herm(X). If A Herm(X∈) GL(X), ∈ ∈ ∈ ∩ then (A−1)∗ = (A∗)−1 = A−1, proving A−1 Herm(X) GL(X). If AB = BA, then (AB)∗ = B∗A∗ = BA = AB, showing AB ∈Herm(X). ∩  ∈ In general, for normal maps and normal matrices, neither the sum nor the product is normal. However, one can show that, if A, B are normal with AB = BA, then A + B and AB are normal – this makes use of the diagonalizability of normal maps over C and is not quite as easy as one might think, cf. Prop. 10.43 below.

Proposition 10.32. Let X, , be a finite-dimensional inner product space over K, n N. h· ·i ∈  10 VECTOR SPACES WITH INNER PRODUCTS 161

(a) Let A, B skHerm(X). Then A + B skHerm(X) (if λ R, then λA skHerm(X∈), showing skHerm(X) to be a vector∈ subspace of (X,X∈ ) for K = R).∈ If A GL(X), then A−1 skHerm(X)8. The analogous resultsL also hold for Her- mitian∈ matrices. ∈

(b) Herm(X) skHerm(X)= 0 and Herm (K) skHerm (K)= 0 . ∩ { } n ∩ n { } Proof. (a): If A, B skHerm(X), then (A + B)∗ = A∗ + B∗ = A B = (A + B), showing A + B skHerm(∈ X). If λ R, then (λA)∗ = λA∗ =− λA− , showing− λA skHerm(X). If A∈ skHerm(X) GL(∈X), then (A−1)∗ = (A∗)−1−= ( A)−1 = A−1∈, proving A−1 skHerm(∈ X) GL(∩X). − − ∈ ∩ (b): Let A Herm(X) skHerm(X). Then A = A∗ = A, i.e. 2A = 0 and A = 0.  ∈ ∩ − Proposition 10.33. Let X, , be a finite-dimensional inner product space over K, A (X,X). h· ·i ∈L 

1 ∗ (a) The Hermitian part AHer = 2 (A + A ) of A is Hermitian, the skew-Hermitian part A = 1 (A A∗) of A is skew-Hermitian. skHer 2 − (b) A can be uniquely decomposed into its Hermitian and skew-Hermitian parts, i.e. A = A + A and, if A = S + B with S Herm(X) and B skHerm(X), then Her skHer ∈ ∈ S = AHer and B = AskHer.

(c) A is Hermitian if, and only if, A = AHer; A is skew-Hermitian if, and only if, A = AskHer.

The analogous results also hold for matrices, M (n, K), n N. ∈M ∈

Proof. (a): AHer is Hermitian due to 1 1 A∗ = (A + A∗)∗ = (A∗ + A)= A . Her 2 2 Her

AskHer is skew-Hermitian due to 1 1 A∗ = (A A∗)∗ = (A∗ A)= A . skHer 2 − 2 − − skHer 8Note that, for K = R, A skHerm(X) GL(X) can only exist if 1 d := dim X is even: If A skHerm(X), then det A =∈det(A∗) = det(∩ A)=( 1)ddet A, where we≤ have used Th. 10.24(b) and∈ Cor. 4.37(c). Thus, for d odd, Re(det A) =− 0 (implying− A / GL(X) for K = R) and, for d even, Im(det A) = 0 (i.e. det A R). ∈ ∈ 10 VECTOR SPACES WITH INNER PRODUCTS 162

1 1 ∗ 1 1 ∗ (b): While AHer + AskHer = 2 A + 2 A + 2 A 2 A = A is immediate; if A = S + B with S Herm(X) and B skHerm(X), then − ∈ ∈ A + A = A = S + B A S = B A , Her skHer ⇒ Her − − skHer where A S Herm(X) by Prop. 10.31(c) and B A skHerm(X) by Prop. Her − ∈ − skHer ∈ 10.32(a), showing AHer = S and AskHer = B by Prop. 10.32(b).

(c): If A = AHer, then A is Hermitian by (a); if A = AskHer, then A is skew-Hermitian by (b). If A is Hermitian, then A = A A Herm(X) skHerm(X)= 0 , showing skHer − Her ∈ ∩ { } A = AHer. If A is skew-Hermitian, then AHer = A AskHer Herm(X) skHerm(X)= 0 , showing A = A . − ∈ ∩  { } skHer Proposition 10.34. Let X, , be an inner product space over C, dim X = n N, h· ·i ∈ with ordered orthonormal basis B := (x1,...,xn). Let denote the induced norm.  k · k ∗ ∗ Moreover, let H (X,X), and let M =(mkl) (n, C), M =(mkl) (n, C) be the respective matrices∈L of H and H∗ with respect∈M to B. Then the following∈M statements are equivalent:

(i) H and M are Hermitian.

(ii) x,Hx R for each x X. h i∈ ∈ (iii) x∗Mx R for each x Cn. ∈ ∈ Caveat: This result does not extend to finite-dimensional inner product spaces over R: Over R, (ii) and (iii) hold for every H (X,X), M (n, C), even though, for n 2, not every such H,M is symmetric∈ (cf. L Ex. 10.20).∈ M ≥ Proof. “(i) (ii)”: If H is Hermitian and x X, then ⇒ ∈ (10.14) x,Hx = Hx,x = x,H∗x = x,Hx , h i h i h i h i showing x,Hx R. h i∈ “(ii) (i)”: We have, for each x X, ⇒ ∈ (ii) (H∗ H)x,x = H∗x,x Hx,x = x,Hx x,Hx = x,Hx x,Hx =0. h − i h i−h i h i− h i h i−h i Thus, by Prop. 10.19(b), we have H∗ H = 0 and H∗ = H, proving (i). − Now consider X := Cn with the standard inner product , and let H (X,X) such that M (n, C) represents H with respect to the standardh· ·i basis.∈L Then, for each ∈ M x Cn, x∗Mx = Mx,x = x,Mx , showing the equivalence between (ii) and (iii).  ∈ h i h i 10 VECTOR SPACES WITH INNER PRODUCTS 163

The following Prop. 10.35 is related to Prop. 10.19, highlighting important differences between the complex and the real situation.

Proposition 10.35. Let X, , be a finite-dimensional inner product space over R, n N. h· ·i ∈  (a) If M =(m ) Sym (R) is such that kl ∈ n x∗Mx =0, (10.15) x∈∀Rn then M =0.

(b) If A Sym(X) is such that ∈ Ax, x =0, (10.16) x∈∀X h i then A =0.

(c) M (n, R) is skew-symmetric if, and only if, (10.15) holds true. ∈M (d) A (X,X) is skew-symmetric if, and only if, (10.16) holds true. ∈L Proof. (a): As in the proof of Prop. 10.19, we obtain from matrix multiplication

n n t x Mx = mklxkxl, Xk=1 Xl=1 t where choosing x := eα, α 1,...,n , yields 0 = eαMeα = mαα; whereas choosing x := e + e for α = β with∈α, { β 1,...,n} yields 0 = m + m = 2m , showing α β 6 ∈ { } αβ βα αβ mαβ = mβα = 0 and M = 0. (b): Let n := dim X N. According to Th. 10.16(c), there exists a linear isomorphism I : X Rn such that∈ −→ x,y = Ix,Iy 2, x,y∀∈X h i h i n where , 2 denotes the standard inner product on R . Moreover, we then also know It = I−h·1 from·i Th. 10.22(j). Thus, if we let B := I A I−1, then B Sym(Rn), since Bt =(I−1)t At It = B. Moreover, ◦ ◦ ∈ ◦ ◦ −1 −1 −1 −1 Bu,u 2 = (I A I )u, (I I )u = (A(I u),I u =0. u∈∀Rn h i ◦ ◦ ◦ 2

Now, if M (n, R) represents B with respect to the standard basis of Rn, then M is symmetric∈ and, M for each u Rn, utMu = Mu,u = Bu,u = 0, such that M =0 ∈ h i2 h i2 by (a). Thus, B = 0, also implying A = I−1 B I = 0. ◦ ◦ 10 VECTOR SPACES WITH INNER PRODUCTS 164

(d): If A is skew-symmetric, then, for each x X, ∈ Ax, x = x, Atx = x, Ax = Ax, x , h i h i −h i −h i

proving 2 Ax, x = 0 and (10.16). Conversely, if (10.16), then, since AskHer is skew- symmetric,h i

0= Ax, x = (AHer + AskHer)x,x = AHerx,x + AskHerx,x = AHerx,x . x∈∀X h i h i h i h i h i

As AHer is symmetric, (b) implies AHer = 0, i.e. A = AskHer is skew-symmetric. (c): If M is skew-symmetric, then, with respect to the standard basis of Rn (which is orthonormal with respect to the standard inner product on Rn), M represents a skew-symmetric map A (Rn, Rn). As A satisfies (10.16) by (d), M satisfies (10.15). Conversely, if M satisfies∈L (10.15) and A is as before, then A satisfies (10.16) and is, thus, skew-symmetric by (d), implying M to be skew-symmetric as well. 

Proposition 10.36. Let X, , be an inner product space over K, dim X = n N, h· ·i ∈ with ordered orthonormal basis B := (x1,...,xn). Let denote the induced norm.  k · k ∗ ∗ Moreover, let U (X,X), and let M =(mkl) (n, K), M =(mkl) (n, K) be the respective matrices∈L of U and U ∗ with respect∈M to B. Then the following∈M statements are equivalent:

(i) U and M are unitary.

(ii) U ∗ and M ∗ are unitary.

(iii) The columns of M form an orthonormal basis of Kn with respect to the standard inner product on Kn.

(iv) The rows of M form an orthonormal basis of Kn with respect to the standard inner product on Kn.

(v) M t is unitary.

(vi) M is unitary.

(vii) Ux,Uy = x,y holds for each x,y X. h i h i ∈ (viii) Ux = x for each x X. k k k k ∈ Proof. “(i) (ii)”: U −1 = U ∗ is equivalent to Id = UU ∗, which is equivalent to (U ∗)−1 = U =(U ∗)∗.⇔ 10 VECTOR SPACES WITH INNER PRODUCTS 165

“(i) (iii)”: M −1 = M ∗ implies ⇔

m1k m1j n n . . ∗ 0 for k = j, . . = mlk mlj = mjl mlk = 6 (10.17)   ·   l=1 l=1 (1 for k = j, mnk mnj     X X     showing that the columns of M form an orthonormal basis of Kn with respect to the standard inner product on Kn. Conversely, if the columns of M form an orthonormal basis of Kn with respect to the standard inner product, then they satisfy (10.17), which implies M ∗M = Id. “(i) (iv)”: M −1 = M ∗ implies ⇔

mk1 mj1 n n . . ∗ 0 for k = j, . . = mkl mjl = mkl mlj = 6 (10.18)   ·   l=1 l=1 (1 for k = j, mkn mjn     X X     showing that the rows of M form an orthonormal basis of Kn with respect to the standard inner product on Kn. Conversely, if the rows of M form an orthonormal basis of Kn with respect to the standard inner product, then they satisfy (10.18), which implies M ∗M = Id. “(i) (v)”: Since the rows of M are the columns of M t, the equivalence of (i) and (v) is immediate⇔ from (iii) and (iv). “(i) (vi)”: Since M =(M ∗)t, the equivalence of (i) and (vi) is immediate from (ii) and (v).⇔ “(i) (vii)” is a special case of what was shown in Th. 10.22(i). ⇔ “(vii) (viii)” holds due to Th. 10.8(e).  ⇔ Corollary 10.37. Let X, , be a finite-dimensional inner product space over R let f : X X. Then f is anh· isometry·i (e.g. a Euclidean isometry of Rn) if, and only if,  f = L −→+ a with an orthogonal map L O(X) and a X. ∈ ∈

Proof. We know each translation Tv : X X, T (x) := x + v to be an isometry by Ex. 10.9(a) and, clearly, compositions of isometries−→ are isometries.

“ ”: If f = L + a with L O(X) and a X, then f = Ta L, where L is an isometric linear⇐ isomorphism by Th.∈ 10.36(vii) (cf. Def.∈ 10.15). Thus, ◦f must be isometric as well. “ ”: If f is an isometry, then f is affine by Th. 10.8(d), i.e. f = L+a with L (X,X) and⇒ a X. Then L = f a = T f, showing L to be isometric. Thus, ∈LL must be ∈ − −a ◦ orthogonal by Th. 10.36.  10 VECTOR SPACES WITH INNER PRODUCTS 166

In preparation for results on the diagonalizability of normal, Hermitian, and unitary maps, we prove the following proposition: Proposition 10.38. Let X, , be an inner product space over K, dim X = n N, and A Nor(X). h· ·i ∈ ∈  (a) If 0 = x X is an eigenvector to the eigenvalue λ K of A, then x is also an eigenvector6 ∈ to the eigenvalue λ of A∗. ∈ (b) If U is an A-invariant subspace of X, then A(U ⊥) U ⊥ and A∗(U) U. ⊆ ⊆ Proof. (a): It suffices to show A∗x λx, A∗x λx = 0. To this end, we use Ax = λx to compute h − − i A∗x λx, A∗x λx = A∗x, A∗x λ x, A∗x λ A∗x,x + λ λ x,x h − − i h i− h i− h i h i = AA∗x,x λ Ax, x λ x, Ax + λ λ x,x h i− h i− h i h i = A∗Ax, x λ λ x,x λ λ x,x + λ λ x,x h i− h i− h i h i = Ax, Ax λ λ x,x =0, h i− h i thereby establishing the case.

(b): Let dim U = m N and let BU := u1,...,um be an orthonormal basis of U. As U is A-invariant, there∈ exist a K such{ that } kl ∈ m

Aul = akl uk. l∈{1,...,m∀ } Xk=1 Define m ∗ xl := A ul alk uk. l∈{1,...,m∀ } − k=1 ∗ X To show the A -invariance of U, it suffices to show that, for each l 1,...,m , xl = 0, i.e. x ,x = 0. To this end, for each l 1,...,m , we compute ∈ { } h l li ∈ { } m m m m x ,x = A∗u ,A∗u a A∗u ,u a u ,A∗u + a a u ,u h l li h l li− lk h l ki− lk h k li lj lk h k ji k=1 k=1 j=1 k=1 Xm mX m X X = A∗Au ,u a u , Au a Au ,u + a 2 h l li− lk h l ki− lk h k li | lk| k=1 k=1 k=1 m mXm X m m X m = a 2 a a u ,u a a u ,u + a 2 | kl| − jk lk h l ji− jk lk h j li | lk| k=1 k=1 j=1 k=1 j=1 k=1 Xm Xm X m Xm X m mX = a 2 a 2 a 2 + a 2 = a 2 a 2, | kl| − | lk| − | lk| | lk| | kl| − | lk| Xk=1 Xk=1 Xk=1 Xk=1 Xk=1 Xk=1 10 VECTOR SPACES WITH INNER PRODUCTS 167

implying m x ,x =0. h l li Xl=1 As xl,xl 0 for each l 1,...,m , this implies the desired xl,xl = 0 for each l h1,...,mi ≥ , thereby proving∈ { A∗(U) } U. We will now make useh of thisi result to also show∈ { A(U ⊥) } U ⊥: Let u U and x ⊆U ⊥. Then ⊆ ∈ ∈ ∗ u, Ax = A∗u,x A =u∈U 0, h i h i proving Ax U ⊥.  ∈ Theorem 10.39. Let X, , be an inner product space over C, dim X = n N. h· ·i ∈  (a) The following statements are equivalent for A (X,X): ∈L (i) A Nor(X). ∈ (ii) There exists an orthonormal basis B of X, consisting of eigenvectors of A. (iii) There exists f C[X], deg f n 1, such that A∗ = ǫ (f), where, as before, ∈ ≤ − A ǫA : C[X] (X,X) denotes the substitution homomorphism introduced in Def. and Rem.−→ L7.10.

In particular, each A Nor(X) is diagonalizable. ∈ (b) The following statements are equivalent for M (n, C): ∈M (i) M Nor (C). ∈ n −1 (ii) There exists a unitary matrix U Un(C) such that D = U MU is a diagonal matrix. ∈

Proof. (a): “(i) (ii)”: We prove the existence of the orthonormal basis of eigenvectors ⇒ via induction on n N. For n = 1, there is nothing to prove. Thus, let n > 1. As C is algebraically closed,∈ there exists λ σ(A). Let 0 = v X be a corresponding eigenvector and U := span v . According∈ Prop. 10.38(b),6 ∈ both U and U ⊥ are A- { } ∗ invariant. Moreover, A ↾U ⊥ is normal, since, if A and A commute on X, they also commute on the subspace U ⊥. Thus, by induction hypothesis, U ⊥ has an orthonormal basis B′, consisting of eigenvectors of A. Thus, X also has an orthonormal basis, consisting of eigenvectors of A.

“(ii) (iii)”: Let v1,...,vn be an orthonormal basis of X, consisting of eigenvectors of A, where⇒ Av = λ{ v , i.e. λ },...,λ C are the corresponding eigenvalues of A. Using j j j 1 n ∈ 10 VECTOR SPACES WITH INNER PRODUCTS 168

Lagrange interpolation according to [Phi21, Th. 3.4], let f := n−1 f Xk C[X] with k=0 k ∈ f0,...,fn−1 C be a polynomial of degree at most n 1, satisfying ∈ − P

ǫλj (f)= λj, (10.19) j∈{1∀,...,n}

and define B := ǫA(f) (if all eigenvalues are distinct, then f is uniquely determined by (10.19), otherwise, there exist infinitely many such polynomials, all resulting in the ∗ n same map B = A , see below). Then, for each y := j=1 βjvj with β1,...,βn C, we obtain ∈ P n−1 n n n−1 n n−1 k k k By = fkB βjvj = βj fkB vj = βj fkλj vj k=0 ! j=1 ! j=1 k=0 ! j=1 k=0 ! nX X X X X X (10.19) = βjλjvj. j=1 X As we also have, for each x := n α v with α ,...,α C, that Ax = n α λ v , j=1 j j 1 n ∈ j=1 j j j we conclude P P n n n n n x,By = α v , β λ v = α λ β = α λ v , β v = Ax, y , h i j j k k k j j j j j j k k h i * j=1 + j=1 * j=1 + X Xk=1 X X Xk=1 proving B = A∗ by Th. 10.22(a). n−1 k “(iii) (i)”: According to (iii), there exists f := k=0 fkX C[X] with f0,...,fn−1 ⇒ ∗ ∈ ∈ C such that A = ǫA(f). Thus, P n−1 n−1 ∗ k k ∗ AA = A fkA = fkA A = A A, v∈∀X ! Xk=0 Xk=0 proving A Nor(X). ∈ n (b): “(i) (ii)”: Let M Norn(C). We consider M as a normal map on C with the standard⇒ inner product ∈, , where the standard basis of Cn constitutes an orthonormal h· ·i basis. Then we know there exists an ordered orthonormal basis B := (x1,...,xn) of Cn such that, with respect to B, M has the diagonal matrix D. Thus, there exists U =(u ) GL (C) such that D = U −1MU and kl ∈ n n

xl = ukl ek. l∈{1∀,...,n} Xk=1 Then n n n

ukl ukj = ukl umj ek,em = xl,xj = δlj, l,j∈{∀1,...,n} h i h i m=1 Xk=1 Xk=1 X 10 VECTOR SPACES WITH INNER PRODUCTS 169

showing the columns of U to be orthonormal and U to be unitary. −1 “(ii) (i)”: Assume there exists a unitary matrix U Un(C) such that D = U MU is a diagonal⇒ matrix. Then M = UDU −1 and ∈

∗ −1 −1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ MM = UDU (U ) D U = UD Idn D U = UD Idn DU =(U −1)∗D∗U ∗UDU −1 = M ∗M, proving M Nor (C).  ∈ n Example 10.40. Consider R2 with the standard inner product. We already know from 0 1 Ex. 6.5(a) that M := has no real eigenvalues (the characteristic polynomial is 1− 0 2   χM = X +1). On the other hand, M is unitary (in particular, normal), showing, one can not expect Th. 10.39 to hold with C replaced by R. Theorem 10.41. Let X, , be an inner product space over K, dim X = n N, and A Herm(X). Then h·σ(·iA) R and there exists an orthonormal basis B of∈ X,  consisting∈ of eigenvectors of A. In⊆ particular, A is diagonalizable. Moreover, if M −1 ∈ Hermn(C), then there exists a unitary matrix U Un(C) such that D = U MU is a diagonal matrix. Also, in particular, for K = R,∈ each A Sym(X) is diagonalizable, ∈ and, if M Symn(R), then there exists an orthogonal matrix U On(R) such that D = U −1MU∈ is a diagonal matrix. ∈

Proof. Let λ σ(A) and let 0 = x X be a corresponding eigenvector. Then, according to Prop. 10.38(a),∈ λx = Ax =6 A∗x∈= λx, showing λ = λ and λ R. As A Herm(X) implies A to be normal, the case K = C is now immediate from∈ Th. 10.39(a)(ii).∈ It remains to consider K = R. First, consider n = 2. If B0 := (x1,x2) is an ordered orthonormal basis of X, then the matrix M (2, R) of A with respect to B0 must have the form ∈ M a b M = b c   with a,b,c R. Thus, the characteristic polynomial is ∈ χ =(X a)(X c) b2 = X2 (a + c)C + ac b2 A − − − − − with the zeros

a + c (a + c)2 a + c (a c)2 +4b2 λ = ac + b2 = − R, 2 ± r 4 − 2 ± r 4 ∈ showing A to be diagonalizable in this case. The rest of the proof is now conducted analogous to the proof of implication “(i) (ii)” of Th. 10.39(a): We prove the existence ⇒ 10 VECTOR SPACES WITH INNER PRODUCTS 170

of the orthonormal basis of eigenvectors via induction on n N. For n = 1, there is nothing to prove. Thus, let n > 1. According to Prop. 8.16(a),∈ there exists an A- invariant subspace W of X such that dim W 1, 2 . If dim W = 1, then A has an eigenvalue λ and a corresponding eigenvector 0∈= { v } X. If dim W = 2, then A↾ is, 6 ∈ W clearly, also Hermitian, and the above-considered case n = 2 yields that A↾W (and, thus, A) has an eigenvalue λ and a corresponding eigenvector 0 = v X. Let U := span v . ⊥ 6 ∈ { } According Prop. 10.38(b), both U and U are A-invariant. Moreover, A ↾U ⊥ is also Hermitian and, by induction hypothesis, U ⊥ has an orthonormal basis B′, consisting of eigenvectors of A. Thus, X also has an orthonormal basis, consisting of eigenvectors n of A. Now let M Symn(R). We consider M as a symmetric map on R with the standard inner product∈ , , where the standard basis of Rn constitutes an orthonormal h· ·i basis. Then we know there exists an ordered orthonormal basis B := (x1,...,xn) of Rn such that, with respect to B, M has the diagonal matrix D. Thus, there exists U =(u ) GL (R) such that D = U −1MU and kl ∈ n n

xl = ukl ek. l∈{1∀,...,n} Xk=1 Then n n n

ukl ukj = ukl umj ek,em = xl,xj = δlj, l,j∈{∀1,...,n} h i h i m=1 Xk=1 Xk=1 X showing the columns of U to be orthonormal and U to be orthogonal. 

The following commutation theorem extends to continuous linear operators on infinite- dimensional Hilbert spaces (see, e.g., [Rud73, Th. 12.16]).

Theorem 10.42 (Fuglede). Let X, , be a finite-dimensional inner product space over K. If A (X,X) and N Nor(h·X·i), then AN = NA implies AN ∗ = N ∗A. The ∈ L ∈  analogous result also holds for matrices A (n, K), N Nor (K), n N. ∈M ∈ n ∈ Proof. First, we consider the case K = C: According to Th. 10.39(a)(iii), there exists a ∗ polynomial f C[X] such that N = ǫN (f). Clearly, AN = NA implies A to commute ∈ ∗ with powers of N and, thus, with N = ǫN (f). If A (n, C), N Norn(C), then, with n ∈M n ∈ n respect to the standard basis of C , A represents a map fA (C , C ) and N represents n ∈L a map fN Nor(C ). Then AN = NA implies fAfN = fN fA, which, as already shown, ∈ ∗ ∗ ∗ ∗ ∗ implies fA(fN ) =(fN ) fA, which, in turn, implies AN = N A (since AN represents ∗ ∗ ∗ fA(fN ) and N A represents (fN ) fA). For matrices, the case K = R is an immediate special case of the case K = C. Now, if K = R, A (X,X) and N Nor(X), ∈ L ∈ then there exists n N and an ordered orthonormal basis B := (v1,...,vn) of X such that, with respect to∈ B, A is represented by M (n, R) and N is represented by A ∈ M 10 VECTOR SPACES WITH INNER PRODUCTS 171

MN Norn(C). Then AN = NA implies MAMN = MN MA, which, as already shown, ∈ ∗ ∗ ∗ ∗ ∗ implies MA(MN ) = (MN ) MA, which, in turn, implies AN = N A (since MA(MN ) ∗ ∗ ∗ represents AN and (MN ) MA represents N A).  Proposition 10.43. Let X, , be a finite-dimensional inner product space over K. If A Nor(X) GL(X), thenh·A−·i1 Nor(X) GL(X). Let A, B Nor(X) and λ K.  Then∈λA Nor(∩ X). If AB = BA∈, then A +∩ B Nor(X) and∈AB Nor(X).∈ The analogous∈ results also hold for normal matrices. ∈ ∈

Proof. If A Nor(X) GL(X), then ∈ ∩ A−1(A−1)∗ = A−1(A∗)−1 =(A∗A)−1 =(AA∗)−1 =(A−1)∗A−1, proving A−1 Nor(X) GL(X). Now let A, B Nor(X) and λ K. Then ∈ ∩ ∈ ∈ λA(λA)∗ = λAλA∗ = λA∗λA =(λA)∗λA shows λA Nor(X). If AB = BA, then ∈ AB(AB)∗ = AB(BA)∗ = ABA∗B∗ Th.= 10.42 AA∗BB∗ = A∗AB∗B Th.= 10.42 A∗B∗AB =(BA)∗AB =(AB)∗AB, (A + B)(A + B)∗ = (A + B)(A∗ + B∗)= AA∗ + AB∗ + BA∗ + BB∗ Th.= 10.42 A∗A + B∗A + A∗B + B∗B = (A∗ + B∗)(A + B)=(A + B)∗(A + B), showing AB Nor(X) and A + B Nor(X). Let us provide another proof of the normality of AB∈ and A + B that does∈ not make use of Fuglede’s Th. 10.42, but uses the diagonalizability of A, B more directly (for K = C, this works directly; for K = R, one can, e.g., extend X to a vector space Y over C, also extending A, B to Y in a natural way): Since AB = BA, we know from Th. 6.8 that A, B are simultaneously diagonalizable, i.e. there exists a basis v1,...,vn of X (n N) such that there exist α ,...,α , β ,...,β C with { } ∈ 1 n 1 n ∈

Avj = αjvj Bvj = βjvj . j∈{1∀,...,n} ∧   Moreover, according to Prop. 10.38(a), we then also have

∗ ∗ A vj = αjvj B vj = βjvj . j∈{1∀,...,n} ∧   Thus,

∗ ∗ (AB)(AB) vj = αjβjβjαjvj = βjαjαjβjvj =(AB) (AB)vj, j∈{1∀,...,n} 11 DEFINITENESS OF QUADRATIC MATRICES OVER K 172

proving AB Nor(X). In the same way, one also sees B∗A = AB∗ and A∗B = BA∗ such that the∈ computation from above, once again, shows (A + B)(A∗ + B∗)=(A + B)∗(A + B). 

11 Definiteness of Quadratic Matrices over K

Definition 11.1. Let n N and let A =(a ) (n, K). ∈ kl ∈M (a) A is called positive semidefinite if, and only if, x∗Ax R+ for each x Kn. ∈ 0 ∈ (b) A is called positive definite if, and only if, A is positive semidefinite and x∗Ax = 0 x = 0, i.e. if, and only if, x∗Ax > 0 for each 0 = x Kn. ⇔ 6 ∈ (c) A is called negative semidefinite if, and only if, x∗Ax R− for each x Kn. ∈ 0 ∈ (d) A is called negative definite if, and only if, A is negative semidefinite and x∗Ax = 0 x = 0, i.e. if, and only if, x∗Ax < 0 for each 0 = x Kn. ⇔ 6 ∈ (e) A is called indefinite if, and only if, A is neither positive semidefinite nor negative semidefinite, i.e. if, and only if,

x∗Ax / R x∗Ax R+ y∗Ay R− . x∈∃Kn ∈ ∨ x,y∃∈Kn ∈ ∧ ∈      Lemma 11.2. Let n N and let A =(a ) (n, K). ∈ kl ∈M (a) A is positive definite (positive semidefinite) if, and only if, A is negative definite (negative semidefinite). −

(b) A is indefinite if, and only if, A is indefinite. − Proof. The equivalences are immediate from Def. 11.1, since, for each x Kn, x∗Ax > 0 x∗( A)x< 0, x∗Ax =0 x∗( A)x = 0, x∗Ax R x∗( A)x∈ R.  ⇔ − ⇔ − ∈ ⇔ − ∈ Theorem 11.3. Let n N and let A =(a ) (n, C). ∈ kl ∈M (a) The following statements are equivalent:

(i) A is positive semidefinite (resp. positive definite). (ii) A is Hermitian and all eigenvalues of A are nonnegative (resp. positive) real numbers. 11 DEFINITENESS OF QUADRATIC MATRICES OVER K 173

Moreover, if A is positive semidefinite (resp. positive definite), then det A 0 (resp. det A> 0). ≥ (b) The following statements are equivalent: (i) A is negative semidefinite (resp. negative definite). (ii) A is Hermitian and all eigenvalues of A are nonpositive (resp. negative) real numbers. Moreover, if A is negative semidefinite (resp. negative definite), then det A 0 (resp. det A> 0) for n even and det A 0 (resp. det A< 0) for n odd. ≥ ≤ (c) The following statements are equivalent: (i) A is indefinite. (ii) A is not Hermitian, or A is Hermitian and A has at least one positive and one negative eigenvalue.

Proof. (a): If A is positive semidefinite, then A is Hermitian by Prop. 10.34 and, thus, by Th. 10.41, all eigenvalues of A are real. If λ R is an eigenvalue of A and x Cn 0 is a corresponding eigenvector, then ∈ ∈ \{ } 0 x∗Ax = x∗λx = λ x 2, ≤ k k2 2 + where the inequality is strict in the case where A is positive definite. As x 2 R , + + k k ∈ λ R0 and even λ R if A is positive definite. Then det A 0 (resp. det A> 0) also follows,∈ as det A is the∈ product of the eigenvalues of A (cf. Cor.≥ 8.5). Now assume A to be Hermitian with only nonnegative (resp. positive) eigenvalues λ ,...,λ R and let 1 n ∈ v1,...,vn be a corresponding orthonormal basis of eigenvectors (i.e. Avj = λjvj and {∗ } n n vj vk = δjk). Then, for each x C , there exist α1,...,αn C such that x = j=1 αjvj, implying ∈ ∈ P n ∗ n n n ∗ ∗ ∗ x Ax = αkvk A αjvj = αkvk αjλjvj ! j=1 ! ! j=1 ! Xk=1 X Xk=1 X n n = α∗α λ = α 2 λ , k k k | k| k Xk=1 Xk=1 showing x∗Ax R+ with x∗Ax R+, for λ ,...,λ R+ and x = 0. Thus, A is ∈ 0 ∈ 1 n ∈ 6 positive semidefinite and even positive definite if all λj are positive. (b) follows by combining (a) with Lem. 11.2. (c) is an immediate consequence of (a) and (b).  11 DEFINITENESS OF QUADRATIC MATRICES OVER K 174

Theorem 11.4. Let n N and let A =(a ) (n, R). ∈ kl ∈M (a) The following statements are equivalent: (i) A is positive semidefinite (resp. positive definite).

(ii) The symmetric part Asym of A is positive semidefinite (resp. positive definite).

(iii) All eigenvalues of Asym are nonnegative (resp. positive) real numbers. Moreover, if A is positive semidefinite (resp. positive definite), then det A 0 sym ≥ (resp. det Asym > 0). (b) The following statements are equivalent: (i) A is negative semidefinite (resp. negative definite).

(ii) The symmetric part Asym of A is negative semidefinite (resp. negative definite).

(iii) All eigenvalues of Asym are nonpositive (resp. negative) real numbers.

Moreover, if A is negative semidefinite (resp. negative definite), then det Asym 0 (resp. det A > 0) for n even and det A 0 (resp. det A < 0) for n odd.≥ sym sym ≤ sym (c) The following statements are equivalent: (i) A is indefinite.

(ii) Asym is indefinite.

(iii) Asym has at least one positive and one negative eigenvalue.

Proof. (a): Since A = AHer +AskHer = Asym +Askew by Prop. 10.33(b), “(i) (ii)” holds ∗ ∗ ∗ n⇔ as Prop. 10.35(c) yields x Askewx = 0 and x Ax = x Asymx for each x R . Since Asym is Hermitian, “(ii) (iii)” is due to Th. 11.3(a). ∈ ⇔ (b) follows by combining (a) with Lem. 11.2. (c) is an immediate consequence of (a) and (b).  Notation 11.5. If A =(a ) is an n n matrix, n N, then, for 1 k l n, let ij × ∈ ≤ ≤ ≤

akk ... akl kl . .. . A :=  . . .  (11.1) alk ... all     denote the (1 + l k) (1 + l k) principal submatrix of A, i.e. − × − kl aij := ai+k−1,j+k−1. (11.2) k,l∈{1∀,...,n}, i,j∈{1,...,∀1+l−k} 1≤k≤l≤n A MULTILINEAR MAPS 175

Proposition 11.6. Let A =(aαβ) (n, K), n N. Then A is positive (semi-)definite if, and only if, every principal submatrix∈M Akl ∈of A, 1 k l n, is positive (semi-)definite. ≤ ≤ ≤

Proof. If all principal submatrices of A are positive (semi-)definite, then, as A = Ann, A is positive (semi-)definite. Now assume A to be positive (semi-)definite and fix k,l 1+l−k ∈n 1,...,n with 1 k l n. Let x =(xk,...,xl) K 0 and extend x to K {by 0, calling} the extended≤ ≤ ≤ vector y: ∈ \ { }

n xα for k α l, y =(y1,...,yn) K 0 , yα = ≤ ≤ (11.3) ∈ \ { } (0 otherwise.

We now consider x and y as column vectors and compute

l n (≥) ∗ kl x A x = aαβxαxβ = aαβyαyβ > 0, (11.4) α,βX=k α,βX=1 showing Akl to be positive (semi-)definite. 

Theorem 11.7. Let m,n N and A (m,n, C). Then A∗A (n, C) is Hermi- tian and positive semidefinite.∈ For m =∈n Mand det A =0, A∗A is even∈ M positive definite. 6 Proof. That A∗A is Hermitian is due to

(A∗A)∗ = A∗(A∗)∗ = A∗A.

n ∗ ∗ ∗ 2 + ∗ Moreover, if x K , then x A Ax = (Ax) (Ax) = Ax 2 R0 , showing A A to be positive semidefinite.∈ If m = n and det A = 0, then,k byk Th.∈ 10.24(b), det(AA∗) = det A det A = det A 2 R+. Thus, 0 is not6 an eigenvalue of A∗A and A∗A must be positive· definite| according| ∈ to Th. 11.3(a)(ii). 

A Multilinear Maps

Theorem A.1. Let V and W be vector spaces over the field F , α N. Then, as vector spaces over F , (V, α−1(V,W )) and α(V,W ) are isomorphic via∈ the isomorphism L L L Φ: (V, α−1(V,W )) α(V,W ), L L −→ L Φ(L)(x1,...,xα) := L(x1)(x2,...,xα). (A.1) B POLYNOMIALS IN SEVERAL VARIABLES 176

Proof. Since L is linear and L(x1) is (α 1) times linear, Φ(L) is, indeed, an element of α(V,W ), showing that Φ is well-defined− by (A.1). Next, we verify Φ to be linear: If λ LF and K,L (V, α−1(V,W )), then ∈ ∈L L Φ(λL)(x1,...,xα)=(λL)(x1)(x2,...,xα)= λ(L(x1)(x2,...,xα)) = λΦ(L)(x1,...,xα) and

Φ(K + L)(x1,...,xα)=(K + L)(x1)(x2,...,xα)=(K(x1)+ L(x1))(x2,...,xα) = K(x1)(x2,...,xα)+ L(x1)(x2,...,xα) = Φ(K)(x1,...,xα)+Φ(L)(x1,...,xα) = (Φ(K)+Φ(L))(x1,...,xα), proving Φ to be linear. Now we show Φ to be injective. To this end, we show that, if L = 0, then Φ(L) = 0. If L = 0, then there exist x1,...,xα V such that L(x1)(6 x2,...,xα) = 0, showing6 that6 Φ(L) = 0 as needed. To verify Φ is∈ also surjective, let K α(V,W ).6 Define L : V α−16(V,W ) by letting ∈L −→ L L(x1)(x2,...,xα) := K(x1,...,xα). (A.2)

Then, clearly, for each x1 V , L(x1) α−1(V,W ). Moreover, L is linear, i.e. L (V, α−1(V,W )). Comparing∈ (A.2) with∈ L (A.1) shows Φ(L) = K, i.e. Φ is surjective,∈ completingL L the proof. 

B Polynomials in Several Variables

Let (R, +, ) be a commutative ring with unity. According to Def. B.4, polynomials in one variable· over R are precisely the linear combinations of monomials X0,X1,X2,... Similarly, we now want polynomials in two variables over R to be the linear combinations k l l k of monomials of the form X1 X2 = X2 X1 . The generalization to finitely many variables X1,...,Xn and even to infinitely many variables (Xi)i∈I (where I is an arbitrary infinite set and the monomials are finite products of the Xi) is then straightforward. We will actually present a construction of polynomials that is even more general, namely the construction of M-polynomials, where M is a commutative monoid (cf. Def. B.1 below): Knowing this general construction is useful if one wants to pursue the study of Algebra further, it comes at virtually no extra difficulty, and it elegantly includes all types of polynomials mentioned above. We will define M-polynomials in Def. B.4 and we will see how polynomials in finitely many variables as well as in infinitely many variables arise as special cases in Ex. B.8(a)-(c). B POLYNOMIALS IN SEVERAL VARIABLES 177

Definition B.1. (a) A semigroup (M, ) is called a monoid if, and only if, there exists a neutral element e M (thus, a magma◦ (M, ) is a monoid if, and only if, is associative and M contains∈ a neutral element). ◦ ◦

(b) Let (M, ) be a monoid, = U M. We call U a submonoid of M if, and only if, (U, )◦ forms a monoid,∅ 6where⊆ the composition on U is the restriction of the composition◦ on M, i.e. ↾ . ◦ U×U Lemma B.2. Let (M, ) be a monoid, = U M. Then U is a submonoid of M if, ◦ ∅ 6 ⊆ and only if, (i) and (ii) hold, where

(i) For each u,v U, one has u v U. ∈ ◦ ∈ (ii) e U, where e denotes the neutral element of M. ∈ Proof. If (U, ) is a monoid, then, clearly, it must satisfy (i) and (ii). Thus, we merely need to show◦ that (i) and (ii) are sufficient for (U, ) to be a monoid. According to (i), maps U U into U. As is associative on M, it◦ is also associative on U and, thus, (ii)◦ yields (×U, ) to be a monoid.◦  ◦

Example B.3. (a) (N0, +) constitutes a commutative monoid. (b) Let (M, ) be a monoid with neutral element e M and let I be a set. Then, by [Phi19, Th.· 4.9(e)], (I,M) = M I becomes a monoid,∈ if is defined pointwise on M I , where is also commutativeF on M I if it is commutative· on M. A submonoid of (M I , ) is· given by (M I , ), where, as in [Phi19, Ex. 5.16(c)], M I denotes the · fin · fin set of functions f : I M such that there exists a finite set I I, satisfying −→ f ⊆ f(i)= e for each i I I , (B.1a) ∈ \ f f(i) = e for each i I : (B.1b) 6 ∈ f I I I Indeed, Mfin is a submonoid of M : If f,g Mfin, then Ifg If Ig, showing fg M I ; f M I for f e. ∈ ⊆ ∪ ∈ fin e ∈ fin e ≡ (c) Let I be a set and n N. By combining (a) and (b), we obtain the commutative n ∈ I I monoids ((N0) , +), ((N0) , +), ((N0)fin, +). Definition B.4. Let (R, +, ) be a commutative ring with unity and let (M, +) be a commutative monoid. We call·

R[M] := RM = (f : M R): #f −1(R 0 ) < (B.2) fin −→ \ { } ∞  B POLYNOMIALS IN SEVERAL VARIABLES 178

the set of M-polynomials over R. We then have the pointwise-defined addition and scalar multiplication on R[M], which it inherits from RM :

(f + g): M R, (f + g)(i) := f(i)+ g(i), f,g∈∀R[M] −→ (B.3) (λ f): M R, (λ f)(i) := λf(i), f∈R∀[M] λ∈∀R · −→ ·

where we know from [Phi19, Ex. 5.16(c)] that, with these compositions, R[M] forms a vector space over R, provided R is a field and, then, B = e : i M , where { i ∈ }

ei : M R, ei(j) := δij, i∈∀M −→ provides the standard basis of the vector space R[M]. In the current context, we will i now write X := ei and we will call these polynomials monomials. Furthermore, we define a multiplication on R[M] by letting

(a ) , (b ) (c ) := (a ) (b ) , i i∈M i i∈M 7→ i i∈M i i∈M · i i∈M (B.4) ci := ak bl := ak bl, 2 kX+l=i (k,l)∈MX: k+l=i where we note that, due to (B.2), only finitely many of the summands in the sum are nonzero. If f := (a ) R[M], then we call the a R the coefficients of f. i i∈M ∈ i ∈ i Remark B.5. In the situation of Def. B.4, using the notation X = ei, we can write ad- dition, scalar multiplication, and multiplication in the following, perhaps, more familiar- looking forms: If λ R, f = f Xi, g = g Xi (each f ,g R), then ∈ i∈M i i∈M i i i ∈ P P i f + g = (fi + gi) X , i∈M X i λf = (λfi) X , i∈M X i fg = fk gl X i∈M ! X kX+l=i (as in (B.4), due to (B.2), only finitely many of the summands in each sum are nonzero).

Theorem B.6. Let (R, +, ) be a commutative ring with unity and let (M, +) be a commutative monoid. Then· (R[M], +, ) forms a commutative ring with unity, where 1= X0 is the neutral element of multiplication.· B POLYNOMIALS IN SEVERAL VARIABLES 179

Proof. We already know from [Phi19, Ex. 4.9(e)] that (R[M], +) forms a commutative group. To verify associativity of multiplication, let a,b,c,d,f,g,h R[M], ∈

a := (ai)i∈M , b := (bi)i∈M , c := (ci)i∈M , d := (di)i∈M ,

f := (fi)i∈M , g := (gi)i∈M , h := (hi)i∈M , such that d := ab, f := bc, g := (ab)c, h := a(bc). Then, for each i M, ∈

gi = dk cl = am bn cl = am bn cl kX+l=i kX+l=i mX+n=k m+Xn+l=i = am bn cl = am fk = hi, mX+k=i nX+l=k mX+k=i proving g = h, as desired. To verify distributivity, let a,b,c,d,f,g R[M] be as before, ∈ but this time such that d := ab, f := ac, and g := a(b + c). Then, for each i M, ∈

gi = ak (bl + cl)= ak bl + ak cl = di + fi, kX+l=i kX+l=i kX+l=i proving g = d + f, as desired. To verify commutativity of multiplication, let a,b,c,d R[M] be as before, but this time such that c := ab, d := ba. Then, for each i M, ∈ ∈

ci = ak bl = bl ak = di, kX+l=i kX+l=i 0 proving c = d, as desired. Finally, if b := X , then b0 = 1and bi = 0 for each i M 0 , yielding, for c := ab and each i M, ∈ \{ } ∈

ci = ak bl = ak b0 = ai, kX+l=i kX+0=i showing X0 to be neutral and completing the proof.  Proposition B.7. If R is a commutative ring with unity and (M, +) is a commutative monoid, then R[M] is a ring extension of R via the unital ring monomorphism

ι : R R[M], ι(r) := rX0. −→ Proof. The map ι is unital, since ι(1) = X0; ι is a ring homomorphism, since, for each r, s R, ι(r+s)=(r+s)X0 = rX0 +sX0 = ι(r)+ι(s) and ι(rs)= rsX0 = rX0 sX0 = ι(r)∈ι(s); ι is injective, since, for r = 0, ι(r)= rX0 0. ·  6 6≡ Example B.8. Let (R, +, ) be a commutative ring with unity and let (M, +) be a · commutative monoid. B POLYNOMIALS IN SEVERAL VARIABLES 180

(a) N0-polynomials over R are polynomials in one variable over R as defined in Def. 7.1: For (M, +)=(N0, +), the definition of an M-polynomial over R according to Def. B.4 is precisely the same as that of a polynomial over R according to Def. 7.1, i.e. R[X]= R[N0].

n n (b) (N0) -polynomials are polynomials in n variables: If (M, +) = ((N0) , +), then one can interpret the X occurring in the monomial X(i1,...,in) with (i ,...,i ) (N )n 1 n ∈ 0 as the n-tuple of variables X = (X1,...,Xn) such that the monomial becomes X(i1,...,in) = Xi1 Xin . In consequence, it is also common to introduce the notation 1 ··· n n R[X1,...,Xn] := R[(N0) ].

I (c) (N0)fin-polynomials, where I is an arbitrary set, are polynomials in the variables I (Xi)i∈I (possibly infinitely many): If ν (N0)fin, then, consistently with the nota- −1 ∈ tion of Ex. B.3(b), we let Iν := ν (N), i.e. Iν is the finite set satisfying

ν(i)=0 foreach i I I , ∈ \ ν ν(i) =0 foreach i I . 6 ∈ ν I If (M, +) = ((N0)fin, +), then one can interpret the X occurring in the monomial ν I 0 X with ν (N0)fin as the (#Iν)-tuple of variables X =(Xi)i∈Iν (with X = X =1 ∈ ν ν(i) if Iν = ), such that the monomial becomes X = i∈Iν Xi . In consequence, it is also common∅ to introduce the notation Q I R[(Xi)i∈I ] := R[(N0)fin].

If J I is a finite subset of I, then the polynomial ring in finitely many variables ⊆ J R[(N0) ] is, clearly, isomorphic to the ring R[X1,...,X#J ] of (b). Moreover, we can J I view R[(N0) ] as a subring of R[(N0)fin] via the ring extension given by the unital ring monomorphism

ι : R[(N )J ] R[(N )I ], 0 −→ 0 fin

ι f Xν := f Xν˜ = f Xν(i),  ν  ν ν i J J J i∈J ν∈X(N0) ν∈X(N0) ν∈X(N0) Y   where, for each ν : J N , we define −→ 0

I ν(i) for i J, ν˜ (N0) , ν˜(i) := ∈ ∈ fin 0 for i I J. ( ∈ \ B POLYNOMIALS IN SEVERAL VARIABLES 181

ν I Next, for each f = ν∈(N )I fν X R[(N0)fin], we define the finite set 0 fin ∈ P f(I) := Iν. (B.5) ν∈(N )I : f 6=0 0[fin ν By definition of f(I), we have

f = f Xν R[(N )f(I)], ν ∈ 0 I(f) ν∈(XN0) showing I J R[(N0)fin]= R[(N0) ]. J⊆I:#[J<∞ Theorem B.9. Let R be a commutative ring with unity and let M be a commutative monoid. Moreover, let S be another commutative ring with unity and assume we have a unital ring homomorphism φ : R S as well as a homomorphism µ : (M, +) (S, ) with µ(0) = 1. Then the map −→ −→ ·

Φ: R[M] S, Φ f Xi := φ(f ) µ(i), (B.6) −→ i i i∈M ! i∈M X X

constitutes the unique ring homomorphism Φ : R[M] S that satisfies Φ ↾R= φ (considering R as a subset of R[M] due to Prop. B.7; in−→ particular, Φ is also unital) as well as Φ(Xi)= µ(i) (B.7) i∈∀M (one calls this kind of property a universal property to the polynomial ring R[M], as one can show it uniquely determines R[M] up to a canonical isomorphism, cf. [Bos13, Ch. 2.5]).

i i Proof. If Φ is defined by (B.6), then, for each f = i∈M fi X ,g = i∈M gi X R[M], one computes (recalling that all occurring sums are, actually, finite) ∈ P P

Φ(f + g)= φ(fi + gi) µ(i)= φ(fi)+ φ(gi) µ(i) i∈M i∈M X X  = φ(fi) µ(i)+ φ(gi) µ(i)=Φ(f)+Φ(g) i∈M i∈M X X B POLYNOMIALS IN SEVERAL VARIABLES 182

as well as

Φ(fg)= φ fk gl µ(i)= φ(fk) φ(gl) µ(i) i∈M ! i∈M ! X kX+l=i X kX+l=i 0∈M = φ(fk) φ(gl) µ(i) = φ(fk) φ(gl) µ(k + l) i∈M X kX+l=i k,lX∈M

= φ(fk) φ(gl) µ(k)µ(l)= φ(fk) µ(k) φ(gl) µ(l) = Φ(f)Φ(g), ! ! k,lX∈M kX∈M Xl∈M showing Φ to be a ring homomorphism. Next, for each r R, we have ∈ Φ(rX0)= φ(r) µ(0) = φ(r), showing Φ↾R= φ. Since φ is unital, Φ(Xi) = Φ(1 Xi)= φ(1)µ(i)=1 µ(i)= µ(i), i∈∀M · · proving (B.7). To prove uniqueness, let Ψ : R[M] S be an arbitrary ring homomorphism such that Ψ↾ = φ and Ψ(Xi)= µ(i) for each−→i M. Then, for each f = f Xi R[M], R ∈ i∈M i ∈ P i i i Ψ fi X = Ψ(fi)Ψ(X )= φ(fi) µ(i) = Φ fi X , i∈M ! i∈M i∈M i∈M ! X X X X establishing Ψ = Φ, as desired. 

From now on, we will restrict ourselves to the cases of Ex. B.8, where, actually, Ex. B.8(a) is a special case of Ex. B.8(b), and Ex. B.8(b), in turn, is a special case of Ex. I B.8(c). Thus, we restrict ourselves to M-polynomials, where (M, +) = ((N0)fin, +) and I may be an arbitrary set. The ring isomorphisms provided by the following Cor. B.10 sometimes allow one to establish properties for polynomial rings in finitely many variables via induction proofs. Corollary B.10. Let (R, +, ) be a commutative ring with unity and n N, n 2. Then · ∈ ≥ R[X1,...,Xn] and (R[X1,...,Xn−1])[Xn] are isomorphic via the ring isomorphism

Φ: (R[X1,...,Xn−1])[Xn] ∼= R[X1,...,Xn],

N ν k α Φ fν,kX Xn := fα X ,   n−1   n Xk=0 ν∈(XN0) α∈X(N0)     B POLYNOMIALS IN SEVERAL VARIABLES 183

where fν,k if α(n)= k and α↾1,...,n−1= ν, fα := (0 otherwise. Moreover, noting that we have the unital ring monomorphism (the map called ι in Ex. B.8(c))

ν ν 0 φ : R[X1,...,Xn−1] R[X1,...,Xn], φ fνX := fνX Xn, −→  n−1  n−1 ν∈(XN0) ν∈(XN0)   and the homomorphism

µ : (N , +) R[X ,...,X ], , µ(k) := Xk, 0 −→ 1 n · n

Φ is the unique ring homomorphism from (R[X1,...,X n−1])[Xn] into R[X1,...,Xn] with

Φ↾R[X1,...,Xn−1]= φ and k k Φ(Xn)= Xn. k∈∀N0

Proof. We apply Th. B.9 with R replaced by R[X1,...,Xn−1], M := N0, and S := R[X1,...,Xn]. We call the ring homomorphism given by (B.6) Ψ and show Ψ = Φ: If N ν k n−1 fν,kX X (R[X1,...,Xn−1])[Xn], then k=0 ν∈(N0) n ∈ P P  N N ν k ν Ψ fν,kX Xn = φ fν,kX µ(k)   n−1    n−1  Xk=0 ν∈(XN0) Xk=0 ν∈(XN0) N    N   ν k ν k = fν,kX Xn = Φ fν,kX Xn ,  n−1    n−1   Xk=0 ν∈(XN0) Xk=0 ν∈(XN0)       proving Ψ = Φ. Thus, it merely remains to show Φ is bijective. To this end, let N ν k f := n−1 fν,kX X (R[X1,...,Xn−1])[Xn]. If f = 0, then there k=0 ν∈(N0) n ∈ 6 n−1 n exist k N0 and ν (N0) such that fν,k = 0. Then, if α (N0) is such that P∈ P ∈ 6 ∈ α↾1,...,n−1= ν and α(n)= k, then fα = fν,k = 0, showing Φ(f) = 0 and injectivity of Φ. α 6 6 If f = n fα X R[X1,...,Xn], then α∈(N0) ∈ P

α↾{1,...,n−1} k Φ fαX Xn = f,   n   kX∈N0 α∈(N0X) : α(n)=k     showing surjectivity of Φ, thereby completing the proof.  B POLYNOMIALS IN SEVERAL VARIABLES 184

Proposition B.11. Let R be a commutative ring with unity and let I be a set. If R is I an integral domain (i.e. if it has no nonzero zero divisors), then R[(N0)fin] is an integral I ∗ ∗ domain as well and, moreover, (R[(N0)fin]) = R .

Proof. To apply Cor. B.10, note that the result was proved for the polynomial ring in one variable (i.e. for R[X]) in Prop. 7.7. Now let n N, n 2, and, by induction hypothesis, ∈ ≥ ∗ ∗ assume R[X1,...,Xn−1] has no nonzero zero divisors and R[X1,...,Xn−1] = R . Then Prop. 7.7 yields that (R[X1,...,Xn−1])[Xn] has no nonzero zero divisors and ∗ ∗  (R[X1,...,Xn−1])[Xn] = R . An application of Cor. B.10, in turn, provides that ∗ R[X ,...,X ] has no nonzero zero divisors and R[X ,...,X ] = R∗, completing 1 n  1 n the induction proof of the proposition’s assertion for polynomial rings in finitely many  I variables. Proceeding to the case of a general, possibly infinite, set I, if f,g R[(N0)fin] 0 , then, using the notation introduced in (B.5), we have ∈ \ { } f R[(N )f(I)], g R[(N )g(I)], fg R[(N )(fg)(I)] R[(N )f(I)∪g(I)]. ∈ 0 ∈ 0 ∈ 0 ⊆ 0 f(I)∪g(I) Since R[(N0) ] is a polynomial ring in finitely many variables, we conclude fg = 0, I I ∗ 6 showing R[(N0)fin] has no nonzero zero divisors. Similarly, f (R[(N0)fin]) implies ∗ f(I) ∈ f R due to f R[(N0) ] being an element of a polynomial ring in finitely many variables.∈ ∈  Notation B.12. If I is a set and ν (N )I , then define ∈ 0 fin ν := ν(i). | | i∈I X Lemma B.13. If I is a set, then

ν1 + ν2 = ν1 + ν2 . (B.8) I ν1,ν2∈∀(N0)fin | | | | | |

Proof. Noting that the following sums are, actually, finite, we compute, for each ν1,ν2 I ∈ (N0)fin, ν + ν = (ν + ν )(i)= ν (i)+ ν (i) = ν (i)+ ν (i)= ν + ν , | 1 2| 1 2 1 2 1 2 | 1| | 2| i∈I i∈I i∈I i∈I X X  X X thereby establishing the case.  Definition B.14. Let (R, +, ) be a commutative ring with unity and let I be a set. · ν I I I (a) If f =(fν)ν∈(N0) = ν∈(N ) fνX R[(N0)fin], then we define the degree of f by fin 0 fin ∈ P for f 0, deg f := −∞ ≡ (B.9) max ν : f =0 for f 0. ( {| | ν 6 } 6≡ B POLYNOMIALS IN SEVERAL VARIABLES 185

ν I (b) We call 0 = f = ν∈(N )I fνX R[(N0)fin] homogeneous of degree d N0 if, and 6 0 fin ∈ ∈ only if, P fν =0 ν = d . I ν∈(N∀0) 6 ⇒ | | fin   Moreover, we also define the zero polynomial to be homogeneous of every degree d N . ∈ 0 ν I (c) For f = ν∈(N )I fνX R[(N0)fin] and d N0, we call 0 fin ∈ ∈

P ν hd(f) := fνX ν∈(N )I : |ν|=d 0Xfin

the homogeneous component of degree d of f (clearly, deg hd(f)= d or hd(f) = 0).

ν I Remark B.15. In the situation of Def. B.14(c), for f = ν∈(N )I fνX R[(N0)fin], 0 fin ∈ the definition of the h (f) immediately yields d P ∞ deg f

f = hd(f)= hd(f). (B.10) Xd=0 Xd=0 Lemma B.16. Let R be a commutative ring with unity and let I be a set. If f,g R[(N )I ] are homogeneous of degree d and d , respectively (d ,d N ), then fg is∈ 0 fin f g f g ∈ 0 a homogeneous of degree df + dg (note that, according to Def. B.14(b), this does not exclude the possibility fg =0).

ν ν Proof. If f = I fνX and g = I gνX , then ν∈(N0)fin ν∈(N0)fin P P ν fg = fν1 gν2 X . ν∈(N )I ν1+ν2=ν ! X0 fin X

Since f is homogeneous of degree df and g is homogeneous of degree dg, we have

f g =0 f =0 g =0 ν = d ν = d ν1 ν2 6 ⇒ ν1 6 ∧ ν2 6 ⇒ | 1| f ∧ | 2| g  (B.8)    ν = ν + ν = ν + ν = d + d , ⇒ | | | 1 2| | 1| | 2| f g proving fg to be homogeneous of degree df + dg.  B POLYNOMIALS IN SEVERAL VARIABLES 186

Theorem B.17. Let R be a commutative ring with unity and let I be a set. If f,g I ν ν ∈ R[(N0) ] with f = I fνX , g = I gνX , then fin ν∈(N0)fin ν∈(N0)fin P P max deg f, deg g if f = g, deg(f + g)= −∞ ≤ { } − (B.11a) max ν : f = g max deg f, deg g , otherwise, ( {| | ν 6 − ν}≤ { } deg(fg) deg f + deg g. (B.11b) ≤ If R is an integral domain, then one even has

deg(fg) = deg f + deg g. (B.11c)

Proof. If f 0, then f + g = g and fg 0, i.e. the degree formulas hold if f 0 or g 0. It is≡ also immediate from (B.3) that≡ (B.11a) holds in the remaining case.≡ Using Rem.≡ B.15, we obtain

deg f deg g deg f deg g f = h (f), g = h (g) fg = h (f) h (g). d1 d2 ⇒ d1 d2 dX1=0 dX2=0 dX1=0 dX2=0

According to Lem. B.16, each product hd1 (f) hd2 (g) is homogeneous of degree d1 + d2

(where, in general, hd1 (f) hd2 (g) = 0 is not excluded), proving (B.11b). If R has no I nonzero zero divisors, then, by Prop. B.11, R[(N0)fin] has no nonzero zero divisors, implying h (f) h (g) = 0 and, thus, (B.11c).  deg f deg g 6 Definition and Remark B.18. Let R be a commutative ring with unity, let I be a set, and let R′ be a commutative ring extension of R, where ι : R R′ is a unital ring monomorphism. For each x := (x ) (R′)I , the map −→ i i∈I ∈

ǫ : R[(N )I ] R′, f ǫ (f)= ǫ f Xν := f xν, (B.12) x 0 fin −→ 7→ x x  ν  ν ν∈(N )I ν∈(N )I X0 fin X0 fin   where xν := xν(i) (B.13) I ′ I i ν∈(N∀0) x=(xi)i∀∈I ∈(R ) fin i∈I Y (the product is, actually, always finite and, thus, well-defined due to the commutativ- ity of R′) is called the substitution homomorphism or evaluation homomorphism corre- sponding to x: Indeed, if x R′, then we may apply Th. B.9 with S := R′, φ := ι, M := (N )I , and µ : (M, +)∈ (R′, ) defined by 0 fin −→ · ν ν(i) µ(ν) := x = xi : (B.14) i∈I Y B POLYNOMIALS IN SEVERAL VARIABLES 187

Then µ(0) = x0 =1 R′ and i∈I i ∈ Q ν1(i)+ν2(i) ν1(i) ν2(i) ν1(i) ν2(i) µ(ν1 + ν2)= xi = xi xi = xi xi = µ(ν1) µ(ν2) i∈I i∈I i∈I ! i∈I ! Y Y  Y Y ν I shows µ to be a homomorphism. Moreover, for each f = ν∈(N )I fνX R[(N0)fin], 0 fin ∈ P (B.6) ǫ (f)= f xν = ι(f ) µ(ν) = Φ f Xν = Φ(f), x ν ν  ν  ν∈(N )I ν∈(N )I ν∈(N )I X0 fin X0 fin X0 fin   identifying ǫx to be the unital ring homomorphism given by (B.6) of Th. B.9. The map ǫ is linear, since, for each λ R and f as before, x ∈ ν ν ǫx(λf)= (λfν) x = λ fνx = λǫx(f). ν∈(N )I ν∈(N )I X0 fin X0 fin We call x (R′)I a zero or a root of f R[(N )I ] if, and only if, ǫ (f)=0. ∈ ∈ 0 fin x Remark B.19. While Def. and Rem. B.18 is the analogon to Def. and Rem. 7.10 for polynomials in one variable, we note that condition (7.7) has been replaced by the stronger assumption that R′ be commutative. This was used in the proof that µ : (M, +) (R′, ) is a homomorphism. It would suffice to replace (7.7) by the −→ · ′ I assumption that, given x := (xi)i∈I (R ) , ab = ba holds for all a,b R xi : i I , but this still means that, for polynomials∈ in several variables, one can,∈ ∪ in { general,∈ no} longer use rings of matrices over R for R′ (one can no longer substitute matrices over R for the variables of the polynomial). Lemma B.20. Let (R, +, ) be a commutative ring with unity and n N, n 2. If · ∈ ≥ Φ: (R[X1,...,Xn−1])[Xn] ∼= R[X1,...,Xn] is the ring isomorphism given by Cor. B.10 and R′ is a commutative ring extension of R, then, for each x =(x ,...,x ) (R′)n and each (f ,...,f ) (R[X ,...,X ])N , 1 n ∈ 1 N ∈ 1 n−1 N N0: ∈ N N k k ǫx Φ fkXn = ǫxn ǫ(x1,...,xn−1)(fk)Xn . !! ! Xk=0 Xk=0 ν Proof. Suppose, for each k 0,...,N , we have n−1 f X R[X ,...,X ]. ν∈(N0) ν,k 1 n−1 From Cor. B.10, we recall ∈ { } ∈ P N ν k α Φ fν,kX Xn = fα X ,   n−1   n Xk=0 ν∈(XN0) α∈X(N0)     B POLYNOMIALS IN SEVERAL VARIABLES 188

where fν,k if α(n)= k and α↾1,...,n−1= ν, fα := (0 otherwise. Thus, using the notation of (B.13), for each x =(x ,...,x ) (R′)n, 1 n ∈ N ν k α ǫx Φ fν,kX Xn = fα x    n−1   n Xk=0 ν∈(XN0) α∈X(N0) N   n−1  N  n−1 = f xk xν(i) = f xν(i) xk ν,k n i  ν,k i  n n−1 i=1 n−1 i=1 Xk=0 ν∈(XN0) Y Xk=0 ν∈(XN0) Y N   ν k = ǫxn ǫ(x1,...,xn−1) fν,kX Xn ,   n−1   Xk=0 ν∈(XN0)     thereby establishing the case.  Notation B.21. Let R be a commutative ring with unity, let I be a set, and let R′ ′ I be a commutative ring extension of R. Moreover, let x := (xi)i∈I (R ) and let I ′ ∈ ǫx : R[(N0)fin] R be the substitution homomorphism of Def. and Rem. B.18. We introduce the notation−→ R[x] := R (x ) := Im ǫ R′. i i∈I x ⊆ ′ If R is an integral domain and L := R is a field, then we use R(x) := R (xi)i∈I to denote the field of fractions of R[x], which, using the isomorphism of Def. and Rem.  7.41, we consider to be a subset of L, i.e. R R[x] R(x) L. ⊆ ⊆ ⊆ If n N and x ,...,x R′, then we also use the simplified notation R[x ,...,x ] and ∈ 1 n ∈ 1 n R(x1,...,xn), respectively. Proposition B.22. In the situation of Not. B.21, the following holds:

(a)

R[x]= S, := S R′ : R x : i I S S is subring of R′ , S ⊆ ∪ { i ∈ }⊆ ∧ S∈S \  ′ i.e. R[x] is the smallest subring of R , containing R as well as all xi, i I. More- over, it also holds that ∈

R[x]= R (xi)i∈J . J⊆I:#J<∞ [   B POLYNOMIALS IN SEVERAL VARIABLES 189

(b) If R is an integral domain and L := R′ is a field, then

R(x)= F, := F L : R x : i I F F is subfield of L , F ⊆ ∪ { i ∈ }⊆ ∧ F ∈F \  i.e. R(x) is the smallest subfield of L, containing R as well as all xi, i I. More- over, it also holds that ∈

R(x)= R (xi)i∈J . J⊆I:#J<∞ [  ′ Proof. (a): According to [Phi19, Ex. 4.36(c)], R[x] = Im ǫx is a subring of R . Since ǫ (r) = r Im ǫ for each r R as well as ǫ (X ) = x for each i I, R x : i x ∈ x ∈ x i i ∈ ∪ { i ∈ I R[x], i.e. R[x] and S∈S S R[x]. To prove the remaining inclusion, let S } ⊆ ′ ∈ S ⊆ be a subring of R such that R xi : i I S. Since S is then closed under sums and products, if f R[(N )I ],T∪ then { ǫ (f∈) }S ⊆, showing R[x] = Im ǫ S and R[x] ∈ 0 fin x ∈ x ⊆ ⊆ S∈S S. To prove the remaining representation of R[x], let U := J⊆I:#J<∞ R (xi)i∈J . Then U is a subring of R′ by [Phi19, Ex. 4.36(f)], since the set of finite subsets of T SM   I is partially ordered by inclusion with J1,J2 J1 J2 for each J1,J2 . Clearly, R x : i I U, i.e. U ⊆and R∪[x] =∈ M S U. To∈ prove M ∪ { i ∈ } ⊆ ∈ S S∈S ⊆ the remaining inclusion, note that J I directly implies R (xi)i∈J R[x]. Thus, U R[x], completing the proof. ⊆ T ⊆ ⊆   (b) follows directly from (a) by applying Def. and Rem. 7.41 with R replaced by R[x]. The remaining representation of R(x) is proved precisely by the same argument as the analogous representation of R[x] in (a). 

Theorem B.23. Let R be a commutative ring with unity, let I be a set, and consider the map I φ : R[(N )I ] R(R ), f φ(f), φ(f)(x) := ǫ (f). (B.15) 0 fin −→ 7→ x We define I Pol(R, (xi)i∈I ) := R[(xi)i∈I ] := φ R[(N0)fin] and call the elements of Pol(R, (xi)i∈I ) polynomial functions (in n variables for #I = n N; in infinitely many variables for I being infinite). ∈

(RI ) (a) φ is a unital ring homomorphism (in particular, Pol(R, (xi)i∈I ) is a subring of R and φ is a unital ring epimorphism onto Pol(R, (xi)i∈I )). If R is a field, then φ is also linear (in particular, Pol(R, (xi)i∈I ) is then a vector subspace of the vector (RI ) space R over R and φ is then a linear epimorphism onto Pol(R, (xi)i∈I )). (b) If R is finite and I is nonempty, then φ is not a monomorphism. B POLYNOMIALS IN SEVERAL VARIABLES 190

I (c) If F := R is an infinite field, then φ : R[(N0)fin] Pol(R, (xi)i∈I ) is an isomor- phism. −→

Proof. We first recall that we know R(RI ) to be a commutative ring with unity from [Phi19, Ex. 4.42(a)]. Similarly, if R is a field, then R(RI ) is a vector space over R according to [Phi19, Ex. 5.2(a)]. (a): If f = X0, then φ(f) 1. We also know from Def. and Rem. B.18 that, for each I ≡ I x R , ǫx is a linear ring homomorphism. Thus, if f,g R[(N0)fin] and λ R, then, for∈ each x RI , ∈ ∈ ∈

φ(f + g)(x)= ǫx(f + g)= ǫx(f)+ ǫx(g)= φ(f)+ φ(g) (x), φ(fg)(x)= ǫx(fg)= ǫx(f) ǫx(g)= φ(f)φ(g) (x),  φ(λf)(x)= ǫx(λf)= λǫx(f)= λφ(f) (x).   Thus, φ is unital linear ring epimorphism onto Pol(R, (xi)i∈I ) (directly from the defini- tion of Pol(R, (xi)i∈I )). In consequence, Pol(R, (xi)i∈I ) is a commutative ring with unity by [Phi19, Prop. 4.37] and, if R is a field, then Pol(R, (xi)i∈I ) is a vector space over R by [Phi19, Prop. 6.3(c)]. (RI ) (RI ) (b): If R and I are both finite, then R and Pol(R, (xi)i∈I ) R are finite as well, I ⊆ I (N0)fin whereas R[(N0)fin]= Rfin is infinite (if I = ). In particular, φ can not be injective. Now let I be infinite and let J I be nonempty6 ∅ and finite. We recall the unital ring monomorphism ι : R[(N )J ] ⊆R[(N )I ] introduced in Ex. B.8(c) and also consider 0 −→ 0 fin J φ : R[(N )J ] R(R ), f φ(f), φ (f)(x) := ǫ (f), J 0 fin −→ 7→ J x where we already know φJ is not injective. If we define

J I ι : R(R ) R(R ), ι (P ) (x ) := P (x ) , J −→ J i i∈I i i∈J then   −1 φ↾ J = ι φ ι : (B.16) ι(R[(N0) ]) J ◦ J ◦ ν J I Indeed, if f = J fν X R[(N0) ], x := (xi)i∈I R and xJ := (xi)i∈J , then ν∈(N0) ∈ ∈ P φ(ι(f))(x)= φ f Xν(i) (x)= f xν(i)  ν i  ν i J i∈J J i∈J ν∈X(N0) Y ν∈X(N0) Y   = φ f Xν(i) (x )= φ (f)(x )= ι φ (f) (x), J  ν i  J J J J J ν∈(N )J i∈J X0 Y    B POLYNOMIALS IN SEVERAL VARIABLES 191

J J thereby proving (B.16). Since ι : R[(N0) ] ι(R[(N0) ]) is bijective and φJ is not −→J injective, (B.16) shows φ restricted to ι(R[(N0) ]) is not injective, i.e. φ is not injective. (c): It remains to show φ is a monomorphism, i.e. ker φ = 0 . For polynomials in one variable over and infinite field F , this was shown in Th. 7.17(c).{ } Next, we use induction to extend this result to polynomials in finitely many variables: Let F be an infinite field, n 2, and let ≥ Φ: (F [X1,...,Xn−1])[Xn] ∼= F [X1,...,Xn] denote the ring isomorphism given by Cor. B.10. Suppose f F [X ,...,X ] ∈ 1 n n φ(f)=0 F (F ). ∈ By induction, we have the isomorphisms

ψ : F [X ,...,X ] Pol[F, (x ,...,x )], 1 n−1 −→ 1 n−1 g ψ(g), ψ(g)(x ,...,x ) := ǫ (g) 7→ 1 n−1 (x1,...,xn−1) and φ : F [X ] Pol[F ], h φ (h), φ (h)(x ) := ǫ (h). n n −→ 7→ n n n xn N −1 N k Now let (f1,...,fN ) (F [X1,...,Xn−1]) , N N0, such that Φ (f) = k=0 fkXn. ∈ ∈ n Then we know from Lem. B.20 that, for each x =(x1,...,xn) F , ∈ P N N k k 0= φ(f)(x)= ǫx(f)= ǫx Φ fkXn = ǫxn ǫ(x1,...,xn−1)(fk)Xn , !! ! Xk=0 Xk=0 showing N ǫ (f )Xk ker φ , i.e. N ǫ (f )Xk = 0, since φ is k=0 (x1,...,xn−1) k n ∈ n k=0 (x1,...,xn−1) k n n injective. Thus, f1,...,fn ker ψ, implying f1 = = fn = 0, since ψ is injective. In consequence,P we have shown∈ Φ−1(f)=0and f P= 0··· as well, proving φ to be injective, as desired. This concludes the induction and the proof for the case of polynomials in finitely I many variables. It merely to consider the case, where φ(f) = 0 and f F [(N0)fin], I ∈ f(I) being infinite and F being and infinite field. According to Ex. B.8(c), f F [(N0) ] F [(N )I ], where f(I) I is the finite set defined in Ex. B.8(c). Thus, f∈ is, actually,⊆ a 0 fin ⊆ ↾ f(I) polynomial in only finitely many variables and, since we already know φ F [(N0) ] to be injective, φ(f) = 0 implies f = 0, proving φ to be injective.  Remark B.24. Let R be a commutative ring with unity, let I be a set. If P : RI R −→ν is a polynomial function as defined in Th. B.23, then there exists f = ν∈(N )I fνX 0 fin ∈ R[(N )I ] with P = φ(f). Thus, for each x =(x ) RI , 0 fin i i∈I ∈ P ν ν(i) P (x)= ǫx(f)= fν x = fν xi (B.17) ν∈(N )I ν∈(N )I i∈I X0 fin X0 fin Y C QUOTIENT RINGS 192

(all sums and products being, actually, finite). Thus, the polynomial functions are ν I precisely the linear combinations of the monomial functions x x , ν (N0)fin. Caveat: In general, the representation of P given by (B.17) is not unique:7→ For∈ example, if R is finite and I is nonempty, then it is not unique due to Th. B.23(b) (also cf. Ex. 7.18 and Rem. 7.19). Remark B.25. In Cor. 7.37, we concluded that the ring S := F [X] is factorial for each field F (i.e. each 0 = a S S∗ admits a factorization into prime elements, which is unique up to the order6 of∈ the\ primes and up to association), as a consequence of F [X] being a principal ideal domain. One can actually show, much more generally, I that, if R is a factorial ring, then R[(N0)fin] is a factorial ring as well (proofs for the case of polynomial rings in finitely many variables can be found, e.g., in [Bos13, Sec. 2.7] and [Lan05, Ch. 4 2] – the case, where I is infinite can then be treated by the § I method we used several times above, using that each element f R[(N0)fin] is, actually, a polynomial in only finitely many variables). The reason the ∈general result does not I follow as easily as the one in Cor. 7.37 lies in the fact that, in general, R[(N0)fin] is no longer a principal ideal domain.

C Quotient Rings

In [Phi19, Def. 4.26], we defined the quotient group G/N of a group G with respect to a normal subgroup N; in [Phi19, Sec. 6.2], we defined the quotient space V/U of a vector space V over a field F with respect to a subspace U. For a ring R, there exists an analogous notion, where ideals a R now play the role of the normal subgroup. For simplicity, we will restrict ourselves⊆ to the case of a commutative ring R with unity. According to Def. 7.22(a), every ideal a R is an additive subgroup of R and we can form the quotient group R/a. We will see⊆ below that we can even make a R into a commutative ring with unity, called the quotient ring or factor ring of R with⊆ respect to a, where a being an ideal guarantees the well-definedness of the multiplication on a R. As we write (R, +) as an additive group, we write the respective cosets (i.e. the ⊆ elements of R/a) as x + a, x R. ∈ Theorem C.1. Let R be a commutative ring with unity and let a R be an ideal in R. ⊆ (a) The compositions +: R/a R/a R/a, (x + a)+(y + a) := x + y + a, (C.1a) × −→ : R/a R/a R/a, (x + a) (y + a) := xy + a, (C.1b) · × −→ · are well-defined, i.e. the results do not depend on the chosen representatives of the respective cosets. C QUOTIENT RINGS 193

(b) The natural (group) epimorphism of [Phi19, Th. 4.27(a)]

φa : R R/a, φa(x) := x + a, (C.2) −→ satisfies

φa(x + y)= φa(x)+ φa(y), (C.3a) x,y∀∈R φa(xy)= φa(x) φa(y). (C.3b) x,y∀∈R ·

(c) R/a with the compositions of (a) forms a commutative ring with unity and φa of (b) constitutes a unital ring epimorphism.

Proof. (a): The composition + is well-defined by [Phi19, Th. 4.27(a)]. To verify that is well-defined as well, suppose x,y,x′,y′ R are such that x + a = x′ + a and · ′ ∈′ ′ y + a = y + a. We need to show xy + a = x y + a. There exist ax,ay a such that ′ ′ ∈ x = x + ax, y = y + ay. Since N is a normal subgroup of G, we have bN = Nb and there exists n N such that n b = bn. Then, as nn N = N, we obtain ∈ a b xy + a =(x′ a )(y′ a )+ a = x′y′ + x′a a y′ + a a + a = x′y′ + a, − x − y y − x x y as needed, where we used that a is an additive subgroup of R and that az = za a for each a a, z R. ∈ ∈ ∈ (b): (C.3a) holds, as φa is a homomorphism with respect to + by [Phi19, Th. 4.27(a)]. To verify (C.3b), let x,y R. Then ∈ (C.1b) φa(xy)= xy + a = (x + a) (y + a)= φa(x) φa(y), · · thereby establishing the case. (c): In view of (b), R/a is a ring with unity by [Phi19, Prop. 4.37] and is commutative by [Phi19, Prop. 4.11(c)]. · 

Lemma C.2. Let (G, ) be a group (not necessarily commutative) and let N G be a · ⊆ normal subgroup of G. Consider the map

Φ : (G) (G/N), Φ (A) := φ (A), (C.4) N P −→ P N N

where φN is the natural epimorphism of [Phi19, (4.25)].

(a) U is a subgroup of G if, and only if, ΦN (U) is a subgroup of G/N. C QUOTIENT RINGS 194

(b) If one lets

:= U (R): U is subgroup of G and N U G , A ∈P ⊆ ⊆ n o := V (G/N): V is subgroup of G/N , B ∈P n o then Φ : is bijective, where N A −→B Φ−1 : , Φ−1(B) := φ−1(B). (C.5) N B −→A N N Proof. (a) is merely a special case of [Phi19, Th. 4.20(a),(d)].

(b): According to (a), ΦN does map into . Since N = ker φN , (a) also shows Φ ( ) = . It remains to show Φ isA injective.B To this end, suppose U, V with N A B N ∈ A U V , U = V . If x G such that φN (x)= xN φN (U), then there exist u U and ⊆ 6 ∈ −1 ∈ ∈ nu,nx N such that xnx = unu, i.e. x = ununx U, since U is a subgroup of G with N U∈. The contraposition to what we have just∈ shown says that v V U implies ⊆ ∈ \ φN (v) / φN (U), i.e. φN (v) ΦN (V ) ΦN (U), proving ΦN : to be injective. Now that∈ we know Φ : ∈ to be\ bijective, (C.5) is clearA from −→ (C.4). B  N A −→B Next, we formulate and proof a version of the previous lemma for commutative rings with unity (where ideals now replace subgroups). Lemma C.3. Let R be a commutative ring with unity and let a R be an ideal in R. Consider the map of (C.4), which we here denote as ⊆

Φa : (R) (R/a), Φa(A) := φa(A). P −→ P

(a) b is an ideal in R if, and only if, Φa(b) is an ideal in R/a. (b) If one lets

:= b (R): b is ideal in R and a b R , AR ∈P ⊆ ⊆ n o := B (R/a): B is ideal in R/a , BR ∈P n o then Φa : is bijective, where AR −→ BR −1 −1 −1 Φa : , Φa (B) := φa (B). BR −→ AR Proof. (a) is merely a special case of Prop. 7.24(c).

(b): According to (a), Φa does map into . Since a = ker φa, (a) also shows AR BR Φa( R)= R. If is the set of Lem. C.2(b), then R and, thus, Φa : R R A B A A−1 ⊆A A −→ B is injective by Lem. C.2(b). The representation of Φa is then also clear from (C.5).  C QUOTIENT RINGS 195

In [Phi19, Th. 4.27(b)], we proved the isomorphism theorem for groups: If G and H are groups and φ : G H is a homomorphism, then G/ ker φ ∼= Im φ. Now let R be a commutative ring with−→ unity and let S be another ring. Since R and S are, in particular, additive groups, if φ : R S is a ring homomorphism, then (R/ ker φ, +) ∼= (Im φ, +) and, since ker φ = φ−1 0 −→is an ideal in R by Prop. 7.24(b), it is natural to ask, whether R/ ker φ and Im φ are{ isomorphic} as rings. In Th. C.4 below we see this, indeed, to be the case. Theorem C.4 (Isomorphism Theorem). Let R be a commutative ring with unity and let S be another ring. If φ : R S is a ring homomorphism, then −→ (R/ ker φ, +, ) = (Im φ, +, ). (C.6) · ∼ · More precisely, the map

f : R/ ker φ Im φ, f(x + ker φ) := φ(x), (C.7) −→

is well-defined and constitutes an ring isomorphism. If fe : R R/ ker φ denotes the natural epimorphism and ι : Im φ S, ι(x) := x, denotes−→ the embedding, then f : R/ ker φ S, f := ι f, is a ring−→ monomorphism such that m −→ m ◦ φ = f f . (C.8) m ◦ e

Proof. All assertions, except that f, fe, and fm are multiplicative homomorphisms, were already proved in [Phi19, Th. 4.27(b)]. Moreover, fe is a multiplicative homomorphism by (C.3b) and, thus, so is fm (by [Phi19, Prop. 4.11(a)]), once we have shown f to be a multiplicative homomorphism. Thus, it merely remains to show

f (x + ker φ)(y + ker φ) = f(x + ker φ) f(y + ker φ) for each x,y R. Indeed, if x,y R, then ∈ ∈ f (x+ker φ)(y+ker φ) = f(xy+ker φ)= φ(xy)= φ(x) φ(y)= f(x+ker φ) f(y+ker φ), as desired.   Definition C.5. Let R be a commutative ring with unity and let a be an ideal in R.

(a) a is called a proper ideal if, and only if, a = R. 6 (b) a is called a prime ideal if, and only if, a is proper and

xy a x a y a . x,y∀∈R ∈ ⇒ ∈ ∨ ∈   C QUOTIENT RINGS 196

(c) a is called a maximal ideal if, and only if, a is proper and

a b b = a b = R . b ⊆ R,∀b ideal ⊆ ⇒ ∨   Lemma C.6. Let R be an integral domain, 0 = p R. Then the following statements are equivalent: 6 ∈

(i) p is prime.

(ii) (p) is prime.

Proof. Suppose p is prime. Then 0 = p R R∗, showing (p) to be proper. If x,y R with xy (p), then xy = ap with6 a ∈R,\ showing p xy. As p is prime, this means∈ p x p∈y, showing x (p) or y (∈p), i.e. (p) is prime.| Conversely, assume (p) to be| prime.∨ | Then p R ∈R∗, since (∈p) is proper. If p xy, then there exists a R such ∈ \ | ∈ that pa = xy, i.e. xy (p). Thus, as (p) is prime, x (p) or y (p). If x (p), then ∈ ∈ ∈ ∈ x = axp with ax R, showing p x; if y (p), then y = ayp with ay R, showing p y. In consequence, p∈is prime. | ∈ ∈ | 

Lemma C.7. Let R be a commutative ring with unity. Then the following statements are equivalent:

(i) R is a field.

(ii) The ideal (0) is maximal in R.

Proof. If R is a field, then we know from Ex. 7.27(a) that (0) and R are the only ideals in R, proving (0) to be maximal. Conversely, assume (0) to be maximal ideal in R. If 0 = x R, then the ideal (x) must be all of R (since (x) = (0) and (0) is maximal). Since6 (∈x) = R and 1 R, there exists y R such that xy =6 1, showing R 0 = R∗ (every nonzero element∈ of R is invertible),∈ i.e. R is a field. \ { } 

Theorem C.8. Let R be a commutative ring with unity and let a be an ideal in R.

(a) The following statements are equivalent:

(i) a is proper. (ii) R/a = 0 (i.e. the quotient ring contains more than one element). 6 { } (b) The following statements are equivalent:

(i) a is prime. C QUOTIENT RINGS 197

(ii) R/a is an integral domain.

(c) The following statements are equivalent:

(i) a is maximal. (ii) (0) is a maximal ideal in R/a. (iii) R/a is a field (called the quotient field or factor field of R with respect to a).

Proof. (a): If a is not proper, then a = R and R/a = 0 . If a is proper, then there { } exists x R a, i.e. x+a = a (since a is an additive subgroup of R). Thus, R/a contains at least∈ two\ elements. 6 (b): If a is prime, then it is proper and, by (a), 0 =1 in R/a. Moreover if x,y R such that 6 ∈ a =(x + a)(y + a)= xy + a (C.9) then xy a and x a or y a (as a is prime). Thus, x + a = a or y + a = a, showing ∈ ∈ ∈ R/a to be an integral domain. Conversely, if R/a is an integral domain, then 0 = 1 in R/a and a is proper by (a). Moreover, if x,y R such that xy a, then (C.9)6 holds, implying x + a = a or y + a = a (as the integral∈ domain R/a has∈ no nonzero zero divisors). Thus, x a or y a, proving a to be prime. ∈ ∈ (c): Letting and be as in Lem. C.3(b), we have the equivalences AR BR Lem. C.3(b) (i) # =2 # =2 (ii). ⇔ AR ⇔ BR ⇔ The equivalence “(ii) (iii)” is provided by Lem. C.7.  ⇔ Corollary C.9. Let R be a commutative ring with unity and let a be an ideal in R. If a is maximal, then a is prime.

Proof. If a is maximal, then (by Th. C.8(c)) R/a is a field, implying, in particular, R/a is an integral domain. Thus, by Th. C.8(b), a is prime. 

Theorem C.10. Let R be an integral domain and 0 = a R R∗. Consider the following statements: 6 ∈ \

(i) (a) is a maximal ideal in R.

(ii) a is prime.

(iii) a is irreducible. C QUOTIENT RINGS 198

Then (i) (ii) (iii) always holds. Moreover, if R is a principal ideal domain, then the three statements⇒ ⇒ are even equivalent.

Proof. “(i) (ii)”: If (a) is maximal, then, according to Cor. (C.9), (a) is prime. Then a is prime by⇒ Lem. C.6. “(ii) (iii)” was already shown in Prop. 7.30(e). ⇒ Now assume R to be a principal ideal domain. It only remains to show “(iii) (i)”. To this end, suppose a is irreducible and b is such that (a) (b) R. Then a ⇒(b), i.e. ⊆ ⊆ ∈ there exists c R such that a = cb. As a is irreducible, this implies c R∗ or b R∗. If b R∗, then (∈b)= R. If c R∗, then b = ac−1, proving (b) (a) and∈ (a)=(b).∈ Thus, (a∈) is maximal in R. ∈ ⊆ 

Example C.11. We already know from Ex. 7.27(b) that Z is a principal ideal domain. Thus, the ideals in Z are precisely the sets (n)= nZ with n N (where (n)=( n)). ∈ 0 − We also already know from [Phi19, Ex. 4.38] that the quotient ring Zn := Z/nZ is a field if, and only if, n N is prime (in [Phi19, Ex. 4.38], we still avoided using the term quotient ring). We can∈ now concisely recover and summarize these previous results using the notions and results of the present section by stating that the following assertions are equivalent for n N: ∈ (i) n is irreducible.

(ii) n is prime.

(iii) (n)= nZ is a prime ideal.

(iv) (n)= nZ is a maximal ideal.

(v) Zn is an integral domain.

(vi) Zn is a field.

Indeed, as Z is a principal ideal domain, (i) (ii) (iv) by Th. C.10. Moreover, (iii) (v) by Th. C.8(b) and (iv) (vi) by Th.⇔ C.8(c).⇔ The still missing implication (v) ⇔(vi) was, actually, also already⇔ shown in [Phi19, Ex. 4.38] (it was shown that, if ⇒ n is not prime, then Zn has nonzero zero divisors). Since Z has no nonzero zero divisors, (0) is a prime ideal in Z – it is the only prime ideal in Z that is not a maximal ideal. — D ALGEBRAIC FIELD EXTENSIONS 199

In the following Th. C.12, we use Zorn’s lemma [Phi19, Th. 5.22] to prove the existence of maximal ideals. In Th. D.16 below, this result will then be used to establish the existence of algebraic closures.

Theorem C.12. Let R be a commutative ring with unity and let a be a proper ideal in R. Then there exists a maximal ideal m in R such that a m (in particular, each commutative ring with unity contains at least one maximal ideal).⊆

Proof. Let S be the set of all proper ideals in R that contain a. Note that set inclusion provides a partial order on S. If C = is a totally ordered subset of S, then, by ⊆ s c 6 ∅ c c c Prop. 7.24(f), := c∈C is an ideal in R. If C, then 1 / , since is proper. Thus, 1 / s, showing s to be proper, i.e. s S provides∈ an upper∈ bound for C. Thus, Zorn’s lemma∈ [Phi19, Th.S 5.22] applies, yielding∈ a maximal element m S (i.e. maximal in S with respect to ), which is, thus, a maximal ideal in R that contains∈ a.  ⊆

D Algebraic Field Extensions

D.1 Basic Definitions and Properties

Definition D.1. Let F L be fields, i.e. let L be a field extension of F . ⊆ (a) We know from [Phi19, Ex. 5.2(b)] that L is a vector space over F . The dimension of L as a vector space over F is denoted [L : F ] and is called the degree of L over F . The field extension is called finite for [L : F ] < and infinite for [L : F ]= . ∞ ∞ (b) λ L is called algebraic over F if, and only if, ker ǫ = 0 , where ǫ : F [X] L is ∈ λ 6 { } λ −→ the substitution homomorphism of Def. and Rem. 7.10 (i.e. if, and only if, ǫλ(f)=0 for some nonzero polynomial f with coefficients in F ); λ L is called transcendental over F if, and only if, λ is not algebraic over F . ∈

(c) The field extension L is called algebraic over F if, and only if, each λ L is algebraic over F . ∈

Example D.2. (a) If F is a field and λ F , then λ is algebraic over F due to ǫλ(X λ)=0. ∈ −

(b) Consider the field extension C of R. Then [C : R] = 2 and C is algebraic over R, since each z C satisfies ǫ X2 (2Re z) X + z 2 = 0: Indeed, ∈ z − | | X2 2Re zX + z 2 = X2 (z + z)+ zz =(X z)(X z). − | | − − − D ALGEBRAIC FIELD EXTENSIONS 200

+ (c) Consider the field extension R of Q. For each q Q0 and each n N, λ := n ∈ ∈ √n q R is algebraic over Q, since ǫλ(X q) = 0. The real numbers e,π R are transcendental∈ over Q, however, the proof− is fairly involved (see, e.g., [Lan05,∈ App. 1]). We can conclude [R : Q] = and that R is not algebraic over Q using simple cardinality considerations: Since∞ Qn is countable for each n N and R is uncountable, we see [R : Q]= . Since, for each n N , the set ∈ ∞ ∈ 0 Q[X] := span 1,X,...,Xn = f Q[X]: deg f n , n { } { ∈ ≤ } forms an (n + 1)-dimensional vector subspace of the vector space Q[X] over Q, n+1 each Q[X]n = Q is countable, and Q[X] = Q[X]n is countable as well. ∼ n∈N0 According to Cor. 7.14, 0 = f Q[X] R[X] has at most n zeros (in Q and in 6 ∈ n ⊆ n R), showing that S

S := x R : x algebraic over Q = x R : ǫ (f) = 0 for some f Q[X] ∈ ∈ x ∈ n n∈N  [  is countable, i.e. S ( R and, thus, R is not algebraic over Q.

Theorem D.3 (Degree Theorem). Consider fields F,K,L such that F K L. Then ⊆ ⊆ [L : F ]=[L : K] [K : F ], (D.1) · where the equation holds in N := N if one uses the convention that n = n = for each n N. ∪ {∞} · ∞ ∞· ∞ ∈

Proof. Let BK be a basis of K as a vector space over F and let BL be a basis of L as a vector space over K. It suffices to show that

B := κλ : κ B λ B { ∈ K ∧ ∈ L} is a basis of L as a vector space over F . We first show B to be linearly independent over F . Suppose n m

cijκiλj =0, i=1 j=1 X X where κ1,...,κn are distinct elements from BK (n N), λ1,...,λm are distinct elements form B (m N), and c F . Since ∈ L ∈ ij ∈ n m m n

0= cijκiλj = cijκi λj, i=1 j=1 j=1 i=1 ! X X X X D ALGEBRAIC FIELD EXTENSIONS 201

each n c κ K, and λ ,...,λ are linearly independent over K, we obtain i=1 ij i ∈ 1 m P n cijκi =0, j∈{1∀,...,m} i=1 X implying c =0 for all (i, j) 1,...,n 1,...,m , due to the linear independence ij ∈ { } × { } of κ1,...,κn over F . This completes the proof that B is linearly independent over F . It remains to show B is a generating set for L over F . To this end, let α L. Then there exists m N as well as λ ,...,λ L and β ,...,β K such that∈α = m β λ . ∈ 1 m ∈ 1 m ∈ j=1 j j Next, there exists n N as well as κ1,...,κn and cij F such that ∈ ∈ P n

βj = cijκi j∈{1∀,...,m} i=1 X

(by using cij := 0 if necessary, we can use the same κ1,...,κn for each βj). Thus, we obtain m m n

α = βjλj = cijκiλj, j=1 j=1 i=1 X X X proving B to be a generating set for L over F . 

Theorem D.4. Let F,L be fields such that L is a finite field extension of F (i.e. F L and [L : F ] < ). Then L is an algebraic field extension of F (however, there also exist⊆ infinite field extensions,∞ see Ex. D.12 below).

Proof. If [L : F ]= n N and λ L, then the n +1 elements λ0,...,λn L are linearly dependent over F , i.e.∈ there exist∈ c ,...,c F such that ∈ 0 n ∈ n n i i ǫλ ciX = ciλ =0 i=0 ! i=0 X X and n c Xi =0 F [X], showing λ to be algebraic over F . Since λ L was arbitrary, i=0 i 6 ∈ ∈ L is algebraic over F .  P Theorem D.5. Let F,L be fields with F L, let α L be algebraic over F , and let ǫ : F [X] L be the corresponding substitution⊆ homomorphism.∈ α −→ (a) There exists a unique monic polynomial µ F [X] such that ker ǫ =(µ ). More- α ∈ α α over, this polynomial µα is both prime and irreducible, and it is the unique monic polynomial f F [X] such that ǫ (f)=0 and such that f is of minimal degree in ∈ α D ALGEBRAIC FIELD EXTENSIONS 202

9 ker ǫα. One calls µα the minimal polynomial or the irreducible polynomial of the algebraic element α over F .

(b) If µα is the minimal polynomial of α over F as defined in (a), then one has

F [X]/ ker ǫ = F [X]/(µ ) = Im ǫ = F [α]= F (α) L, (D.2) α α ∼ α ⊆ where F [α] and F (α) are according to Not. B.21. In particular, F [α] is a field extension of F . The degree of this field extension is F [α] : F = deg µα and, in consequence, F [α] = F [X]/ ker ǫ is algebraic over F . ∼ α   Proof. (a): F [X] is a principal ideal domain according to Ex. 7.27(b). Thus, there exists a monic µ F [X] such that ker ǫ = (µ ). If f F [X] is another monic α ∈ α α ∈ polynomial such that ker ǫα =(f), then, by Prop. 7.30(c), f and µα are associated and, ∗ by Prop. 7.7, there exists r R such that f = rµα, implying f = µα. To see µα is the unique monic polynomial∈ of minimal degree in ker ǫ , let g F [X]. Then g F or α ∈ ∈ deg(gµα) = deg g + deg µα > deg µα by (7.5c). According to the isomorphism Th. C.4, F [X]/ ker ǫ = F [X]/(µ ) = Im ǫ L. As a subring of the field L, Im ǫ is an integral α α ∼ α ⊆ α domain. Thus, (µα) is prime by Th. C.8(b), implying µα to be prime by Lem. C.6 and irreducible by Th. C.10. (b): In the proof of (a), we already noticed (D.2) to hold due to the isomorphism Th. C.4, except that we still need to verify F [α] = F (α), i.e. we need to show F [α] is a field. However, by (a), µα is irreducible and, thus, (µα) is maximal by Th. C.10. In consequence, F [α] ∼= F [X]/(µα) is a field by Th. C.8(c). Next, we show F [α] : F = deg µ : Let α   φ : F [X] F [X]/(µ ), f f := φ(f)= f +(µ ), −→ α 7→ α be the canonical epimorphism. Suppose n := deg µα. If f F [X], then, according to the remainder Th. 7.9, there exist unique polynomials q,r ∈F [X] such that ∈ f = qµ + r deg r

0 n−1 showing X ,..., X to be a generating set for F [X]/(µα) as a vector space over F . To verify{ the set is also} linearly independent, suppose a ,...,a F are such that 0 n−1 ∈ n−1 n−1 a Xi = a Xi =(µ )=0 F [X]/(µ ). i i α ∈ α i=0 i=0 X X Then n−1 a Xi ker ǫ , implying a = = a = 0, since deg n−1 a Xi < n. i=0 i ∈ α 0 ··· n−1 i=0 i Thus X0,..., Xn−1 is a basis of F [X]/(µ ) and, using the isomorphism of (D.2), P α P α0,...,α{ n−1 is a basis} of F [α], proving F [α] : F = n = deg µ . Finally, as F [α] : { } α F = deg µ < , F (α) = F [X]/ ker ǫ is algebraic over F by Th. D.4.  α ∞ ∼ α    Definition D.6. Let F,L be fields with F L and recall Not. B.21 as well as Prop. ⊆ B.22(b).

(a) L is called a simple field extension of F if, and only if, there exists λ L such that L = F (λ). In this case, [F (λ): F is called the degree of λ over F . ∈

(b) L is called a finitely generated field extension of F if, and only if, there exist λ ,...,λ L, n N, such that L = F (λ ,...,λ ). 1 n ∈ ∈ 1 n Example D.7. Let F,L be fields with F L. If τ L is transcendental over F , then the field extension F (τ) is finitely generated,⊆ but neither∈ algebraic nor finite: Indeed, n since τ F (τ), F (τ) is not algebraic; F (τ) is not finite, since τ : n N0 is linearly independent∈ over F : If { ∈ } N N n n 0= cnτ = ǫτ cnX n=0 n=0 ! X X N n with c0,...,cN F , N N0, then n=0 cnX ker ǫτ , implying c0 = = cN = 0, since τ is transcendental∈ ∈ over F . In combination∈ with Cor. D.9 below,··· this example shows that, for λ ,...,λ L, n PN, F (λ ,...,λ ) is a finite field extension of F if, 1 n ∈ ∈ 1 n and only if, the elements λ1,...,λn are all algebraic over F . Theorem D.8. Let F,L be fields with F L. Suppose n N and α ,...,α L are ⊆ ∈ 1 n ∈ algebraic over F such that L = F (α1,...,αn). Then the following holds true:

(a) L = F (α1,...,αn)= F [α1,...,αn]. (b) L is a finite (and, thus, algebraic) field extension of F .

Proof. We carry out the proof via induction on n N, where the base case (n = 1) was already done in Th. D.5(b). Now let n > 1.∈ By induction, we know K := D ALGEBRAIC FIELD EXTENSIONS 204

F (α1,...,αn−1) = F [α1,...,αn−1] to be a finite field extension of F . Since αn is alge- braic over K (as α is algebraic over F K), Th. D.5(b) yields K[α ]= K(α ). Thus, n ⊆ n n F [α1,...,αn]= K[αn] is a field, proving L = F (α1,...,αn)= F [α1,...,αn], as desired. Moreover, since [K : F ] < and [L : K] < , Th. D.3 yields ∞ ∞ [L : F ]=[L : K] [K : F ] < , · ∞ thereby completing the induction and the proof.  Corollary D.9. Let F,L be fields with F L. Then the following statements are equivalent: ⊆

(i) L is a finite field extension of F . (ii) L is finitely generated over F by finitely many algebraic elements. (iii) L is a finitely generated algebraic field extension of F .

Proof. (ii) implies (i) and (iii) by Th. D.8. If (iii), then L = F (α1,...,αn) with α1,...,αn L. However, as L is algebraic over F , the elements α1,...,αn L are all algebraic∈ over F , showing (iii) implies (ii). Finally, assume (i), i.e. ther∈e exists n N and B := α ,...,α L such that B forms a basis of L over F . Then ∈ { 1 n} ⊆ L = F (α1,...,αn) and α1,...,αn are algebraic over F by Th. D.4, showing (i) implies (ii).  Corollary D.10. Let F,L be fields with F L. Then the following statements are equivalent: ⊆

(i) L is algebraic over F .

(ii) There exists a family A := (αi)i∈I in L of algebraic elements over F such that L = F (A).

Proof. It is immediate that (i) implies (ii) (as one can choose A := L). Conversely, if L = F (A), where A := (αi)i∈I is a family of algebraic elements over F , then, by Prop.

B.22(b), L = F (A) = J⊆I:#J<∞ F (αi)i∈J , where each F (αi)i∈J with finite J is algebraic over F by Th. D.8. Thus, each x L is algebraic over F , proving (i).  S ∈  Theorem D.11. Let F,K,L be fields such that F K L and let α L. ⊆ ⊆ ∈ (a) If K is algebraic over F and α is algebraic over K, then α is algebraic over F . (b) L is algebraic over F if, and only if, K is algebraic over F and L is algebraic over K. D ALGEBRAIC FIELD EXTENSIONS 205

Proof. (a): Since α is algebraic over K, there exists n N and κ0,...,κn K, not all n i ∈ ∈ equal to 0, such that i=0 κiα = 0. This shows that α is also already algebraic over K0 := F (κ0,...,κn) K and [K0[α]: K0] < by Th. D.5(b). Moreover, [K0 : F ] < ⊆P ∞ ∞ by Th. D.8 and, thus, [K0[α]: F ]=[K0[α]: K0] [K0 : F ] < by Th. D.3. Thus, K0[α] is algebraic over F and, in particular, α is algebraic· over F .∞ (b): If K is algebraic over F and L is algebraic over K, then L is algebraic over F by (a). Conversely, if L is algebraic over F , then K L is algebraic over F , and L is algebraic over K, as F K. ⊆  ⊆ Example D.12. Consider the field of algebraic numbers, defined by

A := α C : α is algebraic over Q . ∈ Indeed, A is a field, since, if α, β A, then A contains the field Q(α, β) by Th. D.8, i.e., in particular, αβ,α + β, α ∈A and, for 0 = α, α−1 A. Thus, A is an algebraic field extension of Q, where an− argument∈ completely6 analogous∈ to the one in Ex. D.2(c) shows A to be countable and, in particular, A ( C. One can show that A is not a finite field extension of Q: Since (√n p)n p = 0 for each n,p N, we have √n p A − ∈ ∈ for each n,p N. Since, for p prime, Xn p Q[X] is the minimal polynomial of √n p ∈ − ∈ and deg(Xn p) = n, one obtains [A : Q] [Q(√n p) : Q] = n, showing [A : Q] = . − ≥ ∞ However, to actually prove Xn p Q[X] to be the minimal polynomial of √n p for p prime, one needs to show Xn −p is∈ irreducible. This turns out to be somewhat tricky and is usually obtained by using− Eisenstein’s irreducibility criterion (cf., e.g., [Bos13, Th. 2.8.1]).

D.2 Algebraic Closure

Definition D.13. If F,L are fields such that L is a field extension of F , then L is called an algebraic closure of F if, and only if, L is both algebraically closed and algebraic over F . In this case, one often writes F for L. —

The goal of the present section is to show every field is contained in an algebraic closure (Th. D.16) and that all algebraic closures of a field are isomorphic (Cor. D.20). Both results are based on suitable applications of Zorn’s lemma. In preparation for Th. D.16, we first show in the following Th. D.14 that one can always extend a field F to a field L such that a particular given polynomial over F has a zero in L. Theorem D.14. Let F be a field and f F [X] such that deg f 1. Then there ∈ ≥ exists an algebraic field extension L of F such that f has a zero in L (i.e. such that D ALGEBRAIC FIELD EXTENSIONS 206

ǫα(f)=0 for some α L). Moreover, if f is irreducible over F , then one may choose L := F [X]/(f). ∈

Proof. It suffices to consider the case, where f is irreducible over F : If f is not irre- ducible, then, by Cor. 7.37, one writes f = f f with irreducible f ,...,f F [X], 1 ··· n 1 n ∈ n N, showing f to have a zero in L if an irreducible factor fi has a zero in L. Thus, we∈ now assume f to be irreducible. Then, according to Th. C.10, the ideal (f) is maximal in F [X], i.e. L := F [X]/(f) is a field by Th. C.8(c). In the usual way, we consider F F [X], i.e. F [X] as a ring extension of F , and we consider the canonical epimorphism⊆ φ : F [X] L = F [X]/(f), φ(g)= g +(f). −→ Then φ↾F : F L is a unital ring homomorphism between fields and, thus, injective, by Prop. 7.28.−→ Thus, L is a field extension of F , where we can consider F L, if we identify F with φ(F ). We claim that φ(X) L is the desired zero of f:⊆ Indeed, if c ,...,c F (n N) are such that f = n ∈c Xi, then 0 n ∈ ∈ i=0 i n n i P ǫ (f)= c φ(X) = φ c Xi = φ(f)=0 L, φ(X) i i ∈ i=0 i=0 ! X  X where, in the notation, we did not distinguish between ci F and φ(ci) L. We note that L is algebraic over F by Th. D.5(b), since f is irreducible∈ and, thus,∈ after multiplication with c F 0 , the minimal polynomial of its zero φ(X) over F .  ∈ \ { } Corollary D.15. For a field F , the following statements are equivalent:

(i) F is algebraically closed. (ii) If L is an algebraic field extension of F , then F = L.

Proof. “(i) (ii)”: Suppose F is algebraically closed and let L be an algebraic field extension of⇒F . If α L, then α is algebraic over F . If µ F [X] is the corresponding ∈ α ∈ minimal polynomial, then, by Cor. 7.14 (and using that F is algebraically closed, µα is monic) n µ = (X α ), n = deg µ N, α ,...,α F. α − j α ∈ { 1 n}⊆ j=1 Y As µα is irreducible, this implies n = 1 and α1 = α F . As α L was arbitrary, this yields F = L. ∈ ∈ “(ii) (i)”: Assuming (ii), we show F to be algebraically closed: To this end, let f F [X⇒] with deg f 1. According to Th. D.14, there exists an algebraic field extension ∈L of F such that f≥has a zero in L. However, according to (ii), we have F = L, showing f has a zero in F , i.e. F is algebraically closed.  D ALGEBRAIC FIELD EXTENSIONS 207

Theorem D.16. Every field F is contained in an algebraic closure F .

Proof. Consider the set I := f F [X]: deg f 1 . ∈ ≥ In a first step, we construct an algebraic field extension L 1 of F such that each f I has ∈ a zero in L1: The method of construction is basically the same as in the proof of Th. D.14, however, somewhat complicated by the fact that the set I is infinite. In consequence, we now consider the polynomial ring in infinitely many variables F [(Xf )f∈I ] (cf. Ex. B.8(c)). As F [(Xf )f∈I ] is a ring extension of F , we have, for each g I, the substitution homomorphism ∈ ǫ : F [X] F [(X ) ]. Xg −→ f f∈I

For each g I, we let g(Xg) := ǫXg (g) (i.e. one may think of g(Xg) as the element of F [(X ) ]∈ one obtains by replacing the “variable” X in g I by the “variable” X ). f f∈I ∈ g Using Prop. 7.25, we may let a be the smallest ideal in F [(Xf )f∈I ], containing all g(Xg) with g I, i.e. ∈ a := g(X ): g I . { g ∈ } We show a to be a proper ideal in F [(Xf )f∈I ]: If a is not proper in F [(Xf )f∈I ], then, using Prop. 7.25(b), there exists n N and h1,...,hn F [(Xf )f∈I ] as well as g1,...,gn I such that ∈ ∈ ∈

hi gi(Xgi )=1. (D.3) i=1 X After applying Th. D.14 n times, we obtain a field extension K of F such that each of the g has a zero α K. Define x := (x ) by letting i i ∈ f f∈I

αi for f = gi, xf := (0 otherwise, and consider the corresponding substitution homomorphism ǫx : F [(Xf )f∈I ] K. Then −→

ǫx hi gi(Xgi ) = ǫx(hi) ǫαi (gi)=0, i=1 ! i=1 X X in contradiction to (D.3). Thus, a is proper in F [(Xf )f∈I ] as desired. By Th. C.12, there exists a maximal ideal m in R such that a m. We can now define L1 := F [(Xf )f∈I ]/m and know L to be a field by Th. C.8(c).⊆ As before, we consider F F [(X ) ], i.e. 1 ⊆ f f∈I F [(Xf )f∈I ] as a ring extension of F , and we consider the canonical epimorphism

φ : F [(X ) ] L = F [(X ) ]/m, φ(h)= h + m. f f∈I −→ 1 f f∈I D ALGEBRAIC FIELD EXTENSIONS 208

Then φ↾ : F L is a unital ring homomorphism between fields and, thus, injective, F −→ 1 by Prop. 7.28. Thus, L1 is a field extension of F , where we can consider F L1, if we identify F with φ(F ). We proceed analogous to the proof of Th. D.14 and⊆ show φ(Xf ) L1 is the desired zero of f I: Indeed, if c0,...,cn F (n N) are such that f = ∈n c Xi I, then ∈ ∈ ∈ i=0 i ∈ P n n i f(Xf )∈m ǫ (f)= c φ(X ) = φ c (X )i = φ f(X ) = 0 L , φ(Xf ) i f i f f ∈ 1 i=0 i=0 ! X  X  where, in the notation, we did not distinguish between ci F and φ(ci) L1. Next, we show L to be algebraic over F : With α := (φ(X )) and∈ ǫ : F [(X )∈ ] L , we 1 f f∈I α f f∈I −→ 1 have F [α]=Im ǫα as a subring of L1. According to the isomorphism Th. C.4,

F [(X ) ]/ ker ǫ = Im ǫ = F [α] L . f f∈I α ∼ α ⊆ 1

Moreover, according to Cor. D.10, F (α) L1 is algebraic over F and, using Th. D.8(a), we have ⊆

F (α)= F (φ(Xf ))f∈J = F (φ(Xf ))f∈J = F [α]. J⊆I:#J<∞ J⊆I:#J<∞ [  [   Thus, F [α] is a field and ker ǫ is a maximal ideal in F [(X ) ]. Since a ker ǫ , this α f f∈I ⊆ α shows m = ker ǫα and L1 = F (α). In particular, L1 is algebraic over F .

One can now inductively iterate the above construction to obtain a sequence (Lk)k∈N of fields such that F L L ..., ⊆ 1 ⊆ 2 ⊆ where, for each k N, L is an algebraic field extension of L and f L [X] with ∈ k+1 k ∈ k deg f 1 has a zero in Lk+1. Then, as a consequence of Th. D.11(b), each Lk is also algebraic≥ over F . If we now let

F := Lk, k[∈N then F is a field according to [Phi19, Ex. 4.36(f)] (actually, one first needs to extend + and to L, which is straightforward, since for a,b F , there exists k N such that · ∈ ∈ a,b Lk, and, thus, a + b and a b are already defined in Lk – as the Lk are nested, this∈ yields a well-defined + and ·on L). Now F is an algebraic closure of F : Indeed, if α F , then α L for some ·k N and, thus, α is algebraic over F , proving F to ∈ ∈ k ∈ be algebraic over F . If f F [X], then, as f has only finitely many coefficients, there ∈ exists k N such that f Lk[X]. Then f has a zero α Lk+1 F , showing F to be algebraically∈ closed. ∈ ∈ ⊆  D ALGEBRAIC FIELD EXTENSIONS 209

It is remarked in [Bos13], after the proof of [Bos13, Th. 3.4.4], that one can show by different means that, in the situation of the proof of Th. D.16 above, one actually has L1 = F , i.e. one does, in fact, obtain an algebraic closure of F in the first step of the construction. In preparation for showing that two algebraic closures of a field F are necessarily iso- morphic, we need to briefly study extensions of homomorphisms between fields (it is also of algebraic interest beyond its application here). Lemma D.17. Let F,L be fields and let σ : F L be a unital homomorphism. Then σ extends to a map −→

n n σ : F [X] L[X], f = f Xi f σ := σ(f )Xi. (D.4) −→ i 7→ i i=0 i=0 X X (a) σ : F [X] L[X] is still a unital homomorphism. −→ (b) Let f F [X], x F . If ǫ (f)=0, then ǫ (f σ)=0. ∈ ∈ x σ(x) Proof. (a): If f,g F [X], where f = n f Xi, g = n g Xi, then ∈ i=0 i i=0 i n Pn P σ i i (f + g) = σ(fi + gi) X = σ(fi)+ σ(gi) X i=0 i=0 Xn n X  i i σ σ = σ(fi) X + σ(gi) X = f + g , i=0 i=0 X2n X 2n σ i i σ σ (fg) = σ fk gl X = σ(fk) σ(gl) X = f g , i=0 ! i=0 ! X kX+l=i X kX+l=i proving σ : F [X] L[X] to be a homomorphism. −→ (b): If f = n f Xi F [X], then i=0 i ∈ P n n σ i i ǫσ(x)(f )= σ(fi) σ(x) = σ fi x = σ ǫx(f) = σ(0) = 0, i=0 i=0 ! X  X  as claimed.  Proposition D.18. Let F,K be fields such that K = F (α) is a simple algebraic field extension of F , α K. Let L be another field and let σ : F L be a unital homomorphism. Let∈µ F [X] denote the minimal polynomial of−→α over F and let α ∈ µσ L[X] denote its image under σ according to (D.4). α ∈ D ALGEBRAIC FIELD EXTENSIONS 210

(a) If τ : K L is a homomorphism extending σ (i.e. τ ↾ = σ), then ǫ (µσ )=0. −→ F τ(α) α σ (b) For each zero λ L of µα, there exists a unique homomorphism τ : K L such that τ extends σ∈and τ(α)= λ. −→ (c) One has

# (τ : K L): τ ↾ = σ and τ is homomorphism −→ F =# λ L : ǫ (µσ )=0 deg µσ deg µ .  ∈ τ(α) α ≤ α ≤ α  τ σ Proof. (a) is immediate from Lem. D.17(b), since ǫα(µα)=0 and µα = µα. (b): By definition, K = F (α) is the field of fractions of F [α]=Im ǫ , ǫ : F [X] K. α α −→ If τ1,τ2 : K L are homomorphism extending σ such that τ1(α) = τ2(α) and f,g F [X] are such−→ that α is not a zero of g, then ∈ ǫ (f) ǫ (f) ǫ (f) ǫ (f) τ α = τ1(α) = τ2(α) = τ α , 1 ǫ (g) ǫ (g) ǫ (g) 2 ǫ (g)  α  τ1(α) τ2(α)  α  showing τ1 = τ2, proving the uniqueness statement (one could have even omitted the denominators in the previous computation, as we know K = F (α) = F [α] from Th. σ D.5(b)). To prove the existence statement, let λ L be such that ǫλ(µα) = 0. Consider the homomorphisms ∈

ǫ : F [X] F (α)= K, ψ := ǫ σ : F [X] L. α −→ λ ◦ −→

Then we know ker ǫα = (µα) from Th. D.5(b). We also know (µα) ker ψ, since, for each f F [X], ⊆ ∈ ψ(fµ )= ǫ (f σµσ )= ǫ (f σ)ǫ (µσ )= ǫ (f σ) 0=0. α λ α λ λ α λ · If φ : F [X] F [X]/(µ ), φ(f)= f +(µ ), −→ α α is the canonical epimorphism, then, by the isomorphism Th. C.4, we can write ǫα = φα φ with a monomorphism φα : F [X]/(µα) K and we can write ψ = φψ φ with◦ a monomorphism φ : F [X]/(µ ) L.−→ As mentioned above, we know K◦= ψ α −→ F (α)= F [α], implying ǫα to be surjective, i.e. φα is also an epimorphism and, thus, an isomorphism. We claim τ : K L, τ := φ φ−1 −→ ψ ◦ α to be the desired extension of σ with τ(α)= β (as a composition of homomorphisms, τ is a homomorphism). Indeed, for each x F F [X], ∈ ⊆ φ x +(µ ) =(φ φ)(x)= ǫ (x)= x K, α α α ◦ α ∈  D ALGEBRAIC FIELD EXTENSIONS 211

implying

τ(x)= φ φ−1 (x)= φ x +(µ ) =(φ φ)(x)= ψ(x)= σ(x) L, ψ ◦ α ψ α ψ ◦ ∈ proving τ ↾ = σ. Analogously, for X F [X], F ∈ φ X +(µ ) =(φ φ)(X)= ǫ (X)= α K, α α α ◦ α ∈ implying 

τ(α)= φ φ−1 (α)= φ X +(µ ) =(φ φ)(X)= ψ(X)= ǫ (X)= λ, ψ ◦ α ψ α ψ ◦ λ thereby establishing the case.  (c): The equality of cardinalities is immediate from combining (a) and (b). Then the first estimate is due to Cor. 7.14 and the second estimate is due to (D.4).  Theorem D.19. Let F,K be fields such that K is an algebraic field extension of F . Let L be another field and let σ : F L be a unital homomorphism. −→ (a) If L is algebraically closed, then there exists a homomorphism τ : K L such −→ that τ ↾F = σ. (b) If both K and L are algebraically closed, L is algebraic over σ(F ), and τ : K L −→ is a homomorphism such that τ ↾F = σ, then τ is an isomorphism.

Proof. (a): The proof is basically a combination of Prop. D.18(b) with Zorn’s lemma: To apply Zorn’s lemma of [Phi19, Th. A.52(iii)], we define a partial order on the set

:= (H,ϕ): F H K, H is a field, M ⊆ ⊆ n ϕ : H L is a homomorphism with ϕ↾ = σ −→ F o by letting (H ,ϕ ) (H ,ϕ ) : H H ϕ ↾ = ϕ . 1 1 ≤ 2 2 ⇔ 1 ⊆ 2 ∧ 2 H1 1 Then (F,σ) , i.e. = . Every chain has an upper bound, namely (H ,ϕ ) with∈H M:= M 6 H∅ and ϕ (x) := ϕC(x ⊆M), where (H,ϕ) is chosen such C C C (H,ϕ)∈C C ∈ C that x H (since is a chain, the value of ϕ (x) does not actually depend on the S C choice of∈ (H,ϕ) C and is, thus, well-defined). Clearly, F H K, H is a field ∈ C ⊆ C ⊆ C by [Phi19, Ex. 4.36(f)], and ϕC is a homomorphism with ϕ↾F = σ. Thus, Zorn’s lemma applies, yielding a maximal element (Hmax,ϕmax) . We claim that Hmax = K: Indeed, if there exists α K H , then, by assumption,∈ M α is algebraic over F , ∈ \ max and we may consider the minimal polynomial µ F [X] of α over F . If µσ L[X] α ∈ α ∈ REFERENCES 212

denotes the image of µ under σ according to (D.4), then µσ has a zero λ L, since α α ∈ L is algebraically closed. Thus, by Prop. D.18(b), we can extend ϕmax to Hmax(α), where Hmax ( Hmax(α) K, in contradiction to the maximality of (Hmax,ϕmax). In consequence, τ := ϕ : ⊆K L is the desired extension of σ. max −→ (b): Under the hypotheses of (b), τ is injective by Prop. 7.28 and, thus, an isomorphism between the fields K and τ(K), as τ(K) must be a field by [Phi19, Prop. 4.37]. In consequence, if K is algebraically closed, then so is τ(K) (e.g. due to Lem. D.17(b)). Since L is algebraic over σ(F ), L is algebraic over τ(K) σ(F ), and Cor. D.15(ii) yields ⊇ τ(K)= L, showing τ to be an isomorphism between K and L as claimed. 

Corollary D.20. Let F be a field. If L1 and L2 are both algebraic closures of F , then there exists an isomorphism φ : L1 ∼= L2 such that φ↾F = IdF (however, this existence result is nonconstructive, as it is based on a application of Zorn’s lemma).

Proof. We have Id : F L , i 1, 2 . According to Th. D.19(a), there exists a −→ i ∈ { } homomorphism φ : L1 L2 such that φ↾F = Id. Since L2 is algebraic over F , it is also algebraic over φ(L −→) F , i.e. φ is a isomorphism by Th. D.19(b).  1 ⊇

References

[Bos13] Siegfried Bosch. Algebra, 8th ed. Springer-Verlag, Berlin, 2013 (German).

[FJ03] Richard J. Fleming and James E. Jamison. Isometries on Banach Spaces: Function Spaces. Monographs and Surveys in Pure and Applied Mathematics, Vol. 129, CRC Press, Boca Raton, USA, 2003.

[For17] Otto Forster. Analysis 3, 8th ed. Springer Spektrum, Wiesbaden, Ger- many, 2017 (German).

[Jac75] Nathan Jacobson. Lectures in Abstract Algebra II. Linear Algebra. Gradu- ate Texts in Mathematics, Springer, New York, 1975.

[K¨on04] Konrad Konigsberger¨ . Analysis 2, 5th ed. Springer-Verlag, Berlin, 2004 (German).

[Lan05] Serge Lang. Algebra, revised 3rd ed. Graduate Texts in Mathematics, Vol. 211, Springer, New York, 2005. REFERENCES 213

[Phi16a] P. Philip. Analysis I: Calculus of One Real Variable. Lecture Notes, LMU Mu- nich, 2015/2016, AMS Open Math Notes Ref. # OMN:202109.111306, avail- able in PDF format at https://www.ams.org/open-math-notes/omn-view-listing?listingId=111306.

[Phi16b] P. Philip. Analysis II: Topology and Differential Calculus of Several Vari- ables. Lecture Notes, LMU Munich, 2016, AMS Open Math Notes Ref. # OMN:202109.111307, available in PDF format at https://www.ams.org/open-math-notes/omn-view-listing?listingId=111307.

[Phi17a] P. Philip. Analysis III: Measure and Integration Theory of Several Variables. Lecture Notes, LMU Munich, 2016/2017, AMS Open Math Notes Ref. # OMN:202109.111308, available in PDF format at https://www.ams.org/open-math-notes/omn-view-listing?listingId=111308.

[Phi17b] P. Philip. Functional Analysis. Lecture Notes, Ludwig-Maximi- lians-Universit¨at, Germany, 2017, available in PDF format at http://www.math.lmu.de/~philip/publications/lectureNot es/philipPeter_FunctionalAnalysis.pdf.

[Phi19] P. Philip. Linear Algebra I. Lecture Notes, Ludwig-Maximilians-Universit¨at, Germany, 2018/2019, available in PDF format at http://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_LinearAlgebra1.pdf.

[Phi21] P. Philip. Numerical Mathematics I. Lecture Notes, Ludwig-Ma- ximilians-Universit¨at, Germany, 2020/2021, available in PDF format at http://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_Num ericalMathematics1.pdf.

[Rud73] W. Rudin. Functional Analyis. McGraw-Hill Book Company, New York, 1973.

[Str08] Gernot Stroth. Lineare Algebra, 2nd ed. Berliner Studienreihe zur Math- ematik, Vol. 7, Heldermann Verlag, Lemgo, Germany, 2008 (German).

[Wer11] D. Werner. Funktionalanalysis, 7th ed. Springer-Verlag, Berlin, 2011 (Ger- man).