Linear Algebra As an Introduction to Abstract Mathematics

Lecture Notes for MAT67 University of California, Davis written Fall 2007, last updated February 10, 2012

Isaiah Lankham Bruno Nachtergaele Anne Schilling

Copyright c 2007 by the authors. These lecture notes may be reproduced in their entirety for non-commercial purposes. Contents

1 What is Linear Algebra? 1 1.1 IntroductiontoMAT67 ...... 1 1.2 WhatisLinearAlgebra? ...... 2 1.3 Systemsoflinearequations...... 4 1.3.1 Linearequations ...... 4 1.3.2 Non-linearequations ...... 5 1.3.3 Lineartransformations ...... 6 1.3.4 Applications of linear equations ...... 7 Exercises...... 9

2 Introduction to Complex Numbers 11 2.1 Deﬁnitionofcomplexnumbers...... 11 2.2 Operationsoncomplexnumbers...... 12 2.2.1 Addition and subtraction of complex numbers ...... 12 2.2.2 Multiplication and division of complex numbers ...... 13 2.2.3 Complexconjugation...... 15 2.2.4 The modulus (a.k.a. norm, length, or magnitude) ...... 16 2.2.5 Complex numbers as vectors in R2 ...... 18 2.3 Polar form and geometric interpretation for C ...... 19 2.3.1 Polarformforcomplexnumbers...... 19 2.3.2 Geometric multiplication for complex numbers ...... 20 2.3.3 Exponentiation and root extraction ...... 21 2.3.4 Some complex elementary functions ...... 22 Exercises...... 24

ii 3 The Fundamental Theorem of Algebra and Factoring Polynomials 26 3.1 TheFundamentalTheoremofAlgebra ...... 26 3.2 Factoringpolynomials ...... 30 Exercises...... 34

4 Vector Spaces 36 4.1 Deﬁnitionofvectorspaces ...... 36 4.2 Elementarypropertiesofvectorspaces ...... 38 4.3 Subspaces ...... 40 4.4 Sumsanddirectsums ...... 42 Exercises...... 46

5 Span and Bases 48 5.1 Linearspan ...... 48 5.2 Linearindependence ...... 50 5.3 Bases...... 55 5.4 Dimension...... 57 Exercises...... 61

6 Linear Maps 64 6.1 Deﬁnition and elementary properties ...... 64 6.2 Nullspaces ...... 67 6.3 Ranges...... 69 6.4 Homomorphisms ...... 70 6.5 Thedimensionformula...... 71 6.6 Thematrixofalinearmap...... 73 6.7 Invertibility ...... 78 Exercises...... 82

7 Eigenvalues and Eigenvectors 85 7.1 Invariantsubspaces ...... 85 7.2 Eigenvalues ...... 86 7.3 Diagonalmatrices...... 89 7.4 Existenceofeigenvalues ...... 90

iii 7.5 Uppertriangularmatrices ...... 91 7.6 Diagonalization of 2 2 matrices and Applications ...... 96 × Exercises...... 98

8 Permutations and the Determinant of a Square Matrix 102 8.1 Permutations ...... 102 8.1.1 Deﬁnitionofpermutations ...... 102 8.1.2 Compositionofpermutations ...... 106 8.1.3 Inversions and the sign of a permutation ...... 108 8.2 Determinants ...... 110 8.2.1 Summations indexed by the set of all permutations ...... 110 8.2.2 Propertiesofthedeterminant ...... 112 8.2.3 Further properties and applications ...... 115 8.2.4 Computing determinants with cofactor expansions ...... 116 Exercises...... 118

9 Inner Product Spaces 120 9.1 Innerproduct ...... 120 9.2 Norms ...... 122 9.3 Orthogonality...... 123 9.4 Orthonormalbases ...... 127 9.5 The Gram-Schmidt orthogonalization procedure ...... 129 9.6 Orthogonal projections and minimization problems ...... 131 Exercises...... 136

10 Change of Bases 139 10.1Coordinatevectors ...... 139 10.2 Changeofbasistransformation ...... 141 Exercises...... 145

11 The Spectral Theorem for Normal Linear Maps 147 11.1 Self-adjoint or hermitian operators ...... 147 11.2Normaloperators ...... 149 11.3 Normal operators and the spectral decomposition ...... 151

iv 11.4 Applications of the Spectral Theorem: diagonalization ...... 153 11.5Positiveoperators...... 157 11.6Polardecomposition ...... 158 11.7 Singular-value decomposition ...... 159 Exercises...... 161

Supplementary Notes on Matrices and Linear Systems 164 12.1 From linear systems to matrix equations ...... 164 12.1.1 Deﬁnition of and notation for matrices ...... 165 12.1.2 Using matrices to encode linear systems ...... 168 12.2Matrixarithmetic...... 171 12.2.1 Addition and scalar multiplication ...... 171 12.2.2 Multiplying and matrices ...... 175 12.2.3 Invertibility of square matrices ...... 179 12.3 Solving linear systems by factoring the coeﬃcient matrix ...... 181 12.3.1 Factorizing matrices using Gaussian elimination ...... 181 12.3.2 Solving homogenous linear systems ...... 191 12.3.3 Solving inhomogeneous linear systems ...... 194 12.3.4 Solving linear systems with LU-factorization ...... 198 12.4Matricesandlinearmaps...... 203 12.4.1 The canonical matrix of a linear map ...... 203 12.4.2 Using linear maps to solve linear systems ...... 204 12.5 Specialoperationsonmatrices ...... 210 12.5.1 Transposeandconjugatetranspose ...... 210 12.5.2 Thetraceofasquarematrix...... 211 Exercises...... 213

List of Appendices

A The Language of Sets and Functions 217 A.1 Sets ...... 217 A.2 Subset, union, intersection, and Cartesian product ...... 219 A.3 Relations...... 221

v A.4 Functions ...... 222

B Summary of Algebraic Structures Encountered 225 B.1 Binary operations and scaling operations ...... 225 B.2 Groups,ﬁelds,andvectorspaces...... 228 B.3 Ringsandalgebras ...... 233

C Some Common Math Symbols & Abbreviations 235

D Summary of Notation Used 242

E Movie Scripts 245

vi Chapter 1

What is Linear Algebra?

1.1 Introduction to MAT 67

This class may well be one of your ﬁrst mathematics classes that bridges the gap between the mainly computation-oriented lower division classes and the abstract mathematics encountered in more advanced mathematics courses. The goal of this class is threefold:

1. You will learn Linear Algebra, which is one of the most widely used mathematical theories around. Linear Algebra finds applications in virtually every area of mathematics, including Multivariate Calculus, Differential Equations, and Probability Theory. It is also widely applied in fields like physics, chemistry, economics, psychology, and engineering. You are even relying on methods from Linear Algebra every time you use an Internet search like Google, the Global Positioning System (GPS), or a cellphone.

2. You will acquire computational skills to solve linear systems of equations, perform operations on matrices, calculate eigenvalues, and ﬁnd determinants of matrices.

3. In the setting of Linear Algebra, you will be introduced to abstraction. We will develop the theory of Linear Algebra together, and you will learn to write proofs.

The lectures will mainly develop the theory of Linear Algebra, and the discussion sessions will focus on the computational aspects. The lectures and the discussion sections go hand in hand, and it is important that you attend both. The exercises for each Chapter are divided into more computation-oriented exercises and exercises that focus on proof-writing. There

1 2 CHAPTER 1. WHAT IS LINEAR ALGEBRA? are also some very short webwork homework sets to make sure you have some basic skills. You can already try the ﬁrst one that introduces some logical concepts by clicking below:

Background Webwork Problems

1.2 What is Linear Algebra?

Linear Algebra is the branch of mathematics aimed at solving systems of linear equations with a ﬁnite number of unknowns. In particular, one would like to obtain answers to the following questions:

Characterization of solutions: Are there solutions to a given system of linear • equations? How many solutions are there?

Finding solutions: How does the solution set look? What are the solutions? • Linear Algebra is a systematic theory regarding the solutions of systems of linear equations.

Example 1.2.1. Let us take the following system of two linear equations in the two unknowns x1 and x2: 2x + x =0 1 2 . x1 x2 =1 − This system has a unique solution for x , x R, namely x = 1 and x = 2 . 1 2 ∈ 1 3 2 − 3 This solution can be found in several diﬀerent ways. One approach is to ﬁrst solve for one of the unknowns in one of the equations and then to substitute the result into the other equation. Here, for example, we might solve to obtain

x1 =1+ x2

from the second equation. Then, substituting this in place of x1 in the ﬁrst equation, we have

2(1 + x2)+ x2 =0.

From this, x = 2/3. Then, by further substitution, 2 − 2 1 x =1+ = . 1 −3 3 1.2. WHAT IS LINEAR ALGEBRA? 3

Alternatively, we can take a more systematic approach in eliminating variables. Here, for example, we can subtract 2 times the second equation from the ﬁrst equation in order to 2 obtain 3x2 = 2. It is then immediate that x2 = 3 and, by substituting this value for x2 − 1 − in the ﬁrst equation, that x1 = 3 .

Example 1.2.2. Take the following system of two linear equations in the two unknowns x1 and x2: x + x =1 1 2 . 2x1 +2x2 =1 Here, we can eliminate variables by adding 2 times the ﬁrst equation to the second equation, − which results in 0 = 1. This is obviously a contradiction, and hence this system of equations − has no solution. Example 1.2.3. Let us take the following system of one linear equation in the two unknowns x1 and x2: x 3x =0. 1 − 2 In this case, there are inﬁnitely many solutions given by the set x = 1 x x R . You { 2 3 1 | 1 ∈ } can think of this solution set as a line in the Euclidean plane R2:

x2 1 1 x2 = 3 x1

x1 3 2 1 1 2 3 − − − 1 −

In general, a system of m linear equations in n unknowns x1, x2,...,xn is a collection of equations of the form

a x + a x + + a x = b 11 1 12 2 1n n 1 a21x1 + a22x2 + + a2nxn = b2  . .  , (1.1) . .   am1x1 + am2x2 + + amnxn = bm    where the aij’s are the coeﬃcients (usually real or complex numbers) in front of the unknowns xj, and the bi’s are also ﬁxed real or complex numbers. A solution is a set of numbers 4 CHAPTER 1. WHAT IS LINEAR ALGEBRA?

s1,s2,...,sn such that, substituting x1 = s1, x2 = s2,...,xn = sn for the unknowns, all of the equations in System (1.1) hold. Linear Algebra is a theory that concerns the solutions and the structure of solutions for linear equations. As this course progresses, you will see that there is a lot of subtlety in fully understanding the solutions for such equations.

1.3 Systems of linear equations

1.3.1 Linear equations

Before going on, let us reformulate the notion of a system of linear equations into the language of functions. This will also help us understand the adjective “linear” a bit better. A function f is a map f : X Y (1.2) → from a set X to a set Y . The set X is called the domain of the function, and the set Y is called the target space or codomain of the function. An equation is

f(x)= y, (1.3)

where x X and y Y . (If you are not familiar with the abstract notions of sets and ∈ ∈ functions, then please consult Appendix A.)

Example 1.3.1. Let f : R R be the function f(x)= x3 x. Then f(x)= x3 x = 1 is → − − an equation. The domain and target space are both the set of real numbers R in this case.

In this setting, a system of equations is just another kind of equation.

Example 1.3.2. Let X = Y = R2 = R R be the Cartesian product of the set of real × numbers. Then deﬁne the function f : R2 R2 as →

f(x , x )=(2x + x , x x ), (1.4) 1 2 1 2 1 − 2 and set y = (0, 1). Then the equation f(x) = y, where x = (x , x ) R2, describes the 1 2 ∈ system of linear equations of Example 1.2.1.

The next question we need to answer is, “what is a linear equation?” Building on the deﬁnition of an equation, a linear equation is any equation deﬁned by a “linear” function 1.3. SYSTEMS OF LINEAR EQUATIONS 5

f that is defined on a “linear” space (a.k.a. a vector space as defined in Section 4.1). We will elaborate on all of this in future lectures, but let us demonstrate the main features of a “linear” space in terms of the example R2. Take x = (x , x ),y = (y ,y ) R2. There are 1 2 1 2 ∈ two “linear” operations defined on R2, namely addition and scalar multiplication:

x + y := (x1 + y1, x2 + y2) (vector addition) (1.5)

cx := (cx1, cx2) (scalar multiplication). (1.6)

A “linear” function on R2 is then a function f that interacts with these operations in the following way:

f(cx)= cf(x) (1.7) f(x + y)= f(x)+ f(y). (1.8)

You should check for yourself that the function f in Example 1.3.2 has these two properties.

1.3.2 Non-linear equations

(Systems of) Linear equations are a very important class of (systems of) equations. You will learn techniques in this class that can be used to solve any systems of linear equations. Non-linear equations, on the other hand, are signiﬁcantly harder to solve. An example is a quadratic equation such as x2 + x 2=0, (1.9) − which, for no completely obvious reason, has exactly two solutions x = 2 and x = 1. − Contrast this with the equation x2 + x +2=0, (1.10)

which has no solutions within the set R of real numbers. Instead, it is has two complex solutions 1 ( 1 i√7) C, where i = √ 1. (Complex numbers are discussed in more detail 2 − ± ∈ − in Chapter 2.) In general, recall that the quadratic equation x2 + bx + c = 0 has the two solutions b b2 x = c. −2 ± 4 − 6 CHAPTER 1. WHAT IS LINEAR ALGEBRA?

1.3.3 Linear transformations

The set R2 can be viewed as the Euclidean plane. In this context, linear functions of the form f : R2 R or f : R2 R2 can be interpreted geometrically as “motions” in the plane → → and are called linear transformations. Example 1.3.3. Recall the following linear system from Example 1.2.1:

2x + x =0 1 2 . x1 x2 =1 −

Each equation can be interpreted as a straight line in the plane, with solutions (x1, x2) to the linear system given by the set of all points that simultaneously lie on both lines. In this case, the two lines meet in only one location, which corresponds to the unique solution to the linear system as illustrated in the following ﬁgure: y

1 y = x 1 − x 1 1 2 − 1 − y = 2x −

Example 1.3.4. The linear map f(x , x )=(x , x ) describes the “motion” of reﬂecting 1 2 1 − 2 a vector across the x-axis, as illustrated in the following ﬁgure: y

1 (x1, x2)

x 1 2 1 (x1, x2) − −

Example 1.3.5. The linear map f(x , x )=( x , x ) describes the “motion” of rotating a 1 2 − 2 1 vector by 900 counterclockwise, as illustrated in the following ﬁgure: 1.3. SYSTEMS OF LINEAR EQUATIONS 7

( x2, x1) 2 −

1 (x1, x2)

x 1 1 2 −

This example can easily be generalized to rotation by any arbitrary angle using Lemma 2.3.2. In particular, when points in R2 are viewed as complex numbers, then we can employ the so-called polar form for complex numbers in order to model the “motion” of rotation. (Cf. Proof-Writing Exercise 5 on page 25.)

1.3.4 Applications of linear equations

Linear equations pop up in many diﬀerent contexts. For example, you can view the derivative df (x) of a diﬀerentiable function f : R R as a linear approximation of f. This becomes dx → apparent when you look at the Taylor series of the function f(x) centered around the point x = a (as seen in a course like MAT 21C):

df f(x)= f(a)+ (a)(x a)+ . (1.11) dx −

In particular, we can graph the linear part of the Taylor series versus the original function, as in the following ﬁgure: f(x) f(x) 3 f(a)+ df (a)(x a) dx − 2

x 1 2 3

Since f(a) and df (a) are merely real numbers, f(a)+ df (a)(x a) is a linear function in the dx dx − single variable x. 8 CHAPTER 1. WHAT IS LINEAR ALGEBRA?

Similarly, if f : Rn Rm is a multivariate function, then one can still view the derivative → of f as a form of a linear approximation for f (as seen in a course like MAT 21D).

What if there are infinitely many variables x1, x2,...? In this case, the system of equations has the form a x + a x + = y 11 1 12 2 1 a x + a x + = y . 21 1 22 2 2   Hence, the sums in each equation are infinite, and so we would have to deal with infinite series. This, in particular, means that questions of convergence arise, where convergence depends upon the infinite sequence x = (x1, x2,...) of variables. These questions will not occur in this course since we are only interested in finite systems of linear equations in a finite number of variables. Other subjects in which these questions do arise, though, include

Diﬀerential Equations (as in a course like MAT 22B or MAT 118AB); • Fourier Analysis (as in a course like MAT 129); • Real and Complex Analysis (as in a course like MAT 125AB, MAT 185AB, MAT • 201ABC, or MAT 202).

In courses like MAT 150ABC and MAT 250ABC, Linear Algebra is also seen to arise in the study of such things as symmetries, linear transformations, and Lie Algebra theory. 1.3. SYSTEMS OF LINEAR EQUATIONS 9 Exercises for Chapter 1

Calculational Exercises

1. Try some webwork problems:

Linear Systems Webwork Problems

2. Solve the following systems of linear equations and characterize their solution sets. (I.e., determine whether there is a unique solution, no solution, etc.) Also, write each system of linear equations as an equation for a single function f : Rn Rm for → appropriate choices of m, n Z . ∈ + (a) System of 3 equations in the unknowns x,y,z,w:

x +2y 2z +3w =2 − 2x +4y 3z +4w =5 . −  5x + 10y 8z + 11w = 12  −  (b) System of 4 equations in the unknowns x, y, z:

x +2y 3z =4 − x +3y + z = 11   . 2x +5y 4z = 13  −  2x +6y +2z = 22   (c) System of 3 equations in the unknowns x, y, z: 

x +2y 3z = 1 − − 3x y +2z =7 . −  5x +3y 4z =2  − 

Gaussian Elimination 10 CHAPTER 1. WHAT IS LINEAR ALGEBRA?

3. Find all pairs of real numbers x1 and x2 that satisfy the system of equations

x1 +3x2 =2, (1.12) x x =1. (1.13) 1 − 2

Proof-Writing Exercises

1. Let a, b, c, and d be real numbers, and consider the system of equations given by

ax1 + bx2 =0, (1.14)

cx1 + dx2 =0. (1.15)

Note that x1 = x2 = 0 is a solution for any choice of a, b, c, and d. Prove that if ad bc = 0, then x = x = 0 is the only solution. − 1 2 2. Watch the video above and then prove that for every augmented matrix there is a unique augmented matrix in reduced row echelon form. Chapter 2

Introduction to Complex Numbers

Let R denote the set of real numbers, which should be a familiar collection of numbers to anyone who has studied Calculus. In this chapter, we use R to build the equally important set of so-called complex numbers.

2.1 Deﬁnition of complex numbers

We begin with the following deﬁnition.

Deﬁnition 2.1.1. The set of complex numbers C is deﬁned as

C = (x, y) x, y R . { | ∈ }

Given a complex number z = (x, y), we call Re(z) = x the real part of z and Im(z) = y the imaginary part of z.

In other words, we are deﬁning a new collection of numbers z by taking every possible ordered pair (x, y) of real numbers x, y R, and x is called the real part of the ordered pair ∈ (x, y) in order to imply that the set R of real numbers should be identiﬁed with the subset (x, 0) x R C. It is also common to use the term purely imaginary for any complex { | ∈ } ⊂ number of the form (0,y), where y R. In particular, the complex number i = (0, 1) is ∈ special, and it is called the imaginary unit. (The use of i is standard when denoting this complex number, though j is sometimes used if i means something else. E.g., i is used to denote electric current in Electrical Engineering.)

11 12 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS

Note that if we write 1 = (1, 0), then we can express z =(x, y) C as ∈

z =(x, y)= x(1, 0)+ y(0, 1) = x1+ yi = x + yi.

It is often signiﬁcantly easier to perform arithmetic operations on complex numbers when written in this form, as we illustrate in the next section.

2.2 Operations on complex numbers

Even though we have formally deﬁned C as the set of all ordered pairs of real numbers, we can nonetheless extend the usual arithmetic operations on R so that they also make sense on C. We discuss such extensions in this section, along with several other important operations on complex numbers.

2.2.1 Addition and subtraction of complex numbers

Addition of complex numbers is performed component-wise, meaning that the real and imaginary parts are simply combined.

Deﬁnition 2.2.1. Given two complex numbers (x ,y ), (x ,y ) C, we deﬁne their (com- 1 1 2 2 ∈ plex) sum to be

(x1,y1)+(x2,y2)=(x1 + x2,y1 + y2).

Example 2.2.2. (3, 2)+ (17, 4.5)=(3+17, 2 4.5) = (20, 2.5). − − − As with the real numbers, subtraction is deﬁned as addition with the so-called additive inverse, where the additive inverse of z =(x, y) is deﬁned as z =( x, y). − − − Example 2.2.3. (π, √2) (π/2, √19) = (π, √2)+( π/2, √19), where − − −

(π, √2)+( π/2, √19) = (π π/2, √2 √19) = (π/2, √2 √19). − − − − −

The addition of complex numbers shares many of the same properties as the addition of real numbers, including associativity, commutativity, the existence and uniqueness of an additive identity, and the existence and uniqueness of additive inverses. We summarize these properties in the following theorem, which you should prove for your own practice. 2.2. OPERATIONS ON COMPLEX NUMBERS 13

Theorem 2.2.4. Let z , z , z C be any three complex numbers. Then the following state- 1 2 3 ∈ ments are true.

1. (Associativity) (z1 + z2)+ z3 = z1 +(z2 + z3).

2. (Commutativity) z1 + z2 = z2 + z1.

3. (Additive Identity) There is a unique complex number, denoted 0, such that, given any complex number z C, 0+ z = z. Moreover, 0=(0, 0). ∈ 4. (Additive Inverses) Given any complex number z C, there is a unique complex num- ∈ ber, denoted z, such that z +( z)=0. Moreover, if z = (x, y) with x, y R, then − − ∈ z =( x, y). − − − The proof of this theorem is straightforward and relies solely on the deﬁnition of complex addition along with the familiar properties of addition for real numbers. For example, to check commutativity, let z =(x ,y ) and z =(x ,y ) be complex numbers with x , x ,y ,y R. 1 1 1 2 2 2 1 2 1 2 ∈ Then

z1 + z2 =(x1 + x2,y1 + y2)=(x2 + x1,y2 + y1)= z2 + z1.

2.2.2 Multiplication and division of complex numbers

The deﬁnition of multiplication for two complex numbers is at ﬁrst glance somewhat less straightforward than that of addition.

Deﬁnition 2.2.5. Given two complex numbers (x ,y ), (x ,y ) C, we deﬁne their (com- 1 1 2 2 ∈ plex) product to be

(x ,y )(x ,y )=(x x y y , x y + x y ). 1 1 2 2 1 2 − 1 2 1 2 2 1

According to this deﬁnition, i2 = 1. In other words, i is a solution of the polynomial − equation z2 + 1 = 0, which does not have solutions in R. Solving such otherwise unsolvable equations was largely the main motivation behind the introduction of complex numbers. Note that the relation i2 = 1 and the assumption that complex numbers can be multiplied − like real numbers is suﬃcient to arrive at the general rule for multiplication of complex 14 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS

numbers:

2 (x1 + y1i)(x2 + y2i)= x1x2 + x1y2i + x2y1i + y1y2i = x x + x y i + x y i y y 1 2 1 2 2 1 − 1 2 = x x y y +(x y + x y )i. 1 2 − 1 2 1 2 2 1

As with addition, the basic properties of complex multiplication are easy enough to prove using the deﬁnition. We summarize these properties in the following theorem, which you should also prove for your own practice.

Theorem 2.2.6. Let z , z , z C be any three complex numbers. Then the following state- 1 2 3 ∈ ments are true.

1. (Associativity) (z1z2)z3 = z1(z2z3).

2. (Commutativity) z1z2 = z2z1.

3. (Multiplicative Identity) There is a unique complex number, denoted 1, such that, given any z C, 1z = z. Moreover, 1=(1, 0). ∈

4. (Distributivity of Multiplication over Addition) z1(z2 + z3)= z1z2 + z1z3.

Just as is the case for real numbers, any non-zero complex number z has a unique mul- 1 tiplicative inverse, which we may denote by either z− or 1/z.

Theorem 2.2.6 (continued).

5. (Multiplicative Inverses) Given z C with z = 0, there is a unique complex number, ∈ 1 1 denoted z− , such that zz− =1. Moreover, if z =(x, y) with x, y R, then ∈

1 x y z− = , − . x2 + y2 x2 + y2

Proof. (Uniqueness.) A complex number w is an inverse of z if zw = 1 (by the commutativity of complex multiplication this is equivalent to wz = 1). We will ﬁrst prove that if w and v are two complex numbers such that zw = 1 and zv = 1, then we necessarily have w = v. 2.2. OPERATIONS ON COMPLEX NUMBERS 15

This will then imply that any z C can have at most one inverse. To see this, we start ∈ from zv = 1. Multiplying both sides by w, we obtain wzv = w1. Using the fact that 1 is the multiplicative unit, that the product is commutative, and the assumption that w is an inverse, we get zwv = v = w. (Existence.) Now assume z C with z = 0, and write z = x + yi for x, y R. Since ∈ ∈ z = 0, at least one of x or y is not zero, and so x2 + y2 > 0. Therefore, we can deﬁne x y w = , − , x2 + y2 x2 + y2 and one can check that zw = 1.

Now, we can deﬁne the division of a complex number z1 by a non-zero complex number 1 z2 as the product of z1 and z2− . Explicitly, for two complex numbers z1 = x1 + iy1 and

z2 = x2 + iy2, we have that their (complex) quotient is

z1 x1x2 + y1y2 +(x2y1 x1y2) i = 2 2 − . z2 x2 + y2

Example 2.2.7. We illustrate the above deﬁnition with the following example:

(1, 2) 1 3+2 4 3 2 1 4 3+8 6 4 11 2 = , − = , − = , . (3, 4) 32 +42 32 +42 9+16 9+16 25 25 2.2.3 Complex conjugation

Complex conjugation is an operation on C that will turn out to be very useful because it allows us to manipulate only the imaginary part of a complex number. In particular, when combined with the notion of modulus (as deﬁned in the next section), it is one of the most fundamental operations on C. The deﬁnition and most basic properties of complex conjugation are as follows. (As in the previous sections, you should provide a proof of the theorem below for your own practice.)

Deﬁnition 2.2.8. Given a complex number z = (x, y) C with x, y R, we deﬁne the ∈ ∈ (complex) conjugate of z to be the complex number

z¯ =(x, y). − 16 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS

Theorem 2.2.9. Given two complex numbers z , z C, 1 2 ∈

1. z1 + z2 = z1 + z2.

2. z1z2 = z1 z2.

3. 1/z =1/z , for all z =0. 1 1 1

4. z1 = z1 if and only if Im(z1)=0.

5. z1 = z1.

6. the real and imaginary parts of z1 can be expressed as

1 1 Re(z )= (z + z ) and Im(z )= (z z ). 1 2 1 1 1 2i 1 − 1

2.2.4 The modulus (a.k.a. norm, length, or magnitude)

In this section, we introduce yet another operation on complex numbers, this time based upon a generalization of the notion of absolute value of a real number. To motivate the definition, it is useful to view the set of complex numbers as the two-dimensional Euclidean plane, i.e., to think of C = R2 being equal as sets. The modulus, or length, of z C is ∈ then defined as the Euclidean distance between z, as a point in the plane, and the origin 0=(0, 0). This is the content of the following definition.

Deﬁnition 2.2.10. Given a complex number z = (x, y) C with x, y R, the modulus ∈ ∈ of z is deﬁned to be z = x2 + y2. | | In particular, given x R, note that ∈

(x, 0) = √x2 +0= x | | | | under the convention that the square root function takes on its principal positive value.

Example 2.2.11. Using the above deﬁnition, we see that the modulus of the complex number (3, 4) is (3, 4) = √32 +42 = √9+16= √25=5. | | 2.2. OPERATIONS ON COMPLEX NUMBERS 17

To see this geometrically, construct a ﬁgure in the Euclidean plane, such as

5 (3, 4) 4 3 • 2 1 0 x 012345 and apply the Pythagorean theorem to the resulting right triangle in order to ﬁnd the distance from the origin to the point (3, 4).

The following theorem lists the fundamental properties of the modulus, and especially as it relates to complex conjugation. You should provide a proof for your own practice.

Theorem 2.2.12. Given two complex numbers z , z C, 1 2 ∈

1. z z = z z . | 1 2| | 1|| 2| z z 2. 1 = | 1|, assuming that z =0. z z 2 2 2 | | 3. z1 = z1 . | | | | 4. Re(z ) z and Im(z ) z . | 1 |≤| 1| | 1 |≤| 1| 5. (Triangle Inequality) z + z z + z . | 1 2|≤| 1| | 2| 6. (Another Triangle Inequality) z z z z . | 1 − 2|≥|| 1|−| 2|| 7. (Formula for Multiplicative Inverse) z z = z 2, from which 1 1 | 1|

1 z1 z− = 1 z 2 | 1| when we assume that z =0. 1 18 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS

2.2.5 Complex numbers as vectors in R2

When complex numbers are viewed as points in the Euclidean plane R2, several of the operations defined in Section 2.2 can be directly visualized as if they were operations on vectors. For the purposes of this chapter, we think of vectors as directed line segments that start at the origin and end at a specified point in the Euclidean plane. These line segments may also be moved around in space as long as the direction (which we will call the argument in Section 2.3.1 below) and the length (a.k.a. the modulus) are preserved. As such, the distinction between points in the plane and vectors is merely a matter of convention as long as we at least implicitly think of each vector as having been translated so that it starts at the origin. As we saw in Example 2.2.11 above, the modulus of a complex number can be viewed as the length of the hypotenuse of a certain right triangle. The sum and difference of two vectors can also each be represented geometrically as the lengths of specific diagonals within a particular parallelogram that is formed by copying and appropriately translating the two vectors being combined.

Example 2.2.13. We illustrate the sum (3, 2)+(1, 3)= (4, 5) as the main, dashed diagonal of the parallelogram in the left-most figure below. The difference (3, 2) (1, 3) = (2, 1) can − − also be viewed as the shorter diagonal of the same parallelogram, though we would properly need to insist that this shorter diagonal be translated so that it starts at the origin. The latter is illustrated in the right-most figure below. y y

5 (4, 5) 5 (4, 5) • • 4 (1, 3) 4 (1, 3) 3 3 2 • (3, 2) 2 • (3, 2) 1 • 1 • 0 x 0 x 012345 012345

Vector Sum as Main Diagonal Vector Diﬀerence as Minor Diagonal of Parallelogram of Parallelogram 2.3. POLAR FORM AND GEOMETRIC INTERPRETATION FOR C 19 2.3 Polar form and geometric interpretation for C

As mentioned above, C coincides with the plane R2 when viewed as a set of ordered pairs of real numbers. Therefore, we can use polar coordinates as an alternate way to uniquely identify a complex number. This gives rise to the so-called polar form for a complex number, which often turns out to be a convenient representation for complex numbers.

2.3.1 Polar form for complex numbers

The following diagram summarizes the relations between cartesian and polar coordinates in R2: y

z • r   y = r sin(θ)   θ  x   x = r cos(θ)  We call the ordered pair (x, y) therectangular coordinates for the complex number z. We also call the ordered pair (r, θ) the polar coordinates for the complex number z. The radius r = z is called the modulus of z (as deﬁned in Section 2.2.4 above), and the | | angle θ = Arg(z) is called the argument of z. Since the argument of a complex number describes an angle that is measured relative to the x-axis, it is important to note that θ is only well-deﬁned up to adding multiples of 2π. As such, we restrict θ [0, 2π) and add or ∈ subtract multiples of 2π as needed (e.g., when multiplying two complex numbers so that their arguments are added together) in order to keep the argument within this range of values. It is straightforward to transform polar coordinates into rectangular coordinates using the equations x = r cos(θ) and y = r sin(θ). (2.1)

In order to transform rectangular coordinates into polar coordinates, we ﬁrst note that r = x2 + y2 is just the complex modulus. Then, θ must be chosen so that it satisﬁes the 20 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS

bounds 0 θ < 2π in addition to the simultaneous equations (2.1) where we are assuming ≤ that z = 0. Summarizing:

z = x + yi = r cos(θ)+ r sin(θ)i = r(cos(θ) + sin(θ)i).

Part of the utility of this expression is that the size r = z of z is explicitly part of the very | | deﬁnition since it is easy to check that cos(θ) + sin(θ)i = 1 for any choice of θ R. | | ∈ Closely related is the exponential form for complex numbers, which does nothing more than replace the expression cos(θ) + sin(θ)i with eiθ. The real power of this deﬁnition is that this exponential notation turns out to be completely consistent with the usual usage of exponential notation for real numbers.

Example 2.3.1. The complex number i in polar coordinates is expressed as eiπ/2, whereas the number 1 is given by eiπ. −

2.3.2 Geometric multiplication for complex numbers

As discussed in Section 2.3.1 above, the general exponential form for a complex number z is an expression of the form reiθ where r is a non-negative real number and θ [0, 2π). The ∈ utility of this notation is immediately observed when multiplying two complex numbers:

Lemma 2.3.2. Let z = r eiθ1 , z = r eiθ2 C be complex numbers in exponential form. 1 1 2 2 ∈ Then i(θ1+θ2) z1z2 = r1r2e .

Proof. By direct computation,

iθ1 iθ2 iθ1 iθ2 z1z2 =(r1e )(r2e )= r1r2e e

= r1r2(cos θ1 + i sin θ1)(cos θ2 + i sin θ2) = r r (cos θ cos θ sin θ sin θ )+ i(sin θ cos θ + cos θ sin θ ) 1 2 1 2 − 1 2 1 2 1 2 i(θ1+θ2) = r1r2cos(θ1 + θ2)+ i sin(θ1 + θ2) = r1r2e , where we have used the usual formulas for the sine and cosine of the sum of two angles. 2.3. POLAR FORM AND GEOMETRIC INTERPRETATION FOR C 21

In particular, Lemma 2.3.2 shows that the modulus z z of the product is the product | 1 2| of the moduli r1 and r2 and that the argument Arg(z1z2) of the product is the sum of the arguments θ1 + θ2.

2.3.3 Exponentiation and root extraction

Another important use for the polar form of a complex number is in exponentiation. The simplest possible situation here involves the use of a positive integer as a power, in which case exponentiation is nothing more than repeated multiplication. Given the observations in Section 2.3.2 above and using some trigonometric identities, one quickly obtains the following fundamental result.

Theorem 2.3.3 (de Moivre’s Formula). Let z = r(cos(θ) + sin(θ)i) be a complex number in polar form and n Z be a positive integer. Then ∈ +

1. the exponentiation zn = rn(cos(nθ) + sin(nθ)i) and

2. the nth roots of z are given by the n complex numbers

θ 2πk θ 2πk i z = r1/n cos + + sin + i = r1/ne n (θ+2πk), k n n n n where k =0, 1, 2,...,n 1. −

Note, in particular, that we are not only always guaranteed the existence of an nth root for any complex number, but that we are also always guaranteed to have exactly n of them. This level of completeness in root extraction contrasts very sharply with the delicate care that must be taken when one wishes to extract roots of real numbers without the aid of complex numbers. An important special case of de Moivre’s Formula yields an inﬁnite family of well-studied numbers called the roots of unity. By unity, we just mean the complex number 1 = 1+0i, 22 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS

and by the nth roots of unity, we mean the n numbers

0 2πk 0 2πk z = 11/n cos + + sin + i k n n n n 2πk 2πk = cos + sin i n n = e2πi(k/n),

where k =0, 1, 2,...,n 1. These numbers have many interesting properties and important − applications despite how simple they might appear to be.

Example 2.3.4. To ﬁnd all solutions of the equation z3 +8=0 for z C, we may write ∈ z = reiθ in polar form with r > 0 and θ [0, 2π). Then the equation z3 + 8 = 0 becomes ∈ z3 = r3ei3θ = 8=8eiπ so that r = 2 and 3θ = π +2πk for k = 0, 1, 2. This means that − there are three distinct solutions when θ [0, 2π), namely θ = π , θ = π, and θ = 5π . ∈ 3 3

2.3.4 Some complex elementary functions

We conclude these notes by defining three of the basic elementary functions that take complex arguments. In this context, “elementary function” is used as a technical term and essentially means something like “one of the most common forms of functions encountered when beginning to learn Calculus.” The most basic elementary functions include the familiar polynomial and algebraic functions, such as the nth root function, in addition to the somewhat more sophisticated exponential function, the trigonometric functions, and the logarithmic function. For the purposes of these notes, we will now define the complex exponential function and two complex trigonometric functions. Definitions for the remaining basic elementary functions can be found in any book on Complex Analysis. The basic groundwork for defining the complex exponential function was already put into place in Sections 2.3.1 and 2.3.2 above. In particular, we have already defined the expression eiθ to mean the sum cos(θ) + sin(θ)i for any real number θ. Historically, this equivalence is a special case of the more general Euler’s formula

ex+yi = ex(cos(y) + sin(y)i), which we here take as our deﬁnition of the complex exponential function applied to any 2.3. POLAR FORM AND GEOMETRIC INTERPRETATION FOR C 23

complex number x + yi for x, y R. ∈ Given this exponential function, one can then deﬁne the complex sine function and the complex cosine function as

iz iz iz iz e e− e + e− sin(z)= − and cos(z)= . 2i 2

Remarkably, these functions retain many of their familiar properties, which should be taken as a sign that the deﬁnitions — however abstract — have been well thought-out. We summarize a few of these properties as follows.

Theorem 2.3.5. Given z , z C, 1 2 ∈ 1. ez1+z2 = ez1 ez2 and ez =0 for any choice of z C. ∈ 2 2 2. sin (z1) + cos (z1)=1.

3. sin(z + z ) = sin(z ) cos(z ) + cos(z ) sin(z ). 1 2 1 2 1 2 4. cos(z + z ) = cos(z ) cos(z ) sin(z ) sin(z ). 1 2 1 2 − 1 2 24 CHAPTER 2. INTRODUCTION TO COMPLEX NUMBERS Exercises for Chapter 2

Calculational Exercises

1. Express the following complex numbers in the form x + yi for x, y R: ∈ (a) (2+3i)+(4+ i) (b) (2+3i)2(4 + i) 2+3i (c) 4+ i 1 3 (d) + i 1+ i 1 (e) ( i)− − (f) ( 1+ i√3)3 − 2. Compute the real and imaginary parts of the following expressions, where z is the complex number x + yi and x, y R: ∈ 1 (a) z2 1 (b) 3z +2 z +1 (c) 2z 5 − (d) z3

3. Find r> 0 and θ [0, 2π) such that (1 i)/√2= reiθ. ∈ − 4. Solve the following equations for z a complex number:

(a) z5 2=0 − (b) z4 + i =0 (c) z6 +8=0 (d) z3 4i =0 − 5. Calculate the

(a) complex conjugate of the fraction (3 + 8i)4/(1 + i)10. 2.3. POLAR FORM AND GEOMETRIC INTERPRETATION FOR C 25

(b) complex conjugate of the fraction (8 2i)10/(4+6i)5. − (c) complex modulus of the fraction i(2+3i)(5 2i)/( 2 i). − − − (d) complex modulus of the fraction (2 3i)2/(8+6i)2. − 6. Compute the real and imaginary parts:

(a) e2+i (b) sin(1 + i)

3 i (c) e − (d) cos(2+3i)

z 7. Compute the real and imaginary part of ee for z C. ∈

Proof-Writing Exercises

1. Let a R and z,w C. Prove that ∈ ∈ (a) Re(az)= aRe(z) and Im(az)= aIm(z). (b) Re(z + w) = Re(z)+Re(w) and Im(z + w) = Im(z) + Im(w).

2. Let z C. Prove that Im(z) = 0 if and only if Re(z)= z. ∈ 3. Let z,w C. Prove the parallelogram law z w 2 + z + w 2 = 2( z 2 + w 2). ∈ | − | | | | | | | z w 4. Let z,w C with zw = 1 such that either z =1or w = 1. Prove that − =1. ∈ | | | | 1 zw − 5. For an angle θ [0, 2π), ﬁnd the linear map f : R2 R2, which describes the rotation ∈ θ → by the angle θ in the counterclockwise direction. Hint: For a given angle θ, ﬁnd a,b,c,d R such that f (x , x )=(ax +bx , cx +dx ). ∈ θ 1 2 1 2 1 2 Chapter 3

The Fundamental Theorem of Algebra and Factoring Polynomials

The similarities and differences between R and C can be described as elegant and intrigu- ing, but why are complex numbers important? One possible answer to this question is the Fundamental Theorem of Algebra. It states that every polynomial equation in one variable with complex coefficients has at least one complex solution. In other words, polynomial equations formed over C can always be solved over C. This amazing result has several equivalent formulations in addition to a myriad of different proofs, one of the first of which was given by the eminent mathematician Carl Gauss in his doctoral thesis.

3.1 The Fundamental Theorem of Algebra

The aim of this section is to provide a proof of the Fundamental Theorem of Algebra using concepts that should be familiar to you from your study of Calculus, and so we begin by providing an explicit formulation.

Theorem 3.1.1 (Fundamental Theorem of Algebra). Given any positive integer n Z ∈ + and any choice of complex numbers a , a ,...,a C with a =0, the polynomial equation 0 1 n ∈ n

a zn + + a z + a = 0 (3.1) n 1 0 has at least one solution z C. ∈ 26 3.1. THE FUNDAMENTAL THEOREM OF ALGEBRA 27

This is a remarkable statement. No analogous result holds for guaranteeing that a real solution exists to Equation (3.1) if we restrict the coefficients a0, a1,...,an to be real numbers. E.g., there does not exist a real number x satisfying an equation as simple as πx2 + e = 0. Similarly, the consideration of polynomial equations having integer (resp. rational) coefficients quickly forces us to consider solutions that cannot possibly be integers (resp. rational numbers). Thus, the complex numbers are special in this respect. The statement of the Fundamental Theorem of Algebra can also be read as follows: Any non-constant complex polynomial function defined on the complex plane C (when thought of as R2) has at least one root, i.e., vanishes in at least one place. It is in this form that we will provide a proof for Theorem 3.1.1. Given how long the Fundamental Theorem of Algebra has been around, you should not be surprised that there are many proofs of it. There have even been entire books devoted solely to exploring the mathematics behind various distinct proofs. Different proofs arise from attempting to understand the statement of the theorem from the viewpoint of different branches of mathematics. This quickly leads to many non-trivial interactions with such fields of mathematics as Real and Complex Analysis, Topology, and (Modern) Abstract Algebra. The diversity of proof techniques available is yet another indication of how fundamental and deep the Fundamental Theorem of Algebra really is. To prove the Fundamental Theorem of Algebra using Differential Calculus, we will need the Extreme Value Theorem for real-valued functions of two real variables, which we state without proof. In particular, we formulate this theorem in the restricted case of functions defined on the closed disk D of radius R> 0 and centered at the origin, i.e.,

D = (x , x ) R2 x2 + x2 R2 . { 1 2 ∈ | 1 2 ≤ }

Theorem 3.1.2 (Extreme Value Theorem). Let f : D R be a continuous function on the → closed disk D R2. Then f is bounded and attains its minimum and maximum values on ⊂ D. In other words, there exist points x , x D such that m M ∈

f(x ) f(x) f(x ) m ≤ ≤ M for every possible choice of point x D. ∈ If we deﬁne a polynomial function f : C C by setting f(z)= a zn + +a z +a as in → n 1 0 28 CHAPTER 3. THE FUNDAMENTAL THEOREM OF ALGEBRA

Equation (3.1), then note that we can regard (x, y) f(x + iy) as a function R2 R. By → | | → a mild abuse of notation, we denote this function by f( ) or f . As it is a composition of | | | | continuous functions (polynomials and the square root), we see that f is also continuous. | | Lemma 3.1.3. Let f : C C be any polynomial function. Then there exists a point z C → 0 ∈ where the function f attains its minimum value in R. | | Proof. If f is a constant polynomial function, then the statement of the Lemma is trivially true since f attains its minimum value at every point in C. So choose, e.g., z = 0. | | 0 If f is not constant, then the degree of the polynomial deﬁning f is at least one. In this case, we can denote f explicitly as in Equation (3.1). That is, we set

f(z)= a zn + + a z + a n 1 0

with an = 0. Now, assume z = 0, and set A = max a0 ,..., an 1 . We can obtain a lower {| | | − |} bound for f(z) as follows: | |

n an 1 1 a0 1 − f(z) = an z 1+ + + n | | | || | an z an z ∞ n A 1 n A 1 an z 1 k = an z 1 . ≥ | || | − an z | || | − an z 1 | | k=1 | | | | | | − For all z C such that z 2, we can further simplify this expression and obtain ∈ | | ≥ 2A f(z) a z n 1 . | | ≥ | n|| | − a z | n|| | It follows from this inequality that there is an R> 0 such that f(z) > f(0) , for all z C | | | | ∈ satisfying z > R. Let D R2 be the disk of radius R centered at 0, and deﬁne a function | | ⊂ g : D R, by → g(x, y)= f(x + iy) . | | Since g is continuous, we can apply Theorem 3.1.2 in order to obtain a point (x ,y ) D 0 0 ∈ such that g attains its minimum at (x ,y ). By the choice of R we have that for z C D, 0 0 ∈ \ f(z) > g(0, 0) g(x ,y ) . Therefore, f attains its minimum at z = x + iy . | | | |≥| 0 0 | | | 0 0 We now prove the Fundamental Theorem of Algebra. 3.1. THE FUNDAMENTAL THEOREM OF ALGEBRA 29

Proof of Theorem 3.1.1. For our argument, we rely on the fact that the function f attains | | its minimum value by Lemma 3.1.3. Let z C be a point where the minimum is attained. 0 ∈ We will show that if f(z ) = 0, then z is not a minimum, thus proving by contraposition 0 0 that the minimum value of f(z) is zero. Therefore, f(z ) = 0. | | 0 If f(z ) = 0, then we can deﬁne a new function g : C C by setting 0 → f(z + z ) g(z)= 0 , for all z C. f(z0) ∈

Note that g is a polynomial of degree n, and that the minimum of f is attained at z if and | | 0 only if the minimum of g is attained at z = 0. Moreover, it is clear that g(0) = 1. | | More explicitly, g is given by a polynomial of the form

g(z)= b zn + + b zk +1, n k

with n 1 and b = 0, for some 1 k n. Let b = b eiθ, and consider z of the form ≥ k ≤ ≤ k | k|

1/k i(π θ)/k z = r b − e − , (3.2) | k|

with r > 0. For z of this form we have

g(z)=1 rk + rk+1h(r), −

where h is a polynomial. Then, for r< 1, we have by the triangle inequality that

g(z) 1 rk + rk+1 h(r) . | | ≤ − | |

For r > 0 suﬃciently small we have r h(r) < 1, by the continuity of the function rh(r) and | | the fact that it vanishes in r = 0. Hence

g(z) 1 rk(1 r h(r) ) < 1, | | ≤ − − | |

for some z having the form in Equation (3.2) with r (0,r ) and r > 0 suﬃciently small. ∈ 0 0 But then the minimum of the function g : C R cannot possibly be equal to 1. | | → 30 CHAPTER 3. THE FUNDAMENTAL THEOREM OF ALGEBRA 3.2 Factoring polynomials

In this section, we present several fundamental facts about polynomials, including an equivalent form of the Fundamental Theorem of Algebra. While these facts should be familiar to you, they nonetheless require careful formulation and proof. Before stating these results, though, we ﬁrst present a review of the main concepts needed in order to more carefully work with polynomials. Let n Z 0 be a non-negative integer, and let a , a ,...,a C be complex ∈ + ∪ { } 0 1 n ∈ numbers. Then we call the expression

p(z)= a zn + + a z + a n 1 0

a polynomial in the variable z with coeﬃcients a , a ,...,a . If a = 0, then we say 0 1 n n that p(z) has degree n (denoted deg(p(z)) = n), and we call an the leading term of p(z).

Moreover, if an = 1, then we call p(z) a monic polynomial. If, however, n = a0 = 0, then we call p(z) = 0 the zero polynomial and set deg(0) = . −∞ Finally, by a root (a.k.a. zero) of a polynomial p(z), we mean a complex number z0 such that, upon setting z = z0, we obtain the zero polynomial p(z0) = 0. Note, in particular, that every complex number is a root of the zero polynomial. Convention dictates that

a degree zero polynomial be called a constant polynomial, • a degree one polynomial be called a linear polynomial, • a degree two polynomial be called a quadratic polynomial, • a degree three polynomial be called a cubic polynomial, • a degree four polynomial be called a quadric polynomial, • a degree ﬁve polynomial be called a quintic polynomial, • and so on. • Addition and multiplication of polynomials is a direct generalization of the addition and multiplication of real numbers, and degree interacts with these operations as follows: 3.2. FACTORING POLYNOMIALS 31

Lemma 3.2.1. Let p(z) and q(z) be non-zero polynomials. Then

1. deg (p(z) q(z)) max deg(p(z)), deg(q(z)) ± ≤ { } 2. deg (p(z)q(z)) = deg(p(z)) + deg(q(z)).

Theorem 3.2.2. Given a positive integer n Z and any choice of a , a ,...,a C with ∈ + 0 1 n ∈ a =0, deﬁne the function f : C C by setting n →

f(z)= a zn + + a z + a , z C. n 1 0 ∀ ∈

In other words, f is a polynomial function of degree n. Then

1. given any complex number w C, we have that f(w)=0 if and only if there exists a ∈ polynomial function g : C C of degree n 1 such that → −

f(z)=(z w)g(z), z C. − ∀ ∈

2. there are at most n distinct complex numbers w for which f(w)=0. In other words, f has at most n distinct roots.

3. (Fundamental Theorem of Algebra, restated) there exist exactly n +1 complex numbers w ,w ,...,w C (not necessarily distinct) such that 0 1 n ∈

f(z)= w (z w )(z w ) (z w ), z C. 0 − 1 − 2 − n ∀ ∈

In other words, every polynomial function with coeﬃcients over C can be factored into linear factors over C.

Proof.

1. Let w C be a complex number. ∈ (“= ”) Suppose that f(w) = 0. Then, in particular, we have that ⇒

a wn + + a w + a =0. n 1 0 32 CHAPTER 3. THE FUNDAMENTAL THEOREM OF ALGEBRA

Since this equation is equal to zero, it follows that, given any z C, ∈

f(z)= a zn + + a z + a (a wn + + a w + a ) n 1 0 − n 1 0 n n n 1 n 1 = an(z w )+ an 1(z − w − )+ + a1(z w) − − − − n 1 n 2 − − k n 1 k k n 2 k = an(z w) z w − − + an 1(z w) z w − − + + a1(z w) − − − − k=0 k=0 n m 1 − k m 1 k =(z w) a z w − − . − m m=1 k=0 Thus, upon setting

n m 1 − k m 1 k g(z)= a z w − − , z C, m ∀ ∈ m=1 k=0 we have constructed a degree n 1 polynomial function g such that −

f(z)=(z w)g(z), z C. − ∀ ∈

(“ =”) Suppose that there exists a polynomial function g : C C of degree n 1 ⇐ → − such that f(z)=(z w)g(z), z C. − ∀ ∈ Then it follows that f(w)=(w w)g(w) = 0, as desired. − 2. We use induction on the degree n of f.

If n = 1, then f(z)= a1z + a0 is a linear function, and the equation a1z + a0 = 0 has the unique solution z = a /a . Thus, the result holds for n = 1. − 0 1 Now, suppose that the result holds for n 1. In other words, assume that every − polynomial function of degree n 1 has at most n 1 roots. Using the Fundamental − − Theorem of Algebra (Theorem 3.1.1), we know that there exists a complex number w C such that f(w) = 0. Moreover, from Part 1 above, we know that there exists a ∈ polynomial function g of degree n 1 such that −

f(z)=(z w)g(z), z C. − ∀ ∈ 3.2. FACTORING POLYNOMIALS 33

It then follows by the induction hypothesis that g has at most n 1 distinct roots, and − so f must have at most n distinct roots.

3. This part follows from an induction argument on n that is virtually identical to that of Part 2, and so the proof is left as an exercise to the reader. 34 CHAPTER 3. THE FUNDAMENTAL THEOREM OF ALGEBRA Exercises for Chapter 3

Calculational Exercises

1. Let n Z be a positive integer, let w ,w ,...,w C be distinct complex numbers, ∈ + 0 1 n ∈ and let z , z ,...,z C be any complex numbers. Then one can prove that there is 0 1 n ∈ a unique polynomial p(z) of degree at most n such that, for each k 0, 1,...,n , ∈ { } p(wk)= zk.

(a) Find the unique polynomial of degree at most 2 that satisﬁes p(0) = 0, p(1) = 1, and p(2) = 2.

(b) Can your result in Part (a) be easily generalized to ﬁnd the unique polynomial of degree at most n satisfying p(0) = 0, p(1) = 1, . . . , p(n)= n?

2. Given any complex number α C, show that the coeﬃcients of the polynomial ∈

(z α)(z α) − −

are real numbers.

Proof-Writing Exercises

1. Let m, n Z be positive integers with m n. Prove that there is a degree n ∈ + ≤ polynomial p(z) with complex coeﬃcients such that p(z) has exactly m distinct roots.

2. Given a polynomial p(z)= a zn + + a z + a with complex coeﬃcients, deﬁne the n 1 0 conjugate of p(z) to be the new polynomial

p(z)= a zn + + a z + a . n 1 0

(a) Prove that p(z)= p(z).

(b) Prove that p(z) has real coeﬃcients if and only if p(z)= p(z).

(c) Given polynomials p(z), q(z), and r(z) such that p(z) = q(z)r(z), prove that p(z)= q(z)r(z). 3.2. FACTORING POLYNOMIALS 35

3. Let p(z) be a polynomial with real coeﬃcients, and let α C be a complex number. ∈ Prove that p(α) = 0 if and only p(α) = 0. Chapter 4

Vector Spaces

Now that important background has been developed, we are finally ready to begin the study of Linear Algebra by introducing vector spaces. To get a sense of how important vector spaces are, try flipping to a random page in these notes. There is very little chance that you will flip to a page that does not have at least one vector space on it.

4.1 Deﬁnition of vector spaces

As we have seen in Chapter 1, a vector space is a set V with two operations defined upon it: addition of vectors and multiplication by scalars. These operations must satisfy certain properties, which we are about to discuss in more detail. The scalars are taken from a field F, where for the remainder of these notes F stands either for the real numbers R or for the complex numbers C. The sets R and C are examples of fields. The abstract definition of a field along with further examples can be found in Appendix B. Vector addition can be thought of as a function + : V V V that maps two vectors × → u, v V to their sum u + v V . Scalar multiplication can similarly be described as a ∈ ∈ function F V V that maps a scalar a F and a vector v V to a new vector av V . × → ∈ ∈ ∈ (More information on these kinds of functions, also known as binary operations, can be found in Appendix B.) It is when we place the right conditions on these operations that we turn V into a vector space.

Deﬁnition 4.1.1. A vector space over F is a set V together with the operations of addition V V V and scalar multiplication F V V satisfying each of the following properties. × → × → 36 4.1. DEFINITION OF VECTOR SPACES 37

1. Commutativity: u + v = v + u for all u, v V ; ∈ 2. Associativity: (u + v)+ w = u +(v + w) and (ab)v = a(bv) for all u,v,w V and ∈ a, b F; ∈ 3. Additive identity: There exists an element 0 V such that 0 + v = v for all v V ; ∈ ∈ 4. Additive inverse: For every v V , there exists an element w V such that v+w = 0; ∈ ∈ 5. Multiplicative identity: 1v = v for all v V ; ∈ 6. Distributivity: a(u + v) = au + av and (a + b)u = au + bu for all u, v V and ∈ a, b F. ∈ A vector space over R is usually called a real vector space, and a vector space over C is similarly called a complex vector space. The elements v V of a vector space are ∈ called vectors. Even though Deﬁnition 4.1.1 may appear to be an extremely abstract deﬁnition, vector spaces are fundamental objects in mathematics because there are countless examples of them. You should expect to see many examples of vector spaces throughout your mathematical life.

Example 4.1.2. Consider the set Fn of all n-tuples with elements in F. This is a vector space with addition and scalar multiplication deﬁned componentwise. That is, for u = (u ,u ,...,u ), v =(v , v ,...,v ) Fn and a F, we deﬁne 1 2 n 1 2 n ∈ ∈

u + v =(u1 + v1,u2 + v2,...,un + vn),

au =(au1, au2,...,aun).

It is easy to check that each property of Deﬁnition 4.1.1 is satisﬁed. In particular, the additive identity 0 = (0, 0,..., 0), and the additive inverse of u is u =( u , u ,..., u ). − − 1 − 2 − n An important case of Example 4.1.2 is Rn, especially when n = 2 or n = 3. We have already seen in Chapter 1 that there is a geometric interpretation for elements of R2 and R3 as points in the Euclidean plane and Euclidean space, respectively.

Example 4.1.3. Let F∞ be the set of all sequences over F, i.e.,

F∞ = (u ,u ,...) u F for j =1, 2,... . { 1 2 | j ∈ } 38 CHAPTER 4. VECTOR SPACES

Addition and scalar multiplication are deﬁned as expected, namely,

(u1,u2,...)+(v1, v2,...)=(u1 + v1,u2 + v2,...),

a(u1,u2,...)=(au1, au2,...).

You should verify that F∞ becomes a vector space under these operations.

Example 4.1.4. Verify that V = 0 is a vector space! (Here, 0 denotes the zero vector in { } any vector space.)

Example 4.1.5. Let F[z] be the set of all polynomial functions p : F F with coeﬃcients → in F. As discussed in Chapter 3, p(z) is a polynomial if there exist a , a ,...,a F such 0 1 n ∈ that n n 1 p(z)= anz + an 1z − + + a1z + a0. (4.1) − Addition and scalar multiplication on F[z] are deﬁned pointwise as

(p + q)(z)= p(z)+ q(z), (ap)(z)= ap(z),

where p, q F[z] and a F. For example, if p(z)=5z + 1 and q(z)=2z2 + z + 1, then ∈ ∈ (p + q)(z)=2z2 +6z +2 and (2p)(z)=10z + 2. It can be easily veriﬁed that, under these operations, F[z] forms a vector space over F. The additive identity in this case is the zero polynomial, for which all coeﬃcients are equal to zero, n n 1 and the additive inverse of p(z) in Equation (4.1) is p(z)= anz an 1z − a1z a0. − − − − −− − Example 4.1.6. Extending Example 4.1.5, let D R be a subset of R, and let (D) denote ⊂ C the set of all continuous functions with domain D and codomain R. Then, under the same operations of pointwise addition and scalar multiplication, one can show that (D) also C forms a vector space.

4.2 Elementary properties of vector spaces

We are going to prove several important, yet simple, properties of vector spaces. From now on, V will denote a vector space over F. 4.2. ELEMENTARY PROPERTIES OF VECTOR SPACES 39

Proposition 4.2.1. Every vector space has a unique additive identity.

Proof. Suppose there are two additive identities 0 and 0′. Then

0′ =0+0′ =0,

where the ﬁrst equality holds since 0 is an identity and the second equality holds since 0′ is an identity. Hence 0 = 0′, proving that the additive identity is unique.

Proposition 4.2.2. Every v V has a unique additive inverse. ∈

Proof. Suppose w and w′ are additive inverses of v so that v + w = 0 and v + w′ = 0. Then

w = w +0= w +(v + w′)=(w + v)+ w′ =0+ w′ = w′.

Hence w = w′, as desired.

Since the additive inverse of v is unique, as we have just shown, it will from now on be denoted by v. We also deﬁne w v to mean w +( v). We will, in fact, show in − − − Proposition 4.2.5 below that v = 1v. − − Proposition 4.2.3. 0v =0 for all v V . ∈ Note that the 0 on the left-hand side in Proposition 4.2.3 is a scalar, whereas the 0 on the right-hand side is a vector.

Proof. For v V , we have by distributivity that ∈

0v = (0+0)v =0v +0v.

Adding the additive inverse of 0v to both sides, we obtain

0=0v 0v = (0v +0v) 0v =0v. − −

Proposition 4.2.4. a0=0 for every a F. ∈ 40 CHAPTER 4. VECTOR SPACES

Proof. As in the proof of Proposition 4.2.3, if a F, then ∈

a0= a(0+0) = a0+ a0.

Adding the additive inverse of a0 to both sides, we obtain 0 = a0, as desired.

Proposition 4.2.5. ( 1)v = v for every v V . − − ∈ Proof. For v V , we have ∈

v +( 1)v =1v +( 1)v =(1+( 1))v =0v =0, − − −

which shows that ( 1)v is the additive inverse v of v. − −

4.3 Subspaces

As mentioned in the last section, there are countless examples of vector spaces. One particularly important source of new vector spaces comes from looking at subsets of a set that is already known to be a vector space.

Deﬁnition 4.3.1. Let V be a vector space over F, and let U V be a subset of V . Then ⊂ we call U a subspace of V if U is a vector space over F under the same operations that make V into a vector space over F.

To check that a subset U V is a subspace, it suﬃces to check only a few of the ⊂ conditions of a vector space.

Lemma 4.3.2. Let U V be a subset of a vector space V over F. Then U is a subspace of ⊂ V if and only if the following three conditions hold.

1. additive identity: 0 U; ∈ 2. closure under addition: u, v U implies u + v U; ∈ ∈ 3. closure under scalar multiplication: a F, u U implies that au U. ∈ ∈ ∈ 4.3. SUBSPACES 41

Proof. Condition 1 implies that the additive identity exists. Condition 2 implies that vector addition is well-deﬁned and, Condition 3 ensures that scalar multiplication is well-deﬁned. All other conditions for a vector space are inherited from V since addition and scalar multiplication for elements in U are the same when viewed as elements in either U or V .

Remark 4.3.3. Note that if we require U V to be a nonempty subset of V , then condition ⊂ 1 of Lemma 4.3.2 already follows from condition 3 since 0u = 0 for u U. ∈ Example 4.3.4. In every vector space V , the subsets 0 and V are easily veriﬁed to be { } subspaces. We call these the trivial subspaces of V .

Example 4.3.5. (x , 0) x R is a subspace of R2. { 1 | 1 ∈ } Example 4.3.6. U = (x , x , x ) F3 x +2x =0 is a subspace of F3. To see this, we { 1 2 3 ∈ | 1 2 } need to check the three conditions of Lemma 4.3.2. The zero vector (0, 0, 0) F3 is in U since it satisﬁes the condition x +2x = 0. To ∈ 1 2 show that U is closed under addition, take two vectors v = (v1, v2, v3) and u = (u1,u2,u3).

Then, by the definition of U, we have v1 +2v2 = 0 and u1 +2u2 = 0. Adding these two equations, it is not hard to see that the vector v + u = (v1 + u1, v2 + u2, v3 + u3) satisfies (v + u ) +2(v + u ) = 0. Hence v + u U. Similarly, to show closure under scalar 1 1 2 2 ∈ multiplication, take u = (u ,u ,u ) U and a F. Then au = (au , au , au ) satisfies the 1 2 3 ∈ ∈ 1 2 3 equation au +2au = a(u +2u )=0, and so au U. 1 2 1 2 ∈ Example 4.3.7. U = p F[z] p(3) = 0 is a subspace of F[z]. Again, to check this, we { ∈ | } need to verify the three conditions of Lemma 4.3.2. n n 1 Certainly the zero polynomial p(z)=0z +0z − + +0z+0 is in U since p(z) evaluated at 3 is 0. If f(z),g(z) U, then f(3) = g(3) = 0 so that (f +g)(3) = f(3)+g(3) = 0+0 = 0. ∈ Hence f + g U, which proves closure under addition. Similarly, (af)(3) = af(3) = a0=0 ∈ for any a F, which proves closure under scalar multiplication. ∈

Example 4.3.8. As in Example 4.1.6, let D R be a subset of R, and let ∞(D) denote the ⊂ C set of all smooth (a.k.a. continuously diﬀerentiable) functions with domain D and codomain R. Then, under the same operations of pointwise addition and scalar multiplication, one can

show that ∞(D) is a subspace of (D). C C 42 CHAPTER 4. VECTOR SPACES

0 z

−2 4 2 0 −4 −2 4 −4 2 x 0 −2 y −4

Figure 4.1: The intersection U U ′ of two subspaces is a subspace ∩

Example 4.3.9. The subspaces of R2 consist of 0 , all lines through the origin, and R2 { } itself. The subspaces of R3 are 0 , all lines through the origin, all planes through the origin, { } and R3. In fact, these exhaust all subspaces of R2 and R3, respectively. To prove this, we will need further tools such as the notion of bases and dimensions to be discussed soon. In particular, this shows that lines and planes that do not pass through the origin are not subspaces (which is not so hard to show!).

Note that if U and U ′ are subspaces of V , then their intersection U U ′ is also a ∩ subspace (see Proof-writing Exercise 2 on page 47 and Figure 4.1). However, the union of two subspaces is not necessarily a subspace. Think, for example, of the union of two lines in R2, as in Figure 4.2.

4.4 Sums and direct sums

Throughout this section, V is a vector space over F, and U , U V denote subspaces. 1 2 ⊂ Deﬁnition 4.4.1. Let U , U V be subspaces of V . Deﬁne the (subspace) sum of U 1 2 ⊂ 1 4.4. SUMS AND DIRECT SUMS 43

u + v / U U ′ ∈ ∪ u

U ′ U

Figure 4.2: The union U U ′ of two subspaces is not necessarily a subspace ∪ and U2 to be the set U + U = u + u u U ,u U . 1 2 { 1 2 | 1 ∈ 1 2 ∈ 2}

Check as an exercise that U1 + U2 is a subspace of V . In fact, U1 + U2 is the smallest subspace of V that contains both U1 and U2. Example 4.4.2. Let

U = (x, 0, 0) F3 x F , 1 { ∈ | ∈ } U = (0, y, 0) F3 y F . 2 { ∈ | ∈ } Then U + U = (x, y, 0) F3 x, y F . (4.2) 1 2 { ∈ | ∈ } If, alternatively, U = (y, y, 0) F3 y F , then Equation (4.2) still holds. 2 { ∈ | ∈ } If U = U +U , then, for any u U, there exist u U and u U such that u = u +u . 1 2 ∈ 1 ∈ 1 2 ∈ 2 1 2 If it so happens that u can be uniquely written as u1 + u2, then U is called the direct sum of U1 and U2. Deﬁnition 4.4.3. Suppose every u U can be uniquely written as u = u + u for u U ∈ 1 2 1 ∈ 1 and u U . Then we use 2 ∈ 2 U = U U 1 ⊕ 2 to denote the direct sum of U1 and U2. 44 CHAPTER 4. VECTOR SPACES

Example 4.4.4. Let

U = (x, y, 0) R3 x, y R , 1 { ∈ | ∈ } U = (0, 0, z) R3 z R . 2 { ∈ | ∈ } Then R3 = U U . However, if instead 1 ⊕ 2

U = (0,w,z) w, z R , 2 { | ∈ }

3 then R = U1 + U2 but is not the direct sum of U1 and U2.

Example 4.4.5. Let

U = p F[z] p(z)= a + a z2 + + a z2m , 1 { ∈ | 0 2 2m } U = p F[z] p(z)= a z + a z3 + + a z2m+1 . 2 { ∈ | 1 3 2m+1 } Then F[z]= U U . 1 ⊕ 2 Proposition 4.4.6. Let U , U V be subspaces. Then V = U U if and only if the 1 2 ⊂ 1 ⊕ 2 following two conditions hold:

1. V = U1 + U2;

2. If 0= u + u with u U and u U , then u = u =0. 1 2 1 ∈ 1 2 ∈ 2 1 2 Proof. (“= ”) Suppose V = U U . Then Condition 1 holds by deﬁnition. Certainly 0 = 0 + 0, ⇒ 1 ⊕ 2 and, since by uniqueness this is the only way to write 0 V , we have u = u = 0. ∈ 1 2 (“ =”) Suppose Conditions 1 and 2 hold. By Condition 1, we have that, for all v V , ⇐ ∈ there exist u U and u U such that v = u + u . Suppose v = w + w with w U 1 ∈ 1 2 ∈ 2 1 2 1 2 1 ∈ 1 and w U . Subtracting the two equations, we obtain 2 ∈ 2

0=(u w )+(u w ), 1 − 1 2 − 2

where u w U and u w U . By Condition 2, this implies u w = 0 and 1 − 1 ∈ 1 2 − 2 ∈ 2 1 − 1 u w = 0, or equivalently u = w and u = w , as desired. 2 − 2 1 1 2 2 4.4. SUMS AND DIRECT SUMS 45

Proposition 4.4.7. Let U , U V be subspaces. Then V = U U if and only if the 1 2 ⊂ 1 ⊕ 2 following two conditions hold:

1. V = U1 + U2;

2. U U = 0 . 1 ∩ 2 { } Proof. (“= ”) Suppose V = U U . Then Condition 1 holds by deﬁnition. If u U U , then ⇒ 1 ⊕ 2 ∈ 1 ∩ 2 0 = u +( u) with u U and u U (why?). By Proposition 4.4.6, we have u = 0 and − ∈ 1 − ∈ 2 u = 0 so that U U = 0 . − 1 ∩ 2 { } (“ =”) Suppose Conditions 1 and 2 hold. To prove that V = U U holds, suppose that ⇐ 1 ⊕ 2

0= u + u , where u U and u U . (4.3) 1 2 1 ∈ 1 2 ∈ 2

By Proposition 4.4.6, it suﬃces to show that u1 = u2 = 0. Equation (4.3) implies that u = u U . Hence u U U , which in turn implies that u = 0. It then follows that 1 − 2 ∈ 2 1 ∈ 1 ∩ 2 1 u2 = 0 as well.

Everything in this section can be generalized to m subspaces U1, U2,...Um, with the notable exception of Proposition 4.4.7. To see, this consider the following example.

Example 4.4.8. Let

U = (x, y, 0) F3 x, y F , 1 { ∈ | ∈ } U = (0, 0, z) F3 z F , 2 { ∈ | ∈ } U = (0,y,y) F3 y F . 3 { ∈ | ∈ } Then certainly F3 = U + U + U , but F3 = U U U since, for example, 1 2 3 1 ⊕ 2 ⊕ 3

(0, 0, 0) = (0, 1, 0)+(0, 0, 1)+(0, 1, 1). − −

But U U = U U = U U = 0 so that the analog of Proposition 4.4.7 does not 1 ∩ 2 1 ∩ 3 2 ∩ 3 { } hold. 46 CHAPTER 4. VECTOR SPACES Exercises for Chapter 4

Calculational Exercises

1. For each of the following sets, either show that the set is a vector space or explain why it is not a vector space.

(a) The set R of real numbers under the usual operations of addition and multiplication. (b) The set (x, 0) x R under the usual operations of addition and multiplication { | ∈ } on R2. (c) The set (x, 1) x R under the usual operations of addition and multiplication { | ∈ } on R2. (d) The set (x, 0) x R, x 0 under the usual operations of addition and { | ∈ ≥ } multiplication on R2. (e) The set (x, 1) x R, x 0 under the usual operations of addition and { | ∈ ≥ } multiplication on R2. a a + b (f) The set a, b R under the usual operations of addition and a + b a | ∈ 2 2 multiplication on R × . a a + b +1 (g) The set a, b R under the usual operations of addition a + b a | ∈ 2 2 and multiplication on R × .

2. Show that the space V = (x , x , x ) F3 x +2x +2x =0 forms a vector space. { 1 2 3 ∈ | 1 2 3 } 3. For each of the following sets, either show that the set is a subspace of (R) or explain C why it is not a subspace.

(a) The set f (R) f(x) 0, x R . { ∈ C | ≤ ∀ ∈ } (b) The set f (R) f(0) = 0 . { ∈ C | } (c) The set f (R) f(0) = 2 . { ∈ C | } (d) The set of all constant functions. 4.4. SUMS AND DIRECT SUMS 47

(e) The set α + β sin(x) α, β R . { | ∈ } 4. Give an example of a nonempty subset U R2 such that U is closed under scalar ⊂ multiplication but is not a subspace of R2.

5. Let F[z] denote the vector space of all polynomials having coeﬃcient over F, and deﬁne U to be the subspace of F[z] given by

U = az2 + bz5 a, b F . { | ∈ }

Find a subspace W of F[z] such that F[z]= U W . ⊕

Proof-Writing Exercises

1. Let V be a vector space over F. Then, given a F and v V such that av = 0, prove ∈ ∈ that either a =0 or v = 0.

2. Let V be a vector space over F, and suppose that W1 and W2 are subspaces of V . Prove that their intersection W W is also a subspace of V . 1 ∩ 2 3. Prove or give a counterexample to the following claim:

Claim. Let V be a vector space over F, and suppose that W1, W2, and W3 are subspaces

of V such that W1 + W3 = W2 + W3. Then W1 = W2.

4. Prove or give a counterexample to the following claim:

Claim. Let V be a vector space over F, and suppose that W1, W2, and W3 are subspaces of V such that W W = W W . Then W = W . 1 ⊕ 3 2 ⊕ 3 1 2 Chapter 5

Span and Bases

Intuition probably tells you that the plane R2 is of dimension two and that the space we live in R3 is of dimension three. You have probably also learned in physics that space-time has dimension four and that string theories are models that can live in ten dimensions. In this chapter we will give a mathematical deﬁnition of the dimension of a vector space. For this we will ﬁrst need the notions of linear span, linear independence, and the basis of a vector space.

5.1 Linear span

As before, let V denote a vector space over F. Given vectors v , v ,...,v V , a vector 1 2 m ∈ v V is a linear combination of (v ,...,v ) if there exist scalars a ,...,a F such ∈ 1 m 1 m ∈ that v = a v + a v + + a v . 1 1 2 2 m m

Deﬁnition 5.1.1. The linear span (or simply span) of (v1,...,vm) is deﬁned as

span(v ,...,v ) := a v + + a v a ,...,a F . 1 m { 1 1 m m | 1 m ∈ }

Lemma 5.1.2. Let V be a vector space and v , v ,...,v V . Then 1 2 m ∈ 1. v span(v , v ,...,v ). j ∈ 1 2 m

2. span(v1, v2,...,vm) is a subspace of V .

48 5.1. LINEAR SPAN 49

3. If U V is a subspace such that v , v ,...v U, then span(v , v ,...,v ) U. ⊂ 1 2 m ∈ 1 2 m ⊂ Proof. Property 1 is obvious. For Property 2, note that 0 span(v , v ,...,v ) and that ∈ 1 2 m span(v1, v2,...,vm) is closed under addition and scalar multiplication. For Property 3, note that a subspace U of a vector space V is closed under addition and scalar multiplication. Hence, if v ,...,v U, then any linear combination a v + + a v must also be an 1 m ∈ 1 1 m m element of U.

Lemma 5.1.2 implies that span(v1, v2,...,vm) is the smallest subspace of V containing each of v1, v2,...,vm.

Definition 5.1.3. If span(v1,...,vm) = V , then we say that (v1,...,vm) spans V and we call V finite-dimensional. A vector space that is not finite-dimensional is called infinite- dimensional.

Example 5.1.4. The vectors e1 = (1, 0,..., 0), e2 = (0, 1, 0,..., 0),...,en = (0,..., 0, 1) span Fn. Hence Fn is ﬁnite-dimensional. Example 5.1.5. The vectors v = (1, 1, 0) and v = (1, 1, 0) span a subspace of R3. More 1 2 − 3 precisely, if we write the vectors in R as 3-tuples of the form (x, y, z), then span(v1, v2) is the xy-plane in R3.

m m 1 Example 5.1.6. Recall that if p(z)= amz +am 1z − + +a1z+a0 F[z] is a polynomial − ∈ with coeﬃcients in F such that a = 0, then we say that p(z) has degree m. By convention, m the degree of the zero polynomial p(z) = 0 is . We denote the degree of p(z) by deg(p(z)). −∞ Deﬁne

Fm[z] = set of all polynomials in F[z] of degree at most m.

Then F [z] F[z] is a subspace since F [z] contains the zero polynomial and is closed m ⊂ m under addition and scalar multiplication. In fact, Fm[z] is a ﬁnite-dimensional subspace of F[z] since 2 m Fm[z] = span(1,z,z ,...,z ).

At the same time, though, note that F[z] itself is inﬁnite-dimensional. To see this, assume the contrary, namely that

F[z] = span(p1(z),...,pk(z))

for a ﬁnite set of k polynomials p1(z),...,pk(z). Let m = max(deg p1(z),..., deg pk(z)). Then zm+1 F[z], but zm+1 / span(p (z),...,p (z)). ∈ ∈ 1 k 50 CHAPTER 5. SPAN AND BASES 5.2 Linear independence

We are now going to deﬁne the notion of linear independence of a list of vectors. This concept will be extremely important in the sections that follow, and especially when we introduce bases and the dimension of a vector space.

Deﬁnition 5.2.1. A list of vectors (v1,...,vm) is called linearly independent if the only solution for a ,...,a F to the equation 1 m ∈

a v + + a v =0 1 1 m m is a = = a = 0. In other words, the zero vector can only trivially be written as a linear 1 m combination of (v1,...,vm).

Deﬁnition 5.2.2. A list of vectors (v1,...,vm) is called linearly dependent if it is not linearly independent. That is, (v ,...,v ) is linear dependent if there exist a ,...,a F, 1 m 1 m ∈ not all zero, such that a v + + a v =0. 1 1 m m

Example 5.2.3. The vectors (e1,...,em) of Example 5.1.4 are linearly independent. To see this, note that the only solution to the vector equation

0= a e + + a e =(a ,...,a ) 1 1 m m 1 m is a = = a = 0. Alternatively, we can reinterpret this vector equation as the homoge- 1 m neous linear system

a1 = 0 a2 = 0  . . .  , .. . .   am = 0   which clearly has only the trivial solution. (See Section 12.3.2 for the appropriate deﬁnitions.)

Example 5.2.4. The vectors v = (1, 1, 1), v = (0, 1, 1), and v = (1, 2, 0) are linearly 1 2 − 3 5.2. LINEAR INDEPENDENCE 51

dependent. To see this, we need to consider the vector equation

a v + a v + a v = a (1, 1, 1)+ a (0, 1, 1) + a (1, 2, 0) 1 1 2 2 3 3 1 2 − 3 =(a + a , a + a +2a , a a )=(0, 0, 0). 1 3 1 2 3 1 − 2

Solving for a , a , and a , we see, for example, that (a , a , a ) = (1, 1, 1) is a nonzero 1 2 3 1 2 3 − solution. Alternatively, we can reinterpret this vector equation as the homogeneous linear system

a1 + a3 = 0 a1 + a2 + 2a3 = 0  . a a = 0  1 − 2 Using the techniques of Section 12.3, we see that solving this linear system is equivalent to solving the following linear system:

a + a = 0 1 3 . a2 + a3 = 0

Note that this new linear system clearly has inﬁnitely many solutions. In particular, the set of all solutions is given by

N = (a , a , a ) Fn a = a = a = span((1, 1, 1)). { 1 2 3 ∈ | 1 2 − 3} −

m Example 5.2.5. The vectors (1,z,...,z ) in the vector space Fm[z] are linearly independent. Requiring that a 1+ a z + + a zm =0 0 1 m means that the polynomial on the left should be zero for all z F. This is only possible for ∈ a = a = = a = 0. 0 1 m An important consequence of the notion of linear independence is the fact that any vector in the span of a given list of linearly independent vectors can be uniquely written as a linear combination.

Lemma 5.2.6. The list of vectors (v1,...,vm) is linearly independent if and only if every v span(v ,...,v ) can be uniquely written as a linear combination of (v ,...,v ). ∈ 1 m 1 m 52 CHAPTER 5. SPAN AND BASES

Proof.

(“= ”) Assume that (v ,...,v ) is a linearly independent list of vectors. Suppose there are ⇒ 1 m two ways of writing v span(v ,...,v ) as a linear combination of the v : ∈ 1 m i v = a v + a v , 1 1 m m v = a′ v + a′ v . 1 1 m m

Subtracting the two equations yields 0 = (a a′ )v + +(a a′ )v . Since (v ,...,v ) 1 − 1 1 m − m m 1 m is linearly independent, the only solution to this equation is a a′ = 0,...,a a′ = 0, 1 − 1 m − m or equivalently a1 = a1′ ,...,am = am′ . (“ =”) Now assume that, for every v span(v ,...,v ), there are unique a ,...,a F ⇐ ∈ 1 m 1 m ∈ such that v = a v + + a v . 1 1 m m This implies, in particular, that the only way the zero vector v = 0 can be written as a linear combination of v ,...,v is with a = = a = 0. This shows that (v ,...,v ) are 1 m 1 m 1 m linearly independent.

It is clear that if (v1,...,vm) is a list of linearly independent vectors, then the list

(v1,...,vm 1) is also linearly independent. − For the next lemma, we introduce the following notation: If we want to drop a vector vj

from a given list (v1,...,vm) of vectors, then we indicate the dropped vector by a hat. I.e., we write

(v1,..., vˆj,...,vm)=(v1,...,vj 1, vj+1,...,vm). −

Lemma 5.2.7 (Linear Dependence Lemma). If (v1,...,vm) is linearly dependent and v =0, then there exists an index j 2,...,m such that the following two conditions hold. 1 ∈{ }

1. vj span(v1,...,vj 1). ∈ −

2. If vj is removed from (v1,...,vm), then span(v1,..., vˆj,...,vm) = span(v1,...,vm).

Proof. Since (v ,...,v ) is linearly dependent there exist a ,...,a F not all zero such 1 m 1 m ∈ that a v + + a v = 0. Since by assumption v = 0, not all of a ,...,a can be zero. 1 1 m m 1 2 m 5.2. LINEAR INDEPENDENCE 53

Let j 2,...,m be largest such that a = 0. Then we have ∈{ } j

a1 aj 1 vj = v1 − vj 1, (5.1) −aj −− aj − which implies Part 1. Let v span(v ,...,v ). This means, by deﬁnition, that there exist scalars b ,...,b F ∈ 1 m 1 m ∈ such that v = b v + + b v . 1 1 m m

The vector vj that we determined in Part 1 can be replaced by Equation (5.1) so that v

is written as a linear combination of (v1,..., vˆj,...,vm). Hence, span(v1,..., vˆj,...,vm) =

span(v1,...,vm).

2 Example 5.2.8. The list (v1, v2, v3) = ((1, 1), (1, 2), (1, 0)) of vectors spans R . To see this, take any vector v = (x, y) R2. We want to show that v can be written as a linear ∈ combination of (1, 1), (1, 2), (1, 0), i.e., that there exist scalars a , a , a F such that 1 2 3 ∈

v = a1(1, 1) + a2(1, 2)+ a3(1, 0), or equivalently that

(x, y)=(a1 + a2 + a3, a1 +2a2).

Clearly a = y, a = 0, and a = x y form a solution for any choice of x, y R, and so 1 2 3 − ∈ R2 = span((1, 1), (1, 2), (1, 0)). However, note that

2(1, 1) (1, 2) (1, 0) = (0, 0), (5.2) − − which shows that the list ((1, 1), (1, 2), (1, 0)) is linearly dependent. The Linear Dependence Lemma 5.2.7 thus states that one of the vectors can be dropped from ((1, 1), (1, 2), (1, 0)) and that the resulting list of vectors will still span R2. Indeed, by Equation (5.2),

v = (1, 0) = 2(1, 1) (1, 2)=2v v , 3 − 1 − 2 and so span((1, 1), (1, 2), (1, 0)) = span((1, 1), (1, 2)).

The next result shows that linearly independent lists of vectors that span a ﬁnite- 54 CHAPTER 5. SPAN AND BASES

dimensional vector space are the smallest possible spanning sets.

Theorem 5.2.9. Let V be a ﬁnite-dimensional vector space. Suppose that (v1,...,vm) is a linearly independent list of vectors that spans V , and let (w1,...,wn) be any list that spans V . Then m n. ≤ Proof. The proof uses the following iterative procedure: start with an arbitrary list of vectors = (w ,...,w ) such that V = span( ). At the kth step of the procedure, we construct S0 1 n S0 a new list by replacing some vector w by the vector v such that still spans V . Sk jk k Sk Repeating this for all v then produces a new list of length n that contains each of k Sm v ,...,v , which then proves that m n. Let us now discuss each step in this procedure in 1 m ≤ detail.

Step 1. Since (w1,...,wn) spans V , adding a new vector to the list makes the new list

linearly dependent. Hence (v1,w1,...,wn) is linearly dependent. By Lemma 5.2.7, there

exists an index j1 such that

wj1 span(v1,w1,...,wj1 1). ∈ −

Hence = (v ,w ,..., wˆ ,...,w ) spans V . In this step, we added the vector v and S1 1 1 j1 n 1 removed the vector w from . j1 S0 Step k. Suppose that we already added v1,...,vk 1 to our spanning list and removed the − vectors wj1 ,...,wjk−1 . It is impossible that we have reached the situation where all of the vectors w ,...,w have been removed from the spanning list at this step if k m because 1 n ≤ then we would have V = span(v1,...,vk 1) which would allow vk to be expressed as a linear − combination of v1,...,vk 1 (in contradiction with the assumption of linear independence of − v1,...,vn).

Now, call the list reached at this step k 1, and note that V = span( k 1). Add the S − S − vector vk to k 1. By the same arguments as before, adjoining the extra vector vk to the S − spanning list k 1 yields a list of linearly dependent vectors. Hence, by Lemma 5.2.7, there S − exists an index jk such that k 1 with vk added and wjk removed still spans V . The fact S − that (v1,...,vk) is linearly independent ensures that the vector removed is indeed among the w . Call the new list , and note that V = span( ). j Sk Sk The ﬁnal list is but with each v ,...,v added and each w ,...,w m removed. Sm S0 1 m j1 j Moreover, note that has length n and still spans V . It follows that m n. Sm ≤ 5.3. BASES 55 5.3 Bases

A basis of a ﬁnite-dimensional vector space is a spanning list that is also linearly independent. We will see that all bases for ﬁnite-dimensional vector spaces have the same length. This length will then be called the dimension of our vector space.

Deﬁnition 5.3.1. A list of vectors (v1,...,vm) is a basis for the ﬁnite-dimensional vector space V if (v1,...,vm) is linearly independent and V = span(v1,...,vm).

If (v ,...,v ) forms a basis of V , then, by Lemma 5.2.6, every vector v V can be 1 m ∈ uniquely written as a linear combination of (v1,...,vm).

n Example 5.3.2. (e1,...,en) is a basis of F . There are, of course, other bases. For example, ((1, 2), (1, 1)) is a basis of F2. Note that the list ((1, 1)) is also linearly independent, but it does not span F2 and hence is not a basis.

2 m Example 5.3.3. (1,z,z ,...,z ) is a basis of Fm[z].

Theorem 5.3.4 (Basis Reduction Theorem). If V = span(v1,...,vm), then either

(v1,...,vm) is a basis of V or some vi can be removed to obtain a basis of V .

Proof. Suppose V = span(v ,...,v ). We start with the list =(v ,...,v ) and iteratively 1 m S 1 m run through all vectors vk for k =1, 2,...,m to determine whether to keep or remove them from : S Step 1. If v = 0, then remove v from . Otherwise, leave unchanged. 1 1 S S

Step k. If vk span(v1,...,vk 1), then remove vk from . Otherwise, leave unchanged. ∈ − S S The ﬁnal list still spans V since, at each step, a vector was only discarded if it was already S in the span of the previous vectors. The process also ensures that no vector is in the span of the previous vectors. Hence, by the Linear Dependence Lemma 5.2.7, the ﬁnal list is S linearly independent. It follows that is a basis of V . S Example 5.3.5. To see how Basis Reduction Theorem 5.3.4 works, consider the list of vectors = ((1, 1, 0), (2, 2, 0), ( 1, 0, 1), (0, 1, 1), (0, 1, 0)). S − − − − 56 CHAPTER 5. SPAN AND BASES

This list does not form a basis for R3 as it is not linearly independent. However, it is clear that R3 = span( ) since any arbitrary vector v = (x, y, z) R3 can be written as the S ∈ following linear combination over : S

v =(x + z)(1, 1, 0) + 0(2, 2, 0)+(z)( 1, 0, 1) + 0(0, 1, 1)+(x + y + z)(0, 1, 0). − − − −

In fact, since the coeﬃcients of (2, 2, 0) and (0, 1, 1) in this linear combination are both − − zero, it suggests that they add nothing to the span of the subset

= ((1, 1, 0), ( 1, 0, 1), (0, 1, 0)) B − −

of . Moreover, one can show that is a basis for R3, and it is exactly the basis produced S B by applying the process from the proof of Theorem 5.3.4 (as you should be able to verify).

Corollary 5.3.6. Every ﬁnite-dimensional vector space has a basis.

Proof. By deﬁnition, a ﬁnite-dimensional vector space has a spanning list. By the Basis Reduction Theorem 5.3.4, any spanning list can be reduced to a basis.

Theorem 5.3.7 (Basis Extension Theorem). Every linearly independent list of vectors in a ﬁnite-dimensional vector space V can be extended to a basis of V .

Proof. Suppose V is ﬁnite-dimensional and that (v1,...,vm) is linearly independent. Since

V is ﬁnite-dimensional, there exists a list (w1,...,wn) of vectors that spans V . We wish to adjoin some of the wk to (v1,...,vm) in order to create a basis of V . Step 1. If w span(v ,...,v ), then let =(v ,...,v ). Otherwise, =(v ,...,v ,w ). 1 ∈ 1 m S 1 m S 1 m 1 Step k. If w span( ), then leave unchanged. Otherwise, adjoin w to . k ∈ S S k S After each step, the list is still linearly independent since we only adjoined w if w was S k k not in the span of the previous vectors. After n steps, w span( ) for all k = 1, 2,...,n. k ∈ S Since (w ,...,w ) was a spanning list, spans V so that is indeed a basis of V . 1 n S S

4 Example 5.3.8. Take the two vectors v1 = (1, 1, 0, 0) and v2 = (1, 0, 1, 0) in R . One may easily check that these two vectors are linearly independent, but they do not form a basis 4 4 of R . We know that (e1, e2, e3, e4) spans R . (In fact, it is even a basis.) Following the 5.4. DIMENSION 57

algorithm outlined in the proof of the Basis Extension Theorem, we see that e span(v , v ). 1 ∈ 1 2 Hence, we adjoin e to obtain =(v , v , e ). Note that now 1 S 1 2 1

e = (0, 1, 0, 0)=1v +0v +( 1)e 2 1 2 − 1

so that e span(v , v , e ), and so we leave unchanged. Similarly, 2 ∈ 1 2 1 S

e = (0, 0, 1, 0)=0v +1v +( 1)e , 3 1 2 − 1 and hence e span(v , v , e ), which means that we again leave unchanged. Finally, 3 ∈ 1 2 1 S e span(v , v , e ), and so we adjoin it to obtain a basis (v , v , e , e ) of R4. 4 ∈ 1 2 1 1 2 1 4

5.4 Dimension

We now come to the important definition of the dimension of a finite-dimensional vector space. Intuitively, we know that R2 has dimension 2, that R3 has dimension 3, and, more generally, that Rn has dimension n. This is precisely the length of every basis for each of these vector spaces, which prompts the following definition.

Deﬁnition 5.4.1. We call the length of any basis for V (which is well-deﬁned by Theo- rem 5.4.2 below) the dimension of V , and we denote this by dim(V ).

Note that Deﬁnition 5.4.1 only makes sense if, in fact, every basis for a given ﬁnite- dimensional vector space has the same length. This is true by the following theorem.

Theorem 5.4.2. Let V be a ﬁnite-dimensional vector space. Then any two bases of V have the same length.

Proof. Let (v1,...,vm) and (w1,...,wn) be two bases of V . Both span V . By Theorem 5.2.9, we have m n since (v ,...,v ) is linearly independent. By the same theorem, we also ≤ 1 m have n m since (w ,...,w ) is linearly independent. Hence n = m, as asserted. ≤ 1 n n n Example 5.4.3. dim(F ) = n and dim(Fm[z]) = m + 1. Note that dim(C ) = n as a complex vector space, whereas dim(Cn)=2n as an real vector space. This comes from the fact that we can view C itself as an real vector space of dimension 2 with basis (1, i). 58 CHAPTER 5. SPAN AND BASES

Theorem 5.4.4. Let V be a ﬁnite-dimensional vector space with dim(V )= n. Then:

1. If U V is a subspace of V , then dim(U) dim(V ). ⊂ ≤

2. If V = span(v1,...,vn), then (v1,...,vn) is a basis of V .

3. If (v1,...,vn) is linearly independent in V , then (v1,...,vn) is a basis of V .

Point 1 implies, in particular, that every subspace of a ﬁnite-dimensional vector space is ﬁnite-dimensional. Points 2 and 3 show that if the dimension of a vector space is known to be n, then, to check that a list of n vectors is a basis, it is enough to check whether it spans V (resp. is linearly independent).

Proof. To prove Point 1, first note that U is necessarily finite-dimensional (otherwise we could find a list of linearly independent vectors longer than dim(V )). Therefore, by Corollary 5.3.6,

U has a basis (u1,...,um) (say). This list is linearly independent in both U and V . By the

Basis Extension Theorem 5.3.7, we can extend (u1,...,um) to a basis for V , which is of length n since dim(V )= n. This implies that m n, as desired. ≤ To prove Point 2, suppose that (v1,...,vn) spans V . Then, by the Basis Reduction Theorem 5.3.4, this list can be reduced to a basis. However, every basis of V has length n;

hence, no vector needs to be removed from (v1,...,vn). It follows that (v1,...,vn) is already a basis of V .

Point 3 is proven in a very similar fashion. Suppose (v1,...,vn) is linearly independent. By the Basis Extension Theorem 5.3.7, this list can be extended to a basis. However, every

basis has length n; hence, no vector needs to be added to (v1,...,vn). It follows that

(v1,...,vn) is already a basis of V .

We conclude this chapter with some additional interesting results on bases and dimensions. The ﬁrst one combines the concepts of basis and direct sum.

Theorem 5.4.5. Let U V be a subspace of a ﬁnite-dimensional vector space V . Then ⊂ there exists a subspace W V such that V = U W . ⊂ ⊕ Proof. Let (u ,...,u ) be a basis of U. By Theorem 5.4.4(1), we know that m dim(V ). 1 m ≤ Hence, by the Basis Extension Theorem 5.3.7, (u1,...,um) can be extended to a basis

(u1,...,um,w1,...,wn) of V . Let W = span(w1,...,wn). 5.4. DIMENSION 59

To show that V = U W , we need to show that V = U + W and U W = 0 . Since ⊕ ∩ { } V = span(u1,...,um,w1,...,wn) where (u1,...,um) spans U and (w1,...,wn) spans W , it is clear that V = U + W . To show that U W = 0 , let v U W . Then there exist scalars a ,...,a , b ,...,b ∩ { } ∈ ∩ 1 m 1 n ∈ F such that v = a u + + a u = b w + + b w , 1 1 m m 1 1 n n or equivalently that

a u + + a u b w b w =0. 1 1 m m − 1 1 −− n n

Since (u1,...,um,w1,...,wn) forms a basis of V and hence is linearly independent, the only solution to this equation is a = = a = b = = b = 0. Hence v = 0, proving that 1 m 1 n indeed U W = 0 . ∩ { } Theorem 5.4.6. If U, W V are subspaces of a ﬁnite-dimensional vector space, then ⊂

dim(U + W ) = dim(U) + dim(W ) dim(U W ). − ∩

Proof. Let (v ,...,v ) be a basis of U W . By the Basis Extension Theorem 5.3.7, there 1 n ∩ exist (u1,...,uk) and (w1,...,wℓ) such that (v1,...,vn,u1,...,uk) is a basis of U and

(v1,...,vn,w1,...,wℓ) is a basis of W . It suﬃces to show that

=(v ,...,v ,u ,...,u ,w ,...,w ) B 1 n 1 k 1 ℓ is a basis of U + W since then

dim(U + W )= n + k + ℓ =(n + k)+(n + ℓ) n = dim(U) + dim(W ) dim(U W ). − − ∩

Clearly span(v1,...,vn,u1,...,uk,w1,...,wℓ) contains U and W , and hence U + W . To show that is a basis, it remains to show that is linearly independent. Suppose B B

a v + + a v + b u + + b u + c w + + c w =0, (5.3) 1 1 n n 1 1 k k 1 1 ℓ ℓ

and let u = a v + + a v + b u + + b u U. Then, by Equation (5.3), we also 1 1 n n 1 1 k k ∈ have that u = c w c w W , which implies that u U W . Hence, there − 1 1 −− ℓ ℓ ∈ ∈ ∩ 60 CHAPTER 5. SPAN AND BASES

exist scalars a′ ,...,a′ F such that u = a′ v + + a′ v . Since there is a unique linear 1 n ∈ 1 1 n n combination of the linearly independent vectors (v1,...,vn,u1,...,uk) that describes u, we must have b = = b = 0 and a = a′ ,...,a = a′ . Since (v ,...,v ,w ,...,w ) is also 1 k 1 1 n n 1 n 1 ℓ linearly independent, it further follows that a = = a = c = = c = 0. Hence, 1 n 1 ℓ Equation (5.3) only has the trivial solution, which implies that is a basis. B 5.4. DIMENSION 61 Exercises for Chapter 5

Calculational Exercises

1. Show that the vectors v = (1, 1, 1), v = (1, 2, 3), and v = (2, 1, 1) are linearly 1 2 3 − independent in R3. Write v = (1, 2, 5) as a linear combination of v , v , and v . − 1 2 3 3 2. Consider the complex vector space V = C and the list (v1, v2, v3) of vectors in V , where v =(i, 0, 0), v =(i, 1, 0), v =(i, i, 1) . 1 2 3 −

(a) Prove that span(v1, v2, v3)= V .

(b) Prove or disprove: (v1, v2, v3) is a basis for V .

3. Determine the dimension of each of the following subspaces of F4.

(a) (x , x , x , x ) F4 x =0 . { 1 2 3 4 ∈ | 4 } (b) (x , x , x , x ) F4 x = x + x . { 1 2 3 4 ∈ | 4 1 2} (c) (x , x , x , x ) F4 x = x + x , x = x x . { 1 2 3 4 ∈ | 4 1 2 3 1 − 2} (d) (x , x , x , x ) F4 x = x + x , x = x x , x + x =2x . { 1 2 3 4 ∈ | 4 1 2 3 1 − 2 3 4 1} (e) (x , x , x , x ) F4 x = x = x = x . { 1 2 3 4 ∈ | 1 2 3 4} 4. Determine the value of λ R for which each list of vectors is linear dependent. ∈ (a) ((λ, 1, 1), ( 1, λ, 1), ( 1, 1,λ)) as a subset of R3. − − − − − − (b) sin2(x), cos(2x),λ as a subset of (R). C 5. Consider the real vector space V = R4. For each of the following ﬁve statements, provide either a proof or a counterexample.

(a) dim V = 4. (b) span((1, 1, 0, 0), (0, 1, 1, 0), (0, 0, 1, 1)) = V . (c) The list ((1, 1, 0, 0), (0, 1, 1, 0), (0, 0, 1, 1), ( 1, 0, 0, 1)) is linearly independent. − − − − (d) Every list of four vectors v ,...,v V , such that span(v ,...,v )= V , is linearly 1 4 ∈ 1 4 independent. 62 CHAPTER 5. SPAN AND BASES

(e) Let v1 and v2 be two linearly independent vectors in V . Then, there exist vectors u,w V , such that (v , v ,u,w) is a basis for V . ∈ 1 2

Proof-Writing Exercises

1. Let V be a vector space over F and U := span(u ,u ,...,u ) where each u V . Now 1 2 n i ∈ suppose v U. Prove ∈ U = span(v,u1,u2,...,un) .

2. Let V be a vector space over F, and suppose that the list (v1, v2,...,vn) of vectors spans V , where each v V . Prove that the list i ∈

(v1 v2, v2 v3, v3 v4,...,vn 2 vn 1, vn 1 vn, vn) − − − − − − − −

also spans V .

3. Let V be a vector space over F, and suppose that (v1, v2,...,vn) is a linearly independent list of vectors in V . Given any w V such that ∈

(v1 + w, v2 + w,...,vn + w)

is a linearly dependent list of vectors in V , prove that w span(v , v ,...,v ). ∈ 1 2 n 4. Let V be a ﬁnite-dimensional vector space over F with dim(V )= n for some n Z . ∈ + Prove that there are n one-dimensional subspaces U1, U2,...,Un of V such that

V = U U U . 1 ⊕ 2 ⊕⊕ n

5. Let V be a ﬁnite-dimensional vector space over F, and suppose that U is a subspace of V for which dim(U) = dim(V ). Prove that U = V .

6. Let Fm[z] denote the vector space of all polynomials with degree less than or equal to m Z and having coeﬃcient over F, and suppose that p ,p ,...,p F [z] satisfy ∈ + 0 1 m ∈ m pj(2) = 0. Prove that (p0,p1,...,pm) is a linearly dependent list of vectors in Fm[z].

7. Let U and V be ﬁve-dimensional subspaces of R9. Prove that U V = 0 . ∩ { } 5.4. DIMENSION 63

8. Let V be a ﬁnite-dimensional vector space over F, and suppose that U1, U2,...,Um are any m subspaces of V . Prove that

dim(U + U + + U ) dim(U ) + dim(U )+ + dim(U ). 1 2 m ≤ 1 2 m Chapter 6

Linear Maps

As discussed in Chapter 1, one of the main goals of Linear Algebra is the characterization

of solutions to a system of m linear equations in n unknowns x1,...,xn,

a11x1 + + a1nxn = b1 ......  ,  a x + + a x = b  m1 1 mn n m  where each of the coeﬃcients aij and bi is in F. Linear maps and their properties give us insight into the characteristics of solutions to linear systems.

6.1 Deﬁnition and elementary properties

Throughout this chapter, V and W denote vector spaces over F. We are going to study functions from V into W that have the special properties given in the following deﬁnition. Deﬁnition 6.1.1. A function T : V W is called linear if →

T (u + v)= T (u)+ T (v), for all u, v V , (6.1) ∈ T (av)= aT (v), for all a F and v V . (6.2) ∈ ∈

The set of all linear maps from V to W is denoted by (V, W ). We also write T v for T (v). L Moreover, if V = W , then we write (V,V ) = (V ) and call T (V ) a linear L L ∈ L operator on V .

64 6.1. DEFINITION AND ELEMENTARY PROPERTIES 65

Example 6.1.2.

1. The zero map 0 : V W mapping every element v V to 0 W is linear. → ∈ ∈ 2. The identity map I : V V deﬁned as Iv = v is linear. →

3. Let T : F[z] F[z] be the diﬀerentiation map deﬁned as Tp(z) = p′(z). Then, for → two polynomials p(z), q(z) F[z], we have ∈

T (p(z)+ q(z))=(p(z)+ q(z))′ = p′(z)+ q′(z)= T (p(z)) + T (q(z)).

Similarly, for a polynomial p(z) F[z] and a scalar a F, we have ∈ ∈

T (ap(z))=(ap(z))′ = ap′(z)= aT (p(z)).

Hence T is linear.

4. Let T : R2 R2 be the map given by T (x, y)= (x 2y, 3x + y). Then, for → − 2 (x, y), (x′,y′) R , we have ∈

T ((x, y)+(x′,y′)) = T (x + x′,y + y′)=(x + x′ 2(y + y′), 3(x + x′)+ y + y′) − =(x 2y, 3x + y)+(x′ 2y′, 3x′ + y′)= T (x, y)+ T (x′,y′). − − Similarly, for (x, y) R2 and a F, we have ∈ ∈

T (a(x, y)) = T (ax, ay)=(ax 2ay, 3ax + ay)= a(x 2y, 3x + y)= aT (x, y). − −

Hence T is linear. More generally, any map T : Fn Fm deﬁned by →

T (x ,...,x )=(a x + + a x ,...,a x + + a x ) 1 n 11 1 1n n m1 1 mn n

with a F is linear. ij ∈ 5. Not all functions are linear! For example, the exponential function f(x) = ex is not linear since e2x = 2ex in general. Also, the function f : F F given by f(x)= x 1 → − is not linear since f(x + y)=(x + y) 1 =(x 1)+(y 1) = f(x)+ f(y). − − − 66 CHAPTER 6. LINEAR MAPS

An important result is that linear maps are already completely determined if their values on basis vectors are speciﬁed.

Theorem 6.1.3. Let (v1,...,vn) be a basis of V and (w1,...,wn) be an arbitrary list of vectors in W . Then there exists a unique linear map

T : V W such that T (v )= w , i =1, 2,...,n. → i i ∀

Proof. First we verify that there is at most one linear map T with T (vi) = wi. Take any v V . Since (v ,...,v ) is a basis of V there are unique scalars a ,...,a F such that ∈ 1 n 1 n ∈ v = a v + + a v . By linearity, we have 1 1 n n

T (v)= T (a v + + a v )= a T (v )+ + a T (v )= a w + + a w , (6.3) 1 1 n n 1 1 n n 1 1 n n

and hence T (v) is completely determined. To show existence, use Equation (6.3) to deﬁne

T . It remains to show that this T is linear and that T (vi) = wi. These two conditions are not hard to show and are left to the reader.

The set of linear maps (V, W ) is itself a vector space. For S, T (V, W ) addition is L ∈ L deﬁned as (S + T )v = Sv + T v, for all v V . ∈ For a F and T (V, W ), scalar multiplication is deﬁned as ∈ ∈L

(aT )(v)= a(T v), for all v V . ∈

You should verify that S + T and aT are indeed linear maps and that all properties of a vector space are satisfied. In addition to the operations of vector addition and scalar multiplication, we can also define the composition of linear maps. Let V, U, W be vector spaces over F. Then, for S (U, V ) and T (V, W ), we define T S (U, W ) by ∈L ∈L ◦ ∈L

(T S)(u)= T (S(u)), for all u U. ◦ ∈

The map T S is often also called the product of T and S denoted by TS. It has the ◦ following properties: 6.2. NULL SPACES 67

1. Associativity: (T T )T = T (T T ), for all T (V ,V ), T (V ,V ) and T 1 2 3 1 2 3 1 ∈ L 1 0 2 ∈ L 2 1 3 ∈ (V ,V ). L 3 2 2. Identity: TI = IT = T , where T (V, W ) and where I in TI is the identity map ∈ L in (V,V ) whereas the I in IT is the identity map in (W, W ). L L

3. Distributive property: (T1 + T2)S = T1S + T2S and T (S1 + S2)= TS1 + TS2, where S,S ,S (U, V ) and T, T , T (V, W ). 1 2 ∈L 1 2 ∈L Note that the product of linear maps is not always commutative. For example, if we take

T (F[z], F[z]) to be the diﬀerentiation map Tp(z)= p′(z) and S (F[z], F[z]) to be the ∈L ∈L map Sp(z)= z2p(z), then

2 2 (ST )p(z)= z p′(z) but (TS)p(z)= z p′(z)+2zp(z).

6.2 Null spaces

Deﬁnition 6.2.1. Let T : V W be a linear map. Then the null space (a.k.a. kernel) → of T is the set of all vectors in V that are mapped to zero by T . I.e.,

null (T )= v V T v =0 . { ∈ | }

Example 6.2.2. Let T (F[z], F[z]) be the diﬀerentiation map Tp(z)= p′(z). Then ∈L

null (T )= p F[z] p(z) is constant . { ∈ | }

Example 6.2.3. Consider the linear map T (x, y)=(x 2y, 3x + y) of Example 6.1.2. To − determine the null space, we need to solve T (x, y)=(0, 0), which is equivalent to the system of linear equations x 2y =0 − . 3x + y =0 We see that the only solution is (x, y)=(0, 0) so that null (T )= (0, 0) . { } Proposition 6.2.4. Let T : V W be a linear map. Then null (T ) is a subspace of V . → 68 CHAPTER 6. LINEAR MAPS

Proof. We need to show that 0 null (T ) and that null (T ) is closed under addition and ∈ scalar multiplication. By linearity, we have

T (0) = T (0+0) = T (0) + T (0)

so that T (0) = 0. Hence 0 null (T ). For closure under addition, let u, v null (T ). Then ∈ ∈

T (u + v)= T (u)+ T (v)=0+0=0,

and hence u + v null (T ). Similarly, for closure under scalar multiplication, let u null (T ) ∈ ∈ and a F. Then ∈ T (au)= aT (u)= a0=0,

and so au null (T ). ∈ Definition 6.2.5. The linear map T : V W is called injective if, for all u, v V , the → ∈ condition Tu = T v implies that u = v. In other words, different vectors in V are mapped to different vectors in W .

Proposition 6.2.6. Let T : V W be a linear map. Then T is injective if and only if → null (T )= 0 . { } Proof. (“= ”) Suppose that T is injective. Since null (T ) is a subspace of V , we know that 0 ⇒ ∈ null (T ). Assume that there is another vector v V that is in the kernel. Then T (v)=0= ∈ T (0). Since T is injective, this implies that v = 0, proving that null (T )= 0 . { } (“ =”) Assume that null (T ) = 0 , and let u, v V be such that Tu = T v. Then ⇐ { } ∈ 0= Tu T v = T (u v) so that u v null (T ). Hence u v = 0, or, equivalently, u = v. − − − ∈ − This shows that T is indeed injective.

Example 6.2.7.

1. The diﬀerentiation map p(z) p′(z) is not injective since p′(z) = q′(z) implies that → p(z)= q(z)+ c, where c F is a constant. ∈ 2. The identity map I : V V is injective. → 6.3. RANGES 69

3. The linear map T : F[z] F[z] given by T (p(z)) = z2p(z) is injective since it is easy → to verify that null (T )= 0 . { } 4. The linear map T (x, y)=(x 2y, 3x + y) is injective since null (T ) = (0, 0) , as we − { } calculated in Example 6.2.3.

6.3 Ranges

Deﬁnition 6.3.1. Let T : V W be a linear map. The range of T , denoted by range (T ), → is the subset of vectors in W that are in the image of T . I.e.,

range(T )= T v v V = w W there exists v V such that T v = w . { | ∈ } { ∈ | ∈ }

Example 6.3.2. The range of the diﬀerentiation map T : F[z] F[z] is range (T ) = F[z] → since, for every polynomial q F[z], there is a p F[z] such that p′ = q. ∈ ∈ Example 6.3.3. The range of the linear map T (x, y)=(x 2y, 3x + y) is R2 since, for any − (z , z ) R2, we have T (x, y)=(z , z ) if (x, y)= 1 (z +2z , 3z + z ). 1 2 ∈ 1 2 7 1 2 − 1 2 Proposition 6.3.4. Let T : V W be a linear map. Then range(T ) is a subspace of W . → Proof. We need to show that 0 range (T ) and that range(T ) is closed under addition and ∈ scalar multiplication. We already showed that T 0=0 so that 0 range (T ). ∈ For closure under addition, let w ,w range (T ). Then there exist v , v V such that 1 2 ∈ 1 2 ∈ T v1 = w1 and T v2 = w2. Hence

T (v1 + v2)= T v1 + T v2 = w1 + w2, and so w + w range(T ). 1 2 ∈ For closure under scalar multiplication, let w range (T ) and a F. Then there exists ∈ ∈ a v V such that T v = w. Thus ∈

T (av)= aT v = aw, and so aw range(T ). ∈ 70 CHAPTER 6. LINEAR MAPS

Deﬁnition 6.3.5. A linear map T : V W is called surjective if range (T )= W . A linear → map T : V W is called bijective if T is both injective and surjective. → Example 6.3.6.

1. The diﬀerentiation map T : F[z] F[z] is surjective since range (T )= F[z]. However, → if we restrict ourselves to polynomials of degree at most m, then the diﬀerentiation map T : F [z] F [z] is not surjective since polynomials of degree m are not in the m → m range of T .

2. The identity map I : V V is surjective. → 3. The linear map T : F[z] F[z] given by T (p(z)) = z2p(z) is not surjective since, for → example, there are no linear polynomials in the range of T .

4. The linear map T (x, y)=(x 2y, 3x + y) is surjective since range (T ) = R2, as we − calculated in Example 6.3.3.

6.4 Homomorphisms

It should be mentioned that linear maps between vector spaces are also called vector space homomorphisms. Instead of the notation (V, W ), one often sees the convention L

Hom (V, W )= T : V W T is linear . F { → | }

A homomorphism T : V W is also often called → Monomorphism iff T is injective; • Epimorphism iff T is surjective; • Isomorphism iff T is bijective; • Endomorphism iff V = W ; • Automorphism iff V = W and T is bijective. • 6.5. THE DIMENSION FORMULA 71 6.5 The dimension formula

The next theorem is the key result of this chapter. It relates the dimension of the kernel and range of a linear map.

Theorem 6.5.1. Let V be a ﬁnite-dimensional vector space and T : V W be a linear → map. Then range (T ) is a ﬁnite-dimensional subspace of W and

dim(V ) = dim(null (T )) + dim(range(T )). (6.4)

Proof. Let V be a ﬁnite-dimensional vector space and T (V, W ). Since null (T ) is a sub- ∈L space of V , we know that null (T ) has a basis (u1,...,um). This implies that dim(null (T )) = m. By the Basis Extension Theorem, it follows that (u1,...,um) can be extended to a basis of V , say (u1,...,um, v1,...,vn), so that dim(V )= m + n.

The theorem will follow by showing that (T v1,...,Tvn) is a basis of range (T ) since this would imply that range (T ) is ﬁnite-dimensional and dim(range (T )) = n, proving Equa- tion (6.4). Since (u ,...,u , v ,...,v ) spans V , every v V can be written as a linear combination 1 m 1 n ∈ of these vectors; i.e.,

v = a u + + a u + b v + + b v , 1 1 m m 1 1 n n where a , b F. Applying T to v, we obtain i j ∈

T v = b T v + + b T v , 1 1 n n where the terms Tu disappeared since u null (T ). This shows that (T v ,...,Tv ) indeed i i ∈ 1 n spans range (T ).

To show that (T v1,...,Tvn) is a basis of range (T ), it remains to show that this list is linearly independent. Assume that c ,...,c F are such that 1 n ∈

c T v + + c T v =0. 1 1 n n 72 CHAPTER 6. LINEAR MAPS

By linearity of T , this implies that

T (c v + + c v )=0, 1 1 n n

and so c v + + c v null (T ). Since (u ,...,u ) is a basis of null (T ), there must exist 1 1 n n ∈ 1 m scalars d ,...,d F such that 1 m ∈

c v + + c v = d u + + d u . 1 1 n n 1 1 m m

However, by the linear independence of (u1,...,um, v1,...,vn), this implies that all coeﬃ- cients c = = c = d = = d = 0. Thus, (T v ,...,Tv ) is linearly independent, and 1 n 1 m 1 n we are done.

Example 6.5.2. Recall that the linear map T : R2 R2 deﬁned by T (x, y)=(x 2y, 3x+y) → − has null (T )= 0 and range(T )= R2. It follows that { }

dim(R2) = 2 = 0 + 2 = dim(null (T )) + dim(range(T )).

Corollary 6.5.3. Let T (V, W ). ∈L 1. If dim(V ) > dim(W ), then T is not injective.

2. If dim(V ) < dim(W ), then T is not surjective.

Proof. By Theorem 6.5.1, we have that

dim(null (T )) = dim(V ) dim(range (T )) − dim(V ) dim(W ) > 0. ≥ − Since T is injective if and only if dim(null (T )) = 0, T cannot be injective. Similarly,

dim(range (T )) = dim(V ) dim(null (T )) − dim(V ) < dim(W ), ≤ and so range(T ) cannot be equal to W . Hence, T cannot be surjective. 6.6. THE MATRIX OF A LINEAR MAP 73 6.6 The matrix of a linear map

Now we will see that every linear map T (V, W ), with V and W finite-dimensional vector ∈L spaces, can be encoded by a matrix, and, vice versa, every matrix defines such a linear map. Let V and W be finite-dimensional vector spaces, and let T : V W be a linear map. → Suppose that (v1,...,vn) is a basis of V and that (w1,...,wm) is a basis for W . We have seen in Theorem 6.1.3 that T is uniquely determined by specifying the vectors T v ,...,Tv W . 1 n ∈ Since (w ,...,w ) is a basis of W , there exist unique scalars a F such that 1 m ij ∈

T v = a w + + a w for 1 j n. (6.5) j 1j 1 mj m ≤ ≤

We can arrange these scalars in an m n matrix as follows: ×

a11 ... a1n . . M(T )=  . .  .

am1 ... amn     Often, this is also written as A =(aij)1 i m,1 j n. As in Section 12.1.1, the set of all m n ≤ ≤ ≤ ≤ × m n matrices with entries in F is denoted by F × .

Remark 6.6.1. It is important to remember that M(T ) not only depends on the linear map

T but also on the choice of the basis (v1,...,vn) for V and the choice of basis (w1,...,wm) th th for W . The j column of M(T ) contains the coeﬃcients of the j basis vector vj when

expanded in terms of the basis (w1,...,wm), as in Equation (6.5).

Example 6.6.2. Let T : R2 R2 be the linear map given by T (x, y)=(ax+by, cx+dy) for → some a,b,c,d R. Then, with respect to the canonical basis of R2 given by ((1, 0), (0, 1)), ∈ the corresponding matrix is a b M(T )= c d since T (1, 0)=(a, c) gives the ﬁrst column and T (0, 1)=(b, d) gives the second column. More generally, suppose that V = Fn and W = Fm, and denote the standard basis for

V by (e1,...,en) and the standard basis for W by (f1,...,fm). Here, ei (resp. fi) is the n-tuple (resp. m-tuple) with a one in position i and zeroes everywhere else. Then the matrix 74 CHAPTER 6. LINEAR MAPS

M(T )=(aij) is given by

aij =(T ej)i,

th where (T ej)i denotes the i component of the vector T ej. Example 6.6.3. Let T : R2 R3 be the linear map deﬁned by T (x, y)=(y, x +2y, x + y). → Then, with respect to the standard basis, we have T (1, 0) = (0, 1, 1) and T (0, 1) = (1, 2, 1) so that 0 1 M(T )= 1 2 . 1 1     However, if alternatively we take the bases ((1, 2), (0, 1)) for R2 and ((1, 0, 0), (0, 1, 0), (0, 0, 1)) for R3, then T (1, 2) = (2, 5, 3) and T (0, 1) = (1, 2, 1) so that

2 1 M(T )= 5 2 . 3 1     Example 6.6.4. Let S : R2 R2 be the linear map S(x, y)=(y, x). With respect to the → basis ((1, 2), (0, 1)) for R2, we have

S(1, 2) = (2, 1) = 2(1, 2) 3(0, 1) and S(0, 1) = (1, 0) = 1(1, 2) 2(0, 1), − −

and so 2 1 M(S)= . 3 2 − − Given vector spaces V and W of dimensions n and m, respectively, and given a fixed choice of bases, note that there is a one-to-one correspondence between linear maps in (V, W ) and L m n matrices in F × . If we start with the linear map T , then the matrix M(T ) = A =(aij) is m n defined via Equation (6.5). Conversely, given the matrix A =(a ) F × , we can define a ij ∈ linear map T : V W by setting → m

T vj = aijwi. i=1 Recall that the set of linear maps (V, W ) is a vector space. Since we have a one-to-one L 6.6. THE MATRIX OF A LINEAR MAP 75

correspondence between linear maps and matrices, we can also make the set of matrices m n m n F × into a vector space. Given two matrices A =(aij) and B =(bij) in F × and given a scalar α F, we deﬁne the matrix addition and scalar multiplication componentwise: ∈

A + B =(aij + bij),

αA =(αaij).

Next, we show that the composition of linear maps imposes a product on matrices, also called matrix multiplication. Suppose U, V, W are vector spaces over F with bases (u ,...,u ), (v ,...,v ) and (w ,...,w ), respectively. Let S : U V and T : V W be 1 p 1 n 1 m → → linear maps. Then the product is a linear map T S : U W . ◦ → Each linear map has its corresponding matrix M(T ) = A, M(S) = B and M(TS) = C. The question is whether C is determined by A and B. We have, for each j 1, 2,...p , ∈ { } that

(T S)u = T (b v + + b v )= b T v + + b T v ◦ j 1j 1 nj n 1j 1 nj n n n m

= bkjT vk = bkj aikwi k=1 k=1 i=1 m n

= aikbkj wi. i=1 k=1

Hence, the matrix C =(cij) is given by

cij = aikbkj. (6.6) k=1 Equation (6.6) can be used to deﬁne the m p matrix C as the product of a m n matrix × × A and a n p matrix B, i.e., × C = AB. (6.7)

Our derivation implies that the correspondence between linear maps and matrices respects the product structure. Proposition 6.6.5. Let S : U V and T : V W be linear maps. Then → →

M(TS)= M(T )M(S). 76 CHAPTER 6. LINEAR MAPS

Example 6.6.6. With notation as in Examples 6.6.3 and 6.6.4, you should be able to verify that 2 1 1 0 2 1 M(TS)= M(T )M(S)= 5 2 = 4 1 . 3 2 3 1 − − 3 1         Given a vector v V , we can also associate a matrix M(v) to v as follows. Let (v ,...,v ) ∈ 1 n be a basis of V . Then there are unique scalars b1,...,bn such that

v = b v + b v . 1 1 n n

The matrix of v is then deﬁned to be the n 1 matrix ×

b1 . M(v)=  .  .

bn     Example 6.6.7. The matrix of a vector x = (x ,...,x ) Fn in the standard basis 1 n ∈ (e ,...,e ) is the column vector or n 1 matrix 1 n ×

x1 . M(x)=  . 

xn     since x =(x ,...,x )= x e + + x e . 1 n 1 1 n n

The next result shows how the notion of a matrix of a linear map T : V W and the → matrix of a vector v V ﬁt together. ∈

Proposition 6.6.8. Let T : V W be a linear map. Then, for every v V , → ∈

M(T v)= M(T )M(v).

Proof. Let (v1,...,vn) be a basis of V and (w1,...,wm) be a basis for W . Suppose that, with respect to these bases, the matrix of T is M(T )=(aij)1 i m,1 j n. This means that, ≤ ≤ ≤ ≤ 6.6. THE MATRIX OF A LINEAR MAP 77 for all j 1, 2,...,n , ∈{ } m

T vj = akjwk. k=1 The vector v V can be written uniquely as a linear combination of the basis vectors as ∈

v = b v + + b v . 1 1 n n

Hence,

T v = b T v + + b T v 1 1 n n m m = b a w + + b a w 1 k1 k n kn k k=1 k=1 m = (a b + + a b )w . k1 1 kn n k k=1 This shows that M(T v) is the m 1 matrix ×

a11b1 + + a1nbn . M(T v)=  .  .

am1b1 + + amnbn     It is not hard to check, using the formula for matrix multiplication, that M(T )M(v) gives the same result.

Example 6.6.9. Take the linear map S from Example 6.6.4 with basis ((1, 2), (0, 1)) of R2. To determine the action on the vector v = (1, 4) R2, note that v = (1, 4) = 1(1, 2)+2(0, 1). ∈ Hence, 2 1 1 4 M(Sv)= M(S)M(v)= = . 3 2 2 7 − − − This means that Sv = 4(1, 2) 7(0, 1) = (4, 1), − which is indeed true. 78 CHAPTER 6. LINEAR MAPS 6.7 Invertibility

Deﬁnition 6.7.1. A map T : V W is called invertible if there exists a map S : W V → → such that

TS = IW and ST = IV , where I : V V is the identity map on V and I : W W is the identity map on W . V → W → We say that S is an inverse of T .

Note that if the map T is invertible, then the inverse is unique. Suppose S and R are inverses of T . Then

ST = IV = RT,

TS = IW = TR.

Hence, S = S(T R)=(ST )R = R.

1 We denote the unique inverse of an invertible map T by T − .

Proposition 6.7.2. A map T : V W is invertible if and only if T is injective and −→ surjective.

Proof. (“= ”) Suppose T is invertible. ⇒ To show that T is injective, suppose that u, v V are such that Tu = T v. Apply the ∈ 1 1 1 inverse T − of T to obtain T − Tu = T − T v so that u = v. Hence T is injective. To show that T is surjective, we need to show that, for every w W , there is a v V ∈ ∈ 1 1 such that T v = w. Take v = T − w V . Then T (T − w)= w. Hence T is surjective. ∈ (“ =”) Suppose that T is injective and surjective. We need to show that T is invertible. ⇐ We deﬁne a map S (W, V ) as follows. Since T is surjective, we know that, for every ∈ L w W , there exists a v V such that T v = w. Moreover, since T is injective, this v is ∈ ∈ uniquely determined. Hence, deﬁne Sw = v. We claim that S is the inverse of T . Note that, for all w W , we have TSw = T v = w ∈ so that TS = I . Similarly, for all v V , we have ST v = Sw = v so that ST = I . W ∈ V Now we specialize to invertible linear maps. 6.7. INVERTIBILITY 79

1 Proposition 6.7.3. Let T (V, W ) be invertible. Then T − (W, V ). ∈L ∈L Proof. 1 1 Certainly T − : W V so we only need show that T − is a linear map. For all w ,w W , −→ 1 2 ∈ we have 1 1 1 1 T (T − w1 + T − w2)= T (T − w1)+ T (T − w2)= w1 + w2,

1 1 and so T − w1 + T − w2 is the unique vector v in V such that T v = w1 + w2 = w. Hence,

1 1 1 1 T − w1 + T − w2 = v = T − w = T − (w1 + w2).

1 1 The proof that T − (aw)= aT − w is similar. For w W and a F, we have ∈ ∈

1 1 T (aT − w)= aT (T − w)= aw

1 1 1 so that aT − w is the unique vector in V that maps to aw. Hence, T − (aw)= aT − w.

Example 6.7.4. The linear map T (x, y)=(x 2y, 3x+y) is both injective, since null (T )= − 0 , and surjective, since range (T )= R2. Hence, T is invertible by Proposition 6.7.2. { } Deﬁnition 6.7.5. Two vector spaces V and W are called isomorphic if there exists an invertible linear map T (V, W ). ∈L Theorem 6.7.6. Two ﬁnite-dimensional vector spaces V and W over F are isomorphic if and only if dim(V ) = dim(W ).

Proof. (“= ”) Suppose V and W are isomorphic. Then there exists an invertible linear map ⇒ T (V, W ). Since T is invertible, it is injective and surjective, and so null (T )= 0 and ∈L { } range (T )= W . Using the Dimension Formula, this implies that

dim(V ) = dim(null (T )) + dim(range(T )) = dim(W ).

(“ =”) Suppose that dim(V ) = dim(W ). Let (v ,...,v ) be a basis of V and (w ,...,w ) ⇐ 1 n 1 n be a basis of W . Deﬁne the linear map T : V W as →

T (a v + + a v )= a w + + a w . 1 1 n n 1 1 n n 80 CHAPTER 6. LINEAR MAPS

Since the scalars a ,...,a F are arbitrary and (w ,...,w ) spans W , this means that 1 n ∈ 1 n range (T ) = W and T is surjective. Also, since (w1,...,wn) is linearly independent, T is injective (since a w + + a w = 0 implies that all a = = a = 0 and hence only the 1 1 n n 1 n zero vector is mapped to zero). It follows that T is both injective and surjective; hence, by Proposition 6.7.2, T is invertible. Therefore, V and W are isomorphic.

We close this chapter by considering the case of linear maps having equal domain and codomain. As in Definition 6.1.1, a linear map T (V,V ) is called a linear operator on ∈L V . As the following remarkable theorem shows, the notions of injectivity, surjectivity, and invertibility of a linear operator T are the same — as long as V is finite-dimensional. A similar result does not hold for infinite-dimensional vector spaces. For example, the set of all polynomials F[z] is an infinite-dimensional vector space, and we saw that the differentiation map on F[z] is surjective but not injective.

Theorem 6.7.7. Let V be a ﬁnite-dimensional vector space and T : V V be a linear map. → Then the following are equivalent:

1. T is invertible.

2. T is injective.

3. T is surjective.

Proof. By Proposition 6.7.2, Part 1 implies Part 2. Next we show that Part 2 implies Part 3. If T is injective, then we know that null (T )= 0 . Hence, by the Dimension Formula, we have { }

dim(range (T )) = dim(V ) dim(null (T )) = dim(V ). −

Since range (T ) V is a subspace of V , this implies that range (T ) = V , and so T is ⊂ surjective. Finally, we show that Part 3 implies Part 1. Since T is surjective by assumption, we have range (T )= V . Thus, by again using the Dimension Formula,

dim(null (T )) = dim(V ) dim(range (T ))=0, − 6.7. INVERTIBILITY 81 and so null (T ) = 0 , from which T is injective. By Proposition 6.7.2, an injective and { } surjective linear map is invertible. 82 CHAPTER 6. LINEAR MAPS Exercises for Chapter 6

Calculational Exercises

1. Define the map T : R2 R2 by T (x, y)=(x + y, x). → (a) Show that T is linear. (b) Show that T is surjective. (c) Find dim (null (T )). (d) Find the matrix for T with respect to the canonical basis of R2. (e) Find the matrix for T with respect to the canonical basis for the domain R2 and the basis ((1, 1), (1, 1)) for the target space R2. − (f) Show that the map F : R2 R2 given by F (x, y)=(x + y, x + 1) is not linear. → 2. Let T (R2) be defined by ∈L x y x T = , for all R2 . y x y ∈ − (a) Show that T is surjective. (b) Find dim (null (T )). (c) Find the matrix for T with respect to the canonical basis of R2. (d) Show that the map F : R2 R2 given by F (x, y)=(x + y, x + 1) is not linear. → 3. Consider the complex vector spaces C2 and C3 with their canonical bases, and define S (C3, C2) be the linear map defined by S(v)= Av, v C3, where A is the matrix ∈L ∀ ∈ i 1 1 A = M(S)= . 2i 1 1 − − Find a basis for null(S).

4. Give an example of a function f : R2 R having the property that →

a R, v R2, f(av)= af(v) ∀ ∈ ∀ ∈ 6.7. INVERTIBILITY 83

but such that f is not a linear map.

5. Show that the linear map T : F4 F2 is surjective if →

null(T )= (x , x , x , x ) F4 x =5x , x =7x . { 1 2 3 4 ∈ | 1 2 3 4}

6. Show that no linear map T : F5 F2 can have as its null space the set →

(x , x , x , x , x ) F5 x =3x , x = x = x . { 1 2 3 4 5 ∈ | 1 2 3 4 5}

7. Describe the set of solutions x =(x , x , x ) R3 of the system of equations 1 2 3 ∈ x x + x =0 1 − 2 3 x1 +2x2 + x3 =0  .  2x1 + x2 +2x3 =0 

 Proof-Writing Exercises

1. Let V and W be vector spaces over F with V ﬁnite-dimensional, and let U be any subspace of V . Given a linear map S (U, W ), prove that there exists a linear map ∈L T (V, W ) such that, for every u U, S(u)= T (u). ∈L ∈ 2. Let V and W be vector spaces over F, and suppose that T (V, W ) is injec- ∈ L tive. Given a linearly independent list (v1,...,vn) of vectors in V , prove that the

list (T (v1),...,T (vn)) is linearly independent in W .

3. Let U, V , and W be vector spaces over F, and suppose that the linear maps S (U, V ) ∈L and T (V, W ) are both injective. Prove that the composition map T S is injective. ∈L ◦ 4. Let V and W be vector spaces over F, and suppose that T (V, W ) is surjective. ∈ L Given a spanning list (v1,...,vn) for V , prove that span(T (v1),...,T (vn)) = W .

5. Let V and W be vector spaces over F with V ﬁnite-dimensional. Given T (V, W ), ∈L prove that there is a subspace U of V such that

U null(T )= 0 and range(T )= T (u) u U . ∩ { } { | ∈ } 84 CHAPTER 6. LINEAR MAPS

6. Let V be a vector spaces over F, and suppose that there is a linear map T (V,V ) ∈L such that both null(T ) and range(T ) are ﬁnite-dimensional subspaces of V . Prove that V must also be ﬁnite-dimensional.

7. Let U, V , and W be ﬁnite-dimensional vector spaces over F with S (U, V ) and ∈ L T (V, W ). Prove that ∈L

dim(null(T S)) dim(null(T )) + dim(null(S)). ◦ ≤

8. Let V be a ﬁnite-dimensional vector space over F with S, T (V,V ). Prove that ∈ L T S is invertible if and only if both S and T are invertible. ◦ 9. Let V be a ﬁnite-dimensional vector space over F with S, T (V,V ), and denote by ∈L I the identity map on V . Prove that T S = I if and only if S T = I. ◦ ◦ Chapter 7

Eigenvalues and Eigenvectors

In this chapter we study linear operators T : V V on a ﬁnite-dimensional vector space → V . We are interested in understanding when there is a basis B for V such that the matrix M(T ) of T with respect to B has a particularly nice form. In particular, we would like M(T ) to be either upper triangular or diagonal. This quest leads us to the notions of eigenvalues and eigenvectors of a linear operator, which is one of the most important concepts in Linear Algebra and beyond. For example, quantum mechanics is largely based upon the study of eigenvalues and eigenvectors of operators on inﬁnite-dimensional vector spaces.

7.1 Invariant subspaces

To begin our study, we will look at subspaces U of V that have special properties under an operator T (V,V ). ∈L Deﬁnition 7.1.1. Let V be a ﬁnite-dimensional vector space over F with dim(V ) 1, and ≥ let T (V,V ) be an operator in V . Then a subspace U V is called an invariant ∈ L ⊂ subspace under T if Tu U for all u U. ∈ ∈ That is, U is invariant under T if the image of every vector in U under T remains within U. We denote this as T U = Tu u U U. { | ∈ } ⊂ Example 7.1.2. The subspaces null (T ) and range(T ) are invariant subspaces under T . To see this, let u null (T ). This means that Tu = 0. But, since 0 null (T ), this implies that ∈ ∈ 85 86 CHAPTER 7. EIGENVALUES AND EIGENVECTORS

Tu = 0 null (T ). Similarly, let u range (T ). Since T v range (T ) for all v V , we ∈ ∈ ∈ ∈ certainly also have that Tu range(T ). ∈ Example 7.1.3. Take the linear operator T : R3 R3 corresponding to the matrix → 1 2 0 1 1 0 0 0 2     with respect to the basis (e1, e2, e3). Then span(e1, e2) and span(e3) are both invariant subspaces under T .

An important special case is the case of Deﬁnition 7.1.1 involves one-dimensional invariant subspaces under an operator T (V,V ). If dim(U) = 1, then there exists a nonzero vector ∈L u V such that ∈ U = au a F . { | ∈ } In this case, we must have Tu = λu for some λ F. ∈ This motivates the deﬁnitions of eigenvectors and eigenvalues of a linear operator, as given in the next section.

7.2 Eigenvalues

Deﬁnition 7.2.1. Let T (V,V ). Then λ F is an eigenvalue of T if there exists a ∈ L ∈ nonzero vector u V such that ∈ Tu = λu.

The vector u is called an eigenvector of T corresponding to the eigenvalue λ.

Finding the eigenvalues and eigenvectors of a linear operator is one of the most important problems in Linear Algebra. We will see later that this so-called “eigen-information” has many uses and applications. (As an example, quantum mechanics is based upon understanding the eigenvalues and eigenvectors of operators on specifically defined vector spaces. These vector spaces are often infinite-dimensional, though, and so we do not consider them further in these notes.) 7.2. EIGENVALUES 87

Example 7.2.2.

1. Let T be the zero map deﬁned by T (v) = 0 for all v V . Then every vector u = 0 is ∈ an eigenvector of T with eigenvalue 0.

2. Let I be the identity map deﬁned by I(v)= v for all v V . Then every vector u =0 ∈ is an eigenvector of T with eigenvalue 1.

3. The projection map P : R3 R3 deﬁned by P (x, y, z)=(x, y, 0) has eigenvalues 0 → and 1. The vector (0, 0, 1) is an eigenvector with eigenvalue 0, and both (1, 0, 0) and (0, 1, 0) are eigenvectors with eigenvalue 1.

4. Take the operator R : F2 F2 defined by R(x, y)=( y, x). When F = R, R can be → − interpreted as counterclockwise rotation by 900. From this interpretation, it is clear that no non-zero vector in R2 is mapped to a scalar multiple of itself. Hence, for F = R, the operator R has no eigenvalues. For F = C, though, the situation is significantly different! In this case, λ C is an ∈ eigenvalue of R if R(x, y)=( y, x)= λ(x, y) − so that y = λx and x = λy. This implies that y = λ2y, i.e., that λ2 = 1. − − − The solutions are hence λ = i. One can check that (1, i) is an eigenvector with ± − eigenvalue i and that (1, i) is an eigenvector with eigenvalue i. − Eigenspaces are important examples of invariant subspaces. Let T (V,V ), and let ∈ L λ F be an eigenvalue of T . Then ∈

V = v V T v = λv λ { ∈ | }

is called an eigenspace of T . Equivalently,

V = null (T λI). λ −

Note that V = 0 since λ is an eigenvalue if and only if there exists a nonzero vector u V λ { } ∈ such that Tu = λu. We can reformulate this as follows:

λ F is an eigenvalue of T if and only if the operator T λI is not injective. • ∈ − 88 CHAPTER 7. EIGENVALUES AND EIGENVECTORS

Since the notion of injectivity, surjectivity, and invertibility are equivalent for operators on a ﬁnite-dimensional vector space, we can equivalently say either of the following:

λ F is an eigenvalue of T if and only if the operator T λI is not surjective. • ∈ − λ F is an eigenvalue of T if and only if the operator T λI is not invertible. • ∈ − We close this section with two fundamental facts about eigenvalues and eigenvectors.

Theorem 7.2.3. Let T (V,V ), and let λ ,...,λ F be m distinct eigenvalues of T with ∈L 1 m ∈ corresponding nonzero eigenvectors v1,...,vm. Then (v1,...,vm) is linearly independent.

Proof. Suppose that (v1,...,vm) is linearly dependent. Then, by the Linear Dependence Lemma, there exists an index k 2,...,m such that ∈{ }

vk span(v1,...,vk 1) ∈ − and such that (v1,...,vk 1) is linearly independent. This means that there exist scalars − a1,...,ak 1 F such that − ∈ vk = a1v1 + + ak 1vk 1. (7.1) − −

Applying T to both sides yields, using the fact that vj is an eigenvector with eigenvalue λj,

λkvk = a1λ1v1 + + ak 1λk 1vk 1. − − −

Subtracting λk times Equation (7.1) from this, we obtain

0=(λk λ1)a1v1 + +(λk λk 1)ak 1vk 1. − − − − −

Since (v1,...,vk 1) is linearly independent, we must have (λk λj)aj = 0 for all j = − − 1, 2,...,k 1. By assumption, all eigenvalues are distinct, so λ λ = 0, which im- − k − j plies that a = 0 for all j = 1, 2,...,k 1. But then, by Equation (7.1), v = 0, which j − k contradicts the assumption that all eigenvectors are nonzero. Hence (v1,...,vm) is linearly independent.

Corollary 7.2.4. Any operator T (V,V ) has at most dim(V ) distinct eigenvalues. ∈L 7.3. DIAGONAL MATRICES 89

Proof. Let λ1,...,λm be distinct eigenvalues of T , and let v1,...,vm be corresponding

nonzero eigenvectors. By Theorem 7.2.3, the list (v1,...,vm) is linearly independent. Hence m dim(V ). ≤

7.3 Diagonal matrices

Note that if T has n = dim(V ) distinct eigenvalues, then there exists a basis (v1,...,vn) of V such that

T vj = λjvj, for all j =1, 2,...,n.

Then any v V can be written as a linear combination v = a v + + a v of v ,...,v . ∈ 1 1 n n 1 n Applying T to this, we obtain

T v = λ a v + + λ a v . 1 1 1 n n n

Hence the vector a1 . M(v)=  . 

an     is mapped to

λ1a1 . M(T v)=  .  .

λnan     This means that the matrix M(T ) for T with respect to the basis of eigenvectors (v1,...,vn) is diagonal, and so we call T diagonalizable:

λ1 0 . M(T )=  ..  .

 0 λn     We summarize the results of the above discussion in the following Proposition.

Proposition 7.3.1. If T (V,V ) has dim(V ) distinct eigenvalues, then M(T ) is diagonal ∈L with respect to some basis of V . Moreover, V has a basis consisting of eigenvectors of T . 90 CHAPTER 7. EIGENVALUES AND EIGENVECTORS 7.4 Existence of eigenvalues

In what follows, we want to study the question of when eigenvalues exist for a given operator T . To answer this question, we will use polynomials p(z) F[z] evaluated on operators ∈ n n T (V,V ) (or, equivalently, on square matrices A F × ). More explicitly, given a ∈ L ∈ polynomial p(z)= a + a z + + a zk 0 1 k we can associate the operator

p(T )= a I + a T + + a T k. 0 V 1 k

Note that, for p(z), q(z) F[z], we have ∈

(pq)(T )= p(T )q(T )= q(T )p(T ).

The results of this section will be for complex vector spaces. This is because the proof of the existence of eigenvalues relies on the Fundamental Theorem of Algebra from Chapter 3, which makes a statement about the existence of zeroes of polynomials over C.

Theorem 7.4.1. Let V = 0 be a ﬁnite-dimensional vector space over C, and let T { } ∈ (V,V ). Then T has at least one eigenvalue. L Proof. Let v V with v = 0, and consider the list of vectors ∈

(v,Tv,T 2v,...,T nv), where n = dim(V ). Since the list contains n + 1 vectors, it must be linearly dependent. Hence, there exist scalars a , a ,...,a C, not all zero, such that 0 1 n ∈

0= a v + a T v + a T 2v + + a T nv. 0 1 2 n

Let m be largest index for which a = 0. Since v = 0, we must have m > 0 (but possibly m m = n). Consider the polynomial

p(z)= a + a z + + a zm. 0 1 m 7.5. UPPER TRIANGULAR MATRICES 91

By Theorem 3.2.2 (3) it can be factored as

p(z)= c(z λ ) (z λ ), − 1 − m

where c,λ ,...,λ C and c = 0. 1 m ∈ Therefore,

0= a v + a T v + a T 2v + + a T nv = p(T )v 0 1 2 n = c(T λ I)(T λ I) (T λ I)v, − 1 − 2 − m and so at least one of the factors T λ I must be noninjective. In other words, this λ is − j j an eigenvalue of T .

Note that the proof of Theorem 7.4.1 only uses basic concepts about linear maps, which is the same approach as in a popular textbook called Linear Algebra Done Right by Sheldon Axler. Many other textbooks rely on signiﬁcantly more diﬃcult proofs using concepts like the determinant and characteristic polynomial of a matrix. At the same time, it is often prefer- able to use the characteristic polynomial of a matrix in order to compute eigen-information of an operator; we discuss this approach in Chapter 8. Note also that Theorem 7.4.1 does not hold for real vector spaces. E.g., as we saw in Example 7.2.2, the rotation operator R on R2 has no eigenvalues.

7.5 Upper triangular matrices

As before, let V be a complex vector space. Let T (V,V ) and (v ,...,v ) be a basis for V . Recall that we can associate a matrix ∈L 1 n n n M(T ) C × to the operator T . By Theorem 7.4.1, we know that T has at least one ∈ eigenvalue, say λ C. Let v = 0 be an eigenvector corresponding to λ. By the Basis ∈ 1 Extension Theorem, we can extend the list (v1) to a basis of V . Since T v1 = λv1, the ﬁrst 92 CHAPTER 7. EIGENVALUES AND EIGENVECTORS

column of M(T ) with respect to this basis is

λ 0   . .  .    0     What we will show next is that we can ﬁnd a basis of V such that the matrix M(T ) is upper triangular.

n n Deﬁnition 7.5.1. A matrix A = (a ) F × is called upper triangular if a = 0 for ij ∈ ij i > j.

Schematically, an upper triangular matrix has the form

∗. ∗  ..  , 0   ∗   where the entries can be anything and every entry below the main diagonal is zero. ∗ Some of the reasons why upper triangular matrices are so fantastic are that

1. the eigenvalues are on the diagonal (as we will see later);

2. it is easy to solve the corresponding system of linear equations by back substitution (as discussed in Section 12.3).

The next proposition tells us what upper triangularity means in terms of linear operators and invariant subspaces.

Proposition 7.5.2. Suppose T (V,V ) and that (v ,...,v ) is a basis of V . Then the ∈ L 1 n following statements are equivalent:

1. the matrix M(T ) with respect to the basis (v1,...,vn) is upper triangular;

2. T v span(v ,...,v ) for each k =1, 2,...,n; k ∈ 1 k

3. span(v1,...,vk) is invariant under T for each k =1, 2,...,n. 7.5. UPPER TRIANGULAR MATRICES 93

Proof. The equivalence of Condition 1 and Condition 2 follows easily from the deﬁnition since Condition 2 implies that the matrix elements below the diagonal are zero. Obviously, Condition 3 implies Condition 2. To show that Condition 2 implies Condi- tion 3, note that any vector v span(v ,...,v ) can be written as v = a v + + a v . ∈ 1 k 1 1 k k Applying T , we obtain

T v = a T v + + a T v span(v ,...,v ) 1 1 k k ∈ 1 k

since, by Condition 2, each T v span(v ,...,v ) span(v ,...,v ) for j =1, 2,...,k and j ∈ 1 j ⊂ 1 k since the span is a subspace of V .

The next theorem shows that complex vector spaces indeed have some basis for which the matrix of a given operator is upper triangular.

Theorem 7.5.3. Let V be a ﬁnite-dimensional vector space over C and T (V,V ). Then ∈L there exists a basis B for V such that M(T ) is upper triangular with respect to B.

Proof. We proceed by induction on dim(V ). If dim(V ) = 1, then there is nothing to prove. Hence, assume that dim(V )= n> 1 and that we have proven the result of the theorem for all T (W, W ), where W is a complex vector space with dim(W ) n 1. By ∈ L ≤ − Theorem 7.4.1, T has at least one eigenvalue λ. Deﬁne

U = range(T λI), − and note that

1. dim(U) < dim(V )= n since λ is an eigenvalue of T and hence T λI is not surjective; − 2. U is an invariant subspace of T since, for all u U, we have ∈

Tu =(T λI)u + λu, −

which implies that Tu U since (T λI)u range (T λI)= U and λu U. ∈ − ∈ − ∈ Therefore, we may consider the operator S = T , which is the operator obtained by re- |U stricting T to the subspace U. By the induction hypothesis, there exists a basis (u1,...,um) 94 CHAPTER 7. EIGENVALUES AND EIGENVECTORS

of U with m n 1 such that M(S) is upper triangular with respect to (u ,...,u ). This ≤ − 1 m means that Tu = Su span(u ,...,u ), for all j =1, 2,...,m. j j ∈ 1 j

Extend this to a basis (u1,...,um, v1,...,vk) of V . Then

T v =(T λI)v + λv , for all j =1, 2,...,k. j − j j

Since (T λI)v range (T λI)= U = span(u ,...,u ), we have that − j ∈ − 1 m

T v span(u ,...,u , v ,...,v ), for all j =1, 2,...,k. j ∈ 1 m 1 j

Hence, T is upper triangular with respect to the basis (u1,...,um, v1,...,vk).

The following are two very important facts about upper triangular matrices and their associated operators. Proposition 7.5.4. Suppose T (V,V ) is a linear operator and that M(T ) is upper ∈ L triangular with respect to some basis of V . Then 1. T is invertible if and only if all entries on the diagonal of M(T ) are nonzero.

2. The eigenvalues of T are precisely the diagonal elements of M(T ).

Proof of Proposition 7.5.4, Part 1. Let (v1,...,vn) be a basis of V such that

λ1 . ∗ M(T )=  .. 

 0 λn     is upper triangular. The claim is that T is invertible if and only if λ = 0 for all k = k 1, 2,...,n. Equivalently, this can be reformulated as follows: T is not invertible if and only if λ = 0 for at least one k 1, 2,...,n . k ∈{ } Suppose λk = 0. We will show that this implies the non-invertibility of T . If k = 1, this is obvious since then T v = 0, which implies that v null (T ) so that T is not injective and 1 1 ∈ hence not invertible. So assume that k > 1. Then

T vj span(v1,...,vk 1), for all j k, ∈ − ≤ 7.5. UPPER TRIANGULAR MATRICES 95

since T is upper triangular and λ = 0. Hence, we may deﬁne S = T to be the k |span(v1,...,vk) restriction of T to the subspace span(v1,...,vk) so that

S : span(v1,...,vk) span(v1,...,vk 1). → −

The linear map S is not injective since the dimension of the domain is larger than the dimension of its codomain, i.e.,

dim(span(v1,...,vk)) = k>k 1 = dim(span(v1,...,vk 1)). − −

Hence, there exists a vector 0 = v span(v ,...,v ) such that Sv = T v = 0. This implies ∈ 1 k that T is also not injective and therefore also not invertible.

Now suppose that T is not invertible. We need to show that at least one λk = 0. The linear map T not being invertible implies that T is not injective. Hence, there exists a vector 0 = v V such that T v = 0, and we can write ∈

v = a v + + a v 1 1 k k

for some k, where a = 0. Then k

0= T v =(a1T v1 + + ak 1T vk 1)+ akT vk. (7.2) − −

Since T is upper triangular with respect to the basis (v ,...,v ), we know that a T v + + 1 n 1 1 ak 1T vk 1 span(v1,...,vk 1). Hence, Equation (7.2) shows that T vk span(v1,...,vk 1), − − ∈ − ∈ − which implies that λk = 0.

Proof of Proposition 7.5.4, Part 2. Recall that λ F is an eigenvalue of T if and only if ∈ the operator T λI is not invertible. Let (v ,...,v ) be a basis such that M(T ) is upper − 1 n triangular. Then

λ1 λ − . ∗ M(T λI)=  ..  . −  0 λn λ  −    Hence, by Proposition 7.5.4(1), T λI is not invertible if and only if λ = λ for some k. − k 96 CHAPTER 7. EIGENVALUES AND EIGENVECTORS 7.6 Diagonalization of 2 2 matrices and Applications ×

a b 2 2 2 2 Let A = F × , and recall that we can define a linear operator T (F ) on F by c d ∈ ∈L v setting T (v)= Av for each v = 1 F2. v2 ∈ One method for finding the eigen-information of T is to analyze the solutions of the matrix equation Av = λv for λ F and v F2. In particular, using the definition of ∈ ∈ eigenvector and eigenvalue, v is an eigenvector associated to the eigenvalue λ if and only if Av = T (v)= λv. A simpler method involves the equivalent matrix equation (A λI)v = 0, where I denotes − the identity map on F2. In particular, 0 = v F2 is an eigenvector for T associated to the ∈ eigenvalue λ F if and only if the system of linear equations ∈ (a λ)v + bv = 0 − 1 2 (7.3) cv1 + (d λ)v2 = 0 − has a non-trivial solution. Moreover, System (7.3) has a non-trivial solution if and only if the polynomial p(λ)=(a λ)(d λ) bc evaluates to zero. (See see Proof-writing Exercise 12 − − − on page 101.) In other words, the eigenvalues for T are exactly the λ F for which p(λ) = 0, and the ∈ v eigenvectors for T associated to an eigenvalue λ are exactly the non-zero vectors v = 1 v2 ∈ F2 that satisfy System (7.3).

2 1 Example 7.6.1. Let A = − − . Then p(λ)=( 2 λ)(2 λ) ( 1)(5) = λ2 + 1, 5 2 − − − − − which is equal to zero exactly when λ = i. Moreover, if λ = i, then the System (7.3) ± becomes ( 2 i)v v = 0 − − 1 − 2 , 5v1 + (2 i)v2 = 0 −

v1 2 which is satisﬁed by any vector v = C such that v2 = ( 2 i)v1. Similarly, if v2 ∈ − − 7.6. DIAGONALIZATION OF 2 2 MATRICES AND APPLICATIONS 97 × λ = i, then the System (7.3) becomes − ( 2+ i)v v = 0 − 1 − 2 , 5v1 + (2+ i)v2 = 0

v1 2 which is satisﬁed by any vector v = C such that v2 =( 2+ i)v1. v2 ∈ − 2 1 It follows that, given A = − − , the linear operator on C2 deﬁned by T (v) = Av 5 2 has eigenvalues λ = i, with associated eigenvectors as described above. ± Example 7.6.2. Take the rotation R : R2 R2 by an angle θ [0, 2π) given by the matrix θ → ∈ cos θ sin θ Rθ = − sin θ cos θ

Then we obtain the eigenvalues by solving the polynomial equation

p(λ) = (cos θ λ)2 + sin2 θ − = λ2 2λ cos θ +1=0, − where we have used the fact that sin2 θ + cos2 θ = 1. Solving for λ in C, we obtain

2 2 iθ λ = cos θ √cos θ 1 = cos θ sin θ = cos θ i sin θ = e± . ± − ± − ± 2 We see that, as an operator over the real vector space R , the operator Rθ only has eigenvalues x when θ = 0 or θ = π. However, if we interpret the vector 1 R2 as a complex number x2 ∈ iθ z = x + ix , then z is an eigenvector if R : C C maps z λz = e± z. Moreover, from 1 2 θ → → iθ Section 2.3.2, we know that multiplication by e± corresponds to rotation by the angle θ. ± 98 CHAPTER 7. EIGENVALUES AND EIGENVECTORS Exercises for Chapter 7

Calculational Exercises

1. Let T (F2, F2) be deﬁned by ∈L

T (u, v)=(v,u)

for every u, v F. Compute the eigenvalues and associated eigenvectors for T . ∈ 2. Let T (F3, F3) be deﬁned by ∈L

T (u,v,w)=(2v, 0, 5w)

for every u,v,w F. Compute the eigenvalues and associated eigenvectors for T . ∈ 3. Let n Z be a positive integer and T (Fn, Fn) be deﬁned by ∈ + ∈L

T (x ,...,x )=(x + + x ,...,x + + x ) 1 n 1 n 1 n

for every x ,...,x F. Compute the eigenvalues and associated eigenvectors for T . 1 n ∈ 4. Find eigenvalues and associated eigenvectors for the linear operators on F2 deﬁned by each given 2 2 matrix. × 3 0 10 9 0 3 (a) (b) − (c) 8 1 4 2 4 0 − − 2 7 0 0 1 0 (d) − − (e) (f) 1 2 0 0 0 1

a b 2 2 Hint: Use the fact that, given a matrix A = F × , λ F is an eigenvalue c d ∈ ∈ for A if and only if (a λ)(d λ) bc = 0. − − − 5. For each matrix A below, ﬁnd eigenvalues for the induced linear operator T on Fn without performing any calculations. Then describe the eigenvectors v Fn associated ∈ to each eigenvalue λ by looking at solutions to the matrix equation (A λI)v = 0, − 7.6. DIAGONALIZATION OF 2 2 MATRICES AND APPLICATIONS 99 × where I denotes the identity map on Fn.

1 3 0 0 0 1 3 7 11 − 1 1 1 6  0 0 0   0 3 8  (a) − , (b) − 3 , (c) 2 0 5  0 010   000 4   1     0 00   000 2   2       

6. For each matrix A below, describe the invariant subspaces for the induced linear operator T on F2 that maps each v F2 to T (v)= Av. ∈

4 1 0 1 2 3 1 0 (a) − , (b) , (c) , (d) 2 1 1 0 0 2 0 0 −

7. Let T (R2) be deﬁned by ∈L x y x T = , for all R2 . y x + y y ∈

Deﬁne two real numbers λ+ and λ as follows: − 1+ √5 1 √5 λ+ = , λ = − . 2 − 2

(a) Find the matrix of T with respect to the canonical basis for R2 (both as the domain and the codomain of T ; call this matrix A).

(b) Verify that λ+ and λ are eigenvalues of T by showing that v+ and v are eigen- − − vectors, where 1 1 v+ = , v = . λ+ − λ −

2 (c) Show that (v+, v ) is a basis of R . − 2 (d) Find the matrix of T with respect to the basis (v+, v ) for R (both as the domain − and the codomain of T ; call this matrix B). 100 CHAPTER 7. EIGENVALUES AND EIGENVECTORS

Proof-Writing Exercises

1. Let V be a ﬁnite-dimensional vector space over F with T (V,V ), and let U ,...,U ∈L 1 m be subspaces of V that are invariant under T . Prove that U + + U must then also 1 m be an invariant subspace of V under T .

2. Let V be a ﬁnite-dimensional vector space over F with T (V,V ), and suppose that ∈L U and U are subspaces of V that are invariant under T . Prove that U U is also 1 2 1 ∩ 2 an invariant subspace of V under T .

3. Let V be a ﬁnite-dimensional vector space over F with T (V,V ) invertible and ∈ L 1 1 λ F 0 . Prove λ is an eigenvalue for T if and only if λ− is an eigenvalue for T − . ∈ \{ } 4. Let V be a ﬁnite-dimensional vector space over F, and suppose that T (V,V ) has ∈L the property that every v V is an eigenvector for T . Prove that T must then be a ∈ scalar multiple of the identity function on V .

5. Let V be a ﬁnite-dimensional vector space over F, and let S, T (V ) be linear ∈ L operators on V with S invertible. Given any polynomial p(z) F[z], prove that ∈

1 1 p(S T S− )= S p(T ) S− . ◦ ◦ ◦ ◦

6. Let V be a ﬁnite-dimensional vector space over C, T (V ) be a linear operator on ∈ L V , and p(z) C[z] be a polynomial. Prove that λ C is an eigenvalue of the linear ∈ ∈ operator p(T ) (V ) if and only if T has an eigenvalue C such that p()= λ. ∈L ∈ 7. Let V be a ﬁnite-dimensional vector space over C with T (V ) a linear operator ∈ L on V . Prove that, for each k =1,..., dim(V ), there is an invariant subspace Uk of V

under T such that dim(Uk)= k.

8. Prove or give a counterexample to the following claim:

Claim. Let V be a ﬁnite-dimensional vector space over F, and let T (V ) be a linear ∈L operator on V . If the matrix for T with respect to some basis on V has all zeros on the diagonal, then T is not invertible.

9. Prove or give a counterexample to the following claim: 7.6. DIAGONALIZATION OF 2 2 MATRICES AND APPLICATIONS 101 × Claim. Let V be a ﬁnite-dimensional vector space over F, and let T (V ) be a linear ∈L operator on V . If the matrix for T with respect to some basis on V has all non-zero elements on the diagonal, then T is invertible.

10. Let V be a ﬁnite-dimensional vector space over F, and let S, T (V ) be linear ∈ L operators on V . Suppose that T has dim(V ) distinct eigenvalues and that, given any eigenvector v V for T associated to some eigenvalue λ F, v is also an eigenvector ∈ ∈ for S associated to some (possibly distinct) eigenvalue F. Prove that T S = S T . ∈ ◦ ◦ 11. Let V be a ﬁnite-dimensional vector space over F, and suppose that the linear operator P (V ) has the property that P 2 = P . Prove that V = null(P ) range(P ). ∈L ⊕ 12. (a) Let a,b,c,d F and consider the system of equations given by ∈

ax1 + bx2 = 0 (7.4)

cx1 + dx2 =0. (7.5)

Note that x1 = x2 = 0 is a solution for any choice of a, b, c, and d. Prove that this system of equations has a non-trivial solution if and only if ad bc = 0. − a b 2 2 2 (b) Let A = F × , and recall that we can deﬁne a linear operator T (F ) c d ∈ ∈L v on F2 by setting T (v)= Av for each v = 1 F2. v2 ∈

Show that the eigenvalues for T are exactly the λ F for which p(λ) = 0, where ∈ p(z)=(a z)(d z) bc. − − −

Hint: Write the eigenvalue equation Av = λv as (A λI)v = 0 and use the ﬁrst − part. Chapter 8

Permutations and the Determinant of a Square Matrix

There are many operations that can be applied to a square matrix. This chapter is devoted to one particularly important operation called the determinant. In effect, the determinant can be thought of as a single number that is used to check for many of the different properties that a matrix might possess. In order to define the determinant operation, we will first need to define permutations.

8.1 Permutations

Permutations appear in many diﬀerent mathematical concepts, and so we give a general introduction to them in this section.

8.1.1 Deﬁnition of permutations

Given a positive integer n Z , a permutation of an (ordered) list of n distinct objects is ∈ + any reordering of this list. When describing the reorderings themselves, though, the nature of the objects involved is more or less irrelevant. E.g., we can imagine interchanging the second and third items in a list of five distinct objects — no matter what those items are — and this defines a particular permutation that can be applied to any list of five objects. Since the nature of the objects being rearranged (i.e., permuted) is immaterial, it is common to use the integers 1, 2,...,n as the standard list of n objects. Alternatively, one

102 8.1. PERMUTATIONS 103

can also think of these integers as labels for the items in any list of n distinct elements. This gives rise to the following deﬁnition.

Deﬁnition 8.1.1. A permutation π of n elements is a one-to-one and onto function having the set 1, 2,...,n as both its domain and codomain. { }

In other words, a permutation is a function π : 1, 2,...,n 1, 2,...,n such that, { } −→ { } for every integer i 1,...,n , there exists exactly one integer j 1,...,n for which ∈ { } ∈ { } π(j) = i. We will usually denote permutations by Greek letters such as π (pi), σ (sigma), and τ (tau). The set of all permutations of n elements is denoted by and is typically Sn referred to as the symmetric group of degree n. (In particular, the set forms a group Sn under function composition as discussed in Section 8.1.2.) Given a permutation π , there are several common notations used for specifying ∈ Sn how π permutes the integers 1, 2,...,n. The important thing to keep in mind when working with these different notations is that π is a function defined on the finite set 1, 2,...,n , { } with notation being used as a convenient short-hand for keeping track of how π permutes the elements in this set.

Deﬁnition 8.1.2. Given a permutation π , denote π = π(i) for each i 1,...,n . ∈ Sn i ∈{ } Then the two-line notation for π is given by the 2 n matrix × 1 2 n π = . π1 π2 πn In other words, given a permutation π and an integer i 1,...,n , we are denoting ∈ Sn ∈{ } the image of i under π by πi instead of using the more conventional function notation π(i). Then, in order to specify the image of each integer i 1,...,n under π, we list these ∈ { } images in a two-line array as shown above. (One can also use so-called one-line notation for π, which is given by simply ignoring the top row and writing π = π π π .) 1 2 n It is important to note that, although we represent permutations as 2 n matrices, you × should not think of permutations as linear transformations from an n-dimensional vector space into a two-dimensional vector space. Moreover, the composition operation on permutation that we describe in Section 8.1.2 below does not correspond to matrix multiplication. The use of matrix notation in denoting permutations is merely a matter of convenience. 104 CHAPTER 8. PERMUTATIONS AND DETERMINANTS

Example 8.1.3. Suppose that we have a set of five distinct objects and that we wish to describe the permutation that places the first item into the second position, the second item into the fifth position, the third item into the first position, the fourth item into the third position, and the fifth item into the fourth position. Then, using the notation developed above, we have the permutation π such that ∈ S5

π1 = π(1) = 3, π2 = π(2) = 1, π3 = π(3) = 4, π4 = π(4) = 5, π5 = π(5) = 2.

In two-line notation, we would write π as

12345 π = . 31452

It is relatively straightforward to ﬁnd the number of permutations of n elements, i.e., to determine cardinality of the set . To construct an arbitrary permutation of n elements, Sn we can proceed as follows: First, choose an integer i 1,...,n to put into the ﬁrst ∈ { } position. Clearly, we have exactly n possible choices. Next, choose the element to go in the second position. Since we have already chosen one element from the set 1,...,n , there are { } now exactly n 1 remaining choices. Proceeding in this way, we have n 2 choices when − − choosing the third element from the set 1,...,n , then n 3 choices when choosing the { } − fourth element, and so on until we are left with exactly one choice for the nth element. This proves the following theorem.

Theorem 8.1.4. The number of elements in the symmetric group is given by Sn

= n (n 1) (n 2) 3 2 1= n! |Sn| − −

We conclude this section with several examples, including a complete description of the one permutation in , the two permutations in , and the six permutations in . For S1 S2 S3 your own practice, you should (patiently) attempt to list the 4! = 24 permutations in . S4 Example 8.1.5.

1. Given any positive integer n Z , the identity function id : 1,...,n 1,...,n ∈ + { } −→{ } given by id(i) = i, i 1,...,n , is a permutation in . This function can be ∀ ∈ { } Sn 8.1. PERMUTATIONS 105

thought of as the trivial reordering that does not change the order at all, and so we call it the trivial or identity permutation.

2. If n = 1, then, by Theorem 8.1.4, = 1! = 1. Thus, contains only the identity |Sn| S1 permutation.

3. If n = 2, then, by Theorem 8.1.4, = 2! = 2 1 = 2. Thus, there is only one |Sn| non-trivial permutation π in , namely the transformation interchanging the ﬁrst and S2 the second elements in a list. As a function, π(1) = 2 and π(2) = 1, and, in two-line notation, 1 2 1 2 π = = . π1 π2 2 1

4. If n = 3, then, by Theorem 8.1.4, = 3! = 3 2 1 = 6. Thus, there are ﬁve |Sn| non-trivial permutation in . Using two-line notation, we have that S3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 3 = , , , , , S 1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1

Keep in mind the fact that each element in is simultaneously both a function and S3 a reordering operation. E.g., the permutation

1 2 3 1 2 3 π = = π1 π2 π3 2 3 1

can be read as defining the reordering that, with respect to the original list, places the second element in the first position, the third element in the second position, and the first element in the third position. This permutation could equally well have been identified by describing its action on the (ordered) list of letters a, b, c. In other words,

1 2 3 a b c = , 2 3 1 b c a

regardless of what the letters a, b, c might happen to represent. 106 CHAPTER 8. PERMUTATIONS AND DETERMINANTS

8.1.2 Composition of permutations

Let n Z be a positive integer and π, σ be permutations. Then, since π and σ ∈ + ∈ Sn are both functions from the set 1,...,n to itself, we can compose them to obtain a new { } function π σ (read as “pi after sigma”) that takes on the values ◦

(π σ)(1) = π(σ(1)), (π σ)(2) = π(σ(2)),... (π σ)(n)= π(σ(n)). ◦ ◦ ◦

In two-line notation, we can write π σ as ◦ 1 2 n 1 2 n 1 2 n or or . π(σ(1)) π(σ(2)) π(σ(n)) πσ(1) πσ(2) πσ(n) πσ1 πσ2 πσn

Example 8.1.6. From , suppose that we have the permutations π and σ given by S3

π(1) = 2, π(2) = 3, π(3)=1 and σ(1) = 1, σ(2) = 3, σ(3) = 2.

Then note that (π σ)(1) = π(σ(1)) = π(1) = 2, ◦ (π σ)(2) = π(σ(2)) = π(3) = 1, ◦ (π σ)(3) = π(σ(3)) = π(2) = 3. ◦ In other words,

1 2 3 1 2 3 1 2 3 1 2 3 = = . 2 3 1 ◦ 1 3 2 π(1) π(3) π(2) 2 1 3

Similar computations (which you should check for your own practice) yield compositions such as 1 2 3 1 2 3 1 2 3 1 2 3 = = , 1 3 2 ◦ 2 3 1 σ(2) σ(3) σ(1) 3 2 1

1 2 3 1 2 3 1 2 3 1 2 3 = = , 2 3 1 ◦ 1 2 3 σ(1) σ(2) σ(3) 2 3 1 8.1. PERMUTATIONS 107

and 1 2 3 1 2 3 1 2 3 1 2 3 = = . 1 2 3 ◦ 2 3 1 id(2) id(3) id(1) 2 3 1 In particular, note that the result of each composition above is a permutation, that composition is not a commutative operation, and that composition with id leaves a permutation unchanged. Moreover, since each permutation π is a bijection, one can always construct an 1 1 inverse permutation π− such that π π− = id. E.g., ◦ 1 2 3 1 2 3 1 2 3 1 2 3 = = . 2 3 1 ◦ 3 1 2 π(3) π(1) π(2) 1 2 3

We summarize the basic properties of composition on the symmetric group in the following theorem.

Theorem 8.1.7. Let n Z be a positive integer. Then the set has the following ∈ + Sn properties.

1. Given any two permutations π, σ , the composition π σ . ∈ Sn ◦ ∈ Sn 2. (Associativity of Composition) Given any three permutations π, σ, τ , ∈ Sn

(π σ) τ = π (σ τ). ◦ ◦ ◦ ◦

3. (Identity Element for Composition) Given any permutation π , ∈ Sn

π id = id π = π. ◦ ◦

4. (Inverse Elements for Composition) Given any permutation π , there exists a ∈ Sn 1 unique permutation π− such that ∈ Sn

1 1 π π− = π− π = id. ◦ ◦

In other words, the set forms a group under composition. Sn Note that the composition of permutations is not commutative in general. In particular, for n 3, it is easy to ﬁnd permutations π and σ such that π σ = σ π. ≥ ◦ ◦ 108 CHAPTER 8. PERMUTATIONS AND DETERMINANTS

8.1.3 Inversions and the sign of a permutation

Let n Z be a positive integer. Then, given a permutation π , it is natural to ask how ∈ + ∈ Sn “out of order” π is in comparison to the identity permutation. One method for quantifying this is to count the number of so-called inversion pairs in π as these describe pairs of objects that are out of order relative to each other.

Deﬁnition 8.1.8. Let π be a permutation. Then an inversion pair (i, j) of π is a ∈ Sn pair of positive integers i, j 1,...,n for which i < j but π(i) > π(j). ∈{ } Note, in particular, that the components of an inversion pair are the positions where the two “out of order” elements occur.

Example 8.1.9. We classify all inversion pairs for elements in : S3 1 2 3 id = has no inversion pairs since no elements are “out of order”. • 1 2 3

1 2 3 π = has the single inversion pair (2, 3) since π(2) = 3 > 2= π(3). • 1 3 2

1 2 3 π = has the single inversion pair (1, 2) since π(1) = 2 > 1= π(2). • 2 1 3

1 2 3 π = has the two inversion pairs (1, 3) and (2, 3) since we have that both • 2 3 1 π(1) = 2 > 1= π(3) and π(2) = 3 > 1= π(3).

1 2 3 π = has the two inversion pairs (1, 2) and (1, 3) since we have that both • 3 1 2 π(1) = 3 > 1= π(2) and π(1) = 3 > 2= π(3).

1 2 3 π = has the three inversion pairs (1, 2), (1, 3), and (2, 3), as you can check. • 3 2 1 Example 8.1.10. As another example, for each i, j 1,...,n with i < j, we deﬁne the ∈ { } transposition t by ij ∈ Sn 1 2 i j n tij = . 1 2 j i n 8.1. PERMUTATIONS 109

In other words, tij is the permutation that interchanges i and j while leaving all other integers ﬁxed in place. One can check that the number of inversions pairs in t is exactly 2(j i) 1. ij − − Thus, the number of inversions in a transposition is always odd. E.g.,

1234 t13 = 3214 has inversion pairs (1, 2), (1, 3), and (2, 3). For the purposes of using permutations in Linear Algebra, the significance of inversion pairs is mainly due to the following fundamental definition. Definition 8.1.11. Let π be a permutation. Then the sign of π, denoted by sign(π), ∈ Sn is defined by

+1, if the number of inversions in π is even sign(π)=( 1)# of inversion pairs in π = . −  1, if the number of inversions in π is odd − We call π an even permutation if sign(π) = +1, whereas π is called an odd permutation if sign(π)= 1. − Example 8.1.12. Based upon the computations in Example 8.1.9 above, we have that

1 2 3 1 2 3 1 2 3 sign = sign = sign = +1 1 2 3 2 3 1 3 1 2 and that 1 2 3 1 2 3 1 2 3 sign = sign = sign = 1. 1 3 2 2 1 3 3 2 1 − Similarly, from Example 8.1.10, it follows that any transposition is an odd permutation. We summarize some of the most basic properties of the sign operation on the symmetric group in the following theorem. Theorem 8.1.13. Let n Z be a positive integer. Then, ∈ + 1. for id the identity permutation, ∈ Sn

sign(id)=+1. 110 CHAPTER 8. PERMUTATIONS AND DETERMINANTS

2. for t a transposition with i, j 1,...,n and i < j, ij ∈ Sn ∈{ }

sign(t )= 1. (8.1) ij −

3. given any two permutations π, σ , ∈ Sn

sign(π σ) = sign(π) sign(σ), (8.2) ◦ 1 sign(π− ) = sign(π). (8.3)

4. the number of even permutations in , when n 2, is exactly 1 n!. Sn ≥ 2 5. the set A of even permutations in forms a group under composition. n Sn

8.2 Determinants

Now that we have developed the appropriate background material on permutations, we are ﬁnally ready to deﬁne the determinant and explore its many important properties.

8.2.1 Summations indexed by the set of all permutations

Given a positive integer n Z , we begin with the following definition: ∈ + n n Definition 8.2.1. Given a square matrix A = (a ) F × , the determinant of A is ij ∈ defined to be det(A)= sign(π)a a a , (8.4) 1,π(1) 2,π(2) n,π(n) π n ∈S where the sum is over all permutations of n elements (i.e., over the symmetric group).

Note that each permutation in the summand of (8.4) permutes the n columns of the n n × matrix.

2 2 Example 8.2.2. Suppose that A F × is the 2 2 matrix ∈ × a a A = 11 12 . a21 a22 8.2. DETERMINANTS 111

To calculate the determinant of A, we ﬁrst list the two permutations in S2:

1 2 1 2 id = and σ = . 1 2 2 1

The permutation id has sign 1, and the permutation σ has sign 1. Thus, the determinant − of A is given by det(A)= a a a a . 11 22 − 12 21

Were one to attempt to compute determinants directly using Equation (8.4), then one would need to sum up n! terms, where each summand is itself a product of n factors. This is an incredibly inefficient method for finding determinants since n! increases in size very rapidly as n increases. E.g., 10! = 3628800. Thus, even if you could compute one summand per second without stopping, it would still take you well over a month to compute the determinant of a 10 10 matrix using Equation (8.4). Fortunately, there are properties of × the determinant (as summarized in Section 8.2.2 below) that can be used to greatly reduce the size of such computations. These properties of the determinant follow from general properties that hold for any summation taken over the symmetric group, which are in turn themselves based upon properties of permutations and the fact that addition and multiplication are commutative operations in the field F (which, as usual, we take to be either R or C).

Let T : V be a function defined on the symmetric group that takes values in Sn → Sn some vector space V . E.g., T (π) could be the term corresponding to the permutation π in Equation 8.4. Then, since the sum T (π) π n ∈S is finite, we are free to reorder the summands. In other words, the sum is independent of the order in which the terms are added, and so we are free to permute the term order without affecting the value of the sum. Some commonly used reorderings of such sums are 112 CHAPTER 8. PERMUTATIONS AND DETERMINANTS

the following:

T (π) = T (σ π) (8.5) ◦ π n π n ∈S ∈S = T (π σ) (8.6) ◦ π n ∈S 1 = T (π− ), (8.7) π n ∈S where σ is a ﬁxed permutation. Equation (8.5) follows from the fact that, if π runs through each permutation in exactly Sn once, then σ π similarly runs through each permutation but in a potentially diﬀerent order. ◦ I.e., the action of σ upon something like Equation (8.4) is that σ merely permutes the permutations that index the terms. Put another way, there is a one-to-one correspondence between permutations in general and permutations composed with σ. Similar reasoning holds for Equations (8.6) and (8.7).

8.2.2 Properties of the determinant

We summarize some of the most basic properties of the determinant below. The proof of the following theorem uses properties of permutations, properties of the sign function on permutations, and properties of sums over the symmetric group as discussed in Section 8.2.1 above. In thinking about these properties, it is useful to keep in mind that, using Equation (8.4), the determinant of an n n matrix A is the sum over all possible ways of selecting n entries × of A, where exactly one element is selected from each row and from each column of A.

Theorem 8.2.3 (Properties of the Determinant). Let n Z be a positive integer, and ∈ + n n suppose that A =(a ) F × is an n n matrix. Then ij ∈ ×

1. det(0n n)=0 and det(In)=1, where 0n n denotes the n n zero matrix and In × × × denotes the n n identity matrix. × 2. det(AT ) = det(A), where AT denotes the transpose of A.

( ,1) ( ,2) ( ,n) n 3. denoting by A , A ,...,A F the columns of A, det(A) is a linear function ∈ 8.2. DETERMINANTS 113

( ,i) of column A , for each i 1,...,n . In other words, if we denote ∈{ }

( ,1) ( ,2) ( ,n) A = A A A | || then, given any scalar z F and any vectors a , a ,...,a ,c,b Fn, ∈ 1 2 n ∈

det [a1 ai 1 zai an] = z det [a1 ai 1 ai an] , || − | || || − | || det [a1 ai 1 b + c an] = det [a1 b an] + det [a1 c an] . || − | || || || || ||

4. det(A) is an antisymmetric function of the columns of A. In other words, given any ( ,1) ( ,2) ( ,n) positive integers 1 i < j n and denoting A = A A A , ≤ ≤ | || ( ,1) ( ,j) ( ,i) ( ,n) det(A)= det A A A A . − || || || 5. if A has two identical columns, det(A)=0.

6. if A has a column of zero’s, det(A)=0.

7. Properties 3–6 also hold when rows are used in place of columns.

n n 8. given any other matrix B F × , ∈

det(AB) = det(A) det(B).

9. if A is either upper triangular or lower triangular,

det(A)= a a a . 11 22 nn

Proof. First, note that Properties 1, 3, 6, and 9 follow directly from the sum given in Equation (8.4). Moreover, Property 5 follows directly from Property 4, and Property 7 follows directly from Property 2. Thus, we only need to prove Properties 2, 4, and 8. Proof of 2. Since the entries of AT are obtained from those of A by interchanging the 114 CHAPTER 8. PERMUTATIONS AND DETERMINANTS

row and column indices, it follows that det(AT ) is given by

det(AT )= sign(π) a a a . π(1),1 π(2),2 π(n),n π n ∈S Using the commutativity of the product in F and Equation (8.3), we see that

T 1 det(A )= sign(π− ) a −1 a −1 a −1 , 1,π (1) 2,π (2) n,π (n) π n ∈S which equals det(A) by Equation (8.7).

( ,1) ( ,j) ( ,i) ( ,n) Proof of 4. Let B = A A A A be the matrix obtained || || || from A by interchanging the ith and the jth column. Then note that

det(B)= sign(π) a a a a . 1,π(1) j,π(i) i,π(j) n,π(n) π n ∈S Deﬁneπ ˜ = π t , and note that π =π ˜ t . In particular, π(i)=˜π(j) and π(j)=˜π(i), from ◦ ij ◦ ij which det(B)= sign(˜π t ) a a a a . ◦ ij 1,π˜(1) i,π˜(i) j,π˜(j) n,π˜(n) π n ∈S It follows from Equations (8.2) and (8.1) that sign(˜π t )= sign (˜π). Thus, using Equa- ◦ ij − tion (8.6), we obtain det(B)= det(A). − Proof of 8. Using the standard expression for the matrix entries of the product AB in terms of the matrix entries of A =(aij) and B =(bij), we have that

n n

det(AB) = sign(π) a b a n b n 1,k1 k1,π(1) n,k k ,π(n) π n k1=1 kn=1 ∈S n n

= a a n sign (π)b b n . 1,k1 n,k k1,π(1) k ,π(n) k1=1 kn=1 π n ∈S

Note that, for ﬁxed k1,...,kn 1,...,n , the sum π n sign (π)bk1,π(1) bkn,π(n) is the ∈ { } ∈S determinant of a matrix composed of rows k1,...,knof B. Thus, by property 5, it follows that this expression vanishes unless the ki are pairwise distinct. In other words, the sum

over all choices of k1,...,kn can be restricted to those sets of indices σ(1),...,σ(n) that are 8.2. DETERMINANTS 115

labeled by a permutation σ . In other words, ∈ Sn

det(AB)= a a sign(π) b b . 1,σ(1) n,σ(n) σ(1),π(1) σ(n),π(n) σ n π n ∈S ∈S Now, proceeding with the same arguments as in the proof of Property 4 but with the role of tij replaced by an arbitrary permutation σ, we obtain

1 det(AB)= sign(σ) a1,σ(1) an,σ(n) sign(π σ− ) b1,π σ−1(1) bn,π σ−1(n) . ◦ ◦ ◦ σ n π n ∈S ∈S Using Equation (8.6), this last expression then becomes (det(A))(det(B)).

Note that Properties 3 and 4 of Theorem 8.2.3 eﬀectively summarize how multiplication by an Elementary Matrix interacts with the determinant operation. These Properties together with Property 9 facilitate numerical computation of determinants for very large matrices.

8.2.3 Further properties and applications

There are many applications of Theorem 8.2.3. We conclude these notes with a few conse- quences that are particularly useful when computing with matrices. In particular, we use the determinant to list several characterizations for matrix invertibility, and, as a corollary, give a method for using determinants to calculate eigenvalues. You should provide a proof of these results for your own beneﬁt.

n n Theorem 8.2.4. Let n Z and A F × . Then the following statements are equivalent: ∈ + ∈ 1. A is invertible.

x1 . 2. denoting x =  . , the matrix equation Ax =0 has only the trivial solution x =0.

xn     x1 b1 . . 3. denoting x =  . , the matrix equation Ax = b has a solution for all b =  .  Fn. ∈ xn bn         116 CHAPTER 8. PERMUTATIONS AND DETERMINANTS

4. A can be factored into a product of elementary matrices.

5. det(A) =0. 6. the rows (or columns) of A form a linearly independent set in Fn.

7. zero is not an eigenvalue of A.

8. the linear operator T : Fn Fn deﬁned by T (x)= Ax, for every x Fn, is bijective. → ∈

1 1 Moreover, should A be invertible, then det(A− )= . det(A)

n n Given a matrix A C × and a complex number λ C, the expression ∈ ∈

P (λ) = det(A λI ) − n

is called the characteristic polynomial of A. Note that P (λ) is a basis independent polynomial of degree n. Thus, as with the determinant, we can consider P (λ) to be associated with the linear map that has matrix A with respect to some basis. Since the eigenvalues of A are exactly those λ C such that A λI is not invertible, the following is then an ∈ − immediate corollary.

Corollary 8.2.5. The roots of the polynomial P (λ) = det(A λI) are exactly the eigenvalues − of A.

8.2.4 Computing determinants with cofactor expansions

As noted in Section 8.2.1, it is generally impractical to compute determinants directly with Equation (8.4). In this section, we brieﬂy describe the so-called cofactor expansions of a determinant. When properly applied, cofactor expansions are particularly useful for computing determinants by hand.

n n Deﬁnition 8.2.6. Let n Z and A F × . Then, for each i, j 1, 2,...,n , the ∈ + ∈ ∈ { } i-j minor of A, denoted Mij, is deﬁned to be the determinant of the matrix obtained by 8.2. DETERMINANTS 117

removing the ith row and jth column from A. Moreover, the i-j cofactor of A is deﬁned to be A =( 1)i+jM . ij − ij Cofactors themselves, though, are not terribly useful unless put together in the right way.

n n Deﬁnition 8.2.7. Let n Z+ and A =(aij) F × . Then, for each i, j 1, 2,...,n , the ∈ ∈ n ∈{ n } th th i row (resp. j column) cofactor expansion of A is the sum aijAij (resp. aijAij). j=1 i=1 n n Theorem 8.2.8. Let n Z and A F × . Then every row and column factor expansion ∈ + ∈ of A is equal to the determinant of A.

Since the determinant of a matrix is equal to every row or column cofactor expansion, one can compute the determinant using a convenient choice of expansions until the calculation is reduced to one or more 2 2 determinants. We close with an example. × Example 8.2.9. By ﬁrst expanding along the second column, we obtain

1 2 3 4 − 4 1 3 1 3 4 42 1 3 − − − =( 1)1+2(2) 3 0 3 +( 1)2+2(2) 3 0 3 . − − − − 30 0 3 − 2 2 3 2 2 3 2 0 2 3 − − − Then, each of the resulting 3 3 determinants can be computed by further expansion: × 4 1 3 − 1+2 3 3 3+2 4 3 3 0 3 =( 1) (1) − +( 1) ( 2) − = 15+6= 9. − − 2 3 − − 3 3 − − 2 2 3 − − 1 3 4 − 2+1 3 4 2+3 1 3 3 0 3 =( 1) (3) − +( 1) ( 3) − =3+12=15. − − 2 3 − − 2 2 2 2 3 − − − It follows that the original determinant is then equal to 2( 9)+2(15) = 48. − − 118 CHAPTER 8. PERMUTATIONS AND DETERMINANTS Exercises for Chapter 8

Calculational Exercises

3 3 1. Let A C × be given by ∈ 1 0 i A =  0 1 0  . i 0 1 − −    (a) Calculate det(A).

(b) Find det(A4).

2. (a) For each permutation π , compute the number of inversions in π, and classify ∈ S3 π as being either an even or an odd permutation.

(b) Use your result from Part (a) to construct a formula for the determinant of a 3 3 × matrix.

3. (a) For each permutation π , compute the number of inversions in π, and classify ∈ S4 π as being either an even or an odd permutation.

(b) Use your result from Part (a) to construct a formula for the determinant of a 4 4 × matrix.

4. Solve for the variable x in the following expression:

1 0 3 x 1 − det − = det 2 x 6  . 3 1 x − − 1 3 x 5  −    5. Prove that the following determinant does not depend upon the value of θ:

sin(θ) cos(θ) 0 det cos(θ) sin(θ) 0  −  sin(θ) cos(θ) sin(θ) + cos(θ) 1  −    8.2. DETERMINANTS 119

6. Given scalars α,β,γ F, prove that the following matrix is not invertible: ∈ sin2(α) sin2(β) sin2(γ) 2 2 2 cos (α) cos (β) cos (γ) 1 1 1     Hint: Compute the determinant.

Proof-Writing Exercises

1. Let a, b, c, d, e, f F be scalars, and suppose that A and B are the following matrices: ∈ a b d e A = and B = . 0 c 0 f

b a c Prove that AB = BA if and only if det − = 0. e d f − 2. Given a square matrix A, prove that A is invertible if and only if AT A is invertible.

n n 3. Prove or give a counterexample: For any n 1 and A, B R × , one has ≥ ∈

det(A + B) = det(A) + det(B).

n n 4. Prove or give a counterexample: For any r R, n 1 and A R × , one has ∈ ≥ ∈

det(rA)= r det(A). Chapter 9

Inner Product Spaces

The abstract deﬁnition of a vector space only takes into account algebraic properties for the addition and scalar multiplication of vectors. For vectors in Rn, for example, we also have geometric intuition involving the length of a vector or the angle formed by two vectors. In this chapter we discuss inner product spaces, which are vector spaces with an inner product deﬁned upon them. Inner products are what allow us to abstract notions such as the length of a vector. We will also abstract the concept of angle via a condition called orthogonality.

9.1 Inner product

In this section, V is a ﬁnite-dimensional, nonzero vector space over F.

Deﬁnition 9.1.1. An inner product on V is a map

, : V V F × → (u, v) u, v → with the following four properties.

1. Linearity in ﬁrst slot: u + v,w = u,w + v,w and au, v = a u, v for all u,v,w V and a F; ∈ ∈ 2. Positivity: v, v 0 for all v V ; ≥ ∈ 3. Positive deﬁniteness: v, v = 0 if and only if v = 0; 120 9.1. INNER PRODUCT 121

4. Conjugate symmetry: u, v = v,u for all u, v V . ∈ Remark 9.1.2. Recall that every real number x R equals its complex conjugate. Hence, ∈ for real vector spaces, conjugate symmetry of an inner product becomes actual symmetry.

Deﬁnition 9.1.3. An inner product space is a vector space over F together with an inner product , . Example 9.1.4. Let V = Fn and u = (u ,...,u ), v = (v ,...,v ) Fn. Then we can 1 n 1 n ∈ deﬁne an inner product on V by setting

n u, v = u v . i i i=1 For F = R, this reduces to the usual dot product, i.e.,

u v = u v + + u v . 1 1 n n

Example 9.1.5. Let V = F[z] be the space of polynomials with coeﬃcients in F. Given f,g F[z], we can deﬁne their inner product to be ∈ 1 f,g = f(z)g(z)dz, 0 where g(z) is the complex conjugate of the polynomial g(z).

For a fixed vector w V , one can define a map T : V F by setting T v = v,w . ∈ → Note that T is linear by Condition 1 of Definition 9.1.1. This implies, in particular, that 0,w = 0 for every w V . By conjugate symmetry, we also have w, 0 = 0. ∈ Lemma 9.1.6. The inner product is anti-linear in the second slot, that is, u, v + w = u, v + u,w and u, av = a u, v for all u,v,w V and a F. ∈ ∈ Proof. For additivity, note that

u, v + w = v + w,u = v,u + w,u = v,u + w,u = u, v + u,w . 122 CHAPTER 9. INNER PRODUCT SPACES

Similarly, for anti-homogeneity, note that

u, av = av,u = a v,u = a v,u = a u, v .

We close this section by noting that the convention in physics is often the exact opposite of what we have deﬁned above. In other words, an inner product in physics is traditionally linear in the second slot and anti-linear in the ﬁrst slot.

9.2 Norms

The norm of a vector in an arbitrary inner product space is the analog of the length or magnitude of a vector in Rn. We formally deﬁne this concept as follows.

Deﬁnition 9.2.1. Let V be a vector space over F. A map

: V R → v v → is a norm on V if the following three conditions are satisﬁed.

1. Positive deﬁniteness: v = 0 if and only if v = 0; 2. Positive Homogeneity: av = a v for all a F and v V ; | | ∈ ∈ 3. Triangle inequality: v + w v + w for all v,w V . ≤ ∈ Remark 9.2.2. Note that, in fact, v 0 for each v V since ≥ ∈

0= v v v + v =2 v . − ≤ −

Next we want to show that a norm can always be deﬁned from an inner product , via the formula v = v, v for all v V. (9.1) ∈ Properties 1 and 2 follow easily from Conditions 1 and 3 of Deﬁnition 9.1.1. The triangle inequality requires more careful proof, though, which we give in Theorem 9.3.4 below. 9.3. ORTHOGONALITY 123

v x3

Figure 9.1: The length of a vector in R3 via Equation 9.2

If we take V = Rn, then the norm deﬁned by the usual dot product is related to the usual notion of length of a vector. Namely, for v =(x ,...,x ) Rn, we have 1 n ∈

v = x2 + + x2 . (9.2) 1 n We illustrate this for the case of R3 in Figure 9.1. While it is always possible to start with an inner product and use it to define a norm, the converse requires more care. In particular, one can prove that a norm can be used to define an inner product via the Equation (9.1) if and only if the norm satisfies the Parallelogram Law (Theorem 9.3.6).

9.3 Orthogonality

Using the inner product, we can now deﬁne the notion of orthogonality, prove that the Pythagorean theorem holds in any inner product space, and use the Cauchy-Schwarz inequality to prove the triangle inequality. In particular, this will show that v = v, v does indeed deﬁne a norm. 124 CHAPTER 9. INNER PRODUCT SPACES

Deﬁnition 9.3.1. Two vectors u, v V are orthogonal (denoted u v) if u, v = 0. ∈ ⊥ Note that the zero vector is the only vector that is orthogonal to itself. In fact, the zero vector is orthogonal to every vector v V . ∈ Theorem 9.3.2 (Pythagorean Theorem). If u, v V , an inner product space, with u v, ∈ ⊥ then deﬁned by v := v, v obeys u + v 2 = u 2 + v 2.

Proof. Suppose u, v V such that u v. Then ∈ ⊥ u + v 2 = u + v,u + v = u 2 + v 2 + u, v + v,u = u 2 + v 2.

Note that the converse of the Pythagorean Theorem holds for real vector spaces since, in that case, u, v + v,u = 2Re u, v = 0. Given two vectors u, v V with v = 0, we can uniquely decompose u into two pieces: ∈ one piece parallel to v and one piece orthogonal to v. This is called an orthogonal decomposition. More precisely, we have

u = u1 + u2,

where u = av and u v for some scalar a F. To obtain such a decomposition, write 1 2⊥ ∈ u = u u = u av. Then, for u to be orthogonal to v, we need 2 − 1 − 2

0= u av, v = u, v a v 2. − −

Solving for a yields a = u, v / v 2 so that u, v u, v u = v + u v . (9.3) v 2 − v 2 This decomposition is particularly useful since it allows us to provide a simple proof for the Cauchy-Schwarz inequality. 9.3. ORTHOGONALITY 125

Theorem 9.3.3 (Cauchy-Schwarz Inequality). Given any u, v V , we have ∈

u, v u v . | |≤

Furthermore, equality holds if and only if u and v are linearly dependent, i.e., are scalar multiples of each other.

Proof. If v = 0, then both sides of the inequality are zero. Hence, assume that v = 0, and consider the orthogonal decomposition

u, v u = v + w v 2 where w v. By the Pythagorean theorem, we have ⊥ u, v 2 u, v 2 u, v 2 u 2 = v + w 2 = | | + w 2 | | . v 2 v 2 ≥ v 2 Multiplying both sides by v 2 and taking the square root then yields the Cauchy-Schwarz inequality. Note that we get equality in the above arguments if and only if w = 0. But, by Equa- tion (9.3), this means that u and v are linearly dependent.

The Cauchy-Schwarz inequality has many diﬀerent proofs. Here is another one.

Alternate proof of Theorem 9.3.3. Given u, v V , consider the norm square of the vector ∈ u + reiθv: 0 u + reiθv 2 = u 2 + r2 v 2 + 2Re(reiθ u, v ). ≤ Since u, v is a complex number, one can choose θ so that eiθ u, v is real. Hence, the right hand side is a parabola ar2 + br + c with real coeﬃcients. It will lie above the real axis, i.e. ar2 + br + c 0, if it does not have any real solutions for r. This is the case when the ≥ discriminant satisﬁes b2 4ac 0. In our case this means − ≤

4 u, v 2 4 u 2 v 2 0. | | − ≤

Moreover, equality only holds if r can be chosen such that u + reiθv = 0, which means that u and v are scalar multiples. 126 CHAPTER 9. INNER PRODUCT SPACES

u+v

v u+v` v` u

Figure 9.2: The triangle inequality in R2

Now that we have proven the Cauchy-Schwarz inequality, we are finally able to verify the triangle inequality. This is the final step in showing that v = v, v does indeed define a norm. We illustrate the triangle inequality in Figure 9.2. Theorem 9.3.4 (Triangle Inequality). For all u, v V we have ∈

u + v u + v . ≤

Proof. By a straightforward calculation, we obtain

u + v 2 = u + v,u + v = u,u + v, v + u, v + v,u = u,u + v, v + u, v + u, v = u 2 + v 2 + 2Re u, v . Note that Re u, v u, v so that, using the Cauchy-Schwarz inequality, we obtain ≤| |

u + v 2 u 2 + v 2 +2 u v =( u + v )2. ≤

Taking the square root of both sides now gives the triangle inequality.

Remark 9.3.5. Note that equality holds for the triangle inequality if and only if v = ru or u = rv for some r 0. Namely, equality in the proof happens only if u, v = u v , which ≥ is equivalent to u and v being scalar multiples of one another. 9.4. ORTHONORMAL BASES 127

u−v u+v

u g

Figure 9.3: The parallelogram law in R2

Theorem 9.3.6 (Parallelogram Law). Given any u, v V , we have ∈

u + v 2 + u v 2 = 2( u 2 + v 2). −

Proof. By direct calculation,

u + v 2 + u v 2 = u + v,u + v + u v,u v − − − = u 2 + v 2 + u, v + v,u + u 2 + v 2 u, v v,u − − = 2( u 2 + v 2).

Remark 9.3.7. We illustrate the parallelogram law in Figure 9.3.

9.4 Orthonormal bases

We now deﬁne the notions of orthogonal basis and orthonormal basis for an inner product space. As we will see later, orthonormal bases have many special properties that allow us to simplify various calculations. 128 CHAPTER 9. INNER PRODUCT SPACES

Deﬁnition 9.4.1. Let V be an inner product space with inner product , . A list of nonzero vectors (e1,...,em) in V is called orthogonal if

e , e =0, for all 1 i = j m. i j ≤ ≤

The list (e1,...,em) is called orthonormal if

e , e = δ , for all i, j =1,...,m, i j i,j

where δij is the Kronecker delta symbol. I.e., δij = 1 if i = j and is zero otherwise. Proposition 9.4.2. Every orthogonal list of nonzero vectors in V is linearly independent. Proof. Let (e ,...,e ) be an orthogonal list of vectors in V , and suppose that a ,...,a F 1 m 1 m ∈ are such that a e + + a e =0. 1 1 m m Then 0= a e + + a e 2 = a 2 e 2 + + a 2 e 2 1 1 m m | 1| 1 | m| m Note that e > 0, for all k =1,...,m, since every e is a nonzero vector. Also, a 2 0. k k | k| ≥ Hence, the only solution to a e + + a e = 0 is a = = a = 0. 1 1 m m 1 m Definition 9.4.3. An orthonormal basis of a finite-dimensional inner product space V is a list of orthonormal vectors that is basis for V . Clearly, any orthonormal list of length dim(V ) is an orthonormal basis for V . Example 9.4.4. The canonical basis for Fn is an orthonormal basis. Example 9.4.5. The list (( 1 , 1 ), ( 1 , 1 )) is an orthonormal basis for R2. √2 √2 √2 − √2 The next theorem allows us to use inner products to find the coefficients of a vector v V ∈ in terms of an orthonormal basis. This result highlights how much easier it is to compute with an orthonormal basis. Theorem 9.4.6. Let (e ,...,e ) be an orthonormal basis for V . Then, for all v V , we 1 n ∈ have v = v, e e + + v, e e 1 1 n n and v 2 = n v, e 2. k=1 | k| 9.5. THE GRAM-SCHMIDT ORTHOGONALIZATION PROCEDURE 129

Proof. Let v V . Since (e ,...,e ) is a basis for V , there exist unique scalars a ,...,a F ∈ 1 n 1 n ∈ such that v = a e + + a e . 1 1 n n Taking the inner product of both sides with respect to e then yields v, e = a . k k k

9.5 The Gram-Schmidt orthogonalization procedure

We now come to a fundamentally important algorithm, which is called the Gram-Schmidt orthogonalization procedure. This algorithm makes it possible to construct, for each list of linearly independent vectors (resp. basis), a corresponding orthonormal list (resp. orthonormal basis).

Theorem 9.5.1. If (v1,...,vm) is a list of linearly independent vectors in V , then there exists an orthonormal list (e1,...,em) such that

span(v1,...,vk) = span(e1,...,ek), for all k =1,...,m. (9.4)

Proof. The proof is constructive, that is, we will actually construct vectors e1,...,em having the desired properties. Since (v ,...,v ) is linearly independent, v = 0 for each k = 1 m k v1 1, 2,...,m. Set e1 = v1 . Then e1 is a vector of norm 1 and satisﬁes Equation (9.4) for k = 1. Next, set v v , e e e = 2 − 2 1 1 . 2 v v , e e 2 − 2 1 1 This is, in fact, the normalized version of the orthogonal decomposition Equation (9.3). I.e.,

w = v v , e e , 2 − 2 1 1 where w e . Note that e = 1 and span(e , e ) = span(v , v ). ⊥ 1 2 1 2 1 2 Now, suppose that e1,...,ek 1 have been constructed such that (e1,...,ek 1) is an or- − − thonormal list and span(v1,...,vk 1) = span(e1,...,ek 1). Then deﬁne − −

vk vk, e1 e1 vk, e2 e2 vk, ek 1 ek 1 ek = − − −− − − . vk vk, e1 e1 vk, e2 e2 vk, ek 1 ek 1 − − −− − −

Since (v1,...,vk) is linearly independent, we know that vk span(v1,...,vk 1). Hence, we ∈ − 130 CHAPTER 9. INNER PRODUCT SPACES

also know that vk span(e1,...,ek 1). It follows that the norm in the deﬁnition of ek is not ∈ − zero, and so ek is well-deﬁned (i.e., we are not dividing by zero). Note that a vector divided by its norm has norm 1 so that e = 1. Furthermore, k

vk vk, e1 e1 vk, e2 e2 vk, ek 1 ek 1 ek, ei = − − −− − − , ei vk vk, e1 e1 vk, e2 e2 vk, ek 1 ek 1 − − −− − − v , e v , e = k i− k i =0, vk vk, e1 e1 vk, e2 e2 vk, ek 1 ek 1 − − −− − − for each 1 i

3 Example 9.5.2. Take v1 = (1, 1, 0) and v2 = (2, 1, 1) in R . The list (v1, v2) is linearly independent (as you should verify!). To illustrate the Gram-Schmidt procedure, we begin by setting v1 1 e1 = = (1, 1, 0). v1 √2 Next, set v v , e e e = 2 − 2 1 1 . 2 v v , e e 2 − 2 1 1 The inner product v , e = 1 (1, 1, 0), (2, 1, 1) = 3 , so 2 1 √2 √2 3 1 u = v v , e e = (2, 1, 1) (1, 1, 0) = (1, 1, 2). 2 2 − 2 1 1 − 2 2 −

Calculating the norm of u , we obtain u = 1 (1+1+4)= √6 . Hence, normalizing this 2 2 4 2 vector, we obtain u2 1 e2 = = (1, 1, 2). u2 √6 − The list (e1, e2) is therefore orthonormal and has the same span as (v1, v2). Corollary 9.5.3. Every ﬁnite-dimensional inner product space has an orthonormal basis.

Proof. Let (v1,...,vn) be any basis for V . This list is linearly independent and spans V .

Apply the Gram-Schmidt procedure to this list to obtain an orthonormal list (e1,...,en), 9.6. ORTHOGONAL PROJECTIONS AND MINIMIZATION PROBLEMS 131

which still spans V by construction. By Proposition 9.4.2, this list is linearly independent and hence a basis of V .

Corollary 9.5.4. Every orthonormal list of vectors in V can be extended to an orthonormal basis of V .

Proof. Let (e1,...,em) be an orthonormal list of vectors in V . By Proposition 9.4.2, this list

is linearly independent and hence can be extended to a basis (e1,...,em, v1,...,vk) of V by the Basis Extension Theorem. Now apply the Gram-Schmidt procedure to obtain a new or-

thonormal basis (e1,...,em, f1,...,fk). The ﬁrst m vectors do not change since they already are orthonormal. The list still spans V and is linearly independent by Proposition 9.4.2 and therefore forms a basis.

Recall Theorem 7.5.3: given an operator T (V,V ) on a complex vector space V , there ∈L exists a basis B for V such that the matrix M(T ) of T with respect to B is upper triangular. We would like to extend this result to require the additional property of orthonormality.

Corollary 9.5.5. Let V be an inner product space over F and T (V,V ). If T is upper- ∈L triangular with respect to some basis, then T is upper-triangular with respect to some orthonormal basis.

Proof. Let (v1,...,vn) be a basis of V with respect to which T is upper-triangular. Apply

the Gram-Schmidt procedure to obtain an orthonormal basis (e1,...,en), and note that

span(e ,...,e ) = span(v ,...,v ), for all 1 k n. 1 k 1 k ≤ ≤

We proved before that T is upper-triangular with respect to a basis (v1,...,vn) if and only if span(v ,...,v ) is invariant under T for each 1 k n. Since these spans are unchanged by 1 k ≤ ≤ the Gram-Schmidt procedure, T is still upper triangular for the corresponding orthonormal basis.

9.6 Orthogonal projections and minimization problems

Definition 9.6.1. Let V be a finite-dimensional inner product space and U V be a subset ⊂ (but not necessarily a subspace) of V . Then the orthogonal complement of U is defined 132 CHAPTER 9. INNER PRODUCT SPACES to be the set

U ⊥ = v V u, v = 0 for all u U . { ∈ | ∈ } Note that, in fact, U ⊥ is always a subspace of V (as you should check!) and that

0 ⊥ = V and V ⊥ = 0 . { } { }

In addition, if U and U are subsets of V satisfying U U , then U ⊥ U ⊥. 1 2 1 ⊂ 2 2 ⊂ 1 Remarkably, if U V is a subspace of V , then we can say quite a bit more about U ⊥. ⊂ Theorem 9.6.2. If U V is a subspace of V , then V = U U ⊥. ⊂ ⊕ Proof. We need to show two things:

1. V = U + U ⊥.

2. U U ⊥ = 0 . ∩ { } To show Condition 1 holds, let (e1,...,em) be an orthonormal basis of U. Then, for all v V , we can write ∈

v = v, e e + + v, e e + v v, e e v, e e . (9.5) 1 1 m m − 1 1 −− m m u w

The vector u U, and ∈

w, e = v, e v, e =0, for all j =1, 2, . . . , m, j j− j

since (e ,...,e ) is an orthonormal list of vectors. Hence, w U ⊥. This implies that 1 m ∈ V = U + U ⊥.

To prove that Condition 2 also holds, let v U U ⊥. Then v has to be orthogonal to ∈ ∩ every vector in U, including to itself, and so v, v = 0. However, this implies v = 0 so that U U ⊥ = 0 . ∩ { } Example 9.6.3. R2 is the direct sum of any two orthogonal lines, and R3 is the direct sum of any plane and any line orthogonal to the plane (as illustrated in Figure 9.4). For example,

R2 = (x, 0) x R (0,y) y R , { | ∈ }⊕{ | ∈ } R3 = (x, y, 0) x, y R (0, 0, z) z R . { | ∈ }⊕{ | ∈ } 9.6. ORTHOGONAL PROJECTIONS AND MINIMIZATION PROBLEMS 133

Figure 9.4: R3 as a direct sum of a plane and a line, as in Example 9.6.3

Another fundamental fact about the orthogonal complement of a subspace is as follows.

Theorem 9.6.4. If U V is a subspace of V , then U =(U ⊥)⊥. ⊂ Proof. First we show that U (U ⊥)⊥. Let u U. Then, for all v U ⊥, we have u, v = 0. ⊂ ∈ ∈ Hence, u (U ⊥)⊥ by the deﬁnition of (U ⊥)⊥. ∈ Next we show that (U ⊥)⊥ U. Suppose 0 = v (U ⊥)⊥ such that v U, and decompose ⊂ ∈ ∈ v according to Theorem 9.6.2, i.e., as

v = u + u U U ⊥ 1 2 ∈ ⊕

with u U and u U ⊥. Then u = 0 since v U. Furthermore, u , v = u ,u = 0. 1 ∈ 2 ∈ 2 ∈ 2 2 2 But then v is not in (U ⊥)⊥, which contradicts our initial assumption. Hence, we must have

that (U ⊥)⊥ U. ⊂

By Theorem 9.6.2, we have the decomposition V = U U ⊥ for every subspace U V . ⊕ ⊂ This allows us to define the orthogonal projection PU of V onto U. Definition 9.6.5. Let U V be a subspace of a finite-dimensional inner product space. ⊂ Every v V can be uniquely written as v = u + w where u U and w U ⊥. Define ∈ ∈ ∈ P : V V, U → v u. → 134 CHAPTER 9. INNER PRODUCT SPACES

2 Note that PU is called a projection operator since it satisﬁes PU = PU . Further, since we also have

range (PU )= U,

null (PU )= U ⊥,

it follows that range (P ) null (P ). Therefore, P is called an orthogonal projection. U ⊥ U U The decomposition of a vector v V as given in Equation (9.5) yields the formula ∈

P v = v, e e + + v, e e , (9.6) U 1 1 m m

where (e1,...,em) is any orthonormal basis of U. Equation (9.6) is a particularly useful tool

for computing such things as the matrix of PU with respect to the basis (e1,...,em). Let us now apply the inner product to the following minimization problem: Given a subspace U V and a vector v V , ﬁnd the vector u U that is closest to the vector v. ⊂ ∈ ∈ In other words, we want to make v u as small as possible. The next proposition shows − that PU v is the closest point in U to the vector v and that this minimum is, in fact, unique.

Proposition 9.6.6. Let U V be a subspace of V and v V . Then ⊂ ∈

v P v v u for every u U. − U ≤ − ∈

Furthermore, equality holds if and only if u = PU v.

Proof. Let u U and set P := P for short. Then ∈ U v P v 2 v P v 2 + P v u 2 − ≤ − − = (v P v)+(P v u) 2 = v u 2, − − −

where the second line follows from the Pythagorean Theorem 9.3.2 since v P v U ⊥ and − ∈ P v u U. Furthermore, equality holds only if P v u 2 = 0, which is equivalent to − ∈ − P v = u.

Example 9.6.7. Consider the plane U R3 through 0 and perpendicular to the vector ⊂ u = (1, 1, 1). Using the standard norm on R3, we can calculate the distance of the point v = (1, 2, 3) to U using Proposition 9.6.6. In particular, the distance d between v and U 9.6. ORTHOGONAL PROJECTIONS AND MINIMIZATION PROBLEMS 135 is given by d = v P v . Let ( 1 u,u ,u ) be a basis for R3 such that (u ,u ) is an − U √3 1 2 1 2 orthonormal basis of U. Then, by Equation (9.6), we have

1 v P v =( v,u u + v,u u + v,u u ) ( v,u u + v,u u ) − U 3 1 1 2 2 − 1 1 2 2 1 = v,u u 3 1 = (1, 2, 3), (1, 1, 1) (1, 1, 1) 3 = (2, 2, 2).

Hence, d = (2, 2, 2) =2√3. 136 CHAPTER 9. INNER PRODUCT SPACES Exercises for Chapter 9

Calculational Exercises

3 1. Let (e1, e2, e3) be the canonical basis of R , and deﬁne

f1 = e1 + e2 + e3

f2 = e2 + e3

f3 = e3.

(a) Apply the Gram-Schmidt process to the basis (f1, f2, f3).

(b) What do you obtain if you instead applied the Gram-Schmidt process to the basis

(f3, f2, f1)?

2. Let [ π, π] = f :[ π, π] R f is continuous denote the inner product space C − { − → | } of continuous real-valued functions deﬁned on the interval [ π, π] R, with inner − ⊂ product given by

π f,g = f(x)g(x)dx, for every f,g [ π, π]. π ∈ C − − Then, given any positive integer n Z , verify that the set of vectors ∈ + 1 sin(x) sin(2x) sin(nx) cos(x) cos(2x) cos(nx) , , ,..., , , ,..., √ π √π √π √π √π √π √π 2 is orthonormal.

3. Let R2[x] denote the inner product space of polynomials over R having degree at most two, with inner product given by

1 f,g = f(x)g(x)dx, for every f,g R [x]. ∈ 2 0 Apply the Gram-Schmidt procedure to the standard basis 1, x, x2 for R [x] in order { } 2 to produce an orthonormal basis for R2[x]. 9.6. ORTHOGONAL PROJECTIONS AND MINIMIZATION PROBLEMS 137

4. Let v , v , v R3 be given by v = (1, 2, 1), v = (1, 2, 1), and v = (1, 2, 1). 1 2 3 ∈ 1 2 − 3 − 3 Apply the Gram-Schmidt procedure to the basis (v1, v2, v3) of R , and call the resulting

orthonormal basis (u1,u2,u3).

5. Let P R3 be the plane containing 0 perpendicular to the vector (1, 1, 1). Using the ⊂ standard norm, calculate the distance of the point (1, 2, 3) to P .

6. Give an orthonormal basis for null(T ), where T (C4) is the map with canonical ∈ L matrix 1111 1111 . 1111   1111     Proof-Writing Exercises

1. Let V be a ﬁnite-dimensional inner product space over F. Given any vectors u, v V , ∈ prove that the following two statements are equivalent:

(a) u, v =0 (b) u u + αv for every α F. ≤ ∈ 2. Let n Z be a positive integer, and let a ,...,a , b ,...,b R be any collection of ∈ + 1 n 1 n ∈ 2n real numbers. Prove that

2 n n n 2 2 bk akbk kak ≤ k k=1 k=1 k=1

3. Prove or disprove the following claim:

Claim. There is an inner product , on R2 whose associated norm is given by the formula (x , x ) = x + x 1 2 | 1| | 2| for every vector (x , x ) R2, where denotes the absolute value function on R. 1 2 ∈ || 138 CHAPTER 9. INNER PRODUCT SPACES

4. Let V be a ﬁnite-dimensional inner product space over R. Given u, v V , prove that ∈ u + v 2 u v 2 u, v = − − . 4

5. Let V be a ﬁnite-dimensional inner product space over C. Given u, v V , prove that ∈ u + v 2 u v 2 u + iv 2 u iv 2 u, v = − − + − − i. 4 4

6. Let V be a ﬁnite-dimensional inner product space over F, and let U be a subspace of

V . Prove that the orthogonal complement U ⊥ of U with respect to the inner product , on V satisﬁes dim(U ⊥) = dim(V ) dim(U). − 7. Let V be a ﬁnite-dimensional inner product space over F, and let U be a subspace of

V . Prove that U = V if and only if the orthogonal complement U ⊥ of U with respect

to the inner product , on V satisﬁes U ⊥ = 0 . { } 8. Let V be a ﬁnite-dimensional inner product space over F, and suppose that P (V ) ∈L is a linear operator on V having the following two properties:

(a) Given any vector v V , P (P (v)) = P (v). I.e., P 2 = P . ∈ (b) Given any vector u null(P ) and any vector v range(P ), u, v = 0. ∈ ∈ Prove that P is an orthogonal projection.

n n 9. Prove or give a counterexample: For any n 1 and A C × , one has ≥ ∈

null(A) = (range(A))⊥.

10. Prove or give a counterexample: The Gram-Schmidt process applied to an an orthonormal list of vectors reproduces that list unchanged. Chapter 10

Change of Bases

In Section 6.6, we saw that linear operators on an n-dimensional vector space are in one- to-one correspondence with n n matrices. This correspondence, however, depends upon × the choice of basis for the vector space. In this chapter we address the question of how the matrix for a linear operator changes if we change from one orthonormal basis to another.

10.1 Coordinate vectors

Let V be a ﬁnite-dimensional inner product space with inner product , and dimension dim(V ) = n. Then V has an orthonormal basis e = (e1,...,en), and, according to Theo- rem 9.4.6, every v V can be written as ∈ n v = v, e e . i i i=1 This induces a map

[ ] : V Fn e → v, e1 . v  .  , →  v, en      139 140 CHAPTER 10. CHANGE OF BASES

which maps the vector v V to the n 1 column vector of its coordinates with respect to ∈ × the basis e. The column vector [v]e is called the coordinate vector of v with respect to the basis e.

Example 10.1.1. Recall that the vector space R1[x] of polynomials over R of degree at most 1 is an inner product space with inner product deﬁned by

1 f,g = f(x)g(x)dx. 0 Then e = (1, √3( 1+2x)) forms an orthonormal basis for R [x]. The coordinate vector of − 1 the polynomial p(x)=3x +2 R [x] is, e.g., ∈ 1 1 7 [p(x)]e = . 2 √3

Note also that the map [ ] is an isomorphism (meaning that it is an injective and e surjective linear map) and that it is also inner product preserving. Denote the usual inner product on Fn by n x, y n = x y . F k k k=1 Then v,w = [v] , [w] n , for all v,w V , V e eF ∈ since

n n v,w = v, e e , w, e e = v, e w, e e , e V i i j j i j i j i,j=1 i,j=1 n n = v, e w, e δ = v, e w, e = [v] , [w] n . i j ij i i e eF i,j=1 i=1 It is important to remember that the map [ ] depends on the choice of basis e =(e ,...,e ). e 1 n 10.2. CHANGE OF BASIS TRANSFORMATION 141 10.2 Change of basis transformation

n n Recall that we can associate a matrix A F × to every operator T (V,V ). More ∈ ∈ L th precisely, the j column of the matrix A = M(T ) with respect to a basis e =(e1,...,en) is obtained by expanding T ej in terms of the basis e. If the basis e is orthonormal, then the coeﬃcient of ei is just the inner product of the vector with ei. Hence,

M(T )=( T ej, ei )1 i,j n, ≤ ≤ where i is the row index and j is the column index of the matrix.

n n Conversely, if A F × is a matrix, then we can associate a linear operator T (V,V ) ∈ ∈L to A by setting

n n n T v = v, e T e = T e , e v, e e j j j i j i j=1 j=1 i=1 n n = a v, e e = (A[v] ) e , ij j i e i i i=1 j=1 i=1 th where (A[v]e)i denotes the i component of the column vector A[v]e. With this construction, we have M(T )= A. The coeﬃcients of T v in the basis (e1,...,en) are recorded by the column vector obtained by multiplying the n n matrix A with the n 1 column vector [v] whose × × e components ([v] ) = v, e . e j j

Example 10.2.1. Given 1 i A = − , i 1 we can deﬁne T (V,V ) with respect to the canonical basis as follows: ∈L z 1 i z z iz T 1 = − 1 = 1 − 2 . z2 i 1 z2 iz1 + z2

Suppose that we want to use another orthonormal basis f =(f1,...,fn) for V . Then, as 142 CHAPTER 10. CHANGE OF BASES

before, we have v = n v, f f . Comparing this with v = n v, e e , we ﬁnd that i=1 i i j=1 j j

n n n v = v, e e , f f = e , f v, e f . j j i i j i j i i,j=1 i=1 j=1 Hence,

[v]f = S[v]e,

where S =(s )n with s = e , f . ij i,j=1 ij j i th The j column of S is given by the coeﬃcients of the expansion of ej in terms of the basis f =(f ,...,f ). The matrix S describes a linear map in (Fn), which is called the change 1 n L of basis transformation.

We may also interchange the role of bases e and f. In this case, we obtain the matrix n R =(rij)i,j=1, where r = f , e . ij j i Then, by the uniqueness of the expansion in a basis, we obtain

[v]e = R[v]f

so that RS[v] = [v] , for all v V . e e ∈ n 1 Since this equation is true for all [v] F , it follows that either RS = I or R = S− . In e ∈ particular, S and R are invertible. We can also check this explicitly by using the properties of orthonormal bases. Namely,

n n (RS) = r s = f , e e , f ij ik kj k i j k k=1 k=1 n = e , f e , f = [e ] , [e ] n = δ . j k i k j f i f F ij k=1 Matrix S (and similarly also R) has the interesting property that its columns are orthonormal to one another. This follows from the fact that the columns are the coordinates of 10.2. CHANGE OF BASIS TRANSFORMATION 143

orthonormal vectors with respect to another orthonormal basis. A similar statement holds for the rows of S (and similarly also R).

2 Example 10.2.2. Let V = C , and choose the orthonormal bases e = (e1, e2) and f =

(f1, f2) with

1 0 e1 = , e2 = , 0 1 1 1 1 1 f1 = , f2 = − . √2 1 √2 1

Then e , f e , f 1 1 1 S = 1 1 2 1 = e1, f2 e2, f2 √2 1 1 − and f , e f , e 1 1 1 R = 1 1 2 1 = − . f1, e2 f2, e2 √2 1 1 One can then check explicitly that indeed

1 1 1 1 1 1 0 RS = − = = I. 2 1 1 1 1 0 1 − So far we have only discussed how the coordinate vector of a given vector v V changes ∈ under the change of basis from e to f. The next question we can ask is how the matrix M(T ) of an operator T (V ) changes if we change the basis. Let A be the matrix of T ∈ L with respect to the basis e =(e1,...,en), and let B be the matrix for T with respect to the

basis f =(f1,...,fn). How do we determine B from A? Note that

[T v]e = A[v]e

so that 1 [T v]f = S[T v]e = SA[v]e = SAR[v]f = SAS− [v]f .

This implies that 1 B = SAS− . 144 CHAPTER 10. CHANGE OF BASES

Example 10.2.3. Continuing Example 10.2.2, let

1 1 A = 1 1 be the matrix of a linear operator with respect to the basis e. Then the matrix B with respect to the basis f is given by

1 1 1 1 1 1 1 1 1 1 1 2 0 2 0 B = SAS− = − = = . 2 1 1 1 1 1 1 2 1 1 2 0 0 0 − − 10.2. CHANGE OF BASIS TRANSFORMATION 145 Exercises for Chapter 10

Calculational Exercises

3 1. Consider R with two orthonormal bases: the canonical basis e = (e1, e2, e3) and the

basis f =(f1, f2, f3), where

1 1 1 f1 = (1, 1, 1), f2 = (1, 2, 1), f3 = (1, 0, 1) . √3 √6 − √2 −

Find the matrix, S, of the change of basis transformation such that

[v] = S[v] , for all v R3 , f e ∈

where [v]b denotes the column vector of v with respect to the basis b.

2. Let v C4 be the vector given by v = (1, i, 1, i). Find the matrix (with respect to ∈ − − the canonical basis on C4) of the orthogonal projection P (C4) such that ∈L

null(P )= v ⊥ . { }

3. Let U be the subspace of R3 that coincides with the plane through the origin that is perpendicular to the vector n = (1, 1, 1) R3. ∈ (a) Find an orthonormal basis for U. (b) Find the matrix (with respect to the canonical basis on R3) of the orthogonal projection P (R3) onto U, i.e., such that range(P )= U. ∈L 4. Let V = C4 with its standard inner product. For θ R, let ∈ 1  eiθ  v = C4. θ 2iθ ∈ e   3iθ e      Find the canonical matrix of the orthogonal projection onto the subspace v ⊥. { θ} 146 CHAPTER 10. CHANGE OF BASES

Proof-Writing Exercises

1. Let V be a ﬁnite-dimensional vector space over F with dimension n Z , and sup- ∈ + pose that b = (v1, v2,...,vn) is a basis for V . Prove that the coordinate vectors n [v1]b, [v2]b,..., [vn]b with respect to b form a basis for F .

2. Let V be a ﬁnite-dimensional vector space over F, and suppose that T (V ) is a ∈ L linear operator having the following property: Given any two bases b and c for V , the matrix M(T, b) for T with respect to b is the same as the matrix M(T,c) for T with respect to c. Prove that there exists a scalar α F such that T = αI , where I ∈ V V denotes the identity map on V . Chapter 11

The Spectral Theorem for Normal Linear Maps

In this chapter we come back to the question of when a linear operator on an inner product space V is diagonalizable. We first introduce the notion of the adjoint (a.k.a. hermitian conjugate) of an operator, and we then use this to define so-called normal operators. The main result of this chapter is the Spectral Theorem, which states that normal operators are diagonal with respect to an orthonormal basis. We use this to show that normal operators are “unitarily diagonalizable” and generalize this notion to finding the singular-value decomposition of an operator.

11.1 Self-adjoint or hermitian operators

Let V be a ﬁnite-dimensional inner product space over C with inner product , . A linear operator T (V ) is uniquely determined by the values of ∈L

Tv,w , for all v,w V . ∈

This means, in particular, that if T,S (V ) and ∈L

Tv,w = Sv,w for all v,w V , ∈

then T = S. To see this, take w to be the elements of an orthonormal basis of V .

147 148 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS

Deﬁnition 11.1.1. Given T (V ), the adjoint (a.k.a. hermitian conjugate) of T is ∈ L deﬁned to be the operator T ∗ (V ) for which ∈L

Tv,w = v, T ∗w , for all v,w V ∈

Moreover, we call T self-adjoint (a.k.a. hermitian) if T = T ∗.

The uniqueness of T ∗ is clear by the previous observation.

Example 11.1.2. Let V = C3, and let T (C3) be deﬁned by T (z , z , z ) = (2z + ∈ L 1 2 3 2 iz3, iz1, z2). Then

(y ,y ,y ), T ∗(z , z , z ) = T (y ,y ,y ), (z , z , z ) 1 2 3 1 2 3 1 2 3 1 2 3 = (2y + iy , iy ,y ), (z , z , z ) 2 3 1 2 1 2 3 =2y2z1 + iy3z1 + iy1z2 + y2z3 = (y ,y ,y ), ( iz , 2z + z , iz ) 1 2 3 − 2 1 3 − 1 so that T ∗(z , z , z )=( iz , 2z + z , iz ). Writing the matrix for T in terms of the 1 2 3 − 2 1 3 − 1 canonical basis, we see that

0 2 i 0 i 0 − M(T )= i 0 0 and M(T ∗)=  2 0 1 . 0 1 0 i 0 0   −      Note that M(T ∗) can be obtained from M(T ) by taking the complex conjugate of each element and then transposing. This operation is called the conjugate transpose of M(T ), and we denote it by (M(T ))∗.

We collect several elementary properties of the adjoint operation into the following proposition. You should provide a proof of these results for your own practice.

Proposition 11.1.3. Let S, T (V ) and a F. ∈L ∈

1. (S + T )∗ = S∗ + T ∗.

2. (aT )∗ = aT ∗. 11.2. NORMAL OPERATORS 149

3. (T ∗)∗ = T .

4. I∗ = I.

5. (ST )∗ = T ∗S∗.

6. M(T ∗)= M(T )∗.

When n = 1, note that the conjugate transpose of a 1 1 matrix A is just the complex × conjugate of its single entry. Hence, requiring A to be self-adjoint (A = A∗) amounts to saying that this sole entry is real. Because of the transpose, though, reality is not the same as self-adjointness when n> 1, but the analogy does nonetheless carry over to the eigenvalues of self-adjoint operators.

Proposition 11.1.4. Every eigenvalue of a self-adjoint operator is real.

Proof. Suppose λ C is an eigenvalue of T and that 0 = v V is a corresponding eigenvector ∈ ∈ such that T v = λv. Then

2 λ v = λv,v = Tv,v = v, T ∗v = v,Tv = v,λv = λ v, v = λ v 2. This implies that λ = λ. 2 1+ i Example 11.1.5. The operator T (V ) deﬁned by T (v)= v is self-adjoint, ∈L 1 i 3 − and it can be checked (e.g., using the characteristic polynomial) that the eigenvalues of T are λ =1, 4.

11.2 Normal operators

Normal operators are those that commute with their own adjoint. As we will see, this includes many important examples of operations.

Deﬁnition 11.2.1. We call T (V ) normal if T T ∗ = T ∗T . ∈L

Given an arbitrary operator T (V ), we have that T T ∗ = T ∗T in general. However, ∈ L both T T ∗ and T ∗T are self-adjoint, and any self-adjoint operator T is normal. We now give a diﬀerent characterization for normal operators in terms of norms. 150 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS

Proposition 11.2.2. Let V be a complex inner product space, and suppose that T (V ) ∈ L satisﬁes Tv,v =0, for all v V . ∈ Then T =0.

Proof. You should be able to verify that

1 Tu,w = T (u + w),u + w T (u w),u w 4 { − − − +i T (u + iw),u + iw i T (u iw),u iw . − − − } Since each term on the right-hand side is of the form Tv,v , we obtain 0 for each u,w V . ∈ Hence T = 0.

Proposition 11.2.3. Let T (V ). Then T is normal if and only if ∈L

T v = T ∗v , for all v V . ∈

Proof. Note that

T is normal T ∗T T T ∗ =0 ⇐⇒ − (T ∗T T T ∗)v, v =0, for all v V ⇐⇒ − ∈ T T ∗v, v = T ∗Tv,v , for all v V ⇐⇒ ∈ 2 2 T v = T ∗v , for all v V . ⇐⇒ ∈

Corollary 11.2.4. Let T (V ) be a normal operator. ∈L

1. null (T ) = null (T ∗).

2. If λ C is an eigenvalue of T , then λ is an eigenvalue of T ∗ with the same eigenvector. ∈ 3. If λ, C are distinct eigenvalues of T with associated eigenvectors v,w V , respec- ∈ ∈ tively, then v,w =0. Proof. Note that Part 1 follows from Proposition 11.2.3 and the positive deﬁniteness of the norm. 11.3. NORMAL OPERATORS AND THE SPECTRAL DECOMPOSITION 151

To prove Part 2, ﬁrst verify that if T is normal, then T λI is also normal with (T λI)∗ = − − T ∗ λI. Therefore, by Proposition 11.2.3, we have −

0= (T λI)v = (T λI)∗v = (T ∗ λI)v , − − − and so v is an eigenvector of T ∗ with eigenvalue λ. Using Part 2, note that

(λ ) v,w = λv,w v, w = Tv,w v, T ∗w =0. − − −

Since λ = 0 it follows that v,w = 0, proving Part 3. −

11.3 Normal operators and the spectral decomposition

Recall that an operator T (V ) is diagonalizable if there exists a basis B for V such ∈ L that B consists entirely of eigenvectors for T . The nicest operators on V are those that are diagonalizable with respect to some orthonormal basis for V . In other words, these are the operators for which we can ﬁnd an orthonormal basis for V that consists of eigenvectors for T . The Spectral Theorem for ﬁnite-dimensional complex inner product spaces states that this can be done precisely for normal operators.

Theorem 11.3.1 (Spectral Theorem). Let V be a ﬁnite-dimensional inner product space over C and T (V ). Then T is normal if and only if there exists an orthonormal basis for ∈L V consisting of eigenvectors for T .

Proof. (“= ”) Suppose that T is normal. Combining Theorem 7.5.3 and Corollary 9.5.5, there ⇒ exists an orthonormal basis e =(e1,...,en) for which the matrix M(T ) is upper triangular, i.e.,

a11 a1n . . M(T )=  .. .  .

 0 ann     We will show that M(T ) is, in fact, diagonal, which implies that the basis elements e1,...,en are eigenvectors of T . 152 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS

n Since M(T )=(aij)i,j=1 with aij = 0 for i > j, we have T e1 = a11e1 and T ∗e1 = n k=1 a1kek. Thus, by the Pythagorean Theorem and Proposition 11.2.3,

n n 2 2 2 2 2 2 a = a e = T e = T ∗e = a e = a , | 11| 11 1 1 1 1k k | 1k| k=1 k=1 from which it follows that a = = a = 0. Repeating this argument, T e 2 = a 2 | 12| | 1n| j | jj| 2 n 2 and T ∗e = a so that a = 0 for all 2 i < j n. Hence, T is diagonal with j k=j | jk| ij ≤ ≤ respect to the basis e, and e1,...,en are eigenvectors of T . (“ =”) Suppose there exists an orthonormal basis (e ,...,e ) for V that consists of eigen- ⇐ 1 n vectors for T . Then the matrix M(T ) with respect to this basis is diagonal. Moreover,

M(T ∗) = M(T )∗ with respect to this basis must also be a diagonal matrix. It follows that

T T ∗ = T ∗T since their corresponding matrices commute:

M(T T ∗)= M(T )M(T ∗)= M(T ∗)M(T )= M(T ∗T ).

The following corollary is the best possible decomposition of a complex vector space V into subspaces that are invariant under a normal operator T . On each subspace null (T λ I), − i the operator T acts just like multiplication by scalar λi. In other words,

T null (T λiI) = λiInull (T λiI). | − −

Corollary 11.3.2. Let T (V ) be a normal operator, and denote by λ ,...,λ the distinct ∈L 1 m eigenvalues of T .

1. V = null (T λ I) null (T λ I). − 1 ⊕⊕ − m

2. If i = j, then null (T λ I) null (T λ I). − i ⊥ − j

As we will see in the next section, we can use Corollary 11.3.2 to decompose the canonical matrix for a normal operator into a so-called “unitary diagonalization”. 11.4. APPLICATIONS OF THE SPECTRAL THEOREM: DIAGONALIZATION 153 11.4 Applications of the Spectral Theorem: diagonalization

Let e = (e ,...,e ) be a basis for an n-dimensional vector space V , and let T (V ). In 1 n ∈ L this section we denote the matrix M(T ) of T with respect to basis e by [T ]e. This is done to emphasize the dependency on the basis e. In other words, we have that

[T v] = [T ] [v] , for all v V, e e e ∈

where v1 . [v]e =  . 

vn     is the coordinate vector for v = v e + + v e with v F. 1 1 n n i ∈ The operator T is diagonalizable if there exists a basis e such that [T ]e is diagonal, i.e., if there exist λ ,...,λ F such that 1 n ∈

λ1 0 . [T ]e =  ..  .

 0 λn     The scalars λ1,...,λn are necessarily eigenvalues of T , and e1,...,en are the corresponding eigenvectors. We summarize this in the following proposition.

Proposition 11.4.1. T (V ) is diagonalizable if and only if there exists a basis (e ,...,e ) ∈L 1 n consisting entirely of eigenvectors of T .

We can reformulate this proposition using the change of basis transformations as follows.

Suppose that e and f are bases of V such that [T ]e is diagonal, and let S be the change of 1 basis transformation such that [v]e = S[v]f . Then S[T ]f S− = [T ]e is diagonal.

Proposition 11.4.2. T (V ) is diagonalizable if and only if there exists an invertible ∈ L 154 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS

n n matrix S F × such that ∈ λ1 0 1 . S[T ]f S− =  ..  ,

 0 λn     where [T ]f is the matrix for T with respect to a given arbitrary basis f =(f1,...,fn).

On the other hand, the Spectral Theorem tells us that T is diagonalizable with respect to an orthonormal basis if and only if T is normal. Recall that

[T ∗]f = [T ]f∗ for any orthonormal basis f of V . As before,

n n A∗ =(aji)ij=1, for A =(aij)i,j=1,

T is the conjugate transpose of the matrix A. When F = R, note that A∗ = A is just the T n transpose of the matrix, where A =(aji)i,j=1. The change of basis transformation between two orthonormal bases is called unitary in the complex case and orthogonal in the real case. Let e =(e1,...,en) and f =(f1,...,fn) be two orthonormal bases of V , and let U be the change of basis matrix such that [v]f = U[v]e, for all v V . Then ∈ e , e = δ = f , f = Ue , Ue . i j ij i j i j Since this holds for the basis e, it follows that U is unitary if and only if

Uv,Uw = v,w for all v,w V. (11.1) ∈

This means that unitary matrices preserve the inner product. Operators that preserve the inner product are often also called isometries. Orthogonal matrices also deﬁne isometries.

By the deﬁnition of the adjoint, Uv,Uw = v, U ∗Uw , and so Equation 11.1 implies that isometries are characterized by the property

U ∗U = I, for the unitary case, OT O = I, for the orthogonal case. 11.4. APPLICATIONS OF THE SPECTRAL THEOREM: DIAGONALIZATION 155

1 The equation U ∗U = I implies that U − = U ∗. For ﬁnite-dimensional inner product spaces, the left inverse of an operator is also the right inverse, and so

UU ∗ = I if and only if U ∗U = I, (11.2) OOT = I if and only if OT O = I.

It is easy to see that the columns of a unitary matrix are the coeﬃcients of the elements of an orthonormal basis with respect to another orthonormal basis. Therefore, the columns are orthonormal vectors in Cn (or in Rn in the real case). By Condition (11.2), this is also true for the rows of the matrix. The Spectral Theorem tells us that T (V ) is normal if and only if [T ] is diagonal ∈ L e with respect to an orthonormal basis e for V , i.e., if there exists a unitary matrix U such that λ1 0 . UT U ∗ =  ..  .

 0 λn     Conversely, if a unitary matrix U exists such that UT U ∗ = D is diagonal, then

T T ∗ T ∗T = U ∗(DD DD)U =0 − − since diagonal matrices commute, and hence T is normal. Let us summarize some of the deﬁnitions that we have seen in this section.

n n Deﬁnition 11.4.3. Given a square matrix A F × , we call ∈

1. symmetric if A = AT .

2. Hermitian if A = A∗.

3. orthogonal if AAT = I.

4. unitary if AA∗ = I.

Note that every type of matrix in Deﬁnition 11.4.3 is an example of a normal operator. 156 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS

An example of a normal operator N that is neither Hermitian nor unitary is

1 1 N = i − − . 1 1 −

You can easily verify that NN ∗ = N ∗N and that iN is symmetric (not Hermitian).

Example 11.4.4. Consider the matrix

2 1+ i A = 1 i 3 − from Example 11.1.5. To unitarily diagonalize A, we need to find a unitary matrix U and a 1 2 diagonal matrix D such that A = UDU − . To do this, we need to first find a basis for C that consists entirely of orthonormal eigenvectors for the linear map T (C2) defined by ∈ L T v = Av, for all v C2. ∈ To find such an orthonormal basis, we start by finding the eigenspaces of T . We already 1 0 determined that the eigenvalues of T are λ1 = 1 and λ2 = 4, so D = . It follows that 0 4

C2 = null (T I) null (T 4I) − ⊕ − = span(( 1 i, 1)) span((1 + i, 2)). − − ⊕ Now apply the Gram-Schmidt procedure to each eigenspace in order to obtain the columns of U. Here,

1 1 i 1+i 1 i 1+i − − − − − 1 √3 √6 1 0 √3 √6 A = UDU − = 1 2 0 4 1 2 √3 √6 √3 √6 1 i 1+i 1+i 1 − − − √3 √6 1 0 √3 √3 = 1 2 1 i 2 . 0 4 − √3 √6 √6 √6

As an application, note that such diagonal decomposition allows us to easily compute 1 powers and the exponential of matrices. Namely, if A = UDU − with D diagonal, then we 11.5. POSITIVE OPERATORS 157

have

n 1 n n 1 A =(UDU − ) = UD U − ,

∞ ∞ 1 k 1 k 1 1 exp(A)= A = U D U − = U exp(D)U − . k! k! k=0 k=0 Example 11.4.5. Continuing Example 11.4.4,

2 1 2 2 1 1 0 6 5+5i A =(UDU − ) = UD U − = U U ∗ = , 0 16 5 5i 11 −

2 n 1 1+i 2n n 1 n n 1 1 0 3 (1+2 − ) 3 ( 1+2 ) A =(UDU − ) = UD U − = U U ∗ = − , 2n 1 i 2n 1 2n+1 0 2 − ( 1+2 ) (1+2 ) 3 − 3

4 4 4 1 e 0 1 1 2e + e e e + i(e e) exp(A)= U exp(D)U − = U U − = − − . 0 e4 3 e4 e + i(e e4) e +2e4 − − 11.5 Positive operators

Recall that self-adjoint operators are the operator analog for real numbers. Let us now deﬁne the operator analog for positive (or, more precisely, nonnegative) real numbers.

Deﬁnition 11.5.1. An operator T (V ) is called positive (denoted T 0) if T = T ∗ ∈ L ≥ and Tv,v 0 for all v V . ≥ ∈ (If V is a complex vector space, then the condition of self-adjointness follows from the condition Tv,v 0 and hence can be dropped). ≥

Example 11.5.2. Note that, for all T (V ), we have T ∗T 0 since T ∗T is self-adjoint ∈ L ≥ and T ∗Tv,v = Tv,Tv 0. ≥ Example 11.5.3. Let U V be a subspace of V and P be the orthogonal projection onto ⊂ U U. Then P 0. To see this, write V = U U ⊥ and v = u + u⊥ for each v V , where U ≥ ⊕ v v ∈ u U and u⊥ U ⊥. Then P v,w = u ,u + u⊥ = u ,u = u + u⊥,u = v,P w v ∈ v ∈ U v w w v w v v w U so that P ∗ = P . Also, setting v = w in the above string of equations, we obtain P v, v = U U U u ,u 0, for all v V . Hence, P 0. v v ≥ ∈ U ≥ 158 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS

If λ is an eigenvalue of a positive operator T and v V is an associated eigenvector, then ∈ Tv,v = λv,v = λ v, v 0. Since v, v 0 for all vectors v V , it follows that λ 0. ≥ ≥ ∈ ≥ This fact can be used to deﬁne √T by setting

√T ei = λiei, where λi are the eigenvalues of T with respect to the orthonormal basis e =(e1,...,en). We know that these exist by the Spectral Theorem.

11.6 Polar decomposition

Continuing the analogy between C and (V ), recall the polar form of a complex number L z = z eiθ, where z is the absolute value or modulus of z and eiθ lies on the unit circle in | | | | R2. In terms of an operator T (V ), where V is a complex inner product space, a unitary ∈L operator U takes the role of eiθ, and T takes the role of the modulus. As in Section 11.5, | | T ∗T 0 so that T := √T T exists and satisﬁes T 0 as well. ≥ | | ∗ | | ≥

Theorem 11.6.1. For each T (V ), there exists a unitary U such that ∈L

T = U T . | |

This is called the polar decomposition of T .

Sketch of proof. We start by noting that

T v 2 = T v 2, | |

since Tv,Tv = v, T ∗T v = √T T v, √T T v . This implies that null (T ) = null ( T ). By ∗ ∗ | | the Dimension Formula, this also means that dim(range (T )) = dim(range( T )). Moreover, | | we can deﬁne an isometry S : range( T ) range (T ) by setting | | →

S( T v)= T v. | |

The trick is now to deﬁne a unitary operator U on all of V such that the restriction of U 11.7. SINGULAR-VALUE DECOMPOSITION 159

onto the range of T is S, i.e., | | U range ( T ) = S. | | | Note that null ( T ) range( T ), i.e., for v null ( T ) and w = T u range( T ), | | ⊥ | | ∈ | | | | ∈ | |

w, v = T u, v = u, T v = u, 0 =0 | | | | since T is self-adjoint. | | Pick an orthonormal basis e = (e ,...,e ) of null ( T ) and an orthonormal basis f = 1 m | | (f ,...,f ) of (range(T ))⊥. Set Se˜ = f , and extend S˜ to all of null ( T ) by linearity. Since 1 m i i | | null ( T ) range ( T ), any v V can be uniquely written as v = v +v , where v null ( T ) | | ⊥ | | ∈ 1 2 1 ∈ | | and v range ( T ). Now deﬁne U : V V by setting Uv = Sv˜ + Sv . Then U is an 2 ∈ | | → 1 2 isometry. Moreover, U is also unitary, as shown by the following calculation application of the Pythagorean theorem:

Uv 2 = Sv˜ + Sv 2 = Sv˜ 2 + Sv 2 1 2 1 2 = v 2 + v 2 = v 2. 1 2

Also, note that U T = T by construction since U null ( T ) is irrelevant. | | | | |

11.7 Singular-value decomposition

The singular-value decomposition generalizes the notion of diagonalization. To unitarily diagonalize T (V ) means to ﬁnd an orthonormal basis e such that T is diagonal with ∈ L respect to this basis, i.e.,

λ1 0 . M(T ; e, e) = [T ]e =  ..  ,

 0 λn     where the notation M(T ; e, e) indicates that the basis e is used both for the domain and codomain of T . The Spectral Theorem tells us that unitary diagonalization can only be 160 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS done for normal operators. In general, we can ﬁnd two orthonormal bases e and f such that

s1 0 . M(T ; e, f)=  ..  ,

 0 sn     which means that T ei = sifi even if T is not normal. The scalars si are called singular values of T . If T is diagonalizable, then these are the absolute values of the eigenvalues.

Theorem 11.7.1. All T (V ) have a singular-value decomposition. That is, there exist ∈ L orthonormal bases e =(e1,...,en) and f =(f1,...,fn) such that

T v = s v, e f + + s v, e f , 1 1 1 n n n where si are the singular values of T .

Proof. Since T 0, it is also also self-adjoint. Thus, by the Spectral Theorem, there is an | | ≥ orthonormal basis e =(e ,...,e ) for V such that T e = s e . Let U be the unitary matrix 1 n | | i i i in the polar decomposition of T . Since e is orthonormal, we can write any vector v V as ∈

v = v, e e + + v, e e , 1 1 n n and hence T v = U T v = s v, e Ue + + s v, e Ue . | | 1 1 1 n n n Now set f = Ue for all 1 i n. Since U is unitary, (f ,...,f ) is also an orthonormal i i ≤ ≤ 1 n basis, proving the theorem. 11.7. SINGULAR-VALUE DECOMPOSITION 161 Exercises for Chapter 11

Calculational Exercises

3 1. Consider R with two orthonormal bases: the canonical basis e = (e1, e2, e3) and the

basis f =(f1, f2, f3), where

1 1 1 f1 = (1, 1, 1), f2 = (1, 2, 1), f3 = (1, 0, 1) . √3 √6 − √2 −

Find the canonical matrix, A, of the linear map T (R3) with eigenvectors f , f , f ∈L 1 2 3 and eigenvalues 1, 1/2, 1/2, respectively. −

2. For each of the following matrices, verify that A is Hermitian by showing that A = A∗, 1 ﬁnd a unitary matrix U such that U − AU is a diagonal matrix, and compute exp(A).

4 1 i 3 i 6 2+2i (a) A = − (b) A = − (c) A = 1+ i 5 i 3 2 2i 4 − i i 2 − 50 0 √2 √2 0 3+ i i  − 2 0  (d) A = (e) A =  0 1 1+ i  (f) A = √2 3 i 3 − − − − 0 1 i 0  i 0 2   − −   √2        3. For each of the following matrices, either ﬁnd a matrix P (not necessarily unitary) 1 such that P − AP is a diagonal matrix, or show why no such matrix exists.

19 9 6 1 4 2 5 0 0 − − − − (a) A = 25 11 9 (b) A = 34 0 (c) A = 1 5 0  − −   −    17 9 4 31 3 0 1 5  − −   −          0 0 0 i 1 1 0 0 i − (d) A = 0 0 0 (e) A = i 1 1 (f) A = 4 0 i    −    3 0 1 i 1 1 0 0 i    −          162 CHAPTER 11. THE SPECTRAL THEOREM FOR NORMAL LINEAR MAPS

4. Let r R and let T (C2) be the linear map with canonical matrix ∈ ∈L 1 1 T = − . 1 r − (a) Find the eigenvalues of T . (b) Find an orthonormal basis of C2 consisting of eigenvectors of T .

5. Let A be the complex matrix given by:

50 0 A = 0 1 1+ i  − −  0 1 i 0  − −    (a) Find the eigenvalues of A. (b) Find an orthonormal basis of eigenvectors of A. (c) Calculate A = √A A. | | ∗ (d) Calculate eA.

6. Let θ R, and let T (C2) have canonical matrix ∈ ∈L 1 eiθ M(T )= . iθ e− 1 − (a) Find the eigenvalues of T . (b) Find an orthonormal basis for C2 that consists of eigenvectors for T .

Proof-Writing Exercises

1. Prove or give a counterexample: The product of any two self-adjoint operators on a ﬁnite-dimensional vector space is self-adjoint.

2. Prove or give a counterexample: Every unitary matrix is invertible. 11.7. SINGULAR-VALUE DECOMPOSITION 163

3. Let V be a ﬁnite-dimensional vector space over F, and suppose that T (V ) satisﬁes ∈L T 2 = T . Prove that T is an orthogonal projection if and only if T is self-adjoint.

4. Let V be a ﬁnite-dimensional inner product space over C, and suppose that T (V ) ∈L has the property that T ∗ = T . (We call T a skew Hermitian operator on V .) − (a) Prove that the operator iT (V ) deﬁned by (iT )(v)= i(T (v)), for each v V , ∈L ∈ is Hermitian. (b) Prove that the canonical matrix for T can be unitarily diagonalized. (c) Prove that T has purely imaginary eigenvalues.

5. Let V be a ﬁnite-dimensional vector space over F, and suppose that S, T (V ) are ∈ L positive operators on V . Prove that S + T is also a positive operator on T .

6. Let V be a ﬁnite-dimensional vector space over F, and let T (V ) be any operator ∈L on V . Prove that T is invertible if and only if 0 is not a singular value of T . Supplementary Notes on Matrices and Linear Systems

As discussed in Chapter 1, there are many ways in which you might try to solve a system of linear equation involving a finite number of variables. These supplementary notes are intended to illustrate the use of Linear Algebra in solving such systems. In particular, any arbitrary number of equations in any number of unknowns — as long as both are finite — can be encoded as a single matrix equation. As you will see, this has many computational advantages, but, perhaps more importantly, it also allows us to better understand linear systems abstractly. Specifically, by exploiting the deep connection between matrices and so- called linear maps, one can completely determine all possible solutions to any linear system. These notes are also intended to provide a self-contained introduction to matrices and important matrix operations. As you read the sections below, remember that a matrix is, in general, nothing more than a rectangular array of real or complex numbers. Matrices are not linear maps. Instead, a matrix can (and will often) be used to define a linear map.

12.1 From linear systems to matrix equations

We begin this section by reviewing the definition of and notation for matrices. We then review several different conventions for denoting and studying systems of linear equations, the most fundamental being as a single matrix equation. This point of view has a long history of exploration, and numerous computational devices — including several computer programming languages — have been developed and optimized specifically for analyzing matrix equations.

164 12.1. FROM LINEAR SYSTEMS TO MATRIX EQUATIONS 165

12.1.1 Deﬁnition of and notation for matrices

Let m, n Z be positive integers, and, as usual, let F denote either R or C. Then we begin ∈ + by deﬁning an m n matrix A to be a rectangular array of numbers ×

a a 11 1n m,n (i,j) m,n . .. . A = (aij)i,j=1 = (A )i,j=1 =  . . .  m numbers  am1 amn      n numbers  where each element a F in the array is called an entry of A (speciﬁcally, a is called ij ∈ ij the “i, j entry”). We say that i indexes the rows of A as it ranges over the set 1,...,m { } and that j indexes the columns of A as it ranges over the set 1,...,n . We also say that { } the matrix A has size m n and note that it is a (ﬁnite) sequence of doubly-subscripted × numbers for which the two subscripts in no way depend upon each other.

m n Deﬁnition 12.1.1. Given positive integers m, n Z , we use F × to denote the set of all ∈ + m n matrices having entries over F ×

1 02 2 3 2 3 Example 12.1.2. The matrix A = C × , but A / R × since the “2, 3” entry 1 3 i ∈ ∈ − of A is not in R.

Given the ubiquity of matrices in both abstract and applied mathematics, a rich vocabulary has been developed for describing various properties and features of matrices. In addition, there is also a rich set of equivalent notations. For the purposes of these notes, we will use the above notation unless the size of the matrix is understood from context or is unimportant. In this case, we will drop much of this notation and denote a matrix simply as

A =(aij) or A =(aij)m n. × To get a sense of the essential vocabulary, suppose that we have an m n matrix A =(a ) × ij with m = n. Then we call A a square matrix. The elements a11, a22,...,ann in a square

matrix form the main diagonal of A, and the elements a1n, a2,n 1,...,an1 form what is − sometimes called the skew main diagonal of A. Entries not on the main diagonal are 166 CHAPTER 12. SUPPLEMENTARY NOTES also often called oﬀ-diagonal entries, and a matrix whose oﬀ-diagonal entries are all zero is called a diagonal matrix. It is common to call a12, a23,...,an 1,n the superdiagonal of A − and a21, a32,...,an,n 1 the subdiagonal of A. The motivation for this terminology should − be clear if you create a sample square matrix and trace the entries within these particular subsequences of the matrix. Square matrices are important because they are fundamental to applications of Linear Algebra. In particular, virtually every use of Linear Algebra either involves square matrices directly or employs them in some indirect manner. In addition, virtually every usage also involves the notion of vector, where here we mean either an m 1 matrix (a.k.a. a row × vector)ora1 n matrix (a.k.a. a column vector). ×

Example 12.1.3. Suppose that A = (aij), B = (bij), C = (cij), D = (dij), and E = (eij) are the following matrices over F:

3 1 5 2 6 1 3 4 1 A =  1 , B = − , C = 1, 4, 2 ,D =  1 0 1 , E =  1 1 2 . − 0 2 − − 1 3 2 4 4 1 3             Then we say that A is a 3 1 matrix (a.k.a. a column vector), B is a 2 2 square matrix, × × C is a 1 3 matrix (a.k.a. a row vector), and both D and E are square 3 3 matrices. × × Moreover, only B is an upper-triangular matrix (as deﬁned below), and none of the matrices in this example are diagonal matrices. We can discuss individual entries in each matrix. E.g.,

the 2th row of D is d = 1, d = 0, and d = 1. • 21 − 22 23 the main diagonal of D is the sequence d =1,d =0,d = 4. • 11 22 33 the skew main diagonal of D is the sequence d =2,d =0,d = 3. • 13 22 31 the oﬀ-diagonal entries of D are (by row) d , d , d , d , d , and d . • 12 13 21 23 31 32 the 2th column of E is e = e = e = 1. • 12 22 32 the superdiagonal of E is the sequence e =1, e = 2. • 12 23 the subdiagonal of E is the sequence e = 1, e = 1. • 21 − 32 12.1. FROM LINEAR SYSTEMS TO MATRIX EQUATIONS 167

n n A square matrix A =(a ) F × is called upper triangular (resp. lower triangular) ij ∈ if a = 0 for each pair of integers i, j 1,...,n such that i > j (resp. i < j). In other ij ∈ { } words, A is triangular if it has the form

a a a a a 0 0 0 11 12 13 1n 11  0 a22 a23 a2n a21 a22 0 0   0 0 a33 a3n or a31 a32 a33 0  .  . . . . .   . . . . .   ......   ......   . . . .   . . . .       0 0 0 ann an1 an2 an3 ann         Note that a diagonal matrix is simultaneously both an upper triangular matrix and a lower triangular matrix. Two particularly important examples of diagonal matrices are deﬁned as follows: Given any positive integer n Z , we can construct the identity matrix I and the zero matrix ∈ + n 0n n by setting ×

1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0     0 0 1 0 0 0 0 0 0 0     In = . . . . . . and 0n n = . . . . . . , ......  × ......          0 0 0 1 0 0 0 0 0 0     0 0 0 0 1 0 0 0 0 0         where each of these matrices is understood to be a square matrix of size n n. The zero × matrix 0m n is analogously deﬁned for any m, n Z+ and has size m n. I.e., × ∈ ×

0 0 0 0 0 0 0 0 0 0   0 0 0 0 0    0m n = . . . . . . m rows × ......      0 0 0 0 0   0 0 0 0 0      n columns  168 CHAPTER 12. SUPPLEMENTARY NOTES

12.1.2 Using matrices to encode linear systems

Let m, n Z be positive integers. Then a system of m linear equations in n unknowns ∈ + x1,...,xn looks like

a x + a x + a x + + a x = b 11 1 12 2 13 3 1n n 1 a21x1 + a22x2 + a23x3 + + a2nxn = b2   a31x1 + a32x2 + a33x3 + + a3nxn = b3  , (12.3)  .  .  a x + a x + a x + + a x = b  m1 1 m2 2 m3 3 mn n m   where each a , b F is a scalar for i =1, 2,...,m and j =1, 2,...,n . In other words, each ij i ∈ scalar b ,...,b F is being written as a linear combination of the unknowns x ,...,x 1 m ∈ 1 n using coefficients from the field F. To solve System (12.3) means to describe the set of all possible values for x1,...,xn (when thought of as scalars in F) such that each of the m equations in System (12.3) is satisfied simultaneously. Rather than dealing directly with a given linear system, it is often convenient to first encode the system using less cumbersome notation. Specifically, System (12.3) can be summarized using exactly three matrices. First, we collect the coefficients from each equation m n into the m n matrix A =(a ) F × , which we call the coefficient matrix for the linear × ij ∈ system. Similarly, we assemble the unknowns x , x ,...,x into an n 1 column vector 1 2 n × x = (x ) Fn, and the right-hand sides b , b ,...,b of the equation are used to form an i ∈ 1 2 m m 1 column vector b =(b ) Fm. In other words, × i ∈ a a a x b 11 12 1n 1 1 a21 a22 a2n x2 b2 A =   , x =   , and b =   ......  . . . .   .   .        a a a  x  b   m1 m2 mn  n  m       Then the left-hand side of the ith equation in System (12.3) can be recovered by taking the dot product (a.k.a. Euclidean inner product) of x with the ith row in A:

n a a a x = a x = a x + a x + a x + + a x . i1 i2 in ij j i1 1 i2 2 i3 3 in n j=1 12.1. FROM LINEAR SYSTEMS TO MATRIX EQUATIONS 169

In general, we can extend the dot product between two vectors in order to form the product of any two matrices (as in Section 12.2.2). For the purposes of this section, though, m n n it suffices to simply define the product of the matrix A F × and the vector x F to be ∈ ∈ a a a x a x + a x + + a x 11 12 1n 1 11 1 12 2 1n n  a21 a22 a2n  x2   a21x1 + a22x2 + + a2nxn  Ax = = . (12.4) ......  . . . .   .   .        a a a  x  a x + a x + + a x   m1 m2 mn  n  m1 1 m2 2 mn n       Then, since each entry in the resulting m 1 column vector Ax Fm corresponds exactly to × ∈ the left-hand side of each equation in System 12.3, we have effectively encoded System (12.3) as the single matrix equation

a11x1 + a12x2 + + a1nxn b1  a21x1 + a22x2 + + a2nxn  . Ax = = . = b. (12.5) .  .   .    bm a x + a x + + a x     m1 1 m2 2 mn n     Example 12.1.4. The linear system

x + 6x + 4x 2x = 14 1 2 5 − 6 x + 3x + x = 3 . 3 5 6 −   x4 + 5x5 + 2x6 = 11 

 has three equations and involves the six variables x1, x2,...,x6. One can check that possible solutions to this system include

x1 14 x1 6

x2  0  x2  1  x3  3 x3  9   = −  and   = −  . x4  11  x4  5       −          x6  0  x6  2          x6  0  x6  3                  Note that, in describing these solutions, we have used the six unknowns x1, x2,...,x6 to 170 CHAPTER 12. SUPPLEMENTARY NOTES form the 6 1 column vector x = (x ) F6. We can similarly form the coeﬃcient matrix × i ∈ 3 6 3 A F × and the 3 1 column vector b F , where ∈ × ∈ 16004 2 b 14 − 1 A = 00103 1 and b = 3 .    2 −  00015 2 b 11    3         You should check that, given these matrices, each of the solutions given above satisﬁes Equation (12.5).

We close this section by mentioning another common conventions for encoding linear systems. Specifically, rather than attempt to solve Equation (12.3) directly, one can instead look at the equivalent problem of describing all coefficients x ,...,x F for which the 1 n ∈ following vector equation is satisfied:

a11 a12 a13 a1n b1

 a21   a22   a23   a2n   b2  x1  a31  + x2  a32  + x3  a33  + + xn  a3n  =  b3  . (12.6)  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .            am1 am2 am3 amn bm                     ( ,j) This approach emphasizes analysis of the so-called column vectors A (j =1,...,n) of the coeﬃcient matrix A in the matrix equation Ax = b. (See in Section 12.2 for more details about how Equation (12.6) is formed). Conversely, it is also common to directly encounter Equation (12.6) when studying certain questions about vectors in Fn.

It is important to note that System (12.3) diﬀers from Equations (12.5) and (12.6) only in terms of notation. The common aspect of these diﬀerent representations is that the left-hand side of each equation in System (12.3) is a linear sum. Because of this, it is also common to rewrite System (12.3) using more compact notation such as

n n n n

a1kxk = b1, a2kxk = b2, a3kxk = b3,..., amkxk = bm. k=1 k=1 k=1 k=1 12.2. MATRIX ARITHMETIC 171 12.2 Matrix arithmetic

m n In this section, we examine algebraic properties of the set F × (where m, n Z ). Specifi- ∈ + m n cally, F × forms a vector space under the operations of component-wise addition and scalar multiplication, and it is isomorphic to Fmn as a vector space. We also define a multiplication operation between matrices of compatible size and show m n that this multiplication operation interacts with the vector space structure on F × in a n n natural way. In particular, F × forms an algebra over F with respect to these operations. (See Section B.3 for the definition of an algebra.)

12.2.1 Addition and scalar multiplication

Let A = (a ) and B = (b ) be m n matrices over F (where m, n Z ), and let α F. ij ij × ∈ + ∈ Then matrix addition A+B = ((a+b)ij)m n and scalar multiplication αA = ((αa)ij)m n × × are both deﬁned component-wise, meaning

(a + b)ij = aij + bij and (αa)ij = αaij.

Equivalently, A + B is the m n matrix given by ×

a11 a1n b11 b1n a11 + b11 a1n + b1n . . . . . . . . .  . .. .  +  . .. .  =  . .. .  ,

am1 amn bm1 bmn am1 + bm1 amn + bmn             and αA is the m n matrix given by ×

a11 a1n αa11 αa1n . . . . . . α  . .. .  =  . .. .  .

am1 amn αam1 αamn         Example 12.2.1. With notation as in Example 12.1.3,

7 6 5 D + E = 2 1 3 ,  −  7 3 7     172 CHAPTER 12. SUPPLEMENTARY NOTES and no two other matrices from Example 12.1.3 can be added since their sizes are not compatible. Similarly, we can make calculations like

5 4 1 0 0 0 − − D E = D +( 1)E = 0 1 1 and 0D =0E = 0 0 0 =03 3. − −  − −    × 1 1 1 0 0 0  −       

It is important to note that, while these are not the only ways of deﬁning addition m n and scalar multiplication operations on F × , the above operations have the advantage of m n m n endowing F × with a reasonably natural vector space structure. As a vector space, F × is seen to have dimension mn since we can build the standard basis matrices

E11, E12,...,E1n, E21, E22,...,E2n,...,Em1, Em2,...,Emn

mn (k,ℓ) by analogy to the standard basis for F . That is, each Ekℓ = ((e )ij) satisﬁes

(k,ℓ) 1, if i = k and j = ℓ (e )ij = . 0, otherwise 

m n mn This allows us to build a vector space isomorphism F × F using a bijection that → m n simply “lays each matrix out ﬂat”. In other words, given A =(a ) F × , ij ∈ a a 11 1n . . . mn  . .. .  (a11, a12,...,a1n, a21, a22,...,a2n,...,am1, am2,...,amn) F . → ∈ am1 amn    

2 3 Example 12.2.2. The vector space R × of 2 3 matrices over R has standard basis × 1 0 0 0 1 0 0 0 1 E11 = , E12 = , E13 = , 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 E21 = , E22 = , E23 = , 1 0 0 0 1 0 0 0 1 12.2. MATRIX ARITHMETIC 173 which is seen to naturally correspond with the standard basis e ,...,e for R , where { 1 6} 6

e1 = (1, 0, 0, 0, 0, 0), e2 = (0, 1, 0, 0, 0, 0),...,e6 = (0, 0, 0, 0, 0, 1).

m n Of course, it is not enough to just assert that F × is a vector space since we have yet to verify that the above deﬁned operations of addition and scalar multiplication satisfy the vector space axioms. The proof of the following theorem is straightforward and something that you should work through for practice with matrix notation.

Theorem 12.2.3. Given positive integers m, n Z and the operations of matrix addition ∈ + m n and scalar multiplication as deﬁned above, the set F × of all m n matrices satisﬁes each × of the following properties.

m n 1. (associativity of matrix addition) Given any three matrices A,B,C F × , ∈

(A + B)+ C = A +(B + C).

m n 2. (additive identity for matrix addition) Given any matrix A F × , ∈

A +0m n =0m n + A = A. × ×

m n 3. (additive inverses for matrix addition) Given any matrix A F × , there exists a ∈ m n matrix A F × such that − ∈

A +( A)=( A)+ A =0m n. − − ×

m n 4. (commutativity of matrix addition) Given any two matrices A, B F × , ∈

A + B = B + A.

m n 5. (associativity of scalar multiplication) Given any matrix A F × and any two scalars ∈ α, β F, ∈ (αβ)A = α(βA). 174 CHAPTER 12. SUPPLEMENTARY NOTES

m n 6. (multiplicative identity for scalar multiplication) Given any matrix A F × and ∈ denoting by 1 the multiplicative identity of F,

1A = A.

m n 7. (distributivity of scalar multiplication) Given any two matrices A, B F × and any ∈ two scalars α, β F, ∈

(α + β)A = αA + βA and α(A + B)= αA + αB.

m n In other words, F × forms a vector space under the operations of matrix addition and scalar multiplication.

As a consequence of Theorem 12.2.3, every property that holds for an arbitrary vector m n space can be taken as a property of F × speciﬁcally. We highlight some of these properties in the following corollary to Theorem 12.2.3.

Corollary 12.2.4. Given positive integers m, n Z and the operations of matrix addition ∈ + m n and scalar multiplication as deﬁned above, the set F × of all m n matrices satisﬁes each × of the following properties:

m n 1. Given any matrix A F × , given any scalar α F, and denoting by 0 the additive ∈ ∈ identity of F,

0A = A and α0m n =0m n. × ×

m n 2. Given any matrix A F × and any scalar α F, ∈ ∈

αA =0 = either α =0 or A =0m n. ⇒ ×

m n 3. Given any matrix A F × and any scalar α F, ∈ ∈

(αA)=( α)A = α( A). − − −

In particular, the additive inverse A of A is given by A =( 1)A, where 1 denoted − − − − the additive inverse for the additivity identity of F. 12.2. MATRIX ARITHMETIC 175

While one could prove Corollary 12.2.4 directly from deﬁnitions, the point of recognizing m n F × as a vector space is that you get to use these results without worrying about their m n m n proof. Moreover, there is no need to separate prove for each of R × and C × .

12.2.2 Multiplying and matrices

r s s t Let r, s, t Z be positive integers, A =(a ) F × be an r s matrix, and B =(b ) F × ∈ + ij ∈ × ij ∈ be an s t matrix. Then matrix multiplication AB = ((ab)ij)r t is deﬁned by × × s

(ab)ij = aikbkj. k=1 In particular, note that the “i, j entry” of the matrix product AB involves a summation over the positive integer k =1,...,s, where s is both the number of columns in A and the number of rows in B. Thus, this multiplication is only deﬁned when the “middle” dimension of each matrix is the same:

a11 a1s b11 b1t . . . . . . (aij)r s(bij)s t = r  . .. .  . .. .  s × ×   ar1 arsbs1 bst        s t 

s a b s a b k=1 1k k1 k=1 1k kt . .. . =  . . .  r s s   arkbk1 arkbkt  k=1 k=1     t  Alternatively, if we let n Z be a positive integer, then another way of viewing matrix ∈ + n 1 n n 1 multiplication is through the use of the standard inner product on F = F × = F × . In particular, we deﬁne the dot product (a.k.a. Euclidean inner product) of the row vector 176 CHAPTER 12. SUPPLEMENTARY NOTES

1 n n 1 x =(x ) F × and the column vector y =(y ) F × to be 1j ∈ i1 ∈

y11 n . x y = x11, , x1n  .  = x1kyk1 F. ∈ k=1 yn1     We can then decompose matrices A = (aij)r s and B = (bij)s t into their constituent row × × vectors by ﬁxing a positive integer k Z and setting ∈ +

(k, ) 1 s (k, ) 1 t A = a , , a F × and B = b , , b F × . k1 ks ∈ k1 kt ∈ Similarly, ﬁxing ℓ Z , we can also decompose A and B into the column vectors ∈ +

a1ℓ b1ℓ ( ,ℓ) . r 1 ( ,ℓ) . s 1 A =  .  F × and B =  .  F × . ∈ ∈ arℓ bsℓ         It follows that the product AB is the following matrix of dot products:

(1, ) ( ,1) (1, ) ( ,t) A B A B . . . r t AB =  . .. .  F × . ∈ (r, ) ( ,1) (r, ) ( ,t) A B A B     

Example 12.2.5. With notation as in Example 12.1.3, you should sit down and use the above deﬁnitions in order to verify that the following matrix products hold.

3 3 12 6 3 3 AC = 1 1, 4, 2 = 1 4 2 F × ,  −   − − −  ∈ 1 1 4 2         3 CA = 1, 4, 2 1 =3 4+2=1 F,  −  − ∈ 1     12.2. MATRIX ARITHMETIC 177

2 4 1 4 1 16 6 2 2 B = BB = − − = − F × , 0 2 0 2 0 4 ∈

6 1 3 1 3 CE = 1, 4, 2 1 1 2 = 10, 7, 17 F × , and  −  ∈ 4 1 3     1 5 2 3 0 3 1 DA = 1 0 1 1 = 2 F × .  −   −   −  ∈ 3 2 4 1 11             Note, though, that B cannot be multiplied by any of the other matrices, nor does it make sense to try to form the products AD, AE, DC, and EC due to the inherent size mismatches. As illustrated in Example 12.2.5 above, matrix multiplication is not a commutative op- 3 3 1 1 eration (since, e.g., AC F × while CA F × ). Nonetheless, despite the complexity of its ∈ ∈ deﬁnition, the matrix product otherwise satisﬁes many familiar properties of a multiplication operation. We summarize the most basic of these properties in the following theorem. Theorem 12.2.6. Let r,s,t,u Z be positive integers. ∈ + r s s t t u 1. (associativity of matrix multiplication) Given A F × , B F × , and C F × , ∈ ∈ ∈

A(BC)=(AB)C.

r s s t t u 2. (distributivity of matrix multiplication) Given A F × , B,C F × , and D F × , ∈ ∈ ∈

A(B + C)= AB + AC and (B + C)D = BD + CD.

r s s t 3. (compatibility with scalar multiplication) Given A F × , B F × , and α F, ∈ ∈ ∈

α(AB)=(αA)B = A(αB).

n n Moreover, given any positive integer n Z , F × is an algebra over F. ∈ + As with Theorem 12.2.3, you should work through a proof of each part of Theorem 12.2.6 (and especially of the ﬁrst part) in order to practice manipulating the indices of entries correctly. We state and prove a useful followup to Theorems 12.2.3 and 12.2.6 as an illustration. 178 CHAPTER 12. SUPPLEMENTARY NOTES

n n Theorem 12.2.7. Let A, B F × be upper triangular matrices and c F be any scalar. ∈ ∈ Then each of the following properties hold:

1. cA is upper triangular.

2. A + B is upper triangular.

3. AB is upper triangular.

In other words, the set of all m n upper triangular matrices forms an algebra over F. × Moreover, each of the above statements still holds when upper triangular is replaced by lower triangular.

Proof. The proofs of Parts 1 and 2 are straightforward and follow directly form the appropriate deﬁnitions. Moreover, the proof of the case for lower triangular matrices follows from the fact that a matrix A is upper triangular if and only if AT is lower triangular, where AT denotes the transpose of A. (See Section 12.5.1 for the deﬁnition of transpose.)

To prove Part 3, we start from the deﬁnition of the matrix product. Denoting A =(aij) and B =(b ), note that AB = ((ab) ) is an n n matrix having “i-j entry” given by ij ij × n

(ab)ij = aikbkj. k=1

Since A and B are upper triangular, we have that aik = 0 when i>k and that bkj = 0 when k > j. Thus, to obtain a non-zero summand a b = 0, we must have both a = 0, ik kj ik which implies that i k, and b = 0, which implies that k j. In particular, these two ≤ kj ≤ conditions are simultaneously satisﬁable only when i j. Therefore, (ab) = 0 when i > j, ≤ ij from which AB is upper triangular.

At the same time, you should be careful not to blithely perform operations on matrices as you would with numbers. The fact that matrix multiplication is not a commutative operation should make it clear that signiﬁcantly more care is required with matrix arithmetic. As n n another example, given a positive integer n Z , the set F × has what are called zero ∈ + n n divisors. That is, there exist non-zero matrices A, B F × such that AB =0n n: ∈ × 2 0 1 0 1 0 1 0 0 = = =02 2. 0 0 0 0 0 0 0 0 × 12.2. MATRIX ARITHMETIC 179

n n Moreover, note that there exist matrices A,B,C F × such that AB = AC but B = C: ∈ 0 1 1 0 0 1 0 1 =02 2 = . 0 0 0 0 × 0 0 0 0

n n As a result, we say that the set F × fails to have the so-called cancellation property. n n This failure is a direct result of the fact that there are non-zero matrices in F × that have no multiplicative inverse. We discuss matrix invertibility at length in the next section and n n deﬁne a special subset GL(n, F) F × upon which the cancellation property does hold. ⊂

12.2.3 Invertibility of square matrices

In this section, we explore the opposite of matrix multiplication. More speciﬁcally, we characterize square matrices for which multiplicative inverses exist.

n n Deﬁnition 12.2.8. Given a positive integer n Z , we say that the square matrix A F × ∈ + ∈ n n is invertible (a.k.a. nonsingular) if there exists a square matrix B F × such that ∈

AB = BA = In.

We use GL(n, F) to denote the set of all invertible n n matrices having entries from F. × One can prove that, if the multiplicative inverse of a matrix exists, then the inverse is unique. As such, we will usually denote the so-called inverse matrix of A GL(n, F) by ∈ 1 A− . Even though this notation is analogous to the notation for the multiplicative inverse of a scalar, you should not take this to mean that it is possible to “divide” by a matrix.

Moreover, note that the zero matrix 0n n / GL(n, F). This means that GL(n, F) is not a × ∈ n n vector subspace of F × . Since matrix multiplication is not a commutative operation, care must be taken when working with the multiplicative inverses of invertible matrices. In particular, many of the algebraic properties for multiplicative inverses of scalars, when properly modiﬁed, continue to hold. We summarize the most basic of these properties in the following theorem.

Theorem 12.2.9. Let n Z be a positive integer and A, B GL(n, F). Then ∈ + ∈

1 1 1 1. the inverse matrix A− GL(n, F) and satisﬁes (A− )− = A. ∈ 180 CHAPTER 12. SUPPLEMENTARY NOTES

m m 1 1 m 2. the matrix power A GL(n, F) and satisﬁes (A )− = (A− ) , where m Z is ∈ ∈ + any positive integer.

1 1 1 3. the matrix αA GL(n, F) and satisﬁes (αA)− = α− A− , where α F is any non-zero ∈ ∈ scalar.

4. the product AB GL(n, F) and has inverse given by the formula ∈

1 1 1 (AB)− = B− A− .

Moreover, GL(n, F) has the cancellation property. In other words, given any three matrices A,B,C GL(n, F), if AB = AC, then B = C. ∈ At the same time, it is important to note that the zero matrix is not the only non- invertible matrix. As an illustration of the subtlety involved in understanding invertibility, we give the following theorem for the 2 2 case. ×

a11 a12 2 2 Theorem 12.2.10. Let A = F × . Then A is invertible if and only if A a21 a22 ∈ satisﬁes a a a a =0. 11 22 − 12 21 Moreover, if A is invertible, then

a a 22 − 12 a11a22 a12a21 a11a22 a12a21 1 − − A− =   . a21 a11  −  a11a22 a12a21 a11a22 a12a21   − −    A more general theorem holds for larger matrices, but its statement requires substantially more machinery than could reasonably be included here. We nonetheless state this result for completeness and refer the reader to Chapter 8 for the deﬁnition of the determinant.

n n Theorem 12.2.11. Let n Z be a positive integer, and let A =(a ) F × be an n n ∈ + ij ∈ × matrix. Then A is invertible if and only if det(A) =0. Moreover, if A is invertible, then the 1 i+j “i, j entry” of A− is A / det(A). Here, A = ( 1) M , and M is the determinant of ji ij − ij ij the matrix obtained when both the ith row and jth column are removed from A. 12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX181

We close this section by noting that the set GL(n, F) of all invertible n n matrices × over F is often called the general linear group. This set has so many important uses in n mathematics that there are many equivalent notations for it, including GLn(F) and GL(F ), and sometimes simply GL(n) or GLn if it is not important to emphasis the dependence on F. Note, moreover, that the usage of the term “group” in the name “general linear group” is highly technical. This is because GL(n, F) forms a nonabelian group under matrix multiplication. (See Section B.2 for the deﬁnition of a group.)

12.3 Solving linear systems by factoring the coeﬃcient matrix

There are many ways in which you might try to solve a given system of linear equations. This section is primarily devoted to describing two particularly popular techniques, both of which involve factoring the coeﬃcient matrix for the system into a product of simpler matrices. These techniques are also at the heart of many frequently used numerical (i.e., computer-assisted) applications of Linear Algebra. You should note that the factorization of complicated objects into simpler components is an extremely common problem solving technique in mathematics. E.g., we will often factor a given polynomial into several polynomials of lower degree, and one can similarly use the prime factorization for an integer in order to simplify certain numerical computations.

12.3.1 Factorizing matrices using Gaussian elimination

In this section, we discuss a particularly signiﬁcant factorization for matrices known as Gaussian elimination (a.k.a. Gauss-Jordan elimination). Gaussian elimination can be used to express any matrix as a product involving one matrix in so-called reduced row-echelon form and one or more so-called elementary matrices. Then, once such a factorization has been found, we can immediately solve any linear system that has the factorized matrix as its coeﬃcient matrix. Moreover, the underlying technique for arriving at such a factorization is essentially an extension of the techniques already familiar to you for solving small systems of linear equations by hand. m n Let m, n Z denote positive integers, and suppose that A F × is an m n matrix ∈ + ∈ × 182 CHAPTER 12. SUPPLEMENTARY NOTES

(i, ) ( ,j) over F. Then, following Section 12.2.2, we will make extensive use of A and A to denote the row vectors and column vectors of A, respectively.

m n Deﬁnition 12.3.1. Let A F × be an m n matrix over F. Then we say that A is in ∈ × row-echelon form (abbreviated REF) if the rows of A satisfy the following conditions:

(1, ) (1, ) (1) either A is the zero vector or the ﬁrst non-zero entry in A (when read from left to right) is a one.

(i, ) (2) for i = 1,...,m, if any row vector A is the zero vector, then each subsequent row (i+1, ) (m, ) vector A ,...,A is also the zero vector.

(i, ) (3) for i =2,...,m, if some A is not the zero vector, then the ﬁrst non-zero entry (when (i 1, ) read from left to right) is a one and occurs to the right of the initial one in A − .

The initial leading one in each non-zero row is called a pivot. We furthermore say that A is in reduced row-echelon form (abbreviated RREF) if

( ,j) (4) for each column vector A containing a pivot (j = 2,...,n), the pivot is the only ( ,j) non-zero element in A .

The motivation behind Deﬁnition 12.3.1 is that matrix equations having their coeﬃcient matrix in RREF (and, in some sense, also REF) are particularly easy to solve. Note, in particular, that the only square matrix in RREF without zero rows is the identity matrix.

Example 12.3.2. The following matrices are all in REF:

1111 1110 1101 1100 A1 = 0111 , A2 = 0110 , A3 = 0110 , A4 = 0010 , 0011 0010 0001 0001                 1010 0010 0001 0000 A5 = 0001 , A6 = 0001 , A7 = 0000 , A8 = 0000 . 0000 0000 0000 0000                 However, only A4 through A8 are in RREF, as you should verify. Moreover, if we take the T T transpose of each of these matrices (as deﬁned in Section 12.5.1), then only A6 , A7 , and T A8 are in RREF. 12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX183

Example 12.3.3.

1. Consider the following matrix in RREF:

1000 0100 A = . 0010   0001     Given any vector b =(b ) F4, the matrix equation Ax = b corresponds to the system i ∈ of equations

x1 = b1

x2 = b2   . x3 = b3   x4 = b4  Since A is in RREF (in fact, A = I is the 4 4 identity matrix), we can immediately 4 ×  conclude that the matrix equation Ax = b has the solution x = b for any choice of b. Moreover, as we will see in Section 12.4.2, x = b is the only solution to this system.

2. Consider the following matrix in RREF:

16004 2 − 00103 1 A =   . 00015 2    00000 0      Given any vector b =(b ) F4, the matrix equation Ax = b corresponds to the system i ∈ of equations

x + 6x + 4x 2x = b 1 2 5 − 6 1 x3 + 3x5 + x6 = b2   . x4 + 5x5 + 2x6 = b3   0 = b4   Since A is in RREF, we can immediately conclude a number of facts about solutions 184 CHAPTER 12. SUPPLEMENTARY NOTES

to this system. First of all, solutions exist if and only if b4 = 0. Moreover, by “solving for the pivots”, we see that the system reduces to

x = b 6x 4x + 2x 1 1 − 2 − 5 6 x = b 3x x , 3 2 − 5 − 6  x = b 5x 2x  4 3 − 5 − 6  and so there is only enough information to specify values for x1, x3, and x4 in terms

of the otherwise arbitrary values for x2, x5, and x6.

In this context, x1, x3, and x4 are called leading variables since these are the variable

corresponding to the pivots in A. We similarly call x2, x5, and x6 free variables since the leading variables have been expressed in terms of these remaining variable. In particular, given any scalars α,β,γ F, it follows that the vector ∈ x b 6α 4β +2γ b 6α 4β 2γ 1 1 − − 1 − − x2  α   0   α   0   0  x3  b2 3β γ  b2  0   3β  γ  x =   =  − −  =   +   + −  +  −  x4  b3 5β 2γ  b3  0   5β  2γ    − −      −  −              x5  β   0   0   β   0              x6  γ   0   0   0   γ                          must satisfy the matrix equation Ax = b. One can also verify that every solution to the matrix equation must be of this form. It then follows that the set of all solutions should somehow be “three dimensional”.

As the above examples illustrate, a matrix equation having coeﬃcient matrix in RREF corresponds to a system of equations that can be solved with only a small amount of computation. Somewhat amazingly, any matrix can be factored into a product that involves exactly one matrix in RREF and one or more of the matrices deﬁned as follows.

m m Deﬁnition 12.3.4. A square matrix E F × is called an elementary matrix if it has ∈ one of the following forms:

1. (row exchange, a.k.a. “row swap”, matrix) E is obtained from the identity matrix Im (r, ) (s, ) by interchanging the row vectors Im and Im for some particular choice of positive 12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX185

integers r, s 1, 2,...,m . I.e., in the case that r

2. (row scaling matrix) E is obtained from the identity matrix Im by replacing the row (r, ) (r, ) vector Im with αIm for some choice of non-zero scalar 0 = α F and some choice ∈ of positive integer r 1, 2,...,m . I.e., ∈{ }

1 0 0 0 0 0 0 1 0 0 0 0 . . . . . . . . ......    0 0 1 0 0 0 E = I +(α 1)E =   m rr   th − 0 0 0 α 0 0 r row,   ←   0 0 0 0 1 0 . . . . . . . . ......      0 0 0 0 0 1     where Err is the matrix having “r, r entry” equal to one and all other entries equal to

zero. (Recall that Err was deﬁned in Section 12.2.1 as a standard basis vector for the m m vector space F × .) 186 CHAPTER 12. SUPPLEMENTARY NOTES

3. (row combination, a.k.a. “row sum”, matrix) E is obtained from the identity matrix (r, ) (r, ) (s, ) I by replacing the row vector Im with Im + αIm for some choice of scalar α F m ∈ and some choice of positive integers r, s 1, 2,...,m . I.e., in the case that r

where Ers is the matrix having “r, s entry” equal to one and all other entries equal to m m zero. (Ers was also deﬁned in Section 12.2.1 as a standard basis vector for F × .)

The “elementary” in the name “elementary matrix” comes from the correspondence between these matrices and so-called “elementary operations” on systems of equations. In particular, each of the elementary matrices is clearly invertible (in the sense deﬁned in Sec- tion 12.2.3), just as each “elementary operation” is itself completely reversible. We illustrate this correspondence in the following example.

Example 12.3.5. Deﬁne A, x, and b by

2 5 3 x1 4 A = 1 2 3 , x = x2 , and b = 5 . 1 0 8 x 9    3         12.3. SOLVING LINEAR SYSTEMS BY FACTORING THE COEFFICIENT MATRIX187

We illustrate the correspondence between elementary matrices and “elementary” operations on the system of linear equations corresponding to the matrix equation Ax = b, as follows.