Department of

Quick Review of and Real

KC Border Subject to constant revision Last major revision: December 12, 2016 v. 2016.12.30::16.10

Contents

1 fields ...... 1 2 Vector spaces ...... 1 2.1 The Rn ...... 2 2.2 Other examples of vector spaces ...... 3 3 Some elementary consequences ...... 4 3.1 Linear combinations and subspaces ...... 4 3.2 ...... 5 3.3 Bases and ...... 5 3.4 Linear transformations ...... 6 3.5 The coordinate mapping ...... 7 4 Inner product ...... 7 4.1 The Parallelogram Law ...... 9 4.2 ...... 10 4.3 Orthogonal projection and orthogonal complements ...... 11 4.4 Orthogonality and alternatives ...... 13 4.5 The geometry of the Euclidean inner product ...... 14 5 The dual of an ...... 14 6 ⋆ Linear functionals on infinite dimensional spaces ...... 17 7 Linear transformations ...... 18 7.1 Inverse of a linear transformation ...... 19 7.2 Adjoint of a transformation ...... 20 7.3 Orthogonal transformations ...... 22 7.4 Symmetric transformations ...... 23 8 Eigenvalues and eigenvectors ...... 24 9 Matrices ...... 27 9.1 Matrix operations ...... 27 9.2 Systems of linear equations ...... 28 9.3 Matrix representation of a linear transformation ...... 30 9.4 Gershgorin’s Theorem ...... 32 9.5 Matrix representation of a composition ...... 34 KC Border Quick Review of Matrix and Real Linear Algebra 0

9.6 Change of ...... 34 9.7 The Principal Axis Theorem ...... 36 9.8 Simultaneous diagonalization ...... 36 9.9 Trace ...... 38 9.10 Matrices and orthogonal projection ...... 40 10 Quadratic forms ...... 41 10.1 Diagonalization of quadratic forms ...... 42 11 ...... 43 11.1 Determinants as multilinear forms ...... 44 11.2 Some simple consequences ...... 47 11.3 Minors and cofactors ...... 48 11.4 polynomials ...... 52 11.5 The as an “oriented volume” ...... 53 11.6 Computing inverses and determinants by Gauss’ method ...... 54

Foreword These notes have accreted piecemeal from courses in econometrics and microeconomics I have taught over the last thirty-something years, so the notation, hyphenation, and terminology may vary from section to section. They were originally compiled to be a centralized personal reference for finite-dimensional real vector spaces, although I occasionally mention complex or infinite-dimensional spaces. If you too find them useful, so much the better. There is asketchy index, which I think is better than none. For a thorough course on linear algebra I now recommend Axler [7]. My favorite reference for infinite-dimensional vector spaces is the Hitchhiker’s Guide [2], but it needn’t be your favorite.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 1

1 Scalar fields

A is a collection of mathematical entities that we shall call scalars. There are two binary operations defined on scalars, and multiplication. We denote the sum of α and β by α + β, and the product by α · β or more simply αβ These operations satisfy the following familiar properties:

F.1 (Commutativity of Addition and Multiplication) α + β = β + α, αβ = βα.

F.2 (Associativity of Addition and Multiplication) (α + β) + γ = α + (β + γ), (αβ)γ = α(βγ).

F.3 (Distributive Law) α(β + γ) = (αβ) + (αγ).

F.4 (Existence of Additive and Multiplicative Identities) There are two distinct scalars, denoted 0 and 1, such that for every scalar α, we have α + 0 = α and 1α = α.

F.5 (Existence of Additive Inverse) For each scalar α there exists a scalar β such that α + β = 0. We often write −α for such a β.

F.5 (Existence of Multiplicative Inverse) For each nonzero scalar α there exists a scalar β such that αβ = 1. We often write α−1 for such a β.

There are many elementary theorems that point out many things that you probably take for granted. For instance,

• −α and α−1 are unique. This justifies the notation by the way.

• −α = (−1) · α. (Here −1 is the scalar β that satisfies 1 + β = 0.)

• α + γ = β + γ ⇐⇒ α = β.

• 0 · α = 0.

• αβ = 0 =⇒ [α = 0 or β = 0].

There are many more. See, for instance, Apostol [5, p. 18]. The most important scalar fields are the real numbers R, the rational numbers Q, and the field of complex numbers C. Computer scientists, e.g., Klein [16], are fond of the {0, 1} field sometimes known as GF(2) or the Boolean field. These notes are mostly concerned with the field of real numbers. This is not because the other fields are unimportant—it’s because I myself have limited use for the others, whichis probably a personal shortcoming.

2 Vector spaces

Let K be a field of scalars—usually either the real numbers R or the complex numbers C, or occasionally the rationals Q.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 2

1 Definition A vector space over K is a nonempty V of vectors equipped with two operations, vector addition (x, y) 7→ x + y, and (α, x) 7→ αx, where x, y ∈ V and α ∈ K. The operations satisfy: V.1 (Commutativity of Vector Addition) x + y = y + x V.2 (Associativity of Vector Addition) (x + y) + z = x + (y + z)

V.3 (Existence of Zero Vector) There is a vector 0V , often denoted simply 0, satisfying x + 0 = x for every vector x. V.4 (Additive Inverse) x + (−1)x = 0 V.5 (Associativity of Scalar Multiplication) α(βx) = (αβ)x V.6 (Scalar Multiplication by 1) 1x = x V.7 (Distributive Law) α(x + y) = (αx) + (αy) V.8 (Distributive Law) (α + β)x = (αx) + (βx) The term real vector space refers to a vector space over the field of real numbers, and a complex vector space is a vector space over the field of complex numbers. The term linear space is a synonym for vector space.

I shall try to adhere to the following notational conventions that are used by Gale [13]. Vectors are typically denoted by lower case Latin letters, and scalars are typically denoted by lower case Greek letters. When we have occasion to refer to the components of a vector in Rm, I will try to use the Greek letter corresponding to the Latin letter of the vector. For instance, x = (ξ1, . . . , ξm), y = (η1, . . . , ηm), z = (ζ1, . . . , ζm). Upper case Latin letters tend to denote matrices, and their entries will be denoted by the corresponding lower case Greek letters.a That said, these notes come from many different courses, and I may not have standardized everything. I’m sorry.

aThe correspondence between Greek and Latin letters is somewhat arbitrary. For instance, one could make the case that Latin y corresponds to Greek υ, and not to Greek η. I should write out a guide.

2.1 The vector space Rn By far the most important example of a vector space, and indeed the mother of all vector n spaces, is the space R of ordered lists of n real numbers. Given ordered lists x = (ξ1, . . . , ξn) and y = (η1, . . . , ηn) the vector sum x + y is the ordered list (ξ1 + η1, . . . , ξn + ηn). The scalar product is given by the ordered list αx = (αξ1, . . . , αξn). The zero of this vector space is the ordered list 0 = (0,..., 0). It is a trivial matter to verify that the axioms above are satisfied by these operations. As a special case, the set of real numbers R is a real vector space. In Rn we identify several special vectors, the unit coordinate vectors. These are the th ordered lists ei that consist of zeroes except for a single 1 in the i position. We use the notation ei to denote this vector regardless of the length n of the list.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 3

2.2 Other examples of vector spaces The following are also vector spaces. The definition of the vector operations is usually obvious.

• {0} is a vector space, called the trivial vector space. A nontrivial vector space contains at least one nonzero vector.

• The field K is a vector space over itself.

• The set L1(I) of integrable real-valued functions on an interval I of the , ∫

{f : I → R : f(t) dt < ∞} I is a real vector space under the pointwise operations: (f + g)(t) = f(t) + g(t) and (αf)(t) = αf(t) for all t ∈ I.1

• The set L2(P ) of square-integrable random variables,

EX2 < ∞

on the probability space (S, E,P ) is a real vector space.

• The set of solutions to a homogeneous linear differential equation, e.g., f ′′ +αf ′ +βf = 0, is a real vector space.

• The spaces ℓp, { } ∑∞ ∈ N | p| ∞ ∞ ℓp = x = (ξ1, ξ2,... ) R : ξn < , 0 < p < { n=1 }

ℓ∞ = x = (ξ1, ξ2,... ) : sup |ξn| < ∞ n are all real vector spaces.

• The set M(m, n) of m × n real matrices is a real vector space.

• The set of linear transformations from one vector space into another is a linear space. See Proposition 12 below.

• We can even consider the set R of real numbers as an (infinite-dimensional, see below) vector space over the field Q of rational numbers. This leads to some very interesting (counter-)examples. 1See how quickly I forgot the convention that scalars are denoted by Greek letters. I used t here instead of τ.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 4

3 Some elementary consequences

Here are some simple consequences of the axioms that we shall use repeatedly without further comment. 1. 0 is unique. 02 |{z}= 02 + 01 |{z}= 01 + 02 |{z}= 01 by V.3 by V.1 by V.3

2. −x = (−1)x is the unique vector z satisfying x + z = 0. Suppose

x + y = 0 − − | x{z+ x} +y = x 0 + y = −x y = −x.

3. 0K x = 0V : 0V = 0K x + (−1)(0K x) = (0K − 0K )x = 0K x. ··· 4. |x + {z + x} = nx: n x + x = 1x + 1x = (1 + 1)x = 2x x + x + x = (x + x) + x = 2x + x = 2x + 1x = (2 + 1)x = 3x etc.

3.1 Linear combinations and subspaces

A of the vectors x1, . . . , xm is any sum of scalar multiples of vectors of the form α1x1 + ··· + αmxm, αi ∈ K, xi ∈ V .A linear subspace M of V is a nonempty of V that is closed under linear combinations. A linear subspace of a vector space is a vector space in its own right. A linear subspace may also be called a vector subspace. Let E ⊂ V . The span of E, denoted span E, is the set of all linear combinations from E. That is, {∑m } span E = αixi : αi ∈ K, xi ∈ E, m ∈ N . i=1 2 Exercise Prove the following. 1. {0} is a linear subspace.

2. If M is a linear subspace, then 0 ∈ M.

3. The intersection of a family of linear subspaces is a linear subspace.

4. The set span E is the smallest (with respect to inclusion) linear subspace that includes E. □

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 5

3.2 Linear independence

3 Definition A set E of vectors is linearly dependent if there are distinct vectors x1, . . . , xm belonging to E, and nonzero scalars α1, . . . , αm, such that α1x1 + ··· + αmxm = 0. A set of vectors is linearly independent if it is∑ not dependent. That is, E is independent if for every m ··· set x1, . . . , xm of distinct vectors in E, i=1 αixi = 0 implies α1 = = αm = 0. We also say that the vectors x1, . . . , xm are independent instead of saying that the set {x1, . . . , xm} is independent.

4 Proposition (Uniqueness of linear combinations) If E is a linearly independent set of vectors and z belongs to span E, then z is a unique linear combination of elements of E.

Proof : If z is zero, the conclusion follows by definition of independence. If z is nonzero, suppose

∑m ∑n z = αixi = βjyj, i=1 j=1 where the xis are distinct elements of E, the yjs are distinct are distinct elements of E (but may overlap with the xis), and αi, βj > 0 for i = 1, . . . , m and j = 1, . . . , n. Enumerate { } ∪ { } { } A = xi : i = 1, . . . , m yj : j = 1, . . . , n as A = zk : k = 1, . . . , p∑. (If xi = yj∑for some i p p ˆ and j, then p is strictly less than m + n.) Then we can rewrite z = k=1 αˆkzk = k=1 βkzk, where { { α if z = x β if z = y αˆ = i k i and βˆ = j k j k 0 otherwise k 0 otherwise. Then ∑p ˆ ˆ 0 = z − z = (ˆαk − βk)zk =⇒ αˆk − βk = 0, k = 1, . . . , p k=1 ˆ since E is independent. Therefore αˆk = βk, k = 1, . . . , p, which in turn implies m = n = p and {xi : i = 1, . . . , m} = {yj : j = 1, . . . , n}, and the proposition is proved.

The coefficients of this linear combination are called the coordinates of x with respect to the set E.

5 Exercise Prove the following.

1. The is independent.

2. If 0 ∈ E, then E is dependent. □

3.3 Bases and dimension 6 Definition A Hamel basis, or more succinctly, a basis for the linear space V is a linearly independent set B such that span B = V . The plural of basis is bases. The next result is immediate from Proposition 4.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 6

7 Proposition Every element of the vector space V is a unique linear combination of basis vectors.

m 8 Example (The standard basis for R ) The set of unit coordinate vectors e1, . . . , em in m m R is a basis for R , called the standard basis. ∑ m □ Observe that the vector x = (ξ1, . . . , ξm) can be written uniquely as j=1 ξjej. The fundamental facts about bases are these. For a proof of the first assertion, see the Hitchhiker’s Guide [2, Theorem 1.8, p. 15].

9 Fact Every nontrivial vector space has a basis. Any two bases have the same cardinality, called the dimension of V . Prove the second half for finite-dimensional case. Mostly these notes deal with finite-dimensional vector spaces. The next result summarizes Theorems 1.5, 1.6, and 1.7 in Apostol [6, pp. 10–12]]

10 Theorem In an n-dimensional space, every set of more than n vectors is dependent. Con- sequently, any independent set of n vectors is a basis.

3.4 Linear transformations 11 Definition Let V,W be vector spaces. A T : V → W is a linear transformation or linear operator or homomorphism if

T (αx + βy) = αT (x) + βT (y).

The set of linear transformations from the vector space V into the vector space W is denoted

L(V,W ).

If T is a linear transformation from a vector space into the reals R, then it is customary to call T a linear functional. The set L(V, R) of linear functionals on V is called the of V , and is denoted V ′. When there is a notion of continuity, we shall let V ∗ denote the vector space of continuous linear functionals on V . The set V ′ is often called the algebraic dual of V and V ∗ is the topological dual of V . Note that if T is linear, then T 0 = 0. We shall use this fact without any special mention. It is traditional to write a linear transformation without parentheses, that is, to write T x rather than T (x).

12 Proposition L(V,W ) is itself a vector space under the usual pointwise addition and scalar multiplication of functions.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 7

3.5 The coordinate mapping In an n-dimensional space V , if we fix a basis in some particular order, we have an ordered basis or frame. Given this ordered basis the coordinates of a vector comprise an ordered list, and so correspond to an element in Rn. This mapping from vectors to their coordinates is called the coordinate mapping for the ordered basis. The coordinate mapping preserves all the vector operations. The mathematicians’ term for this is isomorphism. If two vector spaces are isomorphic, then for many purposes we can think of them as being the same space, the only difference being that the names of the points have been changed.

13 Definition An isomorphism between vector spaces is a bijective linear transformation. That is, a function φ: V → W between vector spaces is an isomorphism if φ is one-to-one and onto, and φ(x + y) = φ(x) + φ(y) and φ(αx) = αφ(x). In this case we say that V and W are isomorphic. We have just argued the following.

14 Proposition Given an ordered basis x1, . . . , xn for an n-dimensional vector space V , the coordinate mapping is an isomorphism from V to Rn. If any n-dimensional vector space is isomorphic to Rn, is there any reason to consider n- dimensional vector spaces other than Rn? The answer, I believe, is yes. Sometimes there is no “natural” ordered basis for a vector space, so there is no “natural” way to treat it as Rn. For 3 instance, let V be the set of vectors x = (ξ1, ξ2, ξ3) in R , such that ξ1 + ξ2 + ξ3 = 0. This is a 2-dimensional vector space, and there are infinitely many isomorphisms onto R2, but I assert that there is no unique obvious, natural way to identify points in V with points in R2.

4 Inner product

15 Definition A real linear space V has an inner product if for each pair of vectors x and y there is a , traditionally denoted (x, y), satisfying the following properties. IP.1 (x, y) = (y, x). IP.2 (x, y + z) = (x, y) + (x, z). IP.3 α(x, y) = (αx, y) = (x, αy). IP.4 (x, x) > 0 if x ≠ 0. A vector space V equipped with an inner product is called an inner product space. For a complex vector space, the inner product is complex-valued, and property (1) is replaced by (x, y) = (y, x), where the bar denotes complex conjugation, and (3) is replaced by α(x, y) = (αx, y) = (x, αy).

16 Proposition In a real inner product space, the inner product is bilinear. That is,

(αx + βy, z) = α(x, z) + β(y, z) and (z, αx + βy) = α(z, x) + β(z, y)

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 8

Another simple consequence for real inner product spaces is the following identity that we shall use often. (x + y, x + y) = (x, x) + 2(x, y) + (y, y) For complex inner product spaces, this becomes

(x + y, x + y) = (x, x) + (x, y) + (y, x) + (y, y) = (x, x) + (x, y) + (x, y) + (y, y)

n 17 Example The of two vectors x = (ξ1, . . . , ξn) and y = (η1, . . . , ηn) in R is defined by ∑n x · y = ξiηi. i=1 The dot product is an inner product, and Euclidean n-space is Rn equipped with this inner product. The dot product of two vectors is zero if they meet at a right angle. □

18 Lemma If (x, z) = (y, z) for all z, then x = y. ( ) ( ) Proof : This implies (x − y), z = 0 for all z, so setting z = x − y yields (x − y), (x − y) = 0, which by IP.4 implies x − y = 0.

19 Cauchy–Schwartz Inequality In a real inner product space,

(x, y)2 ⩽ (x, x)(y, y) (1) with = if and only if x and y are dependent.

Proof : (Cf. MacLane and Birkhoff [18, p. 353]) If x or y is zero, then we have , so assume x, y are nonzero. Define the quadratic polynomial Q: R → R by

Q(λ) = (λx + y, λx + y) = (x, x)λ2 + 2(x, y)λ + (y, y).

By Property IP.4 of inner products (Definition 15), Q(λ) ⩾ 0 for each λ ∈ R. Therefore the discriminant of Q is nonpositive,2 that is, 4(x, y)2 − 4(x, x)(y, y) ⩽ 0, or (x, y)2 ⩽ (x, x)(y, y). Equality in (1) can occur only if the discriminant is zero, in which case Q has a real root. That is, there is some λ for which Q(λ) = (λx + y, λx + y) = 0. But this implies that λx + y = 0, which means the vectors x and y are linearly dependent.

20 Definition A norm ∥x∥ is a real function on a vector space that satisfies: N.1 ∥0∥ = 0

N.2 ∥x∥ > 0 if x ≠ 0

N.3 ∥αx∥ = |α| ∥x∥.

2In case you have forgotten how you derived the quadratic formula in Algebra I, rewrite the polynomial as ( ) 2 1 β 2 − 2 − f(z) = αz + βz + γ = α αz + 2 (β 4αγ)/4α, and note that the only way to guarantee that f(z) ⩾ 0 for all z is to have α > 0 and β2 − 4αγ ⩽ 0.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 9

N.4 ∥x + y∥ ⩽ ∥x∥ + ∥y∥ with equality if and only if x = 0 or y = 0 or y = αx, α > 0.

1 21 Proposition If (·, ·) is an inner product, then ∥x∥ = (x, x) 2 is a norm.

∥ ∥ ⩽ ∥ ∥ ∥ ∥ ∥ ∥2 ⩽ Proof( : The) only nontrivial part is N.4. Now x + y x + y if and only x + y ∥x∥ + ∥y∥ 2, which is equivalent to √ ⩽ |(x + y,{z x + y}) (x, x) + 2 (x, x)(y, y) + (y, y), =(x,x)+2(x,y)+(y,y) or √ (x, y) ⩽ (x, x)(y, y). But this is just the square root of the Cauchy–Schwartz Inequality.

If the norm induced by the inner product gives rise to a complete metric space,3 then the inner product space is called a or complete inner product space.

4.1 The Parallelogram Law The next result asserts that in an inner product space, the sum of the x + y squares of the lengths of the diagonals of a parallelogram is equal to the sum x of the squares of the lengths of the sides. Consider the parallelogram with vertices 0, x, y, x + y. Its diagonals are the segments [0, x + y] and [x, y], and y their lengths are ∥x + y∥ and ∥x − y∥. It has two sides of length ∥x∥ and two 0 of length ∥y∥. So the claim is:

22 The parallelogram law In an inner product space,

∥x + y∥2 + ∥x − y∥2 = 2∥x∥2 + 2∥y∥2.

Proof : Note that ( ) (x + y), (x + y) = (x, x) + 2(x, y) + (y, y) ( ) (x − y), (x − y) = (x, x) − 2(x, y) + (y, y).

Add these two to get ( ) ( ) (x + y), (x + y) + (x − y), (x − y) = 2(x, x) + 2(y, y), and the desired result is restated in terms of norms.

On a related note, we have the following.

23 Proposition In an inner product space.

∥x + y∥2 − ∥x − y∥2 = 4(x, y).

3 A metric space is complete if every Cauchy sequence has a limit point in the space. A sequence x1, x2,... is a Cauchy sequence if lim ∥xm − xn∥ = 0 as n, m → ∞.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 10

Proof : In the proof above, instead of adding the two equations, subtract them. ( ) ( ) ∥x + y∥2 − ∥x − y∥2 = (x + y), (x + y) − (x − y), (x − y) ( ) = (x, x) + 2(x, y) + (y, y) − (x, x) − 2(x, y) + (y, y) = 4(x, y).

As an aside, a norm on a vector space is induced by an inner product if and only if it satisfies the parallelogram law; see for instance [3, Problem 32.10, p. 303].

4.2 Orthogonality 24 Definition Vectors x and y are orthogonal if (x, y) = 0, written

x ⊥ y.

A set of vectors E ⊂ V is orthogonal if it is pairwise orthogonal. That is, for all x, y ∈ E with x ≠ y we have (x, y) = 0. A set E is orthonormal if E is orthogonal and (x, x) = 1 for all x ∈ E.

25 Lemma If a set of nonzero vectors is pairwise orthogonal, then the set is independent. ∑ m Proof : Suppose i=1 αixi = 0, where the xis are pairwise orthogonal. Then for each k, ( ∑m ) ∑m 0 = (xk, 0) = xk, αixi = αi(xk, xi) = αk(xk, xk). i=1 i=1

Since each xk ≠ 0, we have each αk = 0. Thus the vectors xi are linearly independent.

26 Definition The of M in V is the set { ( )[ ]} x ∈ V : ∀y ∈ M x ⊥ y , denoted M⊥.

27 Lemma If x ⊥ y and x + y = 0, then x = y = 0.

Proof : Now (x + y, x + y) = (x, x) + 2(x, y) + (y, y), so (x, y) = 0 implies (x, x) + (y, y) = 0. Since (x, x) ⩾ 0 and (y, y) ⩾ 0 we have x = y = 0.

The next lemma is left as an exercise.

28 Lemma For any set M, the set M⊥ is a linear subspace. If M is a subspace, then

M ∩ M⊥ = {0}.

The fact that M⊥ is a subspace is not by itself very useful. It might be the trivial subspace {0}. We shall see below in Corollary 33 that for finite dimensional spaces, the dimensions of M and M⊥ add up to the dimension of the entire space.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 11

4.3 Orthogonal projection and orthogonal complements The next result may be found, for instance, in Apostol [6, Theorem 1.14, p. 24], and the proof is known as the Gram–Schmidt procedure.

29 Proposition Let x1, x2,... be a sequence (finite or infinite) of vectors in an inner product space. Then there exists a sequence y1, y2,... such that for each m, the span of y1, . . . , ym is the same as the span of x1, . . . , xm, and the yis are orthogonal.

Proof : Set y1 = x1. For m > 1 recursively define

(xm, ym−1) (xm, ym−2) (xm, y1) ym = xm − ym−1 − ym−2 − · · · − y1. (ym−1, ym−1) (ym−2, ym−2) (y1, y1)

Use induction on m to prove that the vectors y1, . . . , ym are orthogonal and span the same space as the xis. Observe that y2 is orthogonal to y1 = x1:

(x2, x1) (y2, y1) = (y2, x1) = (x2, x1) − (x2, x1) = 0. (x1, x1)

Furthermore any linear combination of x1 and x2 can be replicated with y1 and y2. For m > 2, suppose that y1, . . . , ym−1 are orthogonal and span the same space as x1, . . . , xm−1. Now compute (ym, yk) for k ⩽ m:

− ( ) m∑1 (x , y ) (y , y ) = (x , y ) − m i y , y m k m k (y , y ) i k i=1 i i − m∑1 (x , y ) = (x , y ) − m i (y , y ) m k (y , y ) i k i=1 i i (xm, yk) = (xm, yk) − (yk, yk) (yk, yk) = 0,

so ym is orthogonal to every y1, . . . , ym−1. As an exercise verify that the span of y1, . . . , ym is the same as the span of x1, . . . , xm.

30 Corollary Every nontrivial finite-dimensional subspace of an inner product space has an orthonormal basis.

Proof : Apply the Gram–Schmidt procedure to a basis. To obtain an orthonormal basis, simply 1 normalize each yk by dividing by its norm (yk, yk) 2 .

31 Orthogonal Projection Theorem Let M be a linear subspace of the finite-dimensional real inner product space V . For each x ∈ V we can write x in a unique way as x = xM + x⊥, where xM ∈ M and x⊥ ∈ M⊥.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 12

Proof :∑Let y1, . . . , ym be an orthonormal basis for ∑M. Put zi = (x, yi)yi for i = 1, . . . , m. Put m − ∈ m xM = i=1 zi, and x⊥ = x xM . Let y M, y = i=1 αiyi. Then (∑m ∑m ) (y, x⊥) = αiyi, x − (x, yj)yj i=1 j=1 ∑m ( ∑m ) = αi yi, x − (x, yj)yj i=1 j=1 ∑m { ( ∑m )} = αi (yi, x) − yi, (x, yj)yj i=1 j=1 ∑m { ∑m } = αi (yi, x) − (x, yj)(yi, yj) i=1 j=1 ∑m { } = αi (yi, x) − (x, yi) i=1 = 0. − − Uniqueness: Let x = xM + x⊥ = zM + z⊥ Then 0 = (|xM {z zM }) + (|x⊥ {z z⊥}). Rewriting this ∈M ∈M⊥ yields xM − zM = 0 − (x⊥ − z⊥), so xM − zM also lies in M⊥. Similarly x⊥ − z⊥ also lies in M. Thus xM − zM ∈ M ∩ M⊥ = {0} and x⊥ − z⊥ ∈ M ∩ M⊥ = {0}.

32 Definition The vector xM given by the theorem above is called the orthogonal projec- tion of x on M.

33 Corollary For a finite-dimensional space V and any subspace M of V , dim M +dim M⊥ = dim V . There is another important characterization of orthogonal projection.

34 Proposition (Orthogonal projection minimizes the norm of the “residual”) Let M be a linear subspace of the real inner product space V . Let y ∈ V . Then

∥y − yM ∥ ⩽ ∥y − x∥ for all x ∈ M.

Proof : Let x ∈ M and note

2 2 ∥y − x∥ = ∥(y − yM ) + (yM − x)∥ ( ) = (y − yM ) + (yM − x), (y − yM ) + (yM − x) ( ) ( ) ( ) = y − yM , y − yM + 2 y − yM , yM − x + yM − x, yM − x but y − yM = y⊥ ⊥ M, and yM − x ∈ M, so (y − yM , yM − x) = 0, so

2 2 = ∥y − yM ∥ + ∥yM − x∥ 2 ⩾ ∥y − yM ∥ .

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 13

That is, x = yM minimizes ∥y − x∥ over M.

35 Proposition (Linearity of Projection) The orthogonal projection satisfies

(x + z)M = xM + zM and (αx)M = αxM . ∑ ∑ k k Proof : Use xM = j=1(x, yj)yj and zm = j=1(x, zj)zj. Then

∑k (x + z)M = (x + z, yj)yj. j=1

Use linearity of (·, ·).

4.4 Orthogonality and alternatives We now present a lemma about linear functions that is true in quite general linear spaces, see the Hitchhiker’s Guide [2, Theorem 5.91, p. 212], but we will prove it using some of the special properties of inner products.

36 Lemma Let V be an inner product space. Then y is a linear combination of x1, . . . , xm if and only if ∩m {xi}⊥ ⊂ {y}⊥. (2) i=1 ∑ m Proof : If y is a linear combination of x1, . . . , xm, say y = i=1 αixi, then ∑m (z, y) = αi(z, xi), i=1 so (2) holds. For the converse, suppose (2) holds. Let M = span{x1, . . . , xm} and orthogonally project y on M to get y = yM +y⊥, where yM ∈ M and y⊥ ⊥ M. In particular, (xi, y⊥) = 0, i = 1, . . . , m. Consequently, by hypothesis, (y, y⊥) = 0 too. But

0 = (y, y⊥) = (yM , y⊥) + (y⊥, y⊥) = 0 + (y⊥, y⊥).

Thus y⊥ = 0, so y = yM ∈ M. That is, y is a linear combination of x1, . . . , xm.

We can rephrase the above result as an alternative.

37 Corollary (Fredholm alternative) Either there exist real numbers α1, . . . , αm such that ∑m y = αixi i=1 or else there exists a vector z satisfying

(z, xi) = 0, i = 1, . . . , m and (z, y) = 1.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 14

4.5 The geometry of the Euclidean inner product For nonzero vectors x and y in a ,

x · y = ∥x∥ ∥y∥ cos θ, where θ is the angle between x and y. To see this, orthogonally project y on the space spanned by x. That is, write y = αx + z where z · x = 0. Thus

z · x = (y − αx) · x = y · x − αx · x = 0 =⇒ α = x · y/x · x.

Referring to Figure 1 we see that

cos θ = α∥x∥/∥y∥ = x · y/ (∥x∥ ∥y∥) .

Thus in an inner product space it makes sense to talk about the angle between nonzero vectors.

y

z θ αx x

Figure 1. Dot product and angles: cos θ = α∥x∥/∥y∥ = x · y/ (∥x∥ ∥y∥).

38 Definition In a real inner product space, the angle ∠ xy between two nonzero vectors x and y is defined to be (x, y) ∠ xy = arccos √ . (x, x)(y, y)

5 The dual of an inner product space

Recall that the topological dual V ∗ of a vector space V is the vector subspace of L(V, R) = V ′ consisting of continuous real linear functionals on V . When V is an inner product space, V ∗ has a particularly nice structure. It is clear from the bilinearity of the inner product (Proposition 16) that for every y in the real inner product space V , the function ℓ on V defined by

ℓ(x) = (y, x)

is linear. Moreover, it is continuous with respect to the norm induced by the inner product.

39 Proposition The inner product is jointly norm-continuous. That is, if ∥yn − y∥ → 0 and ∥xn − x∥ → 0, then (yn, xn) → (y, x).

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 15

Proof : By bilinearity and the Cauchy–Schwartz Inequality 19,

− − − − (yn, xn) (y, x) = (yn | y{z+ y}, xn | x{z+ x}) (y, x)

= (yn − y, xn − x) + (yn − y, x) + (y, xn − x) + (y, x) − (y, x) √ √ √ ⩽ ∥yn − y∥ ∥xn − x∥ + ∥yn − y∥ ∥x∥ + ∥y∥ ∥xn − x∥

−−−→ 0. n→∞

The converse is true if the inner product space is complete as a metric space, that is, if it is a Hilbert space. Every continuous linear functional on a real Hilbert space has the form of an inner product with some vector. The question is, which vector? If ℓ(x) = (y, x), then Section 4.5 suggests that ℓ is maximized on the unit sphere when the angle between x and y maximizes the cosine. The maximum value of the cosine is one, which occurs when the angle is zero, that is, when x is a positive multiple of y. Thus to find y given ℓ, we need to find the maximizer of ℓ on the unit ball. But first we need to know that such a maximizer exists.

40 Theorem Let V be a Hilbert space, and let U be its unit ball:

U = {x ∈ V :(x, x) ⩽ 1}.

Let ℓ be a continuous linear functional on V . Then ℓ has a maximizer in U. If ℓ is nonzero, the maximizer is unique and has norm 1. Note that if V is finite dimensional, then U is compact, and the result follows from the Weierstrass Theorem. When V is infinite dimensional, the unit ball is not compact, so another technique is needed.

Proof : (Cf. Murray [20, pp. 12–13]) The case where ℓ is the zero functional is obvious, so restrict attention to the case where ℓ is not identically zero. First observe that if ℓ is continuous, then it is bounded on U. To see this, consider the standard ε-δ definition of continuity at 0, where ε = 1. Then there is some δ > 0 such that if ∥x∥ = ∥x − 0∥ < δ, then |ℓ(x)| = |ℓ(x) − ℓ(0)| < 1. Thus ℓ is bounded by |1/δ| on U. ∞ ∈ − ∈ So let µ = supx∈U ℓ(x), and note that 0 < µ < . Since x U implies x U we also have that −µ = infx∈U ℓ(x). The scalar µ is called the operator norm of ℓ. It has the property that for any x ∈ V , |ℓ(x)| ⩽ µ∥x∥. (3) To see this, observe that x/∥x∥ ∈ U so |ℓ(x/∥x∥)| ⩽ µ, and the result follows by multiplying by both sides by ∥x∥. Pick a sequence xn in U with ℓ(xn) approaching the supremum µ that satisfies ⩾ n−1 ℓ(xn) n µ. (4)

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 16

We shall show that the sequence xn is a Cauchy sequence, and so has a limit. This limit is the maximizer. To see that we have a Cauchy sequence, we use the Parallelogram Law 22 to write 2 2 2 2 ∥xn − xm∥ = 2∥xn∥ + 2∥xm∥ − ∥xn + xm∥ . (5) Now observe that by (3) and (4),

∥ ∥ ⩾ | | ⩾ n−1 µ xn ℓ(xn) = ℓ(xn) n µ.

Dividing by µ > 0 and recalling that 1 ⩾ ∥xn∥ we see that ∥xn∥ → 1 as n → ∞. (This makes sense. If a nonzero linear functional is going to achieve a maximum over the unit ball, it should happen on the boundary.) Similarly we have ( ) ∥ ∥ ⩾ | | ⩾ n−1 m−1 µ xn + xm ℓ(xn + xm) = ℓ(xn) + ℓ(xm) n + m µ. Dividing by µ > 0 gives ∥ ∥ ⩾ n−1 m−1 xn + xm n + m . Square both sides and substitute in (5) to get ( ) 2 ∥ − ∥2 ⩽ ∥ ∥2 ∥ ∥2 − n−1 m−1 xn xm 2 xn + 2 xm n + m .

2 Since ∥xn∥, ∥xm∥ → 1, the right-hand side tends to 2 + 2 − (1 + 1) = 0 as n, m → ∞, so the sequence is indeed a Cauchy sequence. Thus there is a limit point x ∈ U, with ∥x∥ = 1, and which by continuity satisfies

ℓ(x) = µ = max ℓ(x). x∈U To see that the maximizer is unique, suppose ℓ(x) = ℓ(y) = µ > 0. (Note that this rules − ∥ ∥ out y = x.) Then by the Cauchy–Schwartz Inequality( (x + y)/2) < 1 (unless y and x are dependent, which in this case means y = x), so ℓ (x + y)/∥(x + y)∥ > µ, a contradiction.

41 Theorem Let V be a Hilbert space. For every continuous linear functional ℓ on V , there is a vector y in V such that for every x ∈ V ,

ℓ(x) = (y, x).

The correspondence ℓ ↔ y is a homomorphism between V ∗ and V .

Proof : (Cf. Murray [20, p. 13]) If ℓ is the zero functional, let y = 0. Otherwise, let yˆ be the unique maximizer of ℓ over the unit ball (so ∥yˆ∥ = 1), and let µ = ℓ(ˆy). Set

y = µy.ˆ

Then (y, yˆ) = (µy,ˆ yˆ) = µ(ˆy, yˆ) = µ = ℓ(ˆy), so we are off to a good start. We need to show that for every x, we have (y, x) = ℓ(x).

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 17

We start by showing that if ℓ(x) = 0, then (y, x) = 0. So assume that ℓ(x) = 0. Then ℓ(ˆy ± λx) = ℓ(ˆy) = µ for every λ. But by (3) above,

µ = ℓ(ˆy ± λx) ⩽ µ∥yˆ ± λx∥, so ∥yˆ ± λx∥ ⩾ 1 = ∥yˆ∥. Squaring both sides gives ∥yˆ∥2 ± 2λ(ˆy, x) + λ2∥x∥2 ⩾ ∥yˆ∥2 or λ∥x∥2 ⩾ ∓2(ˆy, x) for all λ > 0. Letting λ → 0 implies (ˆy, x) = 0, so (y, x) = 0. For an arbitrary x, let x′ = x − ℓ(x)(ˆy/µ). Then ℓ(x′) = ℓ(x) − ℓ(x)ℓ(ˆy/µ) = 0. So (y, x′) = 0. Now observe that ( ) (y, x) = y, x′ + ℓ(x)(ˆy/µ) = (y, x′) + ℓ(x)(y, y/µˆ ) = 0 + ℓ(x)(µy,ˆ y/µˆ ) = ℓ(x).

This completes the proof.

6 ⋆ Linear functionals on infinite dimensional spaces

If ℓ is a linear functional on the finite dimensional vector space Rn, then by Theorem 41, there is a vector y = (η1, . . . , ηn) such that for any x = (ξ1, . . . , ξn) we have ∑n ℓ(x) = ηiξi, i=1 so that ℓ is a . However, for infinite dimensional normed spaces there are always linear functionals that are discontinuous! Lots of them in fact.

42 Proposition Every infinite dimensional normed space has a discontinuous linear functional.

Proof : If X is infinite dimensional normed space, then it has an infinite Hamel basis B. We may normalize each basis vector to have norm one. Let S = {x1, x2,... } be a countable subset of the basis B. Define the function ℓ on the basis B by ℓ(xn) = n for xn ∈ S, and ℓ(v) = 0 for v ∈ B \ S. Every x ∈ X has a unique representation as ∑ x = αvv, v∈B

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 18

where only finitely many αv are nonzero. Extend ℓ from B to X by ∑ ℓ(x) = αvℓ(v). v∈B

Then ℓ is a linear functional, but it is not bounded on the unit ball (as ℓ(xn) = n). The same argument used in the proof of Theorem 40 shows that in order for ℓ to be continuous, it must be bounded on the unit ball, so it is not continuous.

7 Linear transformations

Recall that a linear transformation is a function T : V → W between vector spaces satisfying T (αx + βy) = αT x + βT y. There are three important classes of linear transformations of a space into itself. The first is just a rescaling along various dimensions. The identity function is of this class, where the scale factor is one in each dimension. Another important example is orthogonal projection onto a linear subspace. And another important class is rotation about an axis. These do not cover all the linear transformation, since for instance the composition of two of these kinds of transformations does not belong to any of these groups.

43 Definition For a linear transformation T : V → W , the null space or of T is the inverse of 0, that is, {x ∈ V : T x = 0}.

44 Proposition The null space of T , {x : T x = 0}, is a linear subspace of V , and the range of T , {T x : x ∈ V }, is a linear subspace of W .

45 Proposition A linear transformation T is one-to-one if and only if its null space is {0}. In other words, T is one-to-one if and only if T x = 0 implies x = 0. The dimension of the range of T is called the of T . The dimension of the null space of T is called the nullity of T . The next result may be found in [6, Theorem 2.3, p. 34]. You can prove it using Corollary 33 applied to the null space of T .

46 Nullity Plus Rank Theorem Let T : V → W , where V and W are finite-dimensional inner product spaces. Then rank T + nullity T = dim V.

Proof : Let N denote the null space of T . For x in V , decompose it orthogonally as x = xN +x⊥, where xN ∈ N and x⊥ ∈ N⊥. Then T x = T x⊥, so the range of T is just T (N⊥). Now let x1, . . . , xk be a basis for N⊥. I claim that T x1, . . . , T xk are independent∑ and therefore constitute a basis for the range of T . So suppose some linear combination k α T x is zero. By linearity (∑ ) ∑ i=1 i i k k of T we have T i=1 αixi = 0. Which implies i=1 αixi belongs to the null space N. But this combination also lies in N⊥, so it must be zero. But since the xis are independent, it follows that α1 = ··· = αk = 0.

47 Corollary Let T : V → W be a linear transformation between m-dimensional spaces. Then T is one-to-one if and only if T is onto.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 19

Proof : ( =⇒ ) Assume T is one-to-one. If T x = 0 = T 0, x = 0 by one-to-oneness. In other words, the null space of T is just {0}. Then by the Nullity Plus Rank Theorem 46 the rank of T is m, but this mean the image under T of V is all of W . ( ⇐= ) Assume T maps V onto W . Then the rank of T is m, so by the Nullity Plus Rank Theorem 46 its null space of T contains only 0. Suppose T x = T y. Then T (x − y) = 0, so x − y = 0, which implies T is one-to-one.

7.1 Inverse of a linear transformation Let T : V → W be a linear transformation between two vector spaces. A left inverse for T is a function L: range T → V such that for all x ∈ V , we have LT x = x.A right inverse for T is a function R: range T → V such that for all y ∈ range T , we have T Ry = y.

48 Lemma (Left vs. right inverses) If T has a left inverse, then it is unique, and also a right inverse.

Proof : (Cf. Apostol [6, Theorem 2.8, pp. 39–40].) Let L and M be left inverses for T . Let y ∈ range T , say y = T x. Then

Ly = LT x = x = MT x = My.

That is L = M on range T . Now we show that L is also a right inverse. Let

y = T x

belong to range T . Then

T Ly = T LT x = T (LT )x = T x = y.

That is, L is a right inverse.

Theorem 117 below describes how to compute the inverse of a matrix.

49 Proposition Let T : V → W be linear, one-to-one, and onto, where V and W are finite- dimensional inner product spaces. Then dim V = dim W , T has full rank, the inverse T −1 exists and is linear.

−1 −1 Proof : First we show that T is linear: Let xi = T (ui), i = 1, 2. Now

T (αx1 + βx2) = αT x1 + βT x2 = αu1 + βu2.

So taking T −1 of both sides gives

−1 −1 −1 α T| {z(u1}) +β T| {z(u2}) = T (αu1 + βu2) x1 x2

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 20

⩽ Next we show that dim W dim V : Let y1, . . . , yn be a basis for V . Then T y1, . . . , T yn span W since T is linear and onto. Therefore dim W ⩽ n = dim V . ⩾ On the other hand, T y1, . . . , T yn are linearly independent, so dim W dim V . To see this, suppose ( ) ∑n ∑n 0 = ξiT ei = T ξiei i=1 i=1 ∑ n ··· Since T is one-to-one and T 0 = 0, we see that i=1 ξiyi = 0. But this implies ξ1 = = ξn = 0. This shows that dim V = dim W , and since T is onto it has rank dim W .

7.2 Adjoint of a transformation 50 Definition Let T : V → W be linear where V and W are Hilbert spaces. The , or adjoint,4 of T , denoted T ′, is the linear transformation T ′ : W → V such that for every x in V and every y in W .

(T ′y, x) = (y, T x). (6)

A natural question is, does such a linear transformation T ′ exist?

51 Proposition The adjoint of T exists, and is uniquely determined by (6). Proof : By bilinearity of the inner product (Proposition 16) on W , for each V T W y ∈ W , the mapping ℓy : z 7→ (y, z) from W to R is linear, so the composition ◦ → ℓy ℓy T : V R is linear. By Theorem 41 on the representation of linear ℓT ′y functionals on Hilbert spaces, there is some vector, call it T ′y, in V so that ′ R (y, T x) = (ℓy ◦ T )(x) = (T y, x), holds for each x ∈ V . The correspondence y 7→ T ′y defines the transformation T ′ : W → V . Let y and z belong to W . Then again by bilinearity,

ℓαy+βz ◦ T = αℓy ◦ T + βℓz ◦ T.

That is, the mapping T ′ is linear. Finally, Lemma 18 implies that T ′y is uniquely determined by (6).

Elaborate this. 52 Proposition Let S, T : V → W and U : W → X. Then

(UT )′ = T ′U ′

(αS + βT )′ = αS′ + βT ′ (T ′)′ = T

4When dealing with complex vector spaces, the definition of the adjoint is modified to (y, T x) = (T ′y, x).

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 21

53 Theorem Let T : V → W , where V and W are inner product spaces, so T ′ : W → V . Then T ′y = 0 ⇐⇒ y ⊥ range T . In other words,

′ null space T = (range T )⊥.

Proof : ( =⇒ ) If T ′y = 0, then (T ′y, x) = (0, x) = 0 for all x in V . But (T ′y, x) = (y, T x), so (y, T x) = 0 for all x in V . That is, y ⊥ range T . (⇐=) If y ⊥ range T , then (y, T x) = 0 for all x in V . Therefore (T ′y, x) = 0 for all x in V , so T ′y = 0.

54 Corollary Let T : V → W , where V and W are inner product spaces. Then range T ′T = range T ′. Consequently, rank T ′ = rank T ′T .

Proof : Clearly range T ′T ⊂ range T ′. Let x belong to the range of T ′, so x = T ′y for some y in W . Let M denote the range of ′ ′ ′ T and consider the orthogonal decomposition y = yM + y⊥. Then T y = T yM + T y⊥, but ′ ′ T y⊥ = 0 by Theorem 53. Now yM = T z some z ∈ W . Then x = T T z, so x belongs to the range of T ′T .

55 Corollary Let T : V → W , where V and W are finite-dimensional inner product spaces. Then rank T ′ = rank T

′ Proof : By Theorem 53, null space T = (range T )⊥ ⊂ W . Therefore

′ ′ dim W − rank T = nullity T = dim(range T )⊥ = dim W − rank T, where the first equality follows from the Nullity Plus Rank Theorem 46, the second from Theorem 53, and the third form Corollary 33. Therefore, rank T = rank T ′.

56 Corollary Let T : Rn → Rm. Then null space T = null space T ′T .

Proof : Clearly null space T ⊂ null space T ′T , since T x = 0 =⇒ T ′T x = T ′0 = 0. Now suppose T ′T x = 0. Then (x, T ′T x) = (x, 0) = 0. But

(x, T ′T x) = (T ′T x, x) = (T x, T x),IP where the first equality is IP.1 and the second is the definition of T ′, so T x = 0.

We can even get a version of the Fredholm Alternative 37 as a corollary. I leave it to you to unravel why I think this a version of the Fredholm Alternative.

57 Corollary (Fredholm Alternative II) Let T : V → W , where V and W are inner product spaces, and let z belong to W . Then either

there exists x ∈ V with z = T x, or else there exists y ∈ W with (y, z) ≠ 0 & T ′y = 0.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 22

Proof : The first alternative states that z ∈ range T , or equivalently (by Theorem 53), that ′ ′ z ∈ (null space T )⊥. If this is not the case, that is, if z∈ / (null space T )⊥, then there must be some y in null space T ′ that is not orthogonal to z. Such a y satisfies T ′y = 0 and (y, z) ≠ 0, which is the second alternative. To see that the two alternatives are inconsistent, suppose that x and y satisfy the alterna- tives. Then 0 ≠ (y, z) = (y, T x) = (T ′y, x) = (0, x) = 0, a contradiction. (The middle equality is just the definition of the transpose.)

58 Proposition (Summary) For a linear transformation T between finite-dimensional spaces, range T ′T = range T ′ and range TT ′ = range T , so

rank T = rank T ′ = rank T ′T = rank TT ′.

7.3 Orthogonal transformations 59 Definition Let V be a real inner product space, and let T : V → V be a linear transfor- mation of V into itself. We say that T is an orthogonal transformation if its adjoint is its inverse, T ′ = T −1.

60 Proposition For a linear transformation T : V → V on an inner product space, the follow- ing are equivalent. 1. T is orthogonal. That is, T ′ = T −1.

2. T preserves norms. That is, for all x,

(T x, T x) = (x, x). (7)

3. T preserves inner products, that is, for every x, y ∈ V ,

(T x, T y) = (x, y).

Proof : (1) =⇒ (2) Assume T is orthogonal. Fix x and let y = T x. By the definition of T ′ we have (T ′y, x) = (y, T x), so (x, x) = (T ′T x, x) = (T ′y, x) = (y, T x) = (T x, T x). (2) =⇒ (3) Assume T preserves norms. By Proposition 23, ∥T x + T y∥ − ∥T x − T y∥ (T x, T y) = 4 ∥T (x + y)∥ − ∥T (x − y)∥ = 4 ∥x + y∥ − ∥x − y∥ = 4 = (x, y).

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 23

(3) =⇒ (1) Assume T preserves inner products. By the definition of T ′, for all x, y, (T ′y, x) = (y, T x). Taking y = T z, we have (T ′T z, x) = (T ′y, x) = (y, T x) = (T z, T x) = (z, x), so by Lemma 18, T ′T z − z = 0 for every z, which is equivalent to T ′ = T −1. A norm preserving mapping is also called an isometry. Since the composition of norm- preserving mappings preserves norms we have the following.

61 Corollary The composition of orthogonal transformations is an orthogonal transformation. An orthogonal transformation preserves angles (since it preserves inner products) and dis- tances between vectors. Reflection and rotation are the basic orthogonal transformations ina finite-dimensional Euclidean space.

7.4 Symmetric transformations 62 Definition A transformation T : Rm → Rm is symmetric or self-adjoint if T ′ = T .A transformation T is skew-symmetric if T ′ = −T .

m m 63 Proposition Let πM : R → R be the orthogonal projection on M. Then πM is sym- metric.

Proof : To show (x, πM z) = (πM x, z). Observe that

(x, zM ) = (xM + x⊥, zM ) = (xM , zM ) + (x⊥, zM ) = (xM , zM ) + 0. Similarly, (xM , z) = (xM , zM + z⊥) = (xM , zM ) + (xM , z⊥) = (xM , zM ) + 0.

Therefore (x, zM ) = (xM , z) = (xM , zM ). In light of the following exercise, a linear transformation is an orthogonal projection if and only if it is symmetric and idempotent. A transformation T is idempotent if T 2x = T T x = T x for all x.

64 Exercise Let P : Rm → Rm be a linear transformation satisfying P ′ = P and P 2x = P x for all x ∈ Rm. That is, P is idempotent and symmetric. Set M = I − P (where I is the identity on Rm). Prove the following. 1. For any x, x = Mx + P x. 2. M 2x = Mx for all x and M ′ = M.

3. null space P = (range P )⊥ = range M and null space M = (range M)⊥ = range P 4. P is the orthogonal projection onto its range. Likewise for M. □

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 24

8 Eigenvalues and eigenvectors

65 Definition Let V be a real vector space, and let T be a linear transformation of V into itself, T : V → V . A real number λ is an eigenvalue of T if there is a nonzero vector x in V such that T x = λx. The vector x is called an eigenvector of T associated with λ. Note that the vector 0 is by definition not an eigenvector of T .5 If T has an eigenvalue λ with eigenvector x, the transformation “stretches” the space by a factor λ in the direction x. While the vector 0 is never an eigenvector, the scalar 0 may be an eigenvalue. Indeed 0 is the eigenvalue associated with any nonzero vector in the null space of T . There are linear transformations with no (real) eigenvalues. For instance, consider the rotation of R2 by ninety degrees. This is given by the transformation (x, y) 7→ (−y, x). In order to satisfy T x = λx we must have λx = −y and λy = x. This cannot happen for any nonzero real vector (x, y) and real λ. On the other hand, the identity transformation has an eigenvalue 1, associated with every nonzero vector. Observe that there is a unique eigenvalue associated with each eigenvector: If T x = λx and T x = αx, then αx = λx, so α = λ, since by definition x is nonzero. On the other hand, one eigenvalue must be associated with many eigenvectors, for if x is an eigenvector associated with λ, so is any nonzero scalar multiple of x. More generally, a linear combination of eigenvectors corresponding to an eigenvalue is also an eigenvector corresponding to the same eigenvalue (provided the linear combination does not equal the zero vector). The span of the set of eigenvectors associated with the eigenvalue λ is called the eigenspace of T corresponding to λ. Every nonzero vector in the eigenspace is an eigenvector associated with λ. The dimension of the eigenspace is called the multiplicity of λ.

66 Proposition If T : V → V is idempotent, then each of its eigenvalues is either 0 or 1.

Proof : Suppose T x = λx with x ≠ 0. Since T is idempotent, we have λx = T x = T 2x = λ2x. Since x ≠ 0, this implies λ = λ2, so λ = 0 or λ = 1.

For distinct eigenvalues we have the following, taken from [6, Theorem 4.1, p. 100].

67 Theorem Let x1, . . . , xn be eigenvectors associated with distinct eigenvalues λ1, . . . , λn. Then the vectors x1, . . . , xn are independent.

Proof : The proof is by induction n. The case n = 1 is trivial, since by definition eigenvectors are nonzero. Now consider n > 1 and suppose that the result is true for n − 1. Now let

∑n αixi = 0. (8) i=1

5For a linear transformation of a complex vector space, eigenvalues may be complex, but I shall only deal with real vector spaces here.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 25

Applying the transformation T to both sides gives ∑n αiλixi = 0. (9) i=1

Let us eliminate xn from (9) by multiplying (8) by λn and subtracting to get

∑n n∑−1 αi(λi − λn)xi = αi(λi − λn)xi = 0. i=1 i=1

1 n−1 But x , . . . , x are independent so by the induction hypothesis αi(λi − λn) = 0 for i = 1, . . . , n − 1. Since the eigenvalues are distinct, this implies each αi = 0 for i = 1, . . . , n − 1. Thus (8) reduces to αnxn = 0, which implies αn = 0. This shows that x1, . . . , xn are independent.

68 Corollary A linear transformation on an n-dimensional space has at most n distinct eigen- values. If it has n, then the space has a basis made up of eigenvectors. When T is a symmetric transformation of an inner product space into itself, not only are eigenvectors associated with distinct eigenvalues independent, they are orthogonal.

69 Proposition Let V be a real inner product space, and let T be a symmetric linear trans- formation of V into itself. Let x and y be eigenvectors of T corresponding to eigenvalues α and λ with α ≠ λ. Then x ⊥ y.

Proof : We are given T x = αx and T y = λy. Thus (T x, y) = (λx, y) = λ(x, y) and (x, T y) = (x, αy) = α(x, y). Since T is symmetric, (T x, y) = (x, T y), so α(x, y) = λ(x, y). Since λ ≠ α we must then have (x, y) = 0.

Also if T is symmetric, we are guaranteed that it has plenty of eigenvectors.

70 Theorem Let T : V → V be symmetric, where V is an n-dimensional inner product space. Then V has an orthonormal basis consisting of eigenvectors of T .

 Proof : This proof uses some well known results from topology and calculus that are beyond the scope of these notes. Cf. Anderson [4, pp. 273–275], Carathéodory [10, § 195], Franklin [12, Section 6.2, pp. 141–145], Rao [21, 1f.2.iii, p. 62]. Let S = {x ∈ V :(x, x) = 1} denote the unit sphere in V . Set M0 = {0} and define S0 = S ∩ M0⊥. Define the quadratic form Q: V → R by Q(x) = (x, T x).

It is easy to see that Q is continuous, so it has a maximizer on S0, which is compact. (This maximizer cannot be unique, since Q(−x) = Q(x), and indeed if T is the identity, then Q is constant on S.) Fix a maximizer x1 of Q over S0. Proceed recursively for k = 1, . . . , n − 1. Let Mk denote the span of x1, . . . , xk, and set Sk = S ∩ Mk⊥. Let xk+1 maximize Q over Sk. By construction, xk+1 ∈ Mk⊥, so the xk’s are orthogonal, indeed orthonormal.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 26

Since x1 maximizes Q on S = S0, it maximizes Q subject to the constraint 1 − (x, x) = 0. Now Q(x) = (x, T x) is continuously differentiable and Q′(x) = 2T x, and the gradient of the constraint function is −2x, which is clearly nonzero (hence linearly independent) on S. 1 It( is a nuisance) to have these 2s popping up, so let us agree to maximize 2 (x, T x) subject 1 − 2 1 (x, x) = 0 instead. Therefore by the well known Lagrange Multiplier Theorem, there exists λ1 satisfying T x1 − λ1x1 = 0.

This obviously implies that the Lagrange multiplier λ1 is an eigenvalue of T and x1 is a corre- sponding eigenvector. Further, it is the value of the maximum:

Q(x1) = (x1, T x1) = (x1, λ1x1) = λ1,

since (x1, x1) = 1. Let x1, . . . , xn be defined as above and assume that for i = 1, . . . , k < n, each xi is an eigenvector of T and that λi = Q(xi) is its corresponding eigenvalue. We wish to show that xk+1 is an eigenvector of T and λk+1 = Q(xk+1) is its corresponding eigenvalue. 1 By construction, xk+1 maximizes 2 Q(x) subject to the k + 1 constraints ( ) 1 − 2 1 (x, x) = 0, (x, x1) = 0,... (x, xk) = 0.

The gradients of these constraint functions are −x and x1, . . . , xk respectively. By construc- tion, x1, . . . , xk+1 are orthonormal, so at x = xk+1 the gradients of the constraint are linearly independent. Therefore there exist Lagrange multipliers λk+1 and µ1, . . . , µk satisfying

T xk+1 − λk+1xk+1 + µ1x1 + ··· + µkxk = 0. (10)

Therefore

Q(xk+1) = (xk+1, T xk+1) = λk+1(xk+1, xk+1) − µ1(xk+1, x1) − · · · − µk(xk+1, xk) = λk+1, since x1, . . . , xk+1 are orthonormal. That is, the multiplier λk+1 is the maximum value of Q over Sk. By hypothesis, T xi = λixi for i = 1, . . . , k. Then since T is symmetric,

(xi, T xk+1) = (xk+1, T xi) = (xk+1, λixi) = 0, i = 1, . . . , k.

That is, T xk+1 ∈ Mk⊥, so T xk+1 − λk+1xk+1 ∈ Mk⊥, so by Lemma 27 equation (10) implies − ··· T| xk+1 {zλk+1xk+1} = 0 and |µ1x1 + {z + µkxk} = 0. ∈ ∈Mk⊥ Mk

We conclude therefore that T xk+1 = λk+1xk+1, so that xk+1 is an eigenvector of T and λk+1 is the corresponding eigenvalue. Since V is n-dimensional, x1, . . . , xn is an orthonormal basis of eigenvectors, as we desire.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 27

9 Matrices

A matrix is merely a rectangular array of numbers, or equivalently, a doubly indexed ordered list of real numbers. An m × n matrix has m rows and n columns. The entries in a matrix are doubly indexed, with the first index denoting its row and the second its column. Here isa generic matrix:   α1,1 . . . α1,n  . .  A =  . .  αm,1 . . . αm,n The generally accepted plural of matrix is matrices, although matrixes is okay too. Matrices are of interest for two separate but hopelessly intertwined reasons. One is their relation to systems of linear equations and inequalities, and the other is their connection to linear transformations between finite-dimensional vector spaces. The set of m × n matrices is denoted

M(m, n).

th j th Given a matrix A, let Ai denote the i row of A and let A denote the j column. The th th i row and j column entry is generally denoted by a lower case Greek letter, e.g., αi,j. We can identify the rows or columns of a matrix with singly indexed lists of real numbers, that is, elements of some Rk. If A is m × n, the column space of A is the subset of Rm spanned by the n columns of A. Its row space is the subspace of Rn spanned by its m rows.

9.1 Matrix operations If both A and B are m × n matrices, the sum A + B is the m × n matrix C defined by

γi,j = αi,j + βi,j.

The scalar multiple αA of a matrix A by a scalar α is the matrix M defined by mi,j = ααi,j. Under these operations the set of m × n matrices becomes an mn-dimensional vector space.

71 Proposition The set M(m, n) is a vector space under the operations of matrix addition and scalar multiplication. It has dimension mn. If A is m × p and B is p × n, the product of A and B is the m × n matrix whose ith row, th j th th j column entry is the dot product Ai · B of the i row of A with the j column of B. j (AB)i,j = Ai · B The reason for this peculiar definition is explained in the next section. Vectors in Rn may also be considered matrices as either a column or a row vector. Let x be a row m-vector (a 1 × m matrix), y be a column n-vector (an n × 1 matrix), and let A be an m × n matrix. Then the matrix product xA considered as a vector in Rn belongs to the row space of A, and Ay as a vector in Rm belongs to the column space of A. ∑m ∑n j xA = ξiAi and Ay = ηjA i=1 j=1

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 28

Note that the ith row of AB is given by

(AB)i = (Ai)B and the jth column of AB is given by

(AB)j = A(Bj).

The main diagonal of a A = [αi,j] is the set of αi,j with i = j. A matrix is called a diagonal matrix if it is square and all its nonzero entries are on the main diagonal. A square matrix is upper triangular if the only nonzero elements are on or above the main diagonal, that is, if i > j implies αi,j = 0. A square matrix is lower triangular if i < j implies αi,j = 0. The n × n diagonal matrix I whose diagonal entries are all 1 and off-diagonal entries are all 0 has the property that AI = IA = A for any n × n matrix A. The matrix I is called the n × n identity matrix. The zero matrix has all its entries zero, and satisfies A + 0 = A.

72 Fact (Summary) Direct computation reveals the following facts.

(AB)C = A(BC) AB ≠ BA (in general) A(B + C) = (AB) + (AC) (A + B)C = AC + BC

73 Exercise Verify the following. 1. The product of upper triangular matrices is upper triangular.

2. The product of lower triangular matrices is lower triangular.

3. The product of diagonal matrices is diagonal.

4. The inverse of an upper triangular matrix is upper triangular (if it exists).

5. The inverse of a lower triangular matrix is lower triangular (if it exists).

6. The inverse of a diagonal matrix is diagonal (if it exists). □

9.2 Systems of linear equations One of the main uses of matrices is the simplification of representing a system of linear equations. For instance, consider the following system of equations.

3x1 + 2x2 = 8

2x1 + 3x2 = 7

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 29

It has align unique solution x1 = 2 and x2 = 1. One way to solve this is to take the second 7 − 3 7 − equation and solve for x1 = 2 2 x2 and substitute this into the first equation to get 3( 2 3 7 − 3 2 x2) + 2x2 = 8, so x2 = 1 and x1 = 2 2 = 2. However Gauss formalized a computationally efficient way to attack these problems using elementary row operations. The first step is to write down the so-called augmented coefficient matrix of the system, which isthe 2 × 3 matrix of just the numbers above: [ ] 3 2 8 . 2 3 7 There are three elementary row operations, and they correspond to steps used to solve a system of equations. One of these operations is to interchange two rows. We won’t use that here. Another is to multiply a row by a nonzero scalar. This does not change the solution. The third operation is to add one row to another. These last two operations can be combined, and we can think of adding a scalar multiple of one row to another as an elementary row operation. We apply these operations until we get a matrix of the form [ ] 1 0 a 0 1 b

which is the of the system

x1 = a

x2 = b and the system is solved. There is a simple algorithm for deciding which elementary row operations to apply, namely, the algorithm. In a section below, we shall go into this algorithm in detail, but let me just give you a hint here. First we multiply 1 the first row by 3 , to get a leading 1: [ ] 2 8 1 3 3 2 3 7

We want to eliminate x1 from the second equation, so we add an appropriate multiple of the first row to the second. In this case the multiple is −2, the result is: [ ] [ ] 2 8 2 8 1 3 3 1 3 3 − · − · 2 − · 8 = 5 5 . 2 2 1 3 2 3 7 2 3 0 3 3

3 Now multiply the second row by 5 to get [ ] 1 2 8 3 3 . 0 1 1

− 2 Finally to eliminate x2 from the first row we add 3 times the second row to the first and get [ ] [ ] 1 − 2 · 0 2 − 2 · 1 8 − 2 · 1 1 0 2 3 3 3 3 3 = , 0 1 1 0 1 1

which accords with our earlier result.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 30

9.3 Matrix representation of a linear transformation Let T be a linear transformation from the n-dimensional space V into the m-dimensional space W . Let x1, . . . , xn be an ordered basis for V and y1, . . . , ym be an ordered basis for W . Let ∑m T x1 = αi,1yi i=1 ∑m T x2 = αi,2yi i=1 . . ∑m T xn = αi,nyi. i=1 The m × n array of numbers   α1,1 . . . α1,n  . .  M(T ) =  . .  αm,1 . . . αm,n

is the matrix representation of T with respect to the ordered bases (xj), (yi). Note th that the j column of this matrix is the coordinate vector of T xj with respect to the ordered basis y1, . . . , ym. This representation provides an isomorphism between matrices and linear transformations. The proof is left as an exercise.

74 Proposition Let V be an n-dimensional vector space, and let W be an m-dimensional vector space. Fix an ordered basis for each, and let M(T ) be the matrix representation a linear transformation T : V → W . Then the mapping

T → M(T ) is a linear isomorphism from L(V,W ) to M(m, n).

Given the coordinate vector of an element∑ x of V , we can use to n compute the coordinates of T x. Let x = j=1 ξjxj. Then

∑n T x = ξjT xj j=1 ∑n ∑m = ξj αi,jyi j=1  i=1  ∑m ∑n   = αi,jξj yi i=1 j=1

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 31

The coordinates of T x with respect to (yi) are given by

∑n (T x)i = αi,jξj. j=1

That is, the coordinate vector of T x is M(T ) times the column vector of coordinates of x. The matrix representation can be thought of in the following terms. Let X denote the n coordinate mapping of V onto R with respect to the ordered basis x1, . . . , xn. Similarly Y denotes the coordinate mapping from W onto Rm. In order to compute the coordinates of T z, we first find the coordinate vector of Xz, and then multiply by the matrix M(T ), as the following “commutative diagram” shows. T V W

X Y

M(T ) Rn Rm

75 Example (The matrix of the coordinate mapping) Consider the case V = Rm. That is, elements of V are already thought of as ordered lists of real numbers. Let x1, . . . , xm be an ordered basis, and let X denote the coordinate mapping from Rm with this basis to Rm th with the standard basis. Then M(X) is simply the matrix whose j column is xj. □

76 Example (Matrices as linear transformations) Let A = [αi,j] be an m × n matrix and   x1  .  x =  .  xn a n × 1 matrix. The matrix product Ax is an m × 1 matrix whose ith row is

∑n αi,jxj i = 1, . . . , m. j=1

Then T : x 7→ Ax defines a linear transformation from Rn to Rm. The matrix M(T ) of this transformation with respect to the standard ordered bases (the bases of unit coordinate vectors) is just A. □

77 Definition The rank of a matrix is the largest number of linearly independent columns. It follows from Proposition 49 that:

78 Proposition An m × m matrix has an inverse if and only if it has rank m.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 32

79 Example (The identity matrix) What is the matrix representation for the identity m → m mapping I : R R ?   1 0  .  M(I) =  ..  0 1 is the m × m identity matrix I. If the transformation T is invertible, so that TT −1 = I, then M(T )M(T −1) = I. The matrix M(T −1) is naturally referred to as the inverse of the matrix M(T ). In general, if A defines an invertible transformation from Rm onto Rm, the matrix A−1 satisfies AA−1 = A−1A = I. □

80 Example (The zero matrix) The matrix for the zero transformation 0: Rn → Rm, is   0 ··· 0 . . . . , 0 ··· 0 the m × n zero matrix. □ The transpose of a matrix is the matrix formed by interchanging rows and columns. This definition is justified by the following lemma.

′ 81 Lemma M(T )i,j = M(T )j,i.

Proof : ′ ′ ′ M(T )i,j = (ei, T ej) = (T ei, ej) = (ej,T ei) = M(T )j,i

9.4 Gershgorin’s Theorem For a diagonal matrix, the diagonal elements of the matrix are the eigenvalues of the corre- sponding linear transformation. Even if the matrix is not diagonal, its eigenvalues are “close” to the diagonal elements. [ ] 82 Gershgorin’s Theorem Let A = αi,j be an m × m real matrix. If λ is an eigenvalue of A, that is, if Ax = λx for some nonzero x, then for some i, ∑ |λ − αi,i| ⩽ |αi,j|. j:j≠ i

Proof : (Cf. Franklin [12, p. 162].) Let x be an eigenvector of A belonging to the eigenvalue λ. Choose i so that |xi| = max{|x1|,..., |xm|},

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 33

and note that since x ≠ 0, we have |xi| > 0. Then by the definition of eigenvalue and eigenvector,

(λI − A)x = 0. ∑ th − − − Now the i component of the vector (λI A)x is just (λ αi,i)xi j:j≠ i αi,jxj = 0, so ∑ (λ − αi,i)xi = αi,jxj j:j≠ i

so taking absolute values, ∑

|λ − αi,i| |xi| = αi,jxj j:j≠ i ∑ ⩽ |αi,j| |xj| j:j≠ i

so dividing by |xi| > 0,

∑ |x | |λ − α | ⩽ |α | j i,i i,j |x | j:j≠ i i ∑ ⩽ |αi,j|. j:j≠ i

A square matrix A has a dominant diagonal if for each i, ∑ |ai,i| > |αi,j|. j:j≠ i

Note that any diagonal matrix with nonzero diagonal elements has a dominant diagonal. This leads to the following corollary of Gershgorin’s Theorem.

83 Corollary If A is a nonnegative square matrix with a dominant diagonal, then every eigen- value of A is strictly positive.

Proof : Since A is nonnegative, if λ is an eigenvalue, then Gershgorin’s theorem implies that for some i, ∑ αi,i − λ ⩽ |λ − αi,i| ⩽ αi,j, j:j≠ i so ∑ 0 < αi,i − αi,j ⩽ λ, j:j≠ i where the first inequality is the dominant diagonal property.

For application of this corollary and related results, see McKenzie [19].

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 34

9.5 Matrix representation of a composition p → n k=1,...,p n → m Let S take R R be linear with matrix M(S) = [βj,k]j=1,...,n. Let T take R R be linear with matrix M(T ) = [α ]j=1,...,n . Then T ◦ S : Rp → Rm is linear. What is M(TS)?   i,j i=1,...,m cx1  .  Let x =  . . Then xp ( ) ∑n ∑p Sx = βj,kxk ej. j=1 k=1  ( ) ∑m ∑n ∑p   T (Sx) = αi,j βj,kxk ei. i=1 j=1 k=1 Set ∑n γi,k = αi,jβj,k. j=1 Then ( ) ∑m ∑p T (Sx) = γi,kxk ei. i=1 k=1 k=1,...,p Thus M(TS) = [γi,k]i=1,...,m. This proves the following theorem.

84 Theorem M(TS) = M(T )M(S). Thus multiplication of matrices corresponds to composition of linear transformations. Let S, T : Rn → Rm be linear transformations with matrices A, B. Then S + T is linear. The matrix for S + T is A + B where (A + B)i,j = αi,j + βi,j.

9.6 A linear transformation may have different matrix representations for different bases. Is there some way to tell if two matrices represent the same transformation? We shall answer this question for the important case where T maps an m-dimensional vector space V into itself, and we use the same basis for V as both the domain and the range. Let A = [αi,j] represent T with respect to the ordered basis x1, . . . , xm, and let B = [βi,j] represent T with respect to the ordered basis y1, . . . , ym. That is, ∑m ∑m T xi = αk,ixk and T yi = βk,iyk. k=1 k=1 m Let X be the coordinate map from V into R with respect to the ordered basis x1, . . . , xm, and let Y be the coordinate map for the ordered basis y1, . . . , ym. Then X and Y have full rank and so have inverses. Consider the following commutative diagram.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 35

B Rm Rm

Y Y

T V V

X X

A Rm Rm

The mapping XY −1 from Rm onto Rm followed by A from Rm into Rm, which maps the upper left Rm into the lower right Rm, is the same as B followed by XY −1. Let C be the matrix representation of XY −1 with respect to the standard ordered basis of unit coordinate vectors. Then A = CBC−1 and B = C−1AC. (⋆)

85 Definition Two square matrices A and B are called similar if there is some nonsingular C such that (⋆) holds. So we have already proved half of the following. The second half is left for you. (See Apostol [6, Theorem 4.7, p. 110] if you get stuck.)

86 Theorem Two matrices are similar if and only if they represent the same linear transfor- mation. The following are corollaries, but have simple direct proofs.

87 Proposition If A, B are similar with A = CBC−1, then λ is an eigenvalue of A if it is an eigenvalue of B. If x is an eigenvector of A, C−1x is an eigenvector of B.

Proof : Suppose x is an eigenvector of A, Ax = λx. Let y = C−1x. Since A = CBC−1,

λx = Ax = CBC−1x = CBy.

Premultiplying by C−1, λy = λC−1x = C−1CBy = By.

88 Proposition If A and B are similar, then rank A = rank B.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 36

Proof : We prove rank B ⩾ rank A. Symmetry completes the argument. Let z1, . . . , zk be a −1 basis for range A, zi = Ayi. Put wi = C yi. Then the Bwis are independent. To see this suppose

∑k ∑k ∑k (∑k ) −1 −1 −1 −1 0 = αi(Bwi) = αiC ACC yi = αiC zi = C αizi . i=1 i=1 i=1 i=1 ∑ −1 k Since C is nonsingular, this implies i=1 αizi = 0, which in turn implies αi = 0, i = 1, . . . , k.

9.7 The Principal Axis Theorem The next result describes the diagonalization of a symmetric matrix.

89 Definition A square matrix X is orthogonal if X′X = I, or equivalently X′ = X−1.

m m 90 Principal Axis Theorem Let A: R → R be a symmetric matrix. Let x1, . . . , xm be an orthonormal basis for Rm made up of eigenvectors of A, with corresponding eigenvalues λ1, . . . , λm. Set   λ1 0  .  Λ =  ..  , 0 λm and set X = [x1, . . . , xm]. Then A = XΛX−1, Λ = X−1AX, and X is orthogonal, that is, X−1 = X′.

′ −1 ′ −1 Proof : Now∑X X = I by orthonormality, so X = X . Pick any z and set y = X z, so m z = Xy = j=1 ηjxj. Then

∑m ∑m Az = ηjAxi = ηj(λjxj) j=1 j=1 = XΛy = XΛX−1z.

Since z is arbitrary A = XΛX−1.

This result is called the principal axis theorem because in the case where A is positive definite (see Definition 101 below), the columns of X are the principal axes of the ellipsoid {x : x · Ax = 1}. See Franklin [12, § 4.6, pp. 80–83].

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 37

9.8 Simultaneous diagonalization 91 Theorem (Simultaneous Diagonalization) Let A, B be symmetric m × m matrices. Then AB = BA if and only if there exists an orthonormal basis consisting of vectors that are eigenvectors of both A and B. Then letting X be the orthogonal matrix whose columns are the basis we have

−1 A = XΛAX −1 B = XΛBX ,

where ΛA and ΛB are diagonal matrices of eigenvalues of A and B respectively.

Partial proof : (⇐=)

−1 −1 −1 −1 −1 AB = XΛAX XΛBX = XΛAΛBX = XΛBΛAX = XΛBX XΛAX = BA,

since diagonal matrices commute. ( =⇒ ) We shall prove the result for the special case where A has distinct eigenvalues. In this case, the eigenvectors associated with any eigenvalue are distinct up to scalar multiplication. Let x be an eigenvector of A corresponding to eigenvalue λ. Suppose A and B commute. Then A(Bx) = BAx = B(λx) = λ(Bx). This means that Bx too is an eigenvector of A corresponding to λ, provided Bx ≠ 0. But as remarked above, this implies that Bx is a scalar multiple of x, so x is an eigenvector of B too. So let X be a matrix whose columns are a basis of orthonormal eigenvectors for both A and −1 B. Then it follows that A = XΛAX , where ΛA is the diagonal matrix of eigenvalues, and −1 similarly B = XΛBX . We now present a sketch of the proof for the general case. The crux of the proof is that in general, the eigenspace M associated with λ may be more than one-dimensional, so it is harder to conclude that the eigenvectors of A and B are the same. To get around this problem observe that B2A = BBA = BAB = ABB = AB2, and in general, BnA = ABn, so that if Ax = λx, then A(Bnx) = BnAx = Bnλx = λ(Bnx). That is, for every n, the vector Bnx is also an eigenvector of A corresponding to λ. Since the eigenspace M associated with λ is finite-dimensional, for some minimal k, the vectors x, Bx, . . . , Bkx are dependent. That is, there are α0, . . . , αk, not all zero, with

2 k α0x + α1Bx + α2B x + ··· + αkB x = 0.

2 k Let µ1, . . . , µk be the roots of the polynomial α0 + α1y + α2y + ··· + αky . Then [ ] (B − µ1I)(B − µ2I) ··· (B − µkI) x = 0.

Set z = [(B − µ2I)(B − µ3I) ··· (B − µkI)]x If k is minimal, then z ≠ 0. (Even if coefficients are complex. Independence over real field implies independence over complex field. Just look at real and imaginary parts.) Therefore (B − µ1I)z = 0 Claim: µ1 and z are real.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 38

Proof of claim: Let µ1 = α + iβ and z = x + iy B(x + iy) = (α + iβ)(x + iy) Therefore Bx = αx − βy and By = βx + αy (equate real and imaginary parts). Thus y′Bx = αy′x − βy′y and by symmetry y′Bx = x′By = βx′x + αx′y. Now, α(y′x) − β(y′y) = β(x′x) + α(x′y),

so β = 0 or z = 0, i.e., x = 0 and y = 0. But z ≠ 0, so β = 0 and µ1 is real. Similarly each µi is real. □

n Therefore z is a real linear combination of B x (all eigenvectors of A) satisfying (B−µ1I)z = 0. In other words, Bz = µ1z, so z is an eigenvalue of both B and A! We now consider the orthogonal complement of z in the eigenspace M to recursively con- struct a basis for M composed of eigenvectors for both A and B. We must do this for each eigenvalue λ of A. More details are in Rao [21, Result (iii), pp. 41–42].

9.9 Trace 92 Definition Let A be an m × m matrix. The trace of A, denoted tr A, is defined by ∑m tr A = αii. i=1 The trace is a linear functional on the set of m × m matrices.

93 Lemma Let A and B be m × m matrices. Then tr(αA + βB) = α tr A + β tr B (11) tr(AB) = tr(BA). (12)

Proof : The proof of linearity (11) is straightforward. For the proof of (12), observe that   ( ) ∑m ∑m ∑m ∑m   tr AB = αi,jβj,i = βj,iαi,j = tr BA. i=1 j=1 j=1 i=1

Equation (11) says that the trace is a linear functional on the vector space M(m, m) of m × m matrices. The next result says that up to a scale multiple it is the only linear functional to satisfy (12).

94 Proposition If ℓ is a linear functional on M(m, m) satisfying ℓ(AB) = ℓ(BA) for all A, B ∈ M(m, m), then there is a constant α such that for all A ∈ M(m, m), ℓ(A) = α tr A.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 39

Proof : To show uniqueness, let ℓ: M(m, m) → R be a linear functional satisfying

ℓ(AB) = ℓ(BA) for all A, B ∈ M(m, m).

Now ℓ belongs to the space L(M(m, m), R). The ordered basis on V induces a matrix repre- sentation for ℓ as an mm × 1 matrix, call it L so that ∑m ∑m ℓ(A) = LijAij. i=1 j=1

We also know that ℓ(AB − BA) = 0 for every A, B ∈ M(m, m). That is, ∑m ∑m ( ) ∑m ∑m ∑m − { − } Lij AB BA ij = Lij AikBkj BikAkj = 0. (13) i=1 j=1 i=1 j=1 k=1

Now consider the matrix A with all its entries zero except for the ith row, which is all ones, and B, which has all its entries zero, except for the jth column, which is all ones. If i ≠ j, then the i, j entry of AB is m and the rest are zero, whereas all the entries of BA are zero. In this case, (13) implies Lij = 0. Next consider the matrices A and B where the nonzero entries A are rows i and j, which consist of ones; and the nonzero entries of B are column i, which consists of ones, and column j, which is column of minus ones. Then BA = 0 and AB is zero except for (AB)ii = (AB)ji = m and (AB)ij = (AB)jj = −m. Since Lij = 0 whenever i ≠ j, equation (13) reduces to Liim + Ljj(−m) = 0, which implies Lii = Ljj. Let α be the common value of the elements of the diagonal matrix L. We have just shown that ℓ(A) = α tr A.

The trace of a matrix depends only on the linear transformation of Rm into Rm that it represents. Add: The trace is also equal to the sum of the characteristic 95 Proposition The trace of a matrix depends only on the linear transformation of Rm into roots (eigenvalues). Rm that it represents. In other words, if A and B are similar matrices, that is, if B = C−1AC, then tr B = tr A.

Proof : Theorem 86 asserts that two matrices represent the same transformation if and only if they are similar. By Lemma 93, equation (12),

tr B = tr C−1AC = tr ACC−1 = tr A.

96 Corollary Let V be an m-dimensional inner product space. There is a unique linear functional tr on L(V,V ) satisfying

tr ST = tr TS for all S, T ∈ L(V,V ), and tr I = m.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 40

97 Theorem If A is symmetric and idempotent, then tr A = rank A.

−1 Proof : Since A is symmetric, A = XBX where X = [x1, . . . , xm] is an orthogonal matrix whose columns are eigenvectors of A, and B is a diagonal matrix whose diagonal elements the eigenvalues of A, which are either are 0 or 1. Thus tr B is the number of nonzero eigenvalues of A. Also rank B is the number of nonzero diagonal elements. Thus tr B = rank B, but since A and B are similar, tr A = tr B = rank B = rank A.

It also follows that on the space of symmetric matrices, the trace can be used to define an inner product.

98 Proposition Let X denote the linear space of symmetric m × m real matrices. Then

(A, B) = tr AB is an inner product.

Proof : Lemma 93, equation (12) shows that (A, B) = (B,A) so IP.1 is satisfied. Moreover equation (11) implies

(A, B + C) = tr A(B + C) = tr(AB + AC) = tr AB + tr AC = (A, B) + (A, C), (αA, B) = tr αAB = α tr AB = α(A, B) (A, αB) = tr AαB = α tr AB = α(A, B) so IP.2 and IP.3 are satisfied. To see that IP.4 is satisfied, observe that if A is a symmetric matrix, then (AA)ii is just th ⩾ the inner product of the the i row∑ of A with itself, which is 0 and equals zero only if the ⩾ row is zero. Thus (A, A) = tr AA = i(AA)ii 0 and = 0 only if every row of A is zero, that is if A itself is zero.

99 Exercise What does it mean for symmetric matrices A and B to be orthogonal under this inner product? □

9.10 Matrices and orthogonal projection Section 4.3 discussed orthogonal projection as a linear transformation. In this section we discuss matrix representations for orthogonal projection. m Let M be a k-dimensional subspace of R , and let b1, . . . , bk be a basis for M. Given a vector y, the orthogonal projection yM of y is the vector in M that minimizes the distance to y (Proposition 34). The difference y⊥ = y − yM is orthogonal to M (Theorem 31). The next result is crucial to the statistical analysis of linear regression models.

100 Least squares regression Let B the m × k matrix with columns b1, . . . , bk, where m {b1, . . . , bk} is a basis for the k-dimensional subspace M. Given a vector y in R , the orthogonal projection yM of y onto M satisfies

′ −1 ′ yM = B(B B) B y.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 41

Proof : The first thing to note is that since {b1, . . . , bk} is a basis for M, the matrix B has rank k, so by Corollary 54, the k × k matrix B′B has k, so by Proposition 78 it is invertible. Next note that the m × 1 column vector B(B′B)−1B′y belongs to M. In fact, setting

a = (B′B)−1B′y

(a k × 1 column matrix), we see that it is the linear combination

∑k Ba = αjbj j=1

of basis vectors. Thus by the Orthogonal Projection Theorem 31, to show that yM = Ba, it suffices to show that y − Ba is orthogonal to M. This in turn is equivalent to y − Ba being ′ th orthogonal to each basis vector bj. Now for any x, the k × 1 column matrix B x has as its j ′ (row) entry the dot product bj · x. Thus all we need do is show that B (y − Ba) = 0. To this end, compute B′(y − Ba) = B′y − B′B(B′B)−1B′y = B′y − B′y = 0.

A perhaps more familiar way to restate this result is that the vector a of coefficients that minimizes the sum of squared residuals, (y − Ba) · (y − Ba), is given by a = (B′B)−1B′y.

10 Quadratic forms

We introduced quadratic forms in the proof of Theorem 70. We go a little deeper here. If you want to know even more, I recommend my on-line notes [9]. Let A be an n × n symmetric matrix, and let x be an n-vector. Then x · Ax is a scalar, and

∑n ∑n x · Ax = αijξiξj. (14) i=1 j=1

(We may also write this as x′Ax in matrix notation.) The mapping Q: x 7→ x · Ax is the quadratic form defined by A.6

101 Definition A symmetric matrix A (or its associated quadratic form) is called

• positive definite if x · Ax > 0 for all nonzero x.

6For decades I was baffled by the term form. I once asked Tom Apostol at a faculty cocktail party what it meant. He professed not to know (it was a cocktail party, so that is excusable), but suggested that I should ask John Todd. He hypothesized that mathematicians don’t know the difference between form and function, a clever reference to modern architectural philosophy. I was too intimidated by Todd to ask, but I subsequently learned (where, I can’t recall) that form refers to a polynomial function in several variables where each term in the polynomial has the same degree. (The degree of the term is the sum of the exponents. For example, in the expression xyz + x2y + xz + z, the first two terms have degree three, the third term has degree two and thelast one has degree one. It is thus not a form.) This is most often encountered in the phrases (each term has degree one) or quadratic form (each term has degree two).

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 42

• negative definite if x · Ax < 0 for all nonzero x.

• positive semidefinite if x · Ax ⩾ 0 for all x.

• negative semidefinite if x · Ax ⩽ 0 for all x.

We want all our (semi)definite matrices to be symmetric so that their eigenvectors generate n A+A′ · an orthonormal basis for R . If A is not symmetric, then 2 is symmetric and x Ax = · A+A′ x ( 2 )x for any x, so we do not lose much applicability by this assumption. Some authors use the term quasi (semi)definite when they do not wish to impose symmetry.

10.1 Diagonalization of quadratic forms By the Principal Axis Theorem 90 we may write

A = XΛX′,

where X is an orthogonal matrix with columns that are eigenvectors of A, and Λ is a diagonal matrix of eigenvalues of A. Then the quadratic form can be written in terms of the diagonal matrix Λ: ∑n · ′ ′ ′ ′ 2 x Ax = x Ax = x XΛX x = y Λy = λiyi , i=1 where y = X′x.

102 Proposition (Eigenvalues and definiteness) The symmetric matrix A is

1. positive definite if and only if all its eigenvalues are strictly positive.

2. negative definite if and only if all its eigenvalues are strictly negative.

3. positive semidefinite if and only if all its eigenvalues are nonnegative.

4. negative semidefinite if and only if all its eigenvalues are nonpositive.

Proof : As above, write ∑n ′ 2 x Ax = λiyi . i=1

where the λis are the eigenvalues of A. All the statements above follow from this equation and 2 ⩾ the fact that yi 0 for all k.

103 Proposition (Definiteness of the inverse) If A is positive definite (negative definite), then A−1 exists and is also positive definite (negative definite).

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 43

Proof : First off, how do we know the inverse of A exists? Suppose Ax = 0. Then x · Ax = x · 0 = 0. Since A is positive definite, we see that x = 0. Therefore A is invertible. Here are two proofs of the proposition. ⇒ −1 ⇒ −1 1 First proof. Since (Ax = λx) = (x = λA x) = (A x = λ x), the eigenvalues of A and A−1 are reciprocals, so they must have the same sign. Apply Proposition 102. Second proof. x · A−1x = y · Ay where y = A−1x.

11 Determinants

The main quick references here are Apostol [6, Chapter 3] and Dieudonné [11, Appendix A.6]. The main things to remember are:

• The determinant assigns a number to each square matrix A, denoted either det A or |A|. A matrix is singular if its determinant is zero, otherwise it is nonsingular. For an n×n identity matrix, det I = 1.

• det (AB) = det A · det B.

• Thus a matrix has an inverse if and only if its determinant is nonzero.

• Multiplying a row or a column by a scalar multiplies the determinant by the same amount. Consequently for an n × n matrix A,

det (−A) = (−1)n det A.

• Also consequently, the determinant of a diagonal matrix is the product of its diagonal elements.

• Adding a multiple of one row to another does not change the determinant.

• Consequently, the determinant of an upper (or lower) triangular matrix is the product of its diagonal.

• Moreover if a square matrix A is block upper triangular, that is, of the form [ ] BC A = , 0 D

where B and D are square, then det A = det B ·det D. Likewise for block lower triangular matrices.

• det A = det A′.

• The determinant can be defined recursively in terms of minors (determinants of subma- trices).

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 44

• The inverse of a matrix can be computed in terms of these minors. The inverse of A is the transpose of its cofactor matrix divided by det A.

• Cramer’s Rule: If Ax = b for a nonsingular matrix A, then det (A1,...,Ai−1, b, Ai+1,...,An) x = . i det A • The determinant is the oriented volume of the n-dimensional “cube” formed by its columns.

• The determinant det (λI − A), where A and I are n × n is an nth degree polynomial in λ, called the characteristic polynomial of A.

• A root (real or complex) of the characteristic polynomial of A is called a characteristic root of A. Characteristic roots that are real are also eigenvalues. If nonzero x belongs to the null space of λI − A, then it is an eigenvector corresponding to the eigenvalue λ.

• The determinant of A is the product of its characteristic roots.

• If A is symmetric, then det A is the product of its eigenvalues.

• If A has rank k then every of size greater than k has zero determinant and there is at least one minor of order k with nonzero determinant.

• The determinant of an orthogonal matrix is ±1.

11.1 Determinants as multilinear forms There are several ways to think about determinants. Perhaps the most useful is as an alternating multilinear n-form: n × · · · × n → A function φ: |R {z R} R is multilinear if it is linear in each variable separately. n copies That is, for each i = 1, . . . , n,

φ(x1, . . . , xi−1, αxi + βyi, xi+1, . . . , xn)

= αφ(x1, . . . , xi−1, xi, xi+1, . . . , xn) + βφ(x1, . . . , xi−1, yi, xi+1, . . . , xn).

A consequence of this is that if any xi is zero, then so is φ. That is,

φ(x1, . . . , xi−1, 0, xi+1, . . . , xn) = 0.

The multilinear function φ is alternating if xi = xj = z for distinct i and j, then

φ(x1,...,z,...,z,...,xn) = 0. The reason for this terminology is the following lemma.

104 Lemma The multilinear function φ is alternating if and only if interchanging xi and xj changes the sign of φ, that is,

φ(x1, . . . , xi, . . . , xj, . . . , xn) = −φ(x1, . . . , xj, . . . , xi, . . . , xn).

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 45

Proof : Suppose first that φ is alternating. Then

0 = φ(x1, . . . , xi + xj, . . . , xj + xi, . . . , xn)

= φ(x1, . . . , xi, . . . , xj + xi, . . . , xn) + φ(x1, . . . , xj, . . . , xj + xi, . . . , xn)

= φ(x1, . . . , xi, . . . , xj, . . . , xn) + |φ(x1, . . . , xi,{z . . . , xi, . . . , xn}) =0 + φ(x , . . . , x , . . . , x , . . . , x ) + φ(x , . . . , x , . . . , x , . . . , x ) | 1 j {z j n} 1 j i n =0

= φ(x1, . . . , xi, . . . , xj, . . . , xn) + φ(x1, . . . , xj, . . . , xi, . . . , xn).

So φ(x1, . . . , xi, . . . , xj, . . . , xn) = −φ(x1, . . . , xj, . . . , xi, . . . , xn). Now suppose interchanging xi and xj changes the sign of φ. Then if xi = xj = z,

φ(x1,...,z,...,z,...,xn) = −φ(x1,...,z,...,z,...,xn),

which implies φ(x1,...,z,...,z,...,xn) = 0, so φ is alternating.

There is an obvious identification of n×n square matrices with the elements of Rn ×· · ·×Rn. Namely A ↔ (A1,...,An), where you will recall Aj denotes the jth column of A interpreted as a vector in Rn. (We could have used rows just as well.) Henceforth, for a multilinear form φ and n × n matrix A, we write φ(A) for φ(A1,...,An). The next fact is rather remarkable, so pay close attention.

105 Proposition For every n × n matrix A, there is a number det A with the property that for any alternating multilinear n-form φ,

φ(A) = det A · φ(I). (⋆)

Proof : Before we demonstrate the proposition in general, let us start with a special case, n = 2. Let [ ] α α A = 1,1 1,2 . α2,1 α2,2 Then

φ(A) = φ(A1,A2)

= φ(α1,1e1 + α2,1e2, α1,2e1 + α2,2e2)

= α1,1φ(e1, α1,2e1 + α2,2e2) + α2,1φ(e2, α1,2e1 + α2,2e2)

= α1,1α1,2φ(e1, e1) + α1,1α2,2φ(e1, e2) + α2,1α1,2φ(e2, e1) + α2,1α2,2φ(e2, e2)

= 0 + α1,1α2,2φ(e1, e2) + α2,1α1,2φ(e2, e1) + 0

= α1,1α2,2φ(e1, e2) − α2,1α1,2φ(e1, e2)

= (α1,1α2,2 − α2,1α1,2)φ(I),

so we see that det A = α1,1α2,2 − α2,1α1,2. Thus the whole of φ is determined by the single number φ(I).

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 46

This is true more generally. Write   ∑n ∑n ∑n ∑n   φ(A) = φ αi1,1ei1 , αi2,2ei2 ,..., αij ,jeij ,..., αin,nein . i1=1 i2=1 ij =1 in=1 Now expand this using linearity in the first component:   ∑n ∑n ∑n ∑n   φ(A) = αi1,1φ ei1 , αi2,2ei2 ,..., αij ,jeij ,..., αin,nein . i1=1 i2=1 ij =1 in=1 Repeating this for the other components leads to ∑n ∑n ∑n ··· ··· φ(A) = αi1,1αi2,2 αin,nφ(ei1 , ei2 , . . . , ein ). i1=1 i2=1 in=1

Now consider φ(ei1 , ei2 , . . . , ein ). Since φ is alternating, this term is zero unless i1, i2, . . . , in are distinct. When these are distinct, then by switching pairs we get to ±φ(e1, e2, . . . , en) where the sign on whether we need an odd or an even number of switches. It now pays to introduce some new terminology and notation. A permutation i is an ordered list i = (i1, . . . , in) of the numbers 1, 2, . . . , n. The signature sgn(i) of i is 1 if i can be put in numerical order by switching terms an even number of times and is −1 if it requires an odd number. It follows How do we know that sgn(i) is well defined? then that defining ∑ · ··· det A = sgn(i) αi1,1αi2,2 αin,n, (15) i where the sum runs over all permutations i, satisfies the conclusion of the proposition.

This result still leaves the following question: Are there any alternating multilinear n-forms at all? The reason the result above does not settle this is that it would be vacuously true if there were none. Fortunately, it is not hard to verify that det itself is such an n-form.

106 Proposition The function A 7→ det A as defined in (15) is an alternating multilinear n-form. · ··· Proof : Observe that in each product sgn(i) αi1,1αi2,2 αin,n in the sum in (15) there is exactly one element from each row and each column of A. This makes it obvious that det(A1, . . . , αAj,...,An) = α det A, and it is straightforward to verify that det(A1, . . . , x + y, . . . , An) = det(A1, . . . , x, . . . , An) + det(A1,...,y,...,An).

i j To see that det is alternating, suppose A = A with i ≠ j. Then for any permutation i, ij ≠ ik, ′ ′ ∈ { } ′ ′ and there is exactly one permutation i satisfying ip = ip for p / i, j and ij = ik and ik = ij. Now observe that sgn(i) = − sgn(i′) as it requires an odd number of interchanges to swap two elements in a list (why?). Thus we can rewrite (15) as ∑ det A = α α ··· α − α ′ α ′ ··· α ′ , i1,1 i2,2 in,n i1,1 i2,2 in,n i:sgn(i)=1 but each α α ··· α − α ′ α ′ ··· α ′ = 0. (Why?) Therefore det A = 0, so det is i1,1 i2,2 in,n i1,1 i2,2 in,n alternating.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 47

To sum things up we have:

107 Corollary An alternating multilinear n-form is identically zero if and only if φ(I) = 0. The determinant is the unique alternating multilinear n-form φ that satisfies φ(I) = 1. Any other alternating multilinear n-form φ is of the form φ = φ(I) · det.

11.2 Some simple consequences 108 Proposition If A′ is the transpose of A, then det A = det A′. Write out a proof.

109 Proposition Adding a scalar multiple of one column of A to a different column leaves the determinant unchanged. Likewise for rows.

Proof : By multilinearity, det(A1,...,Aj+αAk,...,Ak,...,An) = det(A1,...,Aj,...,Ak,...,An)+ α det(A1,...,Ak,...,Ak,...,An), but det(A1,...,Ak,...,Ak,...,An) = 0 since det is alternat- ing. The conclusion for rows follows from that for columns and Proposition 108 on transposes.

110 Proposition The determinant of an upper triangular matrix is the product of the diagonal entries.

Proof : Recall that an upper triangular matrix is one for which i > j implies αi,j = 0. (Diagonal matrices are also upper triangular.) Now examine equation (15). The only summand that is nonzero comes from the permutation (1, 2, 3, . . . , n), since for any other permutation there is some j satisfying ij > j. (Why?)

By the way, the result also holds for lower triangular matrices.

111 Lemma If A is n×n and φ is an alternating multilinear n-form, so is the form φA defined by φA(x1, . . . , xn) = φ(Ax1, . . . , Axn). Furthermore φA(I) = φ(A) = det A · φ(I). (16)

Proof : That φA is an alternating multilinear n-form is straightforward. Therefore φA is pro- portional to φ. To see that the coefficient of proportionality is det A, consider φA(I). Direct computation shows that φA(I) = φ(A).

As an aside, I mention that we could have defined the determinant directly for linear transformations n n as follows. For a linear transformation T : R → R it follows that φT defined by φT (x1, . . . , xn) = φ(T x1, . . . , T xn) is an alternating n-form whenever φ is. It also follows that we could use (16) to define det T to be the scalar satisfying φT (x1, . . . , xn) = det T · φ(x1, . . . , xn). This is precisely Dieudonné’s approach. It has the drawback that minors and cofactors are awkward to describe in his framework.

112 Theorem Let A and B be n × n matrices. Then

det AB = det A · det B.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 48

Proof : Let φ be an alternating multilinear n-form. Applying (16) and (⋆), we see

φAB(I) = φ(AB) = det AB · φ(I).

On the other hand

φAB(I) = φA(B) = det B · φA(I) = det B · det A · φ(I).

Therefore det AB = det A · det B.

113 Corollary If det A = 0, then A has no inverse.

Proof : Observe that if A has an inverse, then

1 = det I = det A · det A−1 so det A ≠ 0.

114 Corollary The determinant of an orthogonal matrix is ±1.

Proof : Recall that A is orthogonal if A′A = I (Definition 89). By Theorem 112, det(A′) det(A) = 1, but by Proposition 108, det(A′) = det A, so (det A)2 = 1.

11.3 Minors and cofactors Different authors assign different meanings to theterm minor. Given a square n × n matrix   α1,1 . . . α1,n  . .  A =  . .  αn,1 . . . αn,n

Apostol [6, p. 87] defines the i, j minor of A to be the (n−1)×(n−1) submatrix obtained from th th A by deleting the i row and j column, and denotes it Ai,j. Gantmacher [14, p. 2] defines a minor of a (not necessarily square) matrix A to be the determinant of a square submatrix of A, and uses the following notation for minors in terms of the remaining rows and columns:

( ) αi1,j1 . . . αi1,jp

i1,...,ip . . A = . . . j1,...,jp αin,j1 . . . αip,jp

Here we require that 1 ⩽ i1 < i2 < ··· < ip leqn and 1 ⩽ j1 < j2 < ··· < jp ⩽ n. If the deleted (and hence( remaining)) rows and columns are the same, that is, if i1 = j1, i2 = j2, …,

i1,...,ip ip = jp, then A is called a principal minor of order p. I think the following j1,...,jp hybrid terminology is useful: a minor submatrix is any square submatrix of A (regardless of whether A is square) and a minor of A is the determinant of a minor submatrix (same as

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 49

Gantmacher). To be on the safe side, I may use the redundant term minor determinant to mean minor. Given a square n × n matrix   α1,1 . . . α1,n  . .  A =  . .  αn,1 . . . αn,n

th the cofactor cof αi,j of αi,j is the determinant obtained by replacing the j column of A with th the i unit coordinate vector ei. That is,

1 j−1 j+1 n cof αi,j = det(A ,...,A , ei,A ,...,A ).

By multilinearity we have for any column j,

∑n det A = αi,j cof αi,j. i=1

115 Lemma (Cofactors and minors) For any square matrix A,

i+j cof αi,j = (−1) det Ai,j,

th th where Ai,j is the minor submatrix obtained by deleting the i row and j column from A. Consequently ∑n i+j det A = (−1) αi,j det Ai,j. i=1 Similarly (interchanging the roles of rows and columns), for any row i,

∑n i+j det A = (−1) αi,j det Ai,j. j=1

Proof : Cf. Apostol [6, Theorem 3.9, p. 87]. By definition

α1,1 . . . α1,j−1 0 α1,j+1 . . . α1,n

......

α − . . . α − − 0 α − . . . α − i 1,1 i 1,j 1 i 1,j+1 i 1,n

cof αi,j = αi,1 . . . αi,j−1 1 αi,j+1 . . . αi,n .

αi+1,1 . . . αi+1,j−1 0 αi+1,j+1 . . . αi+1,n

......

αn,1 . . . αn,j−1 0 αn,j+1 . . . αn,n

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 50

Adding −αi,kei to column k does not change the determinant. Doing this for all k ≠ j yields

α1,1 . . . α1,j−1 0 α1,j+1 . . . α1,n

......

α − . . . α − − 0 α − . . . α − i 1,1 i 1,j 1 i 1,j+1 i 1,n

cof αi,j = 0 ... 0 1 0 ... 0 .

αi+1,1 . . . αi+1,j−1 0 αi+1,j+1 . . . αi+1,n

......

αn,1 . . . αn,j−1 0 αn,j+1 . . . αn,n Now by repeatedly interchanging columns a total of j − 1 times we obtain

0 α1,1 . . . α1,j−1 α1,j+1 . . . α1,n

......

0 α − . . . α − − α − . . . α − i 1,1 i 1,j 1 i 1,j+1 i 1,n − j−1 cof αi,j = ( 1) 1 0 ... 0 0 ... 0 .

0 αi+1,1 . . . αi+1,j−1 αi+1,j+1 . . . αi+1,n

......

0 αn,1 . . . αn,j−1 αn,j+1 . . . αn,n Interchanging rows i − 1 times yields

1 0 ... 0 0 ... 0

0 α . . . α − α . . . α 1,1 1,j 1 1,j+1 1,n ......

− i+j cof αi,j = ( 1) 0 αi−1,1 . . . αi−1,j−1 αi−1,j+1 . . . αi−1,n .

0 αi+1,1 . . . αi+1,j−1 αi+1,j+1 . . . αi+1,n

......

0 αn,1 . . . αn,j−1 αn,j+1 . . . αn,n (Recall that (−1)j−1+i−1 = (−1)i+j.) This last determinant is block diagonal, so we see that it is just |Ai,j|, which completes the proof. The conclusion for rows follows from that for columns and Proposition 108 on transposes.

By repeatedly applying this result, we can express det A in terms of 1 × 1 determinants. If we take the cofactors from an alien column or row , we have:

116 Lemma (Expansion by alien cofactors) Let A be a square n × n matrix. For any column j and any k ≠ j, ∑n αi,j cof αi,k = 0. i=1 Likewise for any row i and any k ≠ i, ∑n αi,j cof αk,j = 0. j=1

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 51

th Proof : Consider the matrix A1 = [˜αi,j] obtained from A by replacing the k column with another copy of column j. Then the i, k cofactors, i = 1, . . . , n, of A and A1 are the same. (The cofactors don’t depend on the column∑ they belong to, since∑ it is replaced by a unit coordinate | | n n | | vector.) So by Lemma 115 A1 = i=1 α˜i,k cof αi,k = i=1 αi,j cof αi,k. But A1 = 0 since it has two identical columns. The conclusion for rows follows from that for columns and Proposition 108 on transposes.

The transpose of the cofactor matrix is also called the adjugate matrix of A. Combining the previous two lemmas yields the following on the adjugate.

117 Theorem (Cofactors and the inverse matrix) For a square matrix   α1,1 . . . α1,n  . .  A =  . .  , αn,1 . . . αn,n we have       α1,1 . . . α1,n cof α1,1 ... cof αn,1 |A| 0  . .   . .   .   . .   . .  =  ..  . αn,1 . . . αn,n cof α1,n ... cof αn,n 0 |A|

That is, the product of A and its adjugate is (det A)In. Consequently, if det A ≠ 0, then   cof α1,1 ... cof αn,1 1   A−1 =  . .  . det A . . cof α1,n ... cof αn,n Combining this with Corollary 113 yields the following.

118 Corollary (Determinants and invertibility) A square matrix is invertible if and only if its determinant is nonzero. Similar reasoning leads to the following theorem due to Jacobi. It expresses a pth order minor of the adjugate in terms( of the corresponding) complementary minor of A. The complement of th i1,...,ip th the p -order minor A is the n − p -order minor obtained by deleting rows i1, . . . , ip j1,...,jp and columns j1, . . . , jp from A.

119 Theorem (Jacobi) For a square matrix   α1,1 . . . α1,n  . .  A =  . .  , αn,1 . . . αn,n we have for any 1 ⩽ p ⩽ n,

cof α1,1 ... cof αp,1 αp+1,p+1 . . . αp+1,n

| | · . . | |p · . . A . . = A . . .

cof α1,p ... cof αp,p αn,p+1 . . . αn,n

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 52

Proof : Observe, recalling Theorem 116 on alien cofactors, that     α1,1 . . . α1,p α1,p+1 . . . α1,n cof α1,1 ... cof αp,1 0 ... 0      . . . .   . . . .   . . . .   . . . .       α . . . α α . . . α   cof α ... cof α 0 ... 0   p,1 p,p p,p+1 p,n   1,p p,p       αp+1,1 . . . αp+1,p αp+1,p+1 . . . αp1,n   cof α1,p+1 ... cof αp,p+1 1 0   . . . .   . . .   . . . .   . . ..  αn,1 . . . αn,p αn,p+1 . . . αn,n cof α1,n ... cof αn,n 0 1

  |A| 0 α1,p+1 . . . α1,n    . . .   .. . .     0 |A| α . . . α   p,p+1 p,n  =    0 ... 0 αp+1,p+1 . . . αp+1,n   . . . .   . . . .  0 ... 0 αn,p+1 . . . αn,n and take determinants on both sides.

11.4 Characteristic polynomials The characteristic polynomial f of a square matrix A is defined by f(λ) = det (λI − A). Roots of this polynomial are called characteristic roots of A.

120 Lemma Every eigenvalue of a matrix is a characteristic root, and every real characteristic root is an eigenvalue.

Proof : To see this note that if λ is an eigenvalue with eigenvector x ≠ 0, then (λI − A)x = λx − Ax = 0, so (λI − A) is singular, so det (λI − A) = 0. That is, λ is a characteristic root of A. Conversely, if det(λI − A) = 0, then there is some nonzero x with (λI − A)x = 0, or Ax = λx.

121 Lemma The determinant of a square matrix is the product of its characteristic roots.

Proof : (Cf. [6, p. 106]) Let A be an n×n square matrix and let f be its characteristic polynomial. Then f(0) = det (−A) = (−1)n det A. On the other hand, we can factor f as

f(λ) = (λ − λ1) ··· (λ − λn)

n where λ1, . . . , λn are its characteristic roots. Thus f(0) = (−1) λ1 ··· λn.

122 Corollary The determinant of a symmetric matrix is the product of its eigenvalues.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 53

11.5 The determinant as an “oriented volume”

There is another interpretation of the determinant det(x1, . . . , xn). Namely it is the n-dimensional volume of the parallelotope defined by the vectors x1, . . . , xn, perhaps multiplied by −1 depend- ing on the “orientation” of the vectors. I don’t want to go into detail on the notion of orientation or even to prove this assertion in any generality, but I shall present enough so that you can get a glimmer of what the assertion is. First, what is a parallelotope? It is a polytope with parallel faces, for x + y example, a cube. The parallelotope generated by the vectors x1, . . . , xn is the ··· y convex hull of zero and the vectors of the form xi1 + xi2 + + xik , where 2 x 1 ⩽ k ⩽ n and i1, . . . , in are distinct. For instance, in R the parallelotope generated by x and y is the plane parallelogram with vertexes 0, x, y, and 0 x + y. The notion of n-dimensional volume is straightforward. For n = 1, it is length, for n = 2, it is area, etc. Oriented volume is more complicated. It distinguishes parallelotopes based on the order that the vectors are presented. For instance, in the figure above the angle swept out from x to y counterclockwise is positive, while from y to x is clockwise. The oriented volume is positive in the first case and negative in the second. 2 Let’s start simply in R , with vectors a = (ax, ay) and b = (bx, by). See Figure 2. The area a + b R1 R3 b R P 2 R2 a

R3 R1

Figure 2. The area of the parallelotope determined by a and b.

(two-dimensional volume, if you will) of the parallelotope P generated by the origin, a, b, and a + b is the area of the large square determined by the origin and a + b less the area of the six regions labeled R1,...,R3. Thus

area P = (ax + bx)(ay + by) − 2area R1 − 2area R2 − 2area R3 =

= (ax + bx)(ay + by) − axay − bxby − 2aybx

= axby − aybx. This is just the determinant det(a, b) or the matrix determinant [ ] a b det A = det x x . ay by I’ll leave it to you to verify that if we interchange a and b, the “oriented” area changes sign. What determines the orientation? The matrix A defines a linear transformation from R2 into

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 54

itself. It maps the first standard basis element e1 into the first column of A, and the second basis element e2 into the second column of A. The basis elements define a “cube” of volume 1. The image of this cube under A is the parallelotope defined by the columns of A. The vector e2 is the vector e1 rotated 1/4 turn counterclockwise. If the second column of A is a counterclockwise rotation of the first column, then the columns of A have the same “orientation” as the basis vectors, and the determinant is positive. Perhaps simplest way to convince yourself that the determinant is the oriented volume of the parallelotope, is to realize that oriented volume is an alternating multilinear form. It is clear that multiplying one of the defining vectors by a positive scalar multiplies the volume by the same factor. It is also clear that if two of the vectors are the same, then the parallelotope is degenerate and has zero volume. The additivity in the defining vectors can be appreciated, at least in two dimensions, by considering Figure 3,.

y1 + y2 x + y1 + y2

y1 x + y1

0 x

Figure 3. The area of the parallelogram defined by x and y is additive in y.

For more details see, for instance, Franklin [12, § 6.6, pp. 153–157].

11.6 Computing inverses and determinants by Gauss’ method This section describes how to use the method of Gaussian elimination to find the inverse of a matrix and to compute its determinant. Let A be an n × n , so that

AA−1 = I.

To find the jth column x = A−1j of A−1, recall from the definition of matrix multiplication that x satisfies Ax = Ij, where Ij is the jth column of the identity matrix. Indeed if A is invertible, then x is the unique vector with this property. Thus we can form the n × (n + 1) augmented coefficient matrix (A|Ij) and use elementary row operations of Gaussian elimination to transform it to (I|x) = (I|A−1j). The same sequence of row operations is employed to transform A into I regardless of which column Ij we use to augment with. So we can solve for all the columns simultaneously by augmenting with all the columns of I. That is, form the n × 2n augmented matrix (A|I1,I2,...,In) = (A|I). If we can transform this by elementary row operations into (I|X), it must be that X = A−1. Furthermore, if we cannot make the transformation, then A is not invertible.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 55

You may ask, how can we tell if we cannot transform A into I? Perhaps it is possible, but we are not clever enough. To attack this question we must write down a specific algorithm. So here is one version of an algorithm for inverting a matrix by Gaussian elimination, or else showing that it is not invertible.

Let A0 = A. At each stage t we apply an elementary operation to transform At−1 into At in such a way that

det At ≠ 0 ⇐⇒ det A ≠ 0.

The rules for selecting the elementary row operation are described below. By stage t = n, either At = I or else we shall have shown that det A = 0, so A is not invertible. Stage t: At stage t, assume that all the columns j = 1, . . . , t − 1 of At−1 have been transformed into the corresponding columns of I, and that det A ≠ 0 ⇐⇒ det At−1 ≠ 0. Step 1: Normalize the diagonal to 1. t−1 t−1 t Case 1: at,t is nonzero. Divide row 1 by at,t to set at,t = 1. This has the side t t−1 t−1 effect of setting det A = at,t det A . t−1 t−1 ̸ Case 2: at,t = 0, but there is row i with i > t for which ai,t = 0. Divide row i t−1 t t t−1 by ai,t and add it to row 1. This sets at,t = 1, and leaves det A = det A . t−1 t−1 ̸ Case 3: If at,t = 0, but there is no row i with i > t for which ai,t = 0. In this case, the first t columns of At−1 must be dependent. This is because there are t column vectors whose only nonzero components are in the first t − 1 rows. This implies det At−1 = 0, and hence det A = 0. This shows that A is not invertible. If Case 3 occurs stop. We already know that A is not invertible. In cases 1 and 2 proceed to: t ̸ Step 2: Eliminate the off diagonal elements. Since at,t = 1, for i = t multiply t−1 t ̸ row 1 by ai,t and subtract it from row i. This sets ai,t = 0, for i = t, and does not change det At. This completes the construction of At from At−1. Proceed to stage t+1, and note that all the columns j = 1, . . . , t of At have been transformed into the corresponding columns of I, and that det A ≠ 0 ⇐⇒ det At ≠ 0.

Now observe how this also can be used to calculate the determinant. Suppose the process n t−1 runs to completion so that A = I. Every time a row was divided by its diagonal element at,t t 1 t−1 to normalize it, we had det A = t−1 det A . Thus we have at,t ∏ n 1 1 = det I = det A = t−1 det A t−1̸ at,t t:at,t =0 or ∏ t−1 det A = at,t . t−1̸ t:at,t =0

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 56

One of the virtues of this approach is that it is extremely easy to program, and reasonably efficient. Each elementary row operation has atmost 2n multiplications and 2n , and at most n2 elementary row operations are required, so the number of steps grows no faster than 4n3. Here are two illustrative examples, which, by the way, were produced (text and all) by a 475-line program written in perl of all things.

123 Example (Matrix inversion by Gaussian elimination) Invert the following 3 × 3 matrix using Gaussian elimination and compute its determinant as a byproduct.    0 0 1    A =  0 1 0  1 0 0

We use only two elementary row operations: dividing a row by a scalar, which also divides the determinant, and adding a multiple of one row to a different row, which leaves the determinant unchanged. Thus the determinant of A is just the product of all the scalars used to divide the rows of A to normalize its diagonal elements to ones. We shall keep track of this product in the variable µ as we go along. At any given stage, µ times the determinant of the left hand block is equal to the determinant of A. Before each step, we put a box around the target entry to be transformed. α1,1 is zero,    0 0 1 1 0 0    µ = 1  0 1 0 0 1 0  1 0 0 0 0 1 so add row 3 to row 1. To eliminate α3,1 = 1,    1 0 1 1 0 1    µ = 1  0 1 0 0 1 0  1 0 0 0 0 1 subtract row 1 from row 3. To normalize α3,3 = −1,    1 0 1 1 0 1    µ = 1  0 1 0 0 1 0  0 0 −1 −1 0 0 divide row 3 (and multiply µ) by −1. To eliminate α1,3 = 1,    1 0 1 1 0 1    µ = −1  0 1 0 0 1 0  0 0 1 1 0 0

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 57 subtract row 3 from row 1. This gives us    1 0 0 0 0 1    µ = −1  0 1 0 0 1 0  0 0 1 1 0 0

To summarize:      0 0 1   0 0 1    −1   A =  0 1 0  A =  0 1 0  1 0 0 1 0 0 And the determinant of A is −1. We could have gotten this result faster by interchanging rows 1 and 3, but it is hard (for me) to program an algorithm to recognize when to do this. □

124 Example (Gaussian elimination on a singular matrix) Consider the following 3×3 matrix.    1 1 2    A =  2 4 4  3 6 6

We shall( ) use the( method) of Gaussian elimination to attempt transform the augmented block −1 matrix A|I into I|A . To eliminate α2,1 = 2,    1 1 2 1 0 0    µ = 1  2 4 4 0 1 0  3 6 6 0 0 1 multiply row 1 by 2 and subtract it from row 2. To eliminate α3,1 = 3,    1 1 2 1 0 0    µ = 1  0 2 0 −2 1 0  3 6 6 0 0 1 multiply row 1 by 3 and subtract it from row 3. To normalize α2,2 = 2,    1 1 2 1 0 0    µ = 1  0 2 0 −2 1 0  0 3 0 −3 0 1 divide row 2 (and multiply µ) by 2. To eliminate α1,2 = 1,    1 1 2 1 0 0   − 1  µ = 2  0 1 0 1 2 0  0 3 0 −3 0 1

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 58

subtract row 2 from row 1. To eliminate α3,2 = 3,   1 0 2 2 − 1 0  2   − 1  µ = 2  0 1 0 1 2 0  0 3 0 −3 0 1 multiply row 2 by 3 and subtract it from row 3. Now notice that the first 3 columns of A are dependent, as each column has at most its first 2 entries nonzero, and any 3 vectors ina 2-dimensional space are dependent.   1 0 2 2 − 1 0  2   − 1  µ = 2  0 1 0 1 2 0  − 1 0 0 0 0 1 2 1

Therefore A has no inverse, and the determinant of A is zero. (If we had interchanged columns 1 and 3, we would have arrived at this result sooner, but hindsight is hard to program.) □

125 Exercise Using the language of your choice, write a program to accept a square matrix and invert it by Gaussian elimination, or stop when you find it is not invertible. □

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 59

References

[1] A. C. Aitken. 1954. Determinants and matrices, 8th. ed. New York: Interscience.

[2] C. D. Aliprantis and K. C. Border. 2006. Infinite dimensional analysis: A hitchhiker’s guide, 3d. ed. Berlin: Springer–Verlag.

[3] C. D. Aliprantis and O. Burkinshaw. 1999. Problems in real analysis, 2d. ed. San Diego: Academic Press. Date of publication, 1998. Copyright 1999.

[4] T. W. Anderson. 1958. An introduction to multivariate statistical analysis. A Wiley Publication in Mathematical Statistics. New York: Wiley.

[5] T. M. Apostol. 1967. Calculus, 2d. ed., volume 1. Waltham, Massachusetts: Blaisdell.

[6] . 1969. Calculus, 2d. ed., volume 2. Waltham, Massachusetts: Blaisdell.

[7] S. Axler. 1997. Linear algebra done right, 2d. ed. Undergraduate Texts in Mathematics. New York: Springer.

[8] A. Berman and A. Ben-Israel. 1971. More on linear inequalities with applications to matrix theory. Journal of Mathematical Analysis and Applications 33(3):482–496. DOI: http://dx.doi.org/10.1016/0022-247X(71)90072-2

[9] K. C. Border. 2001. More than you wanted to know about quadratic forms. On-line note. http://www.hss.caltech.edu/~kcb/Notes/QuadraticForms.pdf [10] C. Carathéodory. 1982. Calculus of variations, 2d. ed. New York: Chelsea. This was originally published in 1935 in two volumes by B. G. Teubner in Berlin as Variationsrech- nung und Partielle Differentialgleichungen erster Ordnung. In 1956 the first volume was edited and updated by E. Hölder. The revised work was translated by Robert B. Dean and Julius J. Brandstatter and published in two volumes as Calculus of variations and partial differential equations of the first order by Holden-Day in 1965–66. The Chelsea second edition combines and revises the 1967 edition.

[11] J. Dieudonné. 1969. Foundations of modern analysis. Number 10-I in Pure and Applied Mathematics. New York: Academic Press. Volume 1 of Treatise on Analysis.

[12] J. Franklin. 2000. Matrix theory. Mineola, New York: Dover. Reprint of 1968 edition published by Prentice-Hall, Englewood Cliffs, N.J.

[13] D. Gale. 1989. Theory of linear economic models. Chicago: University of Chicago Press. Reprint of the 1960 edition published by McGraw-Hill.

[14] F. R. Gantmacher. 1959. Matrix theory, volume 1. New York: Chelsea.

[15] P. R. Halmos. 1974. Finite dimensional vector spaces. New York: Springer–Verlag. Reprint of the edition published by Van Nostrand, 1958.

[16] P. N. Klein. 2013. Coding the matrix: Linear algebra through computer science applica- tions. Newton, Massachusetts: Newtonian Press.

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 60

[17] L. H. Loomis and S. Sternberg. 1968. Advanced calculus. Reading, Massachusetts: Addison–Wesley.

[18] S. MacLane and G. Birkhoff. 1993. Algebra, 3d. ed. New York: Chelsea.

[19] L. W. McKenzie. 1960. Matrices with dominant diagonals and economic theory. In K. J. Arrow, S. Karlin, and P. Suppes, eds., Mathematical Methods in the Social Sciences, 1959, number 4 in Stanford Mathematical Studies in the Social Sciences, chapter 4, pages 47–62. Stanford, California: Stanford University Press.

[20] F. J. Murray. 1941. An introduction to linear transformations in Hilbert space. Number 4 in Annals of Mathematics Studies. Princeton, New Jersey: Princeton University Press. Reprinted 1965 by Kraus Reprint Corporation, New York.

[21] C. R. Rao. 1973. Linear statistical inference and its applications, 2d. ed. Wiley Series in Probability and Mathematical Statistics. New York: Wiley.

[22] H. Theil. 1971. Principles of econometrics. New York: Wiley.

v. 2016.12.30::16.10 Index ∠ xy, 14 of an orthogonal matrix, 48 recursive computation, 49 addition diagonal matrix, 28 of scalars, 1 dimension, 6 adjoint, 20 dominant diagonal matrix, 33 existence, 20 dot product, 8 adjugate, 50 dual space, 6 algebraic dual, 6 algebraic, 6 alien cofactor, 50 topological, 6, 14 angle, 14 Apostol, Tom, 41n eigenspace, 24 eigenvalue, 24 basis, 5 multiplicity of, 24 Hamel, 5 v. characteristic root, 52 ordered, 7 eigenvector, 24 standard, 6 elementary row operations, 29 bilinear, 7 Euclidean n-space, 8 Boolean field, 1 field, 1 Cauchy sequence, 9n, 16 scalar, 1 characteristic polynomial, 43, 52 form, 41 characteristic root, 44, 52 frame, 7 v. eigenvalue, 52 cofactor, 48 Gaussian elimination algorithm, 29 column space, 27 for computing determinant, 55–58 complete inner product space, 9 for matrix inversion, 55–58 complex numbers, 1 Gershgorin’s Theorem, 32 complex vector space, 2 GF(2), 1 coordinate mapping, 7 Gram–Schmidt procedure, 11 coordinates, 5 Hilbert space, 9, 15, 16, 20 dependence, linear, 5 homomorphism, 6 determinant, 45, 46 = product of characteristic roots, 52 identity matrix, 28 and invertibility, 48, 51 independence, linear, 5 and row operations, 47 inner product, 7, 40 as oriented volume, 53 inner product space expansion by alien cofactors, 50 complete = Hilbert, 9 Jacobi’s Theorem, 51 inverse, 19 of a linear transformation, 47 isometry, 23 of a product, 47 isomorphism, 7 of a transpose, 47 Jacobi’s theorem on determinants, 51 of a triangular matrix, 47

61 KC Border Quick Review of Matrix and Real Linear Algebra 62 kernel, 18 norm, 8 null space, 18 L(V,W ), 6 nullity, 18 ℓp, 3 left inverse, 19 operator norm, 15 linear combination, 4 ordered basis, 7 linear dependence, 5 orthogonal complement, 10 linear functional, 6 orthogonal matrix, 36 discontinuous, 17 determinant of, 48 linear independence, 5 orthogonal projection, 12 linear operator, 6 orthogonal transformation, 22 linear space = vector space, 2 is norm-preserving, 22 linear subspace, 4 preserves inner products, 22 linear transformation, 6 orthogonality, 10 idempotent, 23 orthonormality, 10 matrix representation of, 30 parallelogram law, 9 matrix, 27 permutation, 46 characteristic polynomial, 43 signature, 46 determinant of, 45 Principal Axis Theorem, 36 diagonal, 28 principal minor of order p, 48 identity, 28 quadratic form, 41 inverse, 32 and eigenvalues, 42 main diagonal, 28 definite, semidefinite, 41 nonsingular, 43 orthogonal, 36 Rn, 2 product, 27 rank, 18 representation of a linear transformation, rational numbers, 1 30 real numbers, 1 scalar multiple of, 27 real vector space, 2 similarity of, 35, 39 right inverse, 19 singular, 43 row space, 27 sum, 27 trace, 38 scalar, 1 transpose, 32 scalar addition, 1 triangular, 28 scalars, 1 zero, 28 self-adjoint transformation, 23 minor, 48 signature of a permutation, 46 minor determinant, 48 similar matrices, 35, 39 minor submatrix, 48 skew-symmetric transformation, 23 M(m, n), 3, 27 span, 4 multilinear form, 44 standard basis, 6 alternating, 44 sum, 1 multiplication symmetric transformation, 23 of scalars, 1 Todd, John, 41n

v. 2016.12.30::16.10 KC Border Quick Review of Matrix and Real Linear Algebra 63 topological dual, 6 trace, 38 transpose, 20 unit coordinate vectors, 2

V ′, 6 V ∗, 6 vector, 2 vector space, 2 trivial, 3 vector subspace, 4

v. 2016.12.30::16.10