<<

Engineering 1–Summer 2012

The commercial message. Linearity is a key concept in mathematics and its applications.Linear objects are usually nice, smooth, clean, easy to handle. Nonlinear ones are slimy, temperamental, and full of dark corners in which monsters can hide. The world, unfortunately, has a strong tendency toward non-linearity, but there is no way that we can understand nonlinear phenomena without first having a good understanding of linear ones. In fact, one of the most common first approaches to a non-linear problem may be to approximate it by a linear one.

1 Welcome to the world of linear algebra: Vector Spaces

Vector spaces, also known as a linear spaces, come in two flavors, real and complex. The main difference between them is what is meant by a . When working with real vector spaces, a scalar is a real . When working with complex vector spaces, a scalar is a . The important thing is not to mix the two flavors. You either work exclusively with real vector spaces, or exclusively with complex ones. Well . . . ; nothing is that definite. There are times when one wants to go from one flavor to the other; but that should be done with care. One thing to remember, however, is that real are also part of the complex universe; a is just a complex number with zero imaginary part. So when working with complex vector spaces, real numbers are also scalars because they are also complex numbers.

So what is a ? We could give a vague definition saying it is an environment that is linear algebra friendly. But we need to be more precise. Linear operations are very basic ones, so as a better definition we can say that a vector space is any set of objects in which we have defined two operations: 1. An , so that if we have any two objects in that set, we know what their sum is. And that sum should be also an element of the set. 2.A ; it should be possible to multiply objects of the set by scalars to get (usually) new objects in the set. If by scalars we mean real numbers, then we have a real vector space; if we mean complex numbers, we have a complex vector space.

There is some freedom in how these operations are defined, but only a little bit. The operations should have the usual properties we associate with sums and products. Before giving examples, we need to be a bit more precise.

In general, I will use lower case boldface letters to stand for vectors, a vector being simply an object in a set that has been identified as a vector space, and regularly written lower case letters for scalars (real or complex numbers). If V is a vector space, if v, w are elements of V (in other words, vectors), then their sum is denoted by (written as) v + w, which also has to be in V . If a is a scalar, v an element of V , then av denotes the scalar product of a times v. It should be an element of V .

Summary of the information so far. By scalar we mean a complex or a real number. We should be aware that if our scalars are real numbers, they should always be real; if complex, always complex. A vector (or linear) space is any set of objects in which we have defined a sum and a scalar product satisfying some (soon to be specified) properties. In this context, the elements of this set are called vectors. So we can say that a vector set is a bunch of vectors closed under addition and scalar multiplication, a vector is an element of a vector space.

Finally!, here are the basic properties these operations have to satisfy. For a set V of objects to deserve to be called a vector space it is, in the first place necessary (as mentioned already several times) that if v, w ∈ V , then we have defined v + w ∈ V , and if a is a scalar, v ∈ V , we have defined av ∈ V . These operations must satisfy:

1. (Associativity of the sum) If v, w, z ∈ V , then (v + w) + z = v + (w + z). This allows us to write the sum of any number of objects (vectors) without using parentheses: If v, w, y, z ∈ V , it makes sense to write v + w + y + z. One person may compute this sum by adding first v + w, then y + z, finally adding these 1 WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES 2

two together. Another person may first compute w + y, then add v in front to get v + (w + y), finally add z to get (v + (w + y)) + z. The result is the same. 2. (Commutativity of the sum) If v, w ∈ V , then v + w = w + v. 3. (Existence of 0) There exists a unique element of V , usually denoted by 0, such that v + 0 = v, for all v ∈ V . 4. (Existence of an additive inverse) If v ∈ V , there exists a unique element −v ∈ V such that v + (−v) = 0. This allows us to define subtraction; one writes v − w for v + (−w).

5. (Distributivity 1) If v, w ∈ V , a ∈ R, then a(v + w) = av + aw. 6. (Distributivity 2) If a, b ∈ R, and v ∈ V , then (a + b)v = av + bv. 7. (Associativity 2) If a, b ∈ R, v ∈ V , then a(bv) = (ab)v = b(av). 8. (One is one) 1v = v for all v ∈ V . (Here 1 is the scalar number 1.) These properties have consequences. The main consequence is that one operates with vectors as with numbers, as long as things make sense. So, for example, in general, one can’t multiply a vector by a vector (there will be particular exceptions). But otherwise, what seems right is usually right. Here are some examples of what is true in any vector space V . Any scalar times the 0 vector is the zero vector. In symbols: a0 = 0. It is not one of the eight properties listed above, but it follows from them, It is also true that the scalar 0 times any vector is the zero vector: 0v = 0. One good thing is that all the linear spaces we will consider will be quite concrete, and all the properties of the operations quite evident. At least, so I hope. What this abstract part does is to provide a common framework for all these concrete sets of objects.

Here are a number of examples. It might be a good idea to check that in every case the 8 properties hold or, at the very least, convince yourself that they hold.

• Example 1. The stupid vector space. (But, since we are scientists and engineers, we cannot use the word stupid. We must be dignified. It is usually called the trivial vector space.) It is a silly example, but it needs to be seen. It is the absolute simplest case of a linear space. The space has a single vector, the 0 vector. Addition is quite easy; all you need to know is that 0 + 0 = 0. So is scalar multiplication: If a is a scalar, then a0 = 0. And −0 = 0.

The trivial vector space can be either real or complex. The next set of examples consist of real vector spaces.

• Example 2. The next vector space, just one degree above the previous one in complexity, is the set R of real numbers. Here the real numbers are forced to play a double role, have something like a double personality: On the one hand they are the numbers they always are, the scalars. But they also are the vectors. If a, b ∈ R (I write them in boldface to emphasize that now they are acting as vectors), one defines a + b as usual. And if a ∈ R and c is a scalar, then ca is just the usual product ca of c times a, but now interpreted as a vector. • Example 3. Our main examples of real vector spaces are the spaces known as Rn, where n is a positive integer. We already saw R1; it is just R. Now we will meet the next one: the space R2 consists of all pairs of real numbers; in symbols: 2 R = {(a, b): a, b are real numbers}. As we all know, we can identify the elements of R2 with points in the ; once we fix a system of cartesian coordinates, we identify the pair (a, b) with the point of cartesian coordinates (a, b). Operations are defined in a more or less expected way:

(a, b) + (c, d) = (a + c, b + d), (1) a(c, d) = (ac, ad). (2)

Verifying associativity, commutativity, and distributivity, reduces to the usual associative, commutative, and distributive properties of the operations for real numbers. The 0 element is 0 = (0, 0); obviously

(a, b) + 0 = (a, b) + (0, 0) = (a + 0, b + 0) = (a, b). 1 WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES 3

The additive inverse is also easy to identify, −(a, b) = (−a, −b):

(a, b) + (−a, −b) = (a + (−a), b + (−b)) = (0, 0) = 0.

These operations have geometric interpretations. I said that R2 can be identified with points of the plane. But another interpretation is to think of the elements of R2 as being vectors, we represent a pair (a, b) as an arrow beginning at(0, 0) and ending at the point of coordinates (a, b). Then we can add the vectors by the parallelogram rule: Complete the parallelogram having two sides determined by the vectors you want to add. The diagonal of the parallelogram from the origin is the sum of the vectors. The picture shows how to add graphically a = (5, 2) and b = (3, 7) to get a + b = (8, 9).

In many applications one needs free vectors, vectors that do not have their origin at the origin. We can then think of the pair (a, b) as an arrow that can start at any point we wish, but once we fix the starting point, the end-point, the tip of the arrow, is a units in the x-direction, b-units in the y-direction, from the starting point. This gives us an alternative way of adding vectors: To add a and b, place the origin of b at the end of a. Then a + b is the vector starting where a starts, ending where b ends. Here is a picture of the sum of the same two vectors as before done by placing the beginning of one at the end of the other. In black it is a + b, in red b + a. The picture shows that the end result is the same. 1 WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES 4

It is also easy to see what b − a should be graphically. It should be a vector such that when it follows a we get b. In the parallelogram construction, it is the other diagonal of the parallelogram.

What happens if a and b are parallel? Drawing a parallelogram can be a bit of a problem, but following one vector by the other is no problem. For example if a = (a1, a2) and b = (−a1, −a2), then a, b have the same length and point in exactly opposite directions. If you place b starting where a ends, you cancel out the effect of a. The sum is a vector starting and ending at the same point, the 0 vector. Of course, analytically, a + b = (a1 − a1, a2 − a2) = (0, 0) = 0.

The next picture is a graphic illustration of the associative property of the sum of vectors. 1 WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES 5

There is also a graphic interpretation of scalar multiplication. Thinking of these vectors as arrows, one frequently reads that a vector is an object that has a and a sense of direction. So does a weather vane and a lot of animals; I mention this to point out how vague this definition is. But it is a useful way of thinking about vectors. A vector is a magnitude or length, pointing in some direction. Multiplying by a positive scalar (number) keeps the direction the same, but multiplies the length of the vector by the scalar. If the scalar is negative, the length gets multiplied by the of the scalar, and the vector gets turned around so it points in the exact opposite direction from where it was pointing before. If the scalar is 0, you√ get the zero vector. Incidentally, the magnitude or length of the vector of components (a, b) is |(a, b)| = a2 + b2 (as Pythagoras has decreed!) It is good to keep the graphic interpretation in mind for the applications, but when it comes to doing computations seeing vectors in the plane as simply pairs of numbers, as R2, makes for a better, more precise, more efficient way of proceeding.

• Example 3. The space R3. This is the space of all triples of real numbers; in symbols;

3 R = {(a, b, c): a, b, c are real numbers}.

It is similar to R2, just one additional component. The operations are (a, b, c) + (d, e, f) = (a + d, b + e, c + f), (3) r(a, b, c) = (ra, rb, rc). (4)

The 0 element is 0 = (0, 0, 0); obviously

(a, b, c) + 0 = (a, b, c) + (0, 0, 0) = (a + 0, b + 0, c + 0) = (a, b, c).

The additive inverse is also easy to identify, −(a, b, c) = (−a, −b, −c):

(a, b, c) + (−a, −b, −c) = (a + (−a), b + (−b), c + (−c)) = (0, 0, 0) = 0.

We can think of these vectors as points in 3-space (after a system of cartesian coordinates has been set up) or as free “vectors” in 3-space. In this second interpretation, v = (a, b, c) is an “arrow” that we can start from any point we wish, as long as the end-point is a units in the x-direction, b in the y-direction, c in the z-direction, away from its beginning. To add two vectors graphically we can still follows one vector by the other one. Or we can start them both from the same point. If they are not parallel they will determine a plane, and on this plane we can use the rule of the parallelogram to add them. If they are parallel, work on any of the infinite number of planes√ that contain the two vectors. The length or magnitude of v = (a, b, c) is again given by Pythagoras: |v| = a2 + b2 + c2. 1 WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES 6

• Example 4. Why stop at 3? If n is a positive integer, we denote by Rn the set of all n-tuples of real n numbers. A typical element of R will be denoted by a = (a1, . . . , an). With a bit of imagination we might be able to imagine spaces of any number of dimensions; then the elements of Rn can be thought of as points in an n-dimensional space. Isn’t this exciting? The operations are

(a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, a2 + b2, . . . , an + bn), (5)

c(a1, . . . , an) = (ca1, ca2, . . . , can). (6)

The 0 element is 0 = (0,..., 0); obviously, if a = (a1, . . . , an), then | {z } npositions

a + 0 = (a1, . . . , an) + (0,..., 0) = (a1, . . . , an) = a.

The additive inverse is also easy to identify, −(a1, . . . , an) = (−a1,..., −an).

• Example 5. What if we try something a bit more exotic? Suppose we again take R2, pairs of real numbers, but define (out of perversity)

(a, b) + (c, d) = (ac, bd), r(a, b) = (ra, rb).

Well, it won’t work. Commutativity and associativity still hold. There even is something acting like a zero element, namely (1, 1). In this crazy definition, (a, b) + (1, 1) = (a1, b1) = (a, b). Most elements even have “additive” inverses; in this definition (a, b) + (1/a, a/b) = (a(1/a), b(1/b)) = (1, 1), which is the zero element. But “most” is not enough! It has to be ALL. And any pair of which at least one component is 0 has no “additive” inverse. For example, (3, 0) + (c, d) in this strange bad addition works out to (3c, 0) and can never be (1, 1), no matter what (c, d) is. This is not a vector space.

• Example 6. Matrices. A is a rectangular array of numbers. In this arrangement, the horizontal levels are called rows, each vertical grouping is a column. If the matrix has m rows and n columns, we say it is an m × n matrix. Here is a picture copied (stolen?) from Wikipedia, illustrating some basics

The entries of a matrix are usually denoted by double subindices; the first subindex denotes the row, the second the column in which the entry has been placed. The Wikipedia matrix seems to have no end to the right or downwards. If I were to present an example of an abstract m × n matrix (as I am about to do), I’d write it up as follows:   a11 a12 a13 ··· a1n  a21 a22 a23 ··· a2n     . . . . .   . . . . .  am1 am2 am3 ··· amn One surrounds the array of numbers with parentheses (or square brackets), so as to make sure they stay in place. I also dispense with commas between the subindices in the abstract notation; I write a12 rather than a1,2 for example. Wherever this could cause confusion, I would add commas. For example in the unlikely event that we have to deal with a matrix in which m or n (or both) are greater than 10, talking of an element a123 is confusing. It could be a1,23 or a12,3. Commas are indicated. 1 WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES 7

We make a vector space out of m × n matrices, which I’ll denote by Mm,n, by defining addition and scalar multiplication in what could be said to be the obvious way, element wise:     a11 a12 a13 ··· a1n b11 b12 b13 ··· b1n  a21 a22 a23 ··· a2n   b21 b22 b23 ··· b2n    +    . . . . .   . . . . .   . . . . .   . . . . .  am1 am2 am3 ··· amn bm1 bm2 bm3 ··· bmn   a11 + b11 a12 + b12 a13 + b13 ··· a1n + b1n  a21 + b21 a22 + b22 a23 + b23 ··· a2n + b2n  =    . . . . .   . . . . .  am1 + bm1 am2 + bm2 am3 + bm3 ··· amn + bmn for the sum, and     a11 a12 a13 ··· a1n ca11 ca12 ca13 ··· ca1n  a21 a22 a23 ··· a2n   ca21 ca22 ca23 ··· ca2n  c   =    . . . . .   . . . . .   . . . . .   . . . . .  am1 am2 am3 ··· amn cam1 cam2 cam3 ··· camn

for the product by the scalar c. The same definitions, written in a more compact way, are: If A = (aij) and B = (bij) are two m × n matrices, then

A + B = (aij + bij), cA = (cai,j).

It is quite easy to check that with these operations Mm,n is a vector space. The 0 vector in this space is the zero matrix, the matrix all of whose entries are 0. The additive inverse of (aij) is (−aij). Just to make sure we are on the same page, here are a few examples. These could be typical beginning linear algebra exercises. 1. Let  1 −3 5   −5 2 4   0 1 2   2 0 1   6 7 8   −6 8 11  A =   ,B =   ,C =   .  −7 7 4   −1 −1 0   2 4 3  0 5 0 4 4 4 0 0 1 Evaluate 2A − 3B + 5C. Solution.  17 −7 8   −46 19 33  2A − 3B + 5C =    −1 37 23  −12 −2 −7  1 2 0 2   −3 1 4 0  2. Let A = and let B = . Solve the matrix equation 3 0 −4 2 1 1 2 2 A + 5X = B.

Solution. 1  −4/5 −1/5 4/5 −2/5  X = (B − A) = . 5 −2/5 1/5 6/5 0  0 1   0 0 0  3. Evaluate A + B if A = and B = . 1 0 0 0 0

Solution. IT CANNOT BE DONE! IMPOSSIBLE! These matrices belong to different worlds, and different worlds cannot collide. Later on, we’ll see that these two matrices can be multiplied (once we define ); more precisely, AB makes sense, while BA doesn’t. But for now let’s keep in mind that matrices of different types canNOT be added (or subtracted). 1 WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES 8

• Example 7. Functions. I’ll just consider one case. Let I be an interval (open, closed, bounded, or not) in the real line and let V be the set of all (real-valued) functions of domain I. If f, g are functions we define the function f + g as the function whose value at x ∈ I is the sum of the values of f at x and of g at x; in symbols (f + g)(x) = f(x) + g(x). We define the scalar product cf (if c is a real number, f a function on I) as the function whose value at x ∈ I is c times the value of f at x; in symbols,

(cf)(x) = cf(x).

It is easy to see that we have a vector space in which 0 is the constant function that is identically 0, and if f ∈ V , then −f is the function whose value at every x ∈ I is −f(x).

Every one of the examples has its complex counterpart, all we need to do is allow complex numbers as our scalars. Here are examples 2, 3,4, 6, and 7, redone as complex vector spaces. In each case the real vector space is a subset of the complex one.

• Examples 2’, 3’, 4’. The vector space Cn consists of all n-tuples of complex numbers. The operations are defined by

(a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, a2 + b2, . . . , an + bn), (7)

c(a1, . . . , an) = (ca1, ca2, . . . , can). (8)

The 0 element is 0 = (0,..., 0); obviously, if a = (a1, . . . , an), then | {z } npositions

a + 0 = (a1, . . . , an) + (0,..., 0) = (a1, . . . , an) = a.

The additive inverse is, as in the real case, −(a1, . . . , an) = (−a1,..., −an). • Example 6’. A complex matrix is a rectangular array of complex numbers. We will denote the set of all m×n complex matrices by Mm,n(C). Addition and scalar multiplication are defined as in the real case, except that we allow complex scalars. As an example (one of a zillion trillion possible ones), let A, B ∈ M3,2(C) be defined by √  1 + i 3   5 −8  A =  2 − 5i 0  ,B =  −2 1  1 0 0 1 then (see if you get the same result!)  √ √  √ √ 3 + (5 + √3)i 3 + 8i 3A − iB =  2 3 + (2 − 5 √3)i −i  3 −i

• Example 7’. Let I be an interval in the real line and consider all complex valued functions on I. In other words, we consider√ all functions of the form f(t) = u(t) + iv(t), where u(t), v(t) are real valued functions of domain I, and i = −1. If f(t) = u1(t) + iv1(t) and g(t) = u2(t) + iv2(t), where u1, u2, v1, v2 are real valued functions of domain I, we define f + g in the natural way. (f + g)(t) = u1(t) + u2(t) + i(v1(t) + v2(t)). If f(t) = u(t) + iv(t) and c = a + ib is a complex number (a scalar), where a, b are real numbers and u, v are real valued functions of domain I, we define cf to be the function

(cf)(t) = c(f(t)) = (a + ib)(u(t) + iv(t)) = au(t) − bv(t) = i(av(t) + bu(t)).

We have again a vector space, a complex one this time. 2 WHAT CAME FIRST, THE CHICKEN OR THE EGG? 9

1.1 Exercises 1. Which of the following sets V, with the operations as defined, is a (real) vector space? In each case either verify all properties (and this includes identifying the zero element), or determine at least one property that fails to hold. (a) V is the set of all upper triangular n × n matrices, with addition and scalar product defined in the usual way. A square matrix is upper triangular if and only if all entries beneath the main diagonal are 0. (b) V = {(x, y, z) ∈ R3 : 2x − 3y + 4z = 1}, with the same operations as elements of R3. (c) V = {(x, y, z) ∈ R3 : 2x − 3y + 4z = 0}, with the same operations as elements of R3. (d) V = {(x, y): x, y ∈ R, x > 0} with operations defined as follows: (x, y) + (x0, y) = (xx0, y + y0), r(x, y) = (xr, ry),

if (x, y), (x0, y0) ∈ V , and r is a real scalar. (e) I = (a, b) is an interval and V is the set of all (real valued) differentiable functions of domain I. That is, f ∈ V if and only if f(x) is defined for all x ∈ I and the derivative f 0(x) exists st each x ∈ I. The operations are defined as usual for functions; that is, if f, g are functions on I, then the function f + g on I is the function whose value at any point x is f(x) + g(x); that is, (f + g)(x) = f(x) + g(x). And if f is a function, c a scalar, then cf is defined by (cf)(x) = cf(x). (f) V is the set of all (functions); a polynomial being an expression of the form

n p(x) = anx + ··· + a1x + a0,

where a0, . . . , an are real numbers. Operations (addition and scalar multiplication) as usual (g) V is the set of all of degree higher than 2, plus the 0 polynomial.

(n) (n−1) 0 2. Let y + an−1(t)y + ··· + a1(t)y + a0(t)y = 0 be a linear homogeneous differential equation of degree n with coefficients a0(t), . . . , an−1(t) continuous functions in some open interval I of R. Show that the set of solutions of this equation in I is a real vector space. Operations on solutions are defined as usual (see Exercise 1e) 3. Show that the set of complex numbers is a real vector. That is, if we forget about multiplying two complex non-real numbers and only consider products of the form cz, where c is real and z is complex, then C is a real vector space.

2 What came first, the chicken or the egg?

In his book “God or Golem. A Comment on Certain Points Where Cybernetics Impinges on Religion,” Norbert Wiener (a famous mathematician of the first half of the twentieth century, who among many other things coined the word cybernetics from the Greek word for steersman) reexamines the age old precedence question. He concludes that the question is meaningless, since both carry the same information. While we usually see an egg as the device by which a chicken produces another chicken, Wiener argues that we may as well consider a chicken as the means by which an egg produces another egg. If two objects have the same information, they are the same. The point of this, when applied to vector spaces, is that SO FAR, the information carried by an m × n matrix is the same as that carried by a row of mn numbers, or of mn numbers written out in any way we please.

Here is something one can do with m × n matrices. I can take an m × n matrix and write out the elements in a row (or column) of mn elements in a variety of ways. For example, I can write out first the elements of the first row, followed by those of the second row, and so forth. To illustrate I’ll use the case of m = 2, n = 3, but a similar thing holds for all m, n. Suppose

 1 2 3   0 6 7  A = ,B = 0 −2 4 2 0 −1 3 SUBSPACES 10 and we want to compute 2A + 3B. Instead of doing it the usual way, we could proceed as follows. First we write both A, B as rows of numbers by the procedure I mentioned above. We call these new objects A0,B0:

A0 = (1, 2, 3, 0, −2, 4),B0 = (0, 6, 7, 2, 0, −1).

A0,B0 are vectors in R6. Operating on them by the rules of R6 we get 2A0 + 3B0 = (2, 22, 27, 6, −4, 5).

Now we can reverse the process that we used to get A0,B0 and get

 2 22 27  2A + 3B = 6 −4 5

0 0 mn The difference between A and A (and B and B ) is just one of convenience. As vector spaces, Mm,n and R are mn identical. The same is true of Mm,n(C) and C . Later on we’ll se that there is a good reason for writing numbers out as arrays, rather than as rows or columns. And that frequently we want to write the elements of Rn or Cn in the form of columns, rather than rows.

3 Subspaces

Given a vector space, inside of it there are usually sets that are themselves vector spaces with the vector operations. For example, in R3 consider the set V = {(x, y, 0) : x, y real numbers.}.

Is it a vector space? The only thing that could possibly go wrong is that the vector operations can take us out of V . Or that some element of V might not have its additive inverse (the thing you get putting a minus sign in front) in V . Or maybe there is no 0 in V . Or maybe scalar product takes us out of V . But none of these things happens. 0 = (0, 0, 0) is in V ; it is the case x = y = 0. The description of V in words is: V is the set of all triples of real numbers in which the last component is 0. Well, adding two such triples produces another such triple, since 0 + 0 = 0. Given such a triple, its additive inverse is of the same type, since −0 = 0. Multiplying by a scalar such a triple, produces another one with third component 0, since c · 0 = 0 for all scalars. So yes, V is a vector space. We say it is a subspace of R3. Here is a precise definition: Assume V is a vector space. A subset W of V is a subspace of V if (and only if) 1. 0 ∈ W . 2. If v, w ∈ W , then v + w ∈ W .(W is closed under addition.) 3. If v ∈ W and c is a scalar, then cv ∈ W .(W is closed under scalar multiplication.)

With these properties we can work in W without ever having to leave it (except if we want to leave it). We did not ask for W to contain additive inverses of elements because we get that for free. In fact, if W is a subspace, if v ∈ W , then −v = (−1)v; since −1 is a scalar (both in the real and complex case), (−1)v must be in W , meaning −v ∈ W . Here are more examples.

E1. Let W = {(x, y, z, w): x, y, z, w are real numbers andx + 2y − 3z + w = 0}

This is a subset of R4. Is it a subspace of R4? To answer this question, we have to check the three properties mentioned above. Is 0 ∈ W ? For R4, 0 = (0, 0, 0, 0), so x = 0, y = 0, z = 0, w = 0 and, of course, 0 + 2 · 0 − 3 · 0 + 0 = 0. Yes, 0 ∈ W . Next, we have to check that if v, w ∈ W , then v + w ∈ W . For this we assume v = (x, y, z, w), w = (x0, y0, z0, w0) where x, y, z, w, x0, y0, z0, w0 satisfy x + 2y − 3z + w = 0 and x0 + 2y0 − 3z0 + w0 = 0. Then v + w = (x + x0, y + y0, z + z0, w + w0) 3 SUBSPACES 11

and the question is whether (x + x0) + 2(y + y0) − 3(z + z0) + (w + w0) = 0. Using a bit of high school mathematics we see that

(x+x0)+2(y+y0)−3(z+z0)+(w+w0) = x+x0+2y+2y0−3z−3z0+w+w0 = x+2y−3z+w+x0+2y0−3z0+w0 = 0+0 = 0.

We conclude that v + w ∈ W . Finally, suppose v ∈ W , say v = (x, y, z, w) where x + 2y − 3z + w = 0. Suppose c is a scalar (a real number since we are working with a real vector space). Then cv = (cx, cy, cz, cw) and cx + 2cy − 3cz + cw = c(x + 2y − 3z + w) = c · 0 = 0. It follows that cv ∈ W . The conclusion is that W is indeed a subspace of R4. E2 Since real numbers are just complex numbers with the imaginary part equal to 0, it is (or should be) clear that for any integer n > 0, Rn is a subset of Cn. Is it a subspace? At first it seems to be. The zero element, the n-tuple with all components 0, is in Rn. Adding two vectors of Rn results in a vector of Rn. But our scalars are now complex numbers; multiplying a vector of Rn by a scalar will almost always take us out of Rn. Here is an example with n = 3. take, for example (1, 2, 3) ∈ R3. Take for scalar the imaginary unit i. Then i(1, 2, 3) = (i, 2i, 3i) ∈/ R3.

Rn is a subset but not a subspace of Cn. E3. (The largest and smallest subspaces.) Suppose V is a vector space. Then V satisfies all the properties of being a subspace of V . Every vector space is a subspace of itself. V is, of course, the largest possible subspace of V . For the smallest one, consider the set W = {0}, the set containing only the zero element of V . Since it contains the zero element, it satisfies the first property of being a subspace. If v, w ∈ W , well that can only be if both are the zero element and then so is their sum; the second property holds. And, since any scalar times the zero element is the zero element, the third property holds also. This is necessarily the smallest possible subspace, known as the trivial subspace.

E4. Think of R3 as points of three space. That is, by setting up a system of cartesian coordinates, we can identify each vector of R3 with a point of space. One can then show that precisely the following subsets of R3 are subspaces: • The origin as set of a single point; the trivial subspace. • All straight lines through the origin. • All planes that go through the origin. • R3. E5. If A is an m×n matrix, then the of A, denoted by AT , is the n×m matrix whose rows are the columns of A (and, therefore, its columns are the rows of A). The formal description is: If A = (aij)1≤i≤m,1≤j≤n, T T then A = (bij)1≤i≤n,1≤j≤m with bij = aji for all i, j. Briefly: If A = (aij), then A = (aji). For example, if

 1 0 2 − 3i −5  A =  √1 2 3 4  , − 5 −2 + 7i 8 9

then √  1 1 − 5  T  0 2 −2 + 7i  A =    2 − 3i 3 8  −5 4 9 It is an easy exercise to show that if A, B are m × n matrices, then (A + B)T = AT + BT ; if A is an m × n matrix, then (cA)T = c AT for all scalars c. If m = n; that is, when dealing with square matrices, it is possible to have AT = A. Matrices verifying this property are called symmetric. Let n be a positive integer and let V = Mn,n or Mn,n(C) (either the real or the complex space of all square n × n matrices). Let W = {A ∈ V : A = AT }. Then W is a subspace of V . I leave the simple checking of the properties as an exercise. 3 SUBSPACES 12

E6. Let b be a real number. Let W = {(x, y): x, y are real numbers and x + y = b}. There is exactly one value of b for which W so defined is a subspace of R2. Find that value and explain why it is the only value that works. E7. Let I be an interval in the real line and let V be the set of all real valued functions of domain I. As seen earlier, this is a vector space with the usual operations. Let W be the set of all continuous functions of domain I. Because the function that is identically 0 is continuous (about as continuous as can be), it is in W . Since it is also the zero element of V , W contains the 0 element of V . In 1 we learn (and one hopes remember) that the sum of continuous functions is continuous, and that when we multiply a continuous function by a scalar (real number) the result is again continuous. We conclude W is a subspace of V . A few concluding remarks (for now) about subspaces. I hope you realize that we don’t always have to call them W . We could even call the vector space W and the subspace V . Or by any other convenient name; one should try never to get hung up on notation. Example E4 shows a typical fact; there is a certain level among subspaces. For example, we have lines and planes; nothing in between. This is due to the fact that vector spaces have a dimension, as we will see in a while.

3.1 Exercises

1. Which of the following are subspaces of R3. Justify your answer. (a) All vectors of the form (a, 0, 0). (b) All vectors of the form (a, 1, 1). (c) All vectors of the form (a, b, c), where a = b + c. (d) All vectors of the form (a, b, c), where a = b + c + 1.

2. Which of the following are subspaces of M2,2. Justify. (a) All 2 × 2 matrices with integer entries. (b) All matrices  a b  c d where a + b + c + d = 0. (c) (d) All matrices  a b  . −b a

3. Let I = (a, b) be an interval in R and let V be the vector space of all real valued functions on I, as seen in the notes it is a vector space with addition and scalar multiplication defined as usual. Let c, d be points in I: a < c < d < b. Determine which of the following are subspaces of V . (a) The set of all continuous bounded functions on I; that is the set of all continuous f on I such that there exists some number M (depending on f) such that |f(x)| ≤ M for all x ∈ I. R d (b) The set of all continuous functions on I such that c f(x) dx = 0. R d (c) The set of all continuous functions on I such that c f(x) dx = 1 R x (d) The set of all continuous functions on I such that c f(t) dx = f(x) for all x ∈ I. (e) All solutions of the differential equation

dny dn−1y dy a (t) + a (t) + ··· + a (t) + a (t)y = 0 n dtn n−1 dtn−1 1 dt 0

where a0, a1, . . . , an are continuous functions on I. 4 MORE ON MATRICES 13

4 More on matrices

As we saw, the set of m × n matrices is a vector space; real if we restrict ourselves to real entries; complex if we allow complex numbers. But there is more to matrices than just being a vector space. Matrices can be multiplied. Well, not always; sometimes. We can form the product AB of the matrix A times the matrix B if and only if the number of columns of A equals the number of rows of B. Here is the basic definition. It is best done using the summation symbol P.

Suppose A is an m × n matrix and B is an n × p matrix, say

A = (aij)1≤i≤m,1≤j≤n,B = (bjk)1≤j≤n,1≤k≤p.

Then AB = (cik)1≤i≤m,1≤k≤p, where

n X cik = aijbjk = ai1b1k + ai2b2k + ··· + ainbnk. j=1

In words: The element of AB in position (i, k) is obtained as follows. Only the i-th row of A and the k-th column of B are involved. We multiply the first component of the i-th row of A by the first component of the k-th row of B, add to this the product of the second component or entry of the i-th row of A to the second one of the k-th row of B, add to this the product of the third entry of the i-th row of A to the third entry of the k-th row of B, and so forth. Since each row of A has n components, and each column of B has n-components, it all works out, and we end by adding the product of the last entries of the i-th row of A and k-th column of B. Here are a number of examples and exercises. Please, verify that all examples are correct!

• Example.  0 1 2 4   1 2 3   8 −4 4 27  1 −1 1 1 = 4 5 6   17 −7 13 63 2 −1 0 7

 −3   4  • Example. Let A = (1 − 3 4 0), B =  .  5  7 That is A is 1 × 4, B is 4 × 1. Then AB will be 1 × 1. We identify a 1 × 1 matrix with its single entry; that is, we don’t enclose it in parentheses.

AB = 1(−3) + (−3)(4) + (4)(5) + (0)(7) = 5,

 −3 9 −12 0   4 −12 16 0  BA =   .  5 −15 20 0  7 −21 28 0

• Exercise. Let A be a 7 × 8 matrix and B an 8 × 4 matrix. Suppose all entries in the third row of A are zero. Explain why all entries in the third row of AB will be 0.

Matrix product behaves a lot like an ordinary product of numbers; that is WITHIN REASON! and an important exception, given an equation involving matrices, if it is true when one substitutes matrices for numbers, then it will also be true for matrices. Specifically the following properties hold. • (Associativity of the product) Briefly (AB)C = A(BC). But the products have to make sense! In a more detailed way: If A is an m × n matrix, B is n × p and C is p × q, then (AB)C = A(BC). In this case AB is an m × p matrix and it can be multiplied by C, which is p × q, to produce an m × q matrix (AB)C. On the other hand, BC is an n × q matrix; we can multiply on the left by A to get an m × q matrix A(BC). These two m × q matrices are one and the same. This property is not hard to verify, but it can get messy. 4 MORE ON MATRICES 14

• (Distributivity), Briefly written as

A(B + C) = AB + AC (A + B)C = AC + BC.

Once again, all operations must make sense. For the first equality, B,C must be of the same type, say n × p. If A is m × n, then A(B + C), AB, AC, AB + AC, are all defined; the property now makes sense. The second equality assumes implicitly that A, B are of type m × n, and C of type n × p. This property is easy to verify.

• If c is a scalar, A of type m × n, B of type n × p, then

A(cB) = cAB = (cA)B.

Very easy to verify. Is it true that AB = BA? This question doesn’t even make sense in most cases. For example if A is 3 × 5 and B is 5 × 2, then AB is defined (it is 3 × 2), but BA is not. Even if both AB and BA are defined, the answer is obviously no. For example if A is 3 × 4 and B is 4 × 3, then both AB and BA are defined, but AB is 3 × 3 and BA is 4 × 4; they are most definitely not equal. The question becomes more interesting for square matrices; if A, B are square matrices of the same type, say both are n × n, then AB, BA are both defined, both n × n and potentially equal. But usually they are not. Here are examples: 1.  1 2   5 6   19 22  = 3 4 7 8 43 50

 5 6   1 2   23 34  = 7 8 3 4 31 46

2.  1 0 −1   0 3 4   2 1 4   0 3 4   1 0 −1  =  −5 8 −3  −2 2 0 −2 2 0 2 −6 −10

 0 3 4   1 0 −1   −8 17 12   1 0 −1   0 3 4  =  3 −2 −1  −2 2 0 −2 2 0 −2 6 10

3.  1 −1 2   2 −3 3   11 −11 10   2 −3 3   1 −1 2   −2 1 −1   −3 2 −3  =  −10 11 −11  =  −3 2 −3   −2 1 −1  1 −2 1 3 −3 2 11 −10 11 3 −3 2 1 −2 1

In general AB 6= BA but, as the third example shows, there are exceptions. For example, every square matrix naturally commutes with itself: If A = B, then AB = AA = BA. If A is a square matrix, we write A2 for A, A3 for AAA = AA2 = A2A, etc. There are also square matrices that commute with every square matrix. The two most notable examples are the zero matrix and the identity matrix (about to be defined). Suppose 0 is the n × n zero matrix. Then, for every square n × n matrix A we have A0 = 0 = 0A; the square zero matrix commutes with all square matrices. The identity matrix is usually denoted by I, or by In if its type is to be emphasized. It is the square matrix having ones in the main diagonal, all other entries equal to 0. If n = 2, then

 1 0  I = I = ; 2 0 1 4 MORE ON MATRICES 15 if n = 3, then  1 0 0  I = I3 =  0 1 0  ; 0 0 1 etc. The general definition is I = (δij)1≤i,j≤n, where  0 if i 6= j, δ = ij 1 if i = j. It is called the identity matrix because as is immediately verified, if A is any square matrix of the same type as I, then IA = A = AI. Please, verify this on your own; convince yourself it holds and try to understand why it holds. In particular, I commutes with all square matrices of its same type. That is, In commutes with all n × n matrices. But there is a bit more. Let A be an m × n matrix. Then

ImA = A = AIn. (Verify this property.)

Exercise. Let M be a square n × n matrix. Show that MA = AM for all n × n matrices A if and only if M = cI, where c is a scalar and I = In is the n × n identity matrix. (If c = 0, then M = 0, if c = 1 then M = I. In general M would have to be a matrix having all off-diagonal entries equal to 0, all diagonal entries the same.) As a hint, showing that if M = cI, then MA = AM for all n × n matrices should be very easy. For the converse, experiment with different matrices A. For example, what does the equation MA = AM tell you assuming that A has all entries but one equal to zero. For example, if A = (aij) and aij = 0 if i 6= 1, j 6= 2, a1,2 = 1, if M = (mij), then one can see that the first row of AM is equal to the second row of M, while all other rows have only zero entries. On the other hand, in MA one sees that the second column of MA is the first column of M, all other columns are zero columns:       0 1 0 ··· 0 m11 m12 m13 ··· m1n m21 m22 m23 ··· m2n  0 0 0 ··· 0   m21 m22 m23 ··· m2n   0 0 0 ··· 0      =   ,  . . . . .   . . . . .   . . . . .   . . . . .   . . . . .   . . . . .  0 0 0 ··· 0 mn1 mn2 mn3 ··· mnn 0 0 0 ··· 0

      m11 m12 m13 ··· m1n 0 1 0 ··· 0 0 m11 0 ··· 0  m21 m22 m23 ··· m2n   0 0 0 ··· 0   0 m21 0 ··· 0      =    . . . . .   . . . . .   . . . . .   . . . . .   . . . . .   . . . . .  mn1 mn2 mn3 ··· mnn 0 0 0 ··· 0 0 mn1 0 ··· 0 Because M is supposed to commute with all matrices, it will commute with this selected one. Equating the two product results, we see that m21, m23, . . . , m2n, m31, m41, . . . , mn1 must all be 0, while m11 = m22. We are well on our way.

From now on, instead of writing Mn,n I will write simply Mn. Thus Mn is the set of all n × n matrices with real entries and Mn(C) is the set of all n × n matrices with complex entries. The space Mn (as well as Mn(C) has some very nice properties. Under addition and scalar multiplication it is a vector space; but that is true of all sets of matrices of the same type. But it is also closed under matrix multiplication: If A, B ∈ Mn then AB, BA are defined and in Mn. The same holds for Mn(C). So in Mn (and in Mn(C) we can add and multiply any two matrices and never leave the set. We can, of course, also subtract; A − B is the same as A + (−1)B. Can we also divide? This is a very natural question; if A, B ∈ Mn(C), what, if anything, should A/B be? We could say that it should be a square n × n matrix such that when multiplied by B we get A back. Here we run into a first problem; multiplication not being commutative. Should we ask for (A/B)B = A, for B(A/B) = A, or for both? As it turns out this approach has more problems than one might think at first. Here is a very simple example. Suppose  0 0   0 1  A = 0 = ,B = 0 0 0 0 4 MORE ON MATRICES 16

One could say that obviously A/B = 0/B = 0. In fact if you multiply 0 by B (either on the right or on the left), you get A = 0. But here is a quaint fact. Notice that B2 = 0, the 2 × 2 zero matrix. Multiplying B by B itself from the left or from the right will also result in A. Should 0/B = B?

The problem, in general, is that given square matrices A, B there could be more than one matrix that can act like A/B; more than one matrix C such that CB = BC = A. Or there could be none. For example, if in the previous example we replace A by the identity matrix I, leaving B as it is, then I/B is undefined; there is no matrix C such that CB = BC = I. Can you prove this? Due to this one does things in a different way. We have an identity I; we could start trying to define I/B and then define A/B as either A(I/B) or (I/B)A. Well this also has its problems, but they are solvable.

Problem 1. There are many matrices B ∈ Mn(C) for which I/B won’t make sense. That is, matrices B for which there is no C such that CB = BC = I. Solution. Define I/B only for the matrices for which it makes sense. Problem 2. Even if I/B makes sense, there is no guarantee that for any given matrix A we will have (I/B)A = A(I/B). So what should A/B be?

Solution. Forget about A/B, just think of dividing by B on the left and dividing by B on the right.

Let’s begin to be precise. Let A ∈ Mn(C). We say A is invertible if (and only if) there exists B ∈ Mn(C) such that AB = I = BA. In this case one can show that B is unique and one denotes it by A−1. That “B is unique” means: “for a given A there may or there may not exist such a B, but there cannot exist more than one such matrix.” This uniqueness is actually very easy to prove, and here is a proof. Those of you who hate proofs and prefer to accept the instructor’s word for everything, please ignore it. Proof that there can only be one matrix deserving to be called A−1. Suppose there is more than one such matrix, so there are at least 2 matrices, call them B,C, such that

AB = BA = I and AC = CA = I.

Then B = BI = B(AC) = (BA)C = IC = C. End of proof.

What is however amazing, and harder to prove, is that if A, C ∈ Mn(C), and AC = I, then this suffices to make A invertible and C = A−1. That is, in this world of matrices where commutativity is so rare, it suffices to have AC = I to conclude that we also have CA = I and C = A−1. So in verifying that a matrix C is the inverse of A, we don’t have to check that both CA and AC are the identity. If AC = I, then we are done; CA will also be I and C = A−1. Of course, the same is true if CA = I; then AC = I and C = A−1.

Deciding whether a general n × n matrix is invertible and, if it is, finding its inverse is actually a matter of looking at n2 equations, and solving them, or showing they can’t be solved. If we were to try to do this at our current level of knowledge, it would be a boring, difficult, messy exercise. We will develop methods to do this efficiently. However, to give you an idea of what can be involved if one attacks the problem without any more tools at hand, let us try to decide if the matrix  1 2 2  A =  2 3 1  1 0 1 is invertible, and find its inverse. We are asking, in other words, whether there is a 3 × 3 matrix B such that

 1 2 2   2 3 1  B = I. 1 0 1 4 MORE ON MATRICES 17

It is enough if there is a matrix B that works on the right; as mentioned, it will also work from the left. Well, B will look like  r s t  B =  u v w  x y z and the question becomes can we find r, s, t, u, v, w, x, y, z solving

 1 2 2   r s t   1 0 0   2 3 1   u v w  =  0 1 0  . 1 0 1 x y z 0 0 1

If we can find such r, s, etc., then we have our inverse. If there is some reason why this is impossible, then there won’t exist an inverse. In the last matrix we can perform the product on the left and the question now becomes: Can we find r, s, t, u, v, w, x, y, z solving

 r + 2u + 2x s + 2v + 2y t + 2w + 2z   1 0 0   2r + 3u + x 2s + 3v + y 2t + 3w + z  =  0 1 0  . r + x s + y t + z 0 0 1

This is equivalent to solving 9 = 32 equations for r, s, t, u, v, w, x, y, z:

r + 2u + 2x = 1 s + 2v + 2y = 0 t + 2w + 2z = 0 2r + 3u + x = 0 2s + 3v + y = 1 2t + 3w + z = 0 r + x = 0 s + y = 0 t + z = 1

These equations are not as hard as one might think; still solving the system, or verifying that there is no solution, is work. But let’s do it!. Maybe you want to do it on your own; it might make you appreciate more the methods we’ll develop later on. So solve the system, then come back to compare your solution with mine. From the equation r + x = 0, we get x = −r. Using this in the equation 2r + 3u + x = 0 and solving for u, we get u = −r/3. Using these values for x, u in the first equation we get r = −3/5. We now have our possible first column. 3 1 3 r = − , u = , x = . 5 5 5 Similarly we find the other columns. From s + y = 0, we get y = −s. From 2s + 3v + y = 1 we get v = (1 − s)/3; from s + 2v + 2y = 0 we get s = 2/5 so that 2 1 2 s = , v = , y = − . 5 5 5 Finally, from the equations t + z = 1, 2t + 3w + z = 0, t + 2w + 2z = 0 we get 4 3 1 t = , w = − , z = . 5 5 5 If our calculations are correct, the matrix A is invertible and

 − 3 2 4  5 5 5     −3 2 4 −1  1 1 3  1 A =  −  =  1 1 −3   5 5 5  5   3 −2 1 3 2 1 5 − 5 5 5 SYSTEMS OF LINEAR EQUATIONS. 18

But to be absolutely sure, one should multiply A by its supposed inverse (on either side0 and see that one gets the identity matrix. One does. Here is an easier exercise. Show that the matrix  −1 2 0  A =  −3 0 2  0 1 2 is invertible and that  −2 −4 4  1 A−1 = 6 −2 2 14   −3 1 6 Do we have to go through the same process as before, finding the inverse of A and then seeing it works out to the given value. Only if we are gluttons for pain! The obvious thing to do is to multiply A by its assumed inverse and see that we get the identity. We see that

 −1 2 0   −2 −4 4  1 −3 0 2 · 6 −2 2   14   0 1 2 −3 1 6  −1 2 0   −2 −4 4   14 0 0  1 1 = −3 0 2 6 −2 2 = 0 14 0 = I. 14     14   0 1 2 −3 1 6 0 0 14

And we are done. The matrix given as A−1 is indeed the inverse of A, and since A has an inverse, it is invertible.

4.1 Exercises 1. Show that if A is invertible, then A−1 is invertible and (A−1)−1 = A.

T T −1 −1 T 2. Show that if A ∈ Mn(C) is invertible, so is A and (A ) = (A ) . −1 −1 −1 3. Show that if A, B ∈n (C) are invertible, so is AB and (AB) = B A . 2 −1 4. Let A ∈ Mn(C) and assume A = 0. Prove that I + A is invertible and that (I + A) = I − A. Note: It is possible for A2 to be the zero matrix without A being the zero matrix. For example, for n = 3, the following two matrices have square equal to 0. Neither is the zero matrix. There are, of course, many others in the same category.  0 0 1   1 1 −2   0 0 0  ,  1 1 −2  0 0 0 1 1 −2

5 Systems of linear equations.

We now come to one of the many reasons linear algebra was invented, to solve systems of linear equations. In this section I will try to summarize most of what you need to know about systems of equations of the form:

a11x1 + a12x2 + ··· + a1nxn = b1 a21x1 + a22x2 + ··· + a2nxn = b2 . . . . . (9) ··· . ··· . ··· . ··· . . am1x1 + am2x2 + ··· + amnxn = bm

This is a system of m equations in n unknowns. The unknowns are usually denoted by x1, . . . , xn, but the notation can change depending on the circumstances. For example, if n = 2; that is, if there are only two unknowns, then one frequently write x for x1 and y for x2. If there are only three unknowns, one frequently denotes them by x, y, z rather than x1, x2, x3. Less frequently, if one has four unknowns, one denotes them by x, y, z, w. For five or more unknowns one usually uses subindices; since one can too easily run out of letters. 5 SYSTEMS OF LINEAR EQUATIONS. 19

The coefficients aij are given numbers; they can be real or complex. The same holds true for the right hand side entries b1, . . . , bn. Most of my examples will be in the real case, but all that I say is valid also in the complex case (except if it obviously is not valid!). Once a single complex non-real number enters into the picture, one is in complex mode.

Solving a system like (9) consists in finding an n-tuple of numbers x1, x2, . . . , xn such that when plugging it into the equations, all equations are satisfied. For example, consider the system of equations

3x + 2x − x = 1 1 2 3 (10) −x1 + x2 + x3 = 5

Here m = 2, n = 3. Then x1 = 0, x2 = 2, x3 = 3 is a solution. In fact, 3 · 0 + 2 · 2 − 3 = 1 −0 + 2 + 3 = 5

But that isn’t all the story. There are more solutions, many more. For example, as one can verify,

x1 = 9, x2 = −4, x3 = 18 also is a solution. And so is x1 = −6/5, x2 = 14/5, x3 = 1. And many more. We will rarely be content with finding a single solution, in most cases we will want to find ALL solutions. I anticipate here that the following, and only the following, can happen for a system of linear equations: • The system has exactly one solution. • The system has NO solutions. • The system has an infinity of solutions. So if someone tries to sell you a system that has exactly two solutions, don’t buy it! Once it has more than one solution, it has an infinity of solutions.

To attack systems in a rational, efficient way, we want to develop some notation and terminology. Let us return to the system (9). The matrix whose entries are the coefficients of the system; that is, the matrix   a11 a12 ··· a1n  a21 a22 ··· a2n  A =    . . . .   . . . .  am1 am2 ··· amn is called the system matrix (or matrix of the system). For example, for the system in (10), the system matrix is

 3 2 −1  A = −1 1 1

It is an m × n matrix. It will also be convenient to write n-tuples (and m-tuples, and other tuples) vertically,   2 a1 as column vectors. So we think of C as consisting of all elements of the form , where a1, a2 are complex a2   a1 3 numbers; C as consisting of all elements of the form  a2 , where a1, a2, a3 are complex numbers; and so forth. a3 In general   a1  a2  n = {  : a , a , . . . , a ∈ }. C  .  1 2 n C  .  an 5 SYSTEMS OF LINEAR EQUATIONS. 20

n n In other words, we are identifying C with the vector space of n×1 matrices; C = Mn,1(C). If working exclusively with real numbers, one can replace C by R in all of this. We call matrices that consist of a single column, column matrices or, more frequently, column vectors. Returning to our system, the m-tuple b1, . . . , bm of numbers on the   b1  b2  right hand side of the equations will give rise to a vector b =   ∈ m = M ( ). We also introduce the  .  C m,1 C  .  bm   x1  x2  unknown/solution vector x =  . Then the right hand side of the system is precisely Ax and the system  .   .  xn (9) can be written in a nice and compact way as Ax = b.

A solution of the system is now a vector x ∈ Cm such that Ax = b. Our objective is to develop an efficient method for finding all solutions to such a system. Two m × n systems of linear equations are said to be equivalent if they have exactly the same solutions. Solving a system is usually done (whether one realizes it or not) by replacing the system by a sequence of systems, each equivalent to the preceding one, until one gets a system so simple that it actually solves itself. The solutions of the final, very simple system, are the same as of the original system. The way one gets an equivalent system is by performing any of the following “operations.” 1. Interchange two equations. For example, if in the system (10) we interchange equations 1 and 2 we get

−x1 + x2 + x3 = 5 3x1 + 2x2 − x3 = 1 Nothing essential has changed; the order in which the equations are presented does not affect the solutions. 2. Multiply one equation by a non-zero constant. Nothing changes; we can go back to the original system by dividing out the constant. 3. Add to an equation another equation multiplied by a constant. Again, nothing changes; if we now subtract from the new equation the same equation we previously added, multiplied by the same constant, we are back where we were before. These are the three basic “operations” by which any system can be reduced to an immediately solvable one, one that can be solved by inspection. For example, here is how these operations affect system (10) 1. Interchange the first and second equation. The system becomes

−x1 + x2 + x3 = 5 3x1 + 2x2 − x3 = 1

2. Multiply the first equation by −1:

x1 − x2 − x3 = −5 3x1 + 2x2 − x3 = 1

3. Add, to the second equation, −3 times the first equation:

x1 − x2 − x3 = −5 5x2 + 2x3 = 16

4. Multiply the second equation by 1/5:

x1 − x2 − x3 = −5 2 16 x2 + 5 x3 = 5 The system is now in Gauss reduced form, easy to solve, but I’ll go one step further (Gauss-Jordan reduction). 5 SYSTEMS OF LINEAR EQUATIONS. 21

5. To the first equation, add the second equation:

3 9 x1 − 5 x3 = − 5 (11) 2 16 x2 + 5 x3 = 5

The last system of equations is equivalent to the first one; everything that we did can be undone. But it is also solved. We see from it, that every choice of x1, x2, x3 such that 9 3 16 2 x = − + x , x = − x 1 5 5 3 2 5 5 3 is a solution; indicating that we can select x3 arbitrarily, and then use the formulas for x1, x2. If we write the solutions in (column) vector form, we found that

 9 3  − 5 + 5 x3    16 2  x =  − x3   5 5    x3 Using our knowledge of vector operations, we can break the solution up as follows

 9   3  − 5 5      16   2  x =   + x3  −   5   5      0 1

But why have an x3 when we don’t have an x1 nor x2 written out explicitly anymore? Another way of writing the solution of (10) is as follows:

 9   3  − 5 5      16   2  x =   + c  − , where c is arbitrary. (12)  5   5      0 1

 0   9  Taking c = 3, we get the solution  2 , taking c = 18 we get  −4 . These are the two first solutions 3 18 mentioned earlier. Since there is an infinite number of choices of c, we have an infinity of solutions.

If we consider what we did here, we may realize that the only job done by the unknowns, by x1, x2, x3 in our case, was to act as placeholders. The same is true about the equal sign. If we carefully keep the coefficients in place, never mix a coefficient of x1 with one of x2, for example, we don’t really need the variables. That is, we can remove them from the original form of the system, and then bring them back at the very end. To be more precise we need to introduce the augmented matrix of the system. This is the system matrix augmented (increased) by adding the b vector as last column. We draw a line or dotted line to indicate that we have an augmented matrix. The augmented matrix of (9) is the matrix   a11 a12 ··· a1n b1  a21 a22 ··· a2n b2  (A|b) =    . . . ..   . . . ..  am1 am2 ··· amn bm Suppose we carry out the three operations on equations we mentioned above. The effect on the augmented matrix of each one of these operations is, respectively: 5 SYSTEMS OF LINEAR EQUATIONS. 22

1. Interchange two rows. We will call the operation of interchanging rows i and j of a matrix operation I(i,j). 2. Multiply a row by a non zero constant. We will call the operation of multiplying the i-th row by the constant c 6= 0 operation IIc(i). 3. Adding to a row another row multiplied by a constant. Adding to row i the j-th row multiplied by c (i 6= j) will be denoted by III(i)+c(j). These operations are the row operations. The idea is to use them to simplify the augmented matrix. Simplifying the matrix by applying row operations is known as row reduction.

Here is how we would solve (10) by working on the matrix. It is essentially what we did before, but it is less messy, and easy to program. I usually start with the augmented matrix, and then perform row operations in sequence. In some cases the order of performance doesn’t matter, so I may perform more than one simultaneously. I write arrows joining one matrix to the next. Occasionally (and for now) I will write the operations performed on top of the arrow. So here is the same old system (10) solved by row reduction; the first matrix is the augmented matrix of the system.       3 2 −1 1 I(1,2),II(−1)(1) 1 −1 −1 −5 III(2)−3(1) 1 −1 −1 −5 −→ −→ −1 1 1 5 3 2 −1 1 0 5 2 16

 1 −1 −1 −5   1 0 − 3 − 9  II 1 (2) 5 5 5 III(1)+(2) −→   −→   2 16 2 16 0 1 5 5 0 1 5 5 The last matrix is in reduced canonical form (explained below), which means we are done. We can now write out the system that has this last matrix as augmented matrix. We get

3 9 x1 − 5 x3 = − 5

2 16 x2 + 5 x3 = 5 which is exactly (11). From here we continue as before to get the solution (12).

We need to know when to stop row reducing, when the simplest possible level has been achieved. This is called row reduced echelon form or some such name; I’ll abbreviate it to RRE form. A matrix is in RRE form if

1. In each row, the first non zero coefficient is a 1. A row that does not start with a 1 will consist exclusively of 0’s. 2. All zero rows come after the non-zero rows. 3. All entries above and below the leading 1 of any given row are equal to 0.

4. If i < j, and neither row i nor row j is a zero row, then the leading 1 of row i must be in a column preceding the column containing the leading 1 of row j. Two facts are important here: • Every matrix can be brought to RRE form by row operations.

• While there is more than one way to achieve RRE form by row operations, the end result is always the same for any given matrix. It is called the RRE form of the matrix.

Maybe a few examples can clear things up. Examples 5 SYSTEMS OF LINEAR EQUATIONS. 23

1. Find all solutions of

x1 − x2 + x3 − x4 + x5 = 1

3x1 + 2x2 − x4 + 9x5 = 0

7x1 + 10x2 + 3x3 + 6x4 − 9x5 = −7 The augmented matrix of the system is  1 −1 1 −1 1 1   3 2 0 −1 9 0  7 10 3 6 −9 −7 The first objective on our way to RRE form is, by row operations, get a 1 into the (1, 1) position. In our case, the matrix already has a 1 in the correct place, so we need to do nothing. The only reason one will not be able to get a 1 into the (1, 1) position is if the first column contains only zeros (unlikely!, but possible). Then one moves over to the second column, tries to get a 1 into position (1, 2); if this fails, into position (1, 3); etc. Total failure means one has the zero matrix, which is, of course, in RRE form. If there is any non-zero entry in the first column, (or in any other column) one can get it to be in row 1 (if it isn’t there already) by interchanging row 1 with the row containing the non-zero entry; if we then multiply row 1 by the reciprocal of that non-zero entry, we have a 1 in row 1. So, at most two row operations place a 1 in row 1, if the column has at least one non-zero entry. But, as mentioned, we already have a 1 where it should be to start.

Once we have the 1 in place, we use operations of the third type to get every entry below this 1 to be 0. For example, if the first entry in row i, i > 1 is c, we perform the row operation III(i)−c(1). Applied to our matrix, it works as follows  1 −1 1 −1 1 1   1 −1 1 −1 1 1  III(2)−3(1),III(3)−7(1)  3 2 0 −1 9 0  −→  0 5 −3 2 6 −3  7 10 3 6 −9 −7 0 17 −4 13 −16 −14 The first column has been processed. Once a column has been processed, here is how on processes the next column. (a) Suppose i is the row in which the last leading 1 appeared (in one of the columns preceding the one we are about to process). Presumably that 1 is in the column just preceding the one we are about to process, but it could be before that. Is there a non-zero entry in the column to be processed, in some row strictly below i? If no, the column has been processed; move on to the next column. If none remain, you are done. If yes, get it into row i + 1 (if not already there) by interchanging the row containing it with row i + 1. Then multiply the row i + 1 by the reciprocal of that non-zero entry so as to get a 1 in position i + 1. (b) Using operations of type III, get every entry above and below this 1 to be 0. Then move on to the next column. If none remain, you are done. Applying all this to our current matrix, here is how we finish the process.  1 −1 1 −1 1 1   1 −1 1 −1 1 1  II 1 (2) 5 3 2 6 3  0 5 −3 2 6 −3  −→  0 1 − 5 5 5 − 5  0 17 −4 13 −16 −14 0 17 −4 13 −16 −14

 2 3 11 2   2 3 11 2  1 0 5 − 5 5 5 1 0 5 − 5 5 5 II III ,III   5 (3)   (1)+(2) (3)−17(2)  3 2 6 3  31  3 2 6 3  −→  0 1 − −  −→  0 1 − −   5 5 5 5   5 5 5 5      31 31 182 19 182 19 0 0 5 5 − 5 − 5 0 0 1 1 − 31 − 31  141 20  1 0 0 −1 31 31

III(1)− 2 (3),III(2)+ 3 (3)   5 5  72 30  −→  0 1 0 1 − −   31 31    182 19 0 0 1 1 − 31 − 31 5 SYSTEMS OF LINEAR EQUATIONS. 24

The matrix is now in RRE form, and in the best possible way for the case m ≤ n (in our case m = 3, n = 5): The first m columns constitute the m × m identity matrix. The system is equivalent to 141 20 x − x + = 1 4 31 31 72 30 x + x − x = − 2 4 31 5 31 182 19 x + x − x = − 3 4 31 5 31 Variables that do not correspond to the columns containing the leading 1’s; columns 4, 5 in our case, thus x4, x5, can be chosen freely. All solution of the system are thus given by 141 20 x = x − + 1 4 31 31 72 30 x = −x + x − 2 4 31 5 31 182 19 x = −x + x − 3 4 31 5 31

for arbitrary values of x4, x5. That is, for every choice of values of x4, x5 we get a solution; we have again an infinity of solutions. We can write the whole thing in vector notation as follows: The solutions of the system are given by  141 20     141   20  x4 − 31 + 31 1 − 31 31          72 30     72   30   −x4 + x5 −   −1     −   31 31     31   31          x =  182 19  = x4   + x5  182  +  19   −x4 + x5 −   −1     −   31 31     31   31                   x4   1   0   0  x5 0 1 0

A number of other cosmetic changes can be made. For example, why have x4, x5 where we don’t have x1, x2, x3? We could relabel them c1, c2. Moreover, we can try to get rid of some denominators; if x5 = c2 is arbitrary, so is x5/31. We can also write the solution in the slightly nicer way  1   −141   20   −1   72   −30      1   x = c1  −1  + c2  182  +  −19      31    1   0   0  0 31 0

for arbitrary values of c1, c2 2. For our next example, consider x + 3y + 2w = 5 5x + 15y + z + 14w = −1 x + 3y − z − 2w = 2 The augmented matrix is  1 3 0 2 5   5 15 1 14 −1  1 3 −1 −2 2 We proceed to row reduce.  1 3 0 2 5   1 3 0 2 5   1 3 0 2 5  III(2)−5(1),III(3)−(1) III(3)+(2)  5 15 1 14 −1  −→  0 0 1 4 −26  −→  0 0 1 4 −26  1 3 −1 −2 2 0 0 −1 −4 −3 0 0 0 0 −29 5 SYSTEMS OF LINEAR EQUATIONS. 25

and we are done. The system has NO solutions. It has no solutions because it is equivalent to a system in which the third equation is 0x + 0y + 0z + 0w = −29 and since the left hand side is 0 regardless of what x, y, z, w might be, it can never equal -29. 3. We saw in the previous example a system without solutions. Can it have solutions if we change the right hand sides? That is, let us try to determine all values, if any, of b1, b2, b3 for which the system

x + 3y + 2w = b1

5x + 15y + z + 14w = b2

x + 3y − z − 2w = b3 has solutions. We write up the augmented matrix and row reduce.     1 3 0 2 b1 1 3 0 2 b1 III(2)−5(1),III(3)−(1)  5 15 1 14 b2  −→  0 0 1 4 b2 − 5b1  1 3 −1 −2 b3 0 0 −1 −4 b3 − b1   1 3 0 2 b1 III(3)+(2) −→  0 0 1 4 b2 − 5b1  0 0 0 0 −6b1 + b2 + b3

The last equation makes sense if and only if −6b1 + b2 + b3 = 0, or b3 = 6b1 − b2. In this case the RRE form is   1 3 0 2 b1  0 0 1 4 b2 − 5b1  0 0 0 0 0 Notice also that the columns (among the first four) not containing leading 1’s are columns 2 and 4; thus y, w can be selected freely. The solution could be given as

x = −3y − 2w + b1

z = −4w + b2 − 5b1 or in vector form       −3 −2 b1  1   0   0  x = y   + w   +   ,  0   −4   b2 − 5b1  0 1 0 with y, w arbitrary. 4. We now find all solutions of

x1 − 2x2 + 2x3 = 1

x1 + x2 + 5x3 = 0

2x1 + 3x2 + x3 = −1

NOTE: If the number of equations is less than the number of unknowns (m < n) then there usually (but not always!) is more than one solution. More precisely, there either is no solution or an infinity of solutions. If the number of equations is more than the number of unknowns (m > n), there is a good chance of not having any solutions. Actually, anything can happen, but the most likely outcome is no solutions because there are too many conditions (equations). If the number of equations equals the number of unknowns (m = n), as it does in our current example, one has a good chance to having a unique solution. Still, anything could happen. Solving the system. We row reduce the augmented matrix.  1 −2 2 1   1 −2 2 1   1 −2 2 1  II 1 (2) III(2)−(1),III(3)−2(1) 3  1 1 5 0  −→  0 3 3 −1  −→  0 1 1 −1/3  2 3 1 −1 0 7 −3 −3 0 7 −3 −3 5 SYSTEMS OF LINEAR EQUATIONS. 26

 1 0 4 1/3   1 0 4 1/3   1 0 0 1/15  II− 1 (3) III(1)+2(2),III(3)−7(2) 10 III(1)−4(3),III(2)−(3) −→  0 1 1 −1/3  −→  0 1 1 −1/3  −→  0 1 0 −2/5  0 0 −10 −2/3 0 0 1 1/15 0 0 1 1/15 There is a unique solution:  1  15    2  x =  −   5    1 15

The following should be clear from all these examples: There exists a unique solution if and only if m ≥ n and the first n rows of the RRE form of the augmented matrix constitute the identity matrix. Since the augmented matrix has n + 1 columns, and as we row reduce it we are row reducing A, we see that this condition involves only A, not the b part. Let us state this as a theorem so we realize it is an important result:

Theorem 1 The m × n system of linear equations (9) has a unique solution for a given value of b ∈ Cm if and only if (a) m ≥ n. (b) The row reduced echelon form of the system matrix A is either the n × n identity matrix (case n = m) or the n × n identity matrix completed with m − n rows of zeros to an m × n matrix.

Moreover, since all this depends only on A, it has a unique solution for some b ∈ Cm if and only if it has a unique solution for all b ∈ Cm.

The alternative to not having a unique solution for a given b ∈ Cm is having no solutions for that b or having an infinity of solutions. 5. Suppose we want to solve two or more systems of linear equations having the same system matrix. For example, say we want to solve

x1 − x2 + x3 = 1

−x1 + x2 + x3 = 2

x1 + x2 − x3 = −5

and

x1 − x2 + x3 = −1

−x1 + x2 + x3 = 4

x1 + x2 − x3 = 0

There is no need to duplicate efforts. You may have noticed that the system matrix carries the row reductions; once the system matrix is in RRE form, so is the augmented matrix. So we just doubly augment and row reduce  1 −1 1 1 −1   −1 1 1 2 4  1 1 −1 −5 0 Here we go:

 1 −1 1 1 −1   1 −1 1 1 −1   1 −1 1 1 −1  I(2,3),II 1 (2) III(2)+(1),III(3)−(1) 2  −1 1 1 2 4  −→  0 0 2 3 3  −→  0 1 −1 −3 1/2  1 1 −1 −5 0 0 2 −2 −6 1 0 0 2 3 3 5 SYSTEMS OF LINEAR EQUATIONS. 27

 1 0 0 −2 −1/2   1 0 0 −2 −1/2   1 0 0 −2 −1/2  II 1 (3) III(1)+(2) 2 III(2)+(3) −→  0 1 −1 −3 1/2  −→  0 1 −1 −3 1/2  −→  0 1 0 −3/2 2  0 0 2 3 3 0 0 1 3/2 3/2 0 0 1 3/2 3/2 The solution to the first system is  −2     3  x =  −  ,  2    3 2 the solution to the second system is  1  − 2     x =  2  .     3 2

5.1 Exercises In Exercises 1-6 solve the systems by reducing the augmented matrix to reduced row echelon form.

1. x + y + 2z = 9 2x + 4y − 3z = 1 3x + 6y − 5z = 0

2. x1 + 3x2 − 2x3 + 2x5 = 0 2x1 + 6x2 − 5x3 − 2x4 + 4x5 − 3x6 = −1 5x3 + 10x4 + 15x6 = 5 2x1 + 6x2 + 8x4 + 4x5 + 18x6 = 6

3. x1 − 2x2 + x3 − 4x4 = 1 x1 + 3x2 + 7x3 + 2x4 = 2 x1 − 12x2 − 11x3 − 16x4 = 5

4. x1 + x2 + 2x3 = 8 −x1 − 2x2 + 3x3 = 1 3x1 − 7x2 + 4x3 = 10

5. 2x1 + 2x2 + 2x3 = 0 −2x1 + 5x2 + 2x3 = 1 8x1 + x2 + 4x3 = −1

6. − 2x2 + 3x3 = 1 3x1 + 6x2 − 3x3 = −2 6x1 + 6x2 + 3x3 = 5

7. For which values of a will the following system have no solutions? exactly one solution? Infinitely many solutions? x + 2y − 3z = 4 3x − y + 5z = 2 4x + y + (a2 − 14)z = a + 2 6 INVERSES REVISITED 28

6 Inverses Revisited

Suppose A is a square n × n matrix. It is invertible if and only if there exists a square n × n matrix X such that AX = I. In this case we’ll also have XA = I, but this is of no concern right now. We can rephrase this in terms of existence of solutions to systems of linear equations. Suppose we denote the columns of this (that may or may (1) (n) not exist) matrix X by x ,..., x . That is, if X = (xij)1≤i,j≤n, then       x11 x12 x1n  x21   x22   x2n  x(1) =   , x(2) =   , . . . , x(n) =   .  .   .   .   .   .   .  xn1 xn2 xnn

It is then easy to see (I hope you agree) that the condition AX = I is equivalent to n systems of linear equations, all with system matrix A:  Ax(1) = δ(1)   Ax(2) = δ(2) (13) .  ··· . ···   Ax(n) = δ(n) where for 1 ≤ i ≤ n, δ(i) is the column vector having all entries equal to 0, except the i-th one which is 1; that is δ(1), δ(2),..., ∆(n) are the columns of the identity matrix. We can solve all these systems simultaneously if we augment A by all these columns; in other words if we augment A by the identity matrix: (A|I).

What will happen as we row reduce? For a square matrix, the row reduced form either is the identity matrix, or there is at least one row of zeros in the RRE form. Think about it! You can figure out why this is so! Suppose we get a row of zeros in the row reduced form of A (not of the augmented matrix, but of A). The only way we can have solutions for all the n-systems is if we also have 0 for the full row in the augmented matrix. That means that a certain number of row reductions produced a 0 row in the identity matrix. This is impossible, and it is not hard to see why this is impossible. So if the RRE form of A contains a row of zeros, then A cannot be invertible; some of the equations for the columns of the inverse are unsolvable. The alternative is that the row reduced echelon form of A is the identity matrix. In this case the augmented columns contain the solutions to the equations (13); in other words the augmented part has become the inverse matrix. Let us state part of this as a theorem.

Theorem 2 A square n × n matrix is invertible if and only if its RRE form is the identity matrix.

We go to examples.

Example 1. Let us compute the inverse of the matrix we already inverted in a previous section, namely

 1 2 2  A =  2 3 1  1 0 1

We augment and row reduce:

 1 2 2 1 0 0   1 2 2 1 0 0  III(2)−2(1),III(3)−(1)  2 3 1 0 1 0  −→  0 −1 −3 −2 1 0  1 0 1 0 0 1 0 −2 −1 −1 0 1

 1 0 −4 −3 2 0   1 0 −4 −3 2 0  II−(2),II 1 (3) III(1)+2(2),III(3)−2(2) 5 −→  0 −1 −3 −2 1 0  −→  0 1 3 2 −1 0  0 0 5 3 −2 1 0 0 1 3/5 −2/5 1/5  1 0 0 −3/5 2/5 4/5  III(1)+4(3),III(2)−3(3) −→  0 1 0 1/5 1/5 −3/5  0 0 1 3/5 −2/5 1/5 6 INVERSES REVISITED 29

As before, the inverse is  −3/5 2/5 4/5  −1 A =  1/5 1/5 −3/5  3/5 −2/5 1/5

Example 2. Find the inverse of  0 3 −3 1   2 4 −4 0     2 1 −1 1  3 0 1 −3

Solution. By row reduction. As usual, row operations are performed in the order they are written.

 0 3 −3 1 1 0 0 0   1 2 −2 0 0 1/2 0 0  I(1,2),II 1 (1)  2 4 −4 0 0 1 0 0  2  0 3 −3 1 1 0 0 0    −→    2 1 −1 1 0 0 1 0   2 1 −1 1 0 0 1 0  3 0 1 −3 0 0 0 1 3 0 1 −3 0 0 0 1

 1 2 −2 0 0 1/2 0 0   1 2 −2 0 0 1/2 0 0  III(3)−2(1),III(4)−3(1)  0 3 −3 1 1 0 0 0  III(3)+(2),III(4)+2(2)  0 3 −3 1 1 0 0 0  −→   −→    0 −3 3 1 0 −1 1 0   0 0 0 2 1 −1 1 0  0 −6 7 −3 0 −3/2 0 1 0 0 1 −1 2 −3/2 0 1  1 2 −2 0 0 1/2 0 0   1 0 0 −2/3 −2/3 1/2 0 0  II 1 (2) III 3  0 1 −1 1/3 1/3 0 0 0  (1)−2(2)  0 1 −1 1/3 1/3 0 0 0  −→   −→    0 0 0 2 1 −1 1 0   0 0 0 2 1 −1 1 0  0 0 1 −1 2 −3/2 0 1 0 0 1 −1 2 −3/2 0 1  1 0 0 −2/3 −2/3 1/2 0 0   1 0 0 −2/3 −2/3 1/2 0 0  III ,II I (2)+(3) 1 (4) (4,3)  0 1 −1 1/3 1/3 0 0 0  2  0 1 0 −2/3 7/3 −3/2 0 1  −→   −→    0 0 1 −1 2 −3/2 0 1   0 0 1 −1 2 −3/2 0 1  0 0 0 2 1 −1 1 0 0 0 0 1 1/2 −1/2 1/2 0  1 0 0 0 −1/3 1/6 1/3 0  III(1)+ 2 (4),III(2)+ 2 (4),III(3)+(4) 3 3  0 1 0 0 8/3 −11/6 1/3 1  −→    0 0 1 0 5/2 −2 1/2 1  0 0 0 1 1/2 −1/2 1/2 0

The answer is  1 1 1  − 3 6 3 0    8 11 1   3 − 6 3 1  −1   A =    5 1   −2 1   2 2    1 1 1 2 − 2 2 0 Example 3. Invert  1 2 3  A =  4 5 6  . 7 8 9

Solution. By row reduction,

 1 2 3 1 0 0   1 2 3 1 0 0  III(2)−4(1),III(3)−7(1)  4 5 6 0 1 0  −→  0 −3 −6 −4 1 0  7 8 9 0 0 1 0 −6 −12 −7 0 1 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 30

 1 2 3 1 0 0  III(3)−2(2) −→  0 −3 −6 −4 1 0  0 0 0 1 −2 1 A row of zeros has developed in the row reduction of A; the systems of equations whose solutions are the columns of the inverse of A are not solvable. The matrix A is not invertible.  a b  Example 4. Show that a 2 × 2 matrix is invertible if and only if ad − bc 6= 0. c d Solution. We may have to consider two cases, a 6= 0, a = 0. Suppose first a 6= 0. We can then divide by a and row reduce as follows

  II 1     a b 1 0 (1) 1 b/a 1/a 0 III(2)−c(1) 1 b/a 1/a 0 −→a −→ c d 0 1 c d 0 1 0 d − cb/a −c/a 1

If d − (cb/a) = 0, we are done, there is no inverse. Now d − (cb/a) = (ad − bc)/a 6= 0 if and only if ad − bc 6= 0. We see that if ad−bc = 0, there is no inverse. On the other hand, if ad−bc 6= 0, we can divide by (ad−bc)/a:

  II a (2)   1 b/a 1/a 0 ad−bc 1 b/a 1/a 0 −→ 0 d − cb/a −c/a 1 0 1 −c/(ad − bc) a/(ad − bc)

 1 bc b   d b  III 1 0 + − 1 0 − (1)− b (2) a a(ad−bc) ad−bc ad−bc ad−bc a −→   =   c a c a 0 1 − ad−bc ad−bc 0 1 − ad−bc ad−bc It follows that if a 6= 0, then 1  d −b  A−1 = ad − bc −c a  0 b  Suppose now a = 0 so A = and ad − bc = −bc. If ad − bc = 0; that is (in this case) if b or c equals c d 0, then A has a zero row or a zero column; it clearly can’t have an inverse. On the other hand if bc 6= 0, the inverse we found above makes sense; it is 1  d −b  A−1 = − bc −c 0 Now 1  d −b   0 b  1  −bc 0  − = − = I, bc −c 0 c d bc 0 −bc proving A is invertible and A−1 is given by the same formula as for a 6= 0.

6.1 Exercises To come

7 Linear dependence, independence and bases

We return to our study of vector spaces. In this section we develop some of the fundamental concepts related to a vector space. Remember that a scalar is either a complex number or a real number and that one either assumes that we are only going to allow real numbers as scalars, and our vector spaces are real vector spaces, or we will open the door to non-real numbers, and we are now dealing with complex vector spaces.

For all the definitions that follows, assume V is a vector space. It could be any of the examples we gave earlier, or anything else that qualifies as being a vector space.

A of vectors v1,..., vk ∈ V is any vector of the form

c1v1 + ··· + ckvk 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 31

where c1, . . . , ck are scalars. We do allow the case k = 1; a linear combination is then just cv1; c a scalar. Example. In R3 consider the vectors  1   −1  v1 =  −3  and v2 =  0  2 1 Show that the vectors  0   1  0 =  0  , v =  −2  0 1  1  are linear combinations of v1, v2, while w =  1  is not. 1 Solution. Given any number of vectors, the zero vector is always a linear combination of these vectors; we just have to take the scalars involved equal to 0. In our case 0 = 0v1 + 0v2. To show v is a linear combination of v1, v2 reduces to showing that there exists scalars c1, c2 such that v = c1v1 + c2v2; writing this out componentwise, it works out to         1 1 −1 c1 − c2  −2  = c1  −3  + c2  0  =  −3c1  . 1 2 1 2c1 + c2 In other words, we have to show the system of equations

c1 − c2 = 1

−3c1 = −2

2c1 + c2 = 1 has a solution. Analyzing similarly the situation for w, we have to prove that the system

c1 − c2 = 1

−3c1 = 1

2c1 + c2 = 1 does not have a solution. We can do it all at once, trying to solve both systems simultaneously since the have the same system matrix. We row reduce  1 −1 1 1   −3 0 −2 1  2 1 1 1

Performing the following row operations III(2)+3(1), III(3)−2(1), and then III(3)+(2), one gets  1 −1 1 1   0 −3 1 4  0 0 0 3 This already shows that the system for which the vector b is the last column of the matrix cannot have a solution. In other words, this show that w is not a linear combination of v1, v2. Dropping the last column, we can solve for c1, c2 for the case of v. We keep row reducing performing II 1 followed by III(1)+(2) to get − 3 (2)  1 0 2/3   0 1 −1/3  0 0 0

We obtained a unique solution c1 = 2/3, c2 = −1/3. One can verify that, in fact,  2 + 1      3 3   1 −1   1 2 1  2   −3  −  0  =  −3  =  −2  = v. 3 3  3  2 1   1 2 1 2 3 − 3 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 32

 1   −2  Another example. Express v =   as a linear combination of the following five vectors v1, v2, v3, v4, v5,  3  4 or show it isn’t possible.

 1   0   1   1   1   2   2   0   −1   0  v1 =   , v2 =   , v3 =   , v4 =   , v5 =    0   0   0   1   1  0 3 −1 1 0

Solution. We have to find scalars c1, c2, c3, c4, c5 such that v = c1v1 + c2v2 + c3v3 + c4v4 + c5v5. Written out in terms of components, this means finding c1, c2, c3, c4, c5 such that     1 c1 + c3 + c4 + c5  −2   2c1 + 2c2 − c4    =   ;  3   c4 + c5  4 3c2 − c3 + c4 in other words, solving the system of equations

c1 + c3 + c4 + c5 = 1

2c1 + 2c2 − c4 = −2

c4 + c5 = 3

3c2 − c3 + c4 = 4

As usual, we’ll do this setting up the augmented matrix and row reducing. The augmented matrix is:

 1 0 1 1 1 1   2 2 0 −1 0 −2     0 0 0 1 1 3  0 3 −1 1 0 4

You will notice that the columns of the augmented matrix are the vectors v1, v2, v3, v4, and v. Noticing this saves some time next time one has to do this. Lets proceed with the row reduction.

 1 0 1 1 1 1   1 0 1 1 1 1   2 2 0 −1 0 −2  III(2)−2(1)  0 2 −2 −3 −2 −4    −→    0 0 0 1 1 3   0 0 0 1 1 3  0 3 −1 1 0 4 0 3 −1 1 0 4

 1 0 1 1 1 1   1 0 1 1 1 1  II 1 (2) III 2  0 1 −1 −3/2 −1 −2  (4)−3(2)  0 1 −1 −3/2 −1 −2  −→   −→    0 0 0 1 1 3   0 0 0 1 1 3  0 3 −1 1 0 4 0 0 2 11/2 3 10  1 0 1 1 1 1   1 0 0 −7/4 −1/2 −4  II 1 (4) III ,III 2  0 1 −1 −3/2 −1 −2  (1)−(4) (2)+(4)  0 1 0 5/4 1/2 3  −→   −→    0 0 0 1 1 3   0 0 0 1 1 3  0 0 1 11/4 3/2 5 0 0 1 11/4 3/2 5  1 0 0 −7/4 −1/2 −4   1 0 0 0 5/4 5/4  III ,III ,III I (1)+ 7 (4) (2)− 5 (4) (3)− 11 (4) (3,4)  0 1 0 5/4 1/2 3  4 4 4  0 1 0 0 −3/4 −3/4  −→   −→    0 0 1 11/4 3/2 5   0 0 1 0 −5/4 −13/4  0 0 0 1 1 3 0 0 0 1 1 3 The solution is given by 5 5 3 3 5 13 c = − c + , c = c − , c = c − , c = −c + 3, 1 4 5 4 2 4 5 4 3 4 5 4 4 5 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 33

with c5 being arbitrary. This gives us an infinite number of ways of expressing v as a linear combination of v1, v2, v3, v4, v5. For example, selecting c5 = 0, we get the representation 5 3 13 v = v − v − v + 3v . 4 1 4 2 4 3 4

You might want to take a minute or so and verify that this representation is correct. Or, we can take c5 = 1 to get

v = −2v3 − 2v4 + v5.

Again, take a minute or so to see this works. Or, one can take c5 = −1 and get 5 3 9 v = v − v − v + 4v − v . 2 1 2 2 2 3 4 5 And an infinity more.

Notice the difference between the two examples. In the first one there was a unique choice for the scalar coefficients (or no choice at all). In the second example, there is an infinity of choices. This is because the set of vectors in the first example were linearly independent, while those of the second example were not. Let us define what this means.

Assume again that we have a vector space V and vectors v1,..., vm in V . We say the vectors v1,..., vm are linearly dependent if there exist scalars c1, c2, . . . , cm, not all 0, such that

c1v1 + ··· + cmvm = 0.

The vectors are linearly independent if they are not linearly dependent.

In more words than symbols: Given a set of vectors v1,..., vm, the zero vector 0 can always be obtained as a linear combination of these vectors by taking c1 = 0, c2 = 0, . . . , cm = 0; all coefficients equal to 0. If this is the ONLY way one can get the zero vector, the vectors are said to be linearly independent. If there is another way, with at least one of the coefficients not zero, they are linearly dependent.

A few obvious things to notice: If a set of vectors contains the zero vector, they are automatically linearly dependent. In fact, say v1 = 0; then whatever v2,..., vn may be, we will have c1v1 + ··· + cnvn = 0 if we take c2 = 0, . . . , cn = 0. and c1 = 1 (or any other non-zero number). Given two vectors v1, v2 they are linearly independent if and only if one is not equal to the other one times a scalar. In fact, if (say) v1 = cv2, then v1 + cv2 = 0 and since the coefficient of v1 is 1 6= 0, we clearly have linear dependence. If, on the other hand, we have c1v1 + c2v2 = 0 and not both c1, c2 are 0, then we can divide out by the non-zero coefficient and solve. For example, if c1 6= 0, we can divide by c1 and solve to v1 = −(c2/c1)v2, so v1 is a scalar times v2. A bit less obvious, perhaps, is that vectors v1,..., vm are linearly dependent if and only if one of the vectors is a linear combination of the others. In fact, if we have c1v1 + ··· cmvm = 0 and one of the cj’s is not 0, we can divide it out and solve to get vj a linear combination of the remaining vectors. For example, if cm 6= 0 we get     c1 cm−1 vm = − v1 + ··· + − vm−1. cm cm Conversely, if one vector is a linear combination of the others, we can get at once a linear combination of all equal to 0 in which the coefficient of the vector that was a combination of the others is 1 (or −1). For example, if

v2 = c1v1 + c3v3 + ··· + vm, then c1v1 + (−1)v2 + c3v3 + ··· + vm = 0 and, of course, −1 6= 0. Maybe we should have this as a theorem for easy reference:

Theorem 3 Vectors v1,..., vm of a vector space V are linearly dependent if and only if one of the vectors is a linear combination of the others. Equivalently, the vectors v1,..., vm are linearly independent if and only if none of the vectors is a linear combination of the others. 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 34

Here is another example. Example Let the vector space V be now the space of all (real valued) functions defined on the real line. Show that the following sets of vectors are linearly independent.

1. f1, f2 where f1(t) = sin t, f2(t) = cos t.

t 2t 3t 4t 2. g1, g2, g3, g4 where g1(t) = e , g2(t) = e , g3(t) = e , g4(t) = e . Solution. The first one is easy. Since we only have two vectors, we can ask whether one is a constant times the other. Is there a scalar c such that either f1 = cf2, which means sin t = c cos t FOR ALL t, or f2 = cf1, which means cos t = c sin t FOR ALL t. Of course not!. If sin t = c cos t, we simply have to set t = π/2 to get the rather strange conclusion 1 = 0, so f1 = cf2 is out. So is f2 = cf1; we can’t have cos t = c sin t for all t because for t = 0 we get 1 = 0.

t 2t 3t 4t The second one is a bit harder; we have to show that the only way we can get c1e + c2e + c3e + c4e = 0 to hold for ALL t is by taking c1 = c2 = c3 = c4 = 0. We can set up a system of impossible equations by judiciously giving values to t, but I’ll postpone the somewhat simpler solution for later, once we have developed a few more techniques.

Here comes another definition. Let v1,..., vm be vectors in a vector space V . The span of v1, . . . , vm is the set of all linear combinations of these vectors. I will denote the span of v1,..., vm by sp(v1,..., vm). Vectors that will be for sure in the span of vectors v1,..., vm include: The zero vector; as mentioned, it is a linear combination of any bunch of vectors, just use 0 coefficients. The vectors v1,..., vm themselves; to get vj as a linear combination use cj = 1, all other coefficients equal to 0. And, in general, many more: v1 + 2v2, −v1, v1 + ··· + vm; etc., etc., ad infinitum. Here are a few examples. Examples

1. Let us start with the simplest example; we have only one vector and that vector is the zero vector. What can 0 span? Well, not much; multiplying by any scalar we always get 0. The span of the zero vector is the set consisting of the zero vector alone; a set of a single element. In symbols, sp(0) = {0}. 2. On the other hand if V is a vector space, v ∈ V , v 6= 0, then the span of v consists of all multiples of v; (v) = {cv : c a scalar}.

3. Remember the example we did above: In R3 consider the vectors  1   −1  v1 =  −3  and v2 =  0  2 1 Show that the vectors  0   1  0 =  0  , v =  −2  0 1  1  are linear combinations of v1, v2, while w =  1  is not. 1 We could now rephrase it in terms of spans; in solving it we showed that with the vectors defined as in the example, 0, v ∈ sp(v1, v2), while w ∈/ sp(v1, v2).

Here is a simple theorem.

Theorem 4 Assume v1,..., vm are vectors in a vector space V . Then sp(v1,..., vm) is a subspace of V . 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 35

The reason why this theorem is true is, I hope, clear. First of all, the zero vector is in the span of v1,..., vm. It is also clear (I hope once more) that adding two linear combinations of these vectors, or multiplying such a linear combination by a scalar, resolves into another linear combination of the same vectors. That’s it. ( We will refer to the span v1,..., vm) of vectors v1,..., vm also as the subspace of V spanned by v1,..., vm, and call {v1,..., vm} a spanning set for this subspace. A subspace can be spanned by different sets of vectors. In fact, except for the pathetically small subspace consisting of the zero vector by its lonesome, every other subspace of a vector space will have an infinity of different spanning sets. Consider, for example, one of the simplest cases, suppose v is a non-zero vector in a vector space V . Let W =( v). Then W = {cv : c a scalar}. But it is, or should be clear, that anything that is a multiple of v is also a multiple of any non-zero multiple of v. That is, suppose d is any non zero scalar and we set w = dv. Any multiple of v is a multiple of w, and vice-versa: If x = cv, then x = (c/d)w; if x = cw, then x = (cd)v. Any multiple of v also spans W .

Generally speaking, if there is one any spanning set, there is an infinity of them. But some spanning sets are better than others. They have less fat,less superfluous elements. Say we are in a vector space V and W is a subspace spanned by the vectors v1,..., vm. If one of these vectors happens to be a linear combination of the remaining ones, who needs it? For example, suppose

vm = a1v1 + ··· + αm−1vm−1 for some scalarsa1, . . . , am.

Then any linear combination involving all vectors can be rewritten as one without vm: c1v1+···+cmvm = c1v1+···+cm−1vm−1+cm (a1v1 + ··· + αm−1vm−1) = (c1+cma1)v1+···+(cm−1+cmam−1)vm−1.

That is, if vm is a linear combination of v1,..., vm−1, then sp(v1,..., vm) = sp(v1,..., vm−1). Recalling Theorem 3, that one vectors is a linear combination of the others is equivalent to linear dependence. If the spanning is linearly dependent, we can find a vector that is a linear combination of the others (there usually is more than one choice), and throw it out. The remaining vectors still span the same subspace. We keep doing this. Can we run out of vectors? No, we can’t, since nothing cannot span a subspace (well, sort of) and we are always spanning the same subspace. But we only have a finite number of vectors to start with, so there must be a stopping point. The stopping point is a linearly independent set of vectors spanning the same space as before. Such a set is called a of the subspace. To put it in the form of a theorem:

( Theorem 5 Let V be a vector space and let v1,..., vm be vectors in V . Let W = v1,..., vm). There is a subset of {v1,..., vm} that still spans W and is linearly independent; in other words: every spanning set contains a basis of the spanned subspace. We could now ask if this basis, obtained by discarding vectors from a spanning set, is always the same. Well, if the spanning set was already linearly independent, and there is nothing to discard, then yes. Even so, there are many other spanning sets that will span the same subspace. And since when discarding there is almost always more than one choice of what to discard at each stage, the general answer is no. Vector subspaces tend to have an infinity of different bases. What is, however, remarkable (I’d even dare say extremely remarkable) is that any two bases of a given subspace of a vector space will have the same number of elements. So if you have a subspace and found a basis of exactly seven vectors and someone tries to sell you a better, improved basis, of only six vectors, don’t buy! There could be a better basis than the one you found, but it still will consist of seven vectors. If a subspace has a basis of m vectors, we say that its dimension is m. That is, an m-dimensional subspace of a vector space is one that has a basis of m elements (and hence ALL of its bases will have m elements).

Before we go any further, it may be good to make some additional definitions. A vector space V is a subspace of itself, so we can ask if it can be spanned by a finite number of vectors. If so, we say it is finite dimensional; if not, it is infinite dimensional. By what we saw, if it is finite dimensional, it has a basis, hence a dimension. Infinite dimensional vector spaces also have bases, but one has to redefine the concept a bit, and we won’t go into it. Example. Show that the vectors

e1 = (1, 0,..., 0), e2 = (0, 1, 0,..., 0),..., en = (0,..., 0, 1), 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 36

n n form a basis of R , hence R is a vector space of dimension n. In case there are too many dots, the vector ej is the n-tuple in which the j-th component is 1, all other components are 0. Solution. A direct way to verify that a set of vectors is a basis of a given space (or subspace) is by performing two tasks (in any order) 1. Verify . In other words show that the equation

c1e1 + ··· + cnen = 0

can only be solved by c1 = 0, c2 = 0, . . . , cn = 0. 2. Verify that for every vector w in the space (or subspace), the equation

c1e1 + ··· + cnen = w

has at least one solution c1, c2 = 0, . . . , cn. (Incidentally, if it happens to have more than one solution, then the first condition fails; if a vector is a linear combination of linearly independent vectors, there is only one choice for the coefficients.) If either condition fails, we don’t have a basis. Verifying the first condition; linear independence: Suppose c1e1 + ··· + cnen = 0 for some scalars c1, . . . , cn. If we write the vectors of Rn in the from of column vectors, then this equation becomes           0 1 0 0 c1  0   0   1   0   c2    = c   + c   + ··· + c   =    .  1  .  2  .  n  .   .   .   .   .   .   .  0 0 0 1 cn

The only way the first and last vectors can be equal is if c1 = c2 = ··· = cn = 0. Linear independence has been established.

  w1  w2  Verifying the second condition; spanning: Let w be a vector of n,so w =  . We have to show that no R  .   .  wn matter what w1, . . . , wn are, we can always solve         w1 1 0 0  w2   0   1   0    = c   + c   + ··· + c  ?  .  1  .  2  .  n  .   .   .   .   .  wn 0 0 1

Writing the right hand side as a single vector, the equation becomes     w1 c1  w2   c2    =   .  .   .   .   .  wn cn

There is a solution, namely c1 = w1, c2 = w2,..., cn = wn. Spanning has been established. n n The basis {e1,..., en} of R is sometimes referred to as the canonical basis of R . You might notice that if we allow complex scalars, it is also a basis of Cn, so that Cn is a (complex) vector space of dimension n. Here is a theorem about bases; some of the properties mentioned in it are sort of obvious, others less so. I hope all are believable (given that they are true).

Theorem 6 Let V be a vector space of dimension n. Then 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 37

1. No set of more than n vectors can be linearly independent. 2. No set of less than n vectors can span V .

3. Any set of n vectors that spans V is also linearly independent and a basis; i.e., if V = sp(v1,..., vn), then v1,..., vn is a basis of V .

4. Any set of n vectors that is linearly independent will also span V and be a basis of V; i.e., if v1,..., vn are linearly independent, then V = sp(v1,..., vn), and v1,..., vn is a basis of V .

5. Let v1,..., vm be linearly independent (so, by the first property, we must have m ≤ n). If m = n it is a basis; if m < n it can be extended to a basis meaning we can find vectors vm+1,..., vn so that v1,..., vn is a basis of V .

6. Every subspace of V has a basis. If W is a subspace of V there exists a set of vectors {v1,..., vm} that is linearly independent and such that W = sp(v1,..., vm). Necessarily m ≤ n (m = n if and only if V = W ).

So once we have a finite dimensional space V , there is a hierarchy of subspaces: Precisely one subspace of dimension n, namely V itself, a lot (an infinity) of subspaces of dimensions n−1, all the way down to dimension 1. To keep the poor subspace consisting only of the zero vector happy by giving it a dimension, one says that the trivial subspace has dimension 0. From a geometric point of view, if we think of Rn as a sort of n-dimensional replica of our familiar 3-space (or as our 3-space if n = 3, the plane if n = 2, a line if n = 1), one dimensional subspaces are lines through the origin, two dimensional subspaces are planes containing the origin, three dimensional subspaces, well, they are replicas of R3 containing the origin. Maybe it is a good idea to work for a while in the familiar environment of 3 space. Setting up a system of orthogonal coordinates, we can think of points of 3 space as being vectors. The  x  triple (written) as a column x =  y  can be interpreted as the point of coordinates x, y, z, or as the arrow z from the origin to the point of coordinates x, y, z. Choose the one you like best, it makes no difference. (In applied situations it can make a difference, but here we are not in an applied situation.) Suppose we have a non-zero vector; a non-zero vector constitutes a very small linearly independent set of a single element. Say   b1 b =  b2  6= 0 (so at least one of b1, b2, b3 is not 0), then the subspace sp(b) is the set b3     b1 cb1 sp(b) = {c  b2  : c a real number} = { cb2  : c a real number}. b3 cb3 In other words, it consists of all points of coordinates x, y, z satisfying

x = cb1, y = cb2, z = cb3, −∞ < c < ∞.

These are the parametric equations of a line through the origin in the direction of the vector b.One usually uses t or s for the parameter instead of c, but that does not really matter.

What about the subspace spanned by two vectors a and b. If these vectors are linearly dependent, we are back to a line (assume that neither is 0 to avoid wasting time). Both vectors are then on the same line through the origin, and either one of them spans that line, and is a basis for the line. On the other hand, if the vectors are not collinear (i.e., they are linearly independent), then sp(a, b) is the plane determined by the two vectors (thinking of them as arrows from the origin). We might retake this example later on.

What about three vectors? Well, if they are linearly dependent and not all 0, they span a line or a plane through the origin. If linearly independent, they are a basis of R3 and span R3.

Since our main space may well be Rn (or Cn), it may be a good idea to have some algorithm to decide when vectors in Rn are linearly independent; know what they span. In the section on we’ll see how to do this with determinants, but for now we’ll use our old and a bit neglected friend, row reduction. We were writing 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 38 vectors of Rn as column vectors, but for the algorithm I have in mind it will be more convenient to write them as rows. After this I’ll give you a second algorithm where you write. Well, lets try to be more or less consistent and keep writing the vectors as columns; in the first algorithm we’ll just transpose to get rows. So assume given m vectors in Rn, say       v11 v12 v1m  v21   v22   v2m  v =   , v =   ,..., v =   , 1  .  2  .  m  .   .   .   .  vn1 vn2 vnm

The problem to solve is: determine the dimension of sp(v1,..., vm) and find a basis for the subspace W = sp(v1,..., vm). For both algorithms we introduce the matrix I’ll call M whose columns are the vectors; that is,   v11 v12 ··· v1m  v21 v22 ··· v2m  M =    . . . .   . . . .  vn1 vn2 ··· vnm

T Algorithm 1 Let N = M . The vectors v1,..., vm are the rows of N. Row reduce N to RRE form. The non-zero rows are then a basis for W ; write them again as column vectors. Why does this work? It works because row operations do not change the span of the row vectors. That is fairly easy to see. So once you are in RRE form, the rows of the RRE form still span the same subspace W . But because the non-zero ones all start with a leading 1, and everything above and below that 1 is 0, it is easy to see that the non-zero rows are linearly independent, thus a basis. This basis, of course, might contain no vector from the original spanning set. Algorithm 2. This one is a bit harder to explain, but one row reduces directly the matrix M, bringing it to RRE form. Now go to the original spanning set v1,..., vm and discard every vector that was in a column of M which now does NOT have a leading 1. That is, keep only the original vectors that were in a column that now has a leading 1. These remaining vectors form a basis for W . The advantage of this algorithm is that the basis is made up out of vectors from the original set.

Lets illustrate this with a somewhat messy example. Messy examples can sometimes be the best. Sometimes they are the worst. NOTICE: This is only an example! No instructor would be sadistic enough to have you do a computation like this one by hand. I just thought it might be good to occasionally deal with larger systems, and understand in the process why computers are a great invention. Consider the following 8 vectors in R7:  1   3   1   1   1   1   2   6   −3   1   −8   0               3   9   −2   1   −7   1              v1 =  −1  , v2 =  −3  , v3 =  4  , v4 =  0  , v5 =  9  , v6 =  1  ,              −2   −6   0   5   2   1               4   12   1   −2   −2   1  −2 −6 5 −1 12 1

 0   3   2   −3       1   −1      v7 =  0  , v8 =  8  .      1   −5       1   10  −1 8

We want to find the dimension of the subspace W = sp(v1,..., v8) they span, and a basis of this spanned subspace. If perchance the dimension is 7 (it won’t be), then they span R7 and the basis we would get would be a 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 39 basis of R7. The matrix M is  1 3 1 1 1 1 0 3   2 6 −3 1 −8 0 2 −3     3 9 −2 1 −7 1 1 −1    M =  −1 −3 4 0 9 1 0 8     −2 −6 0 5 2 1 1 −5     4 12 1 −2 −2 1 1 10  −2 −6 5 −1 12 1 −1 8

The transpose matrix is  1 2 3 −1 −2 4 −2   3 6 9 −3 −6 12 −6     1 −3 −2 4 0 1 5    T  1 1 1 0 5 −2 −1  N = M =    1 −8 −7 9 2 −2 12     1 0 1 1 1 1 1     0 2 1 0 1 1 −1  3 −3 −1 8 −5 10 8 To row reduce this matrix, I used Excel. Moreover, because at every row reduction the space spanned by the row vectors is always the same, it isn’t really necessary to reach RRE; it suffices to stop once one can see that the non-zero rows are linearly independent. With the help of Excel, I carried out the following operations on N, in the indicated order:

III(2)−3(1),III(3)−(1),III(4)−(1),III(5)−(1),III(6)−(1),III(8)−3(1),I(2,8),I(2,3),II−(2)

III(1)−2(2),III(3)+5(2),III(4)+10(2),III(5)+2(2),III(6)−2(2),III(7)+9(2),III(4)−2(3),I(4,7)

II5(4),III(4)−2(3),II5(5),III(5)+3(3),II5(6),III(6)−8(3),I(4,5),III(6)−(4),III(6)+2(5). At this point I got the following matrix:

 1 0 −1 1 12 −8 0   0 1 2 −1 −7 6 −1     0 0 5 0 −33 27 2     0 0 0 10 −24 26 11     0 0 0 0 11 −9 1     0 0 0 0 0 0 0     0 0 0 0 0 0 0  0 0 0 0 0 0 0

I doubt I could have done this without Excel! Too many possibilities of mistakes. There is no real need to continue. Vectors in n of the form R       a1 0 0  ∗   a2   0         ∗   ∗   a3         ∗  ,  ∗  ,  ∗  ,...        .   .   .   .   .   .  ∗ ∗ ∗ where a1, a2,... are non-zero scalars and the entries marked with a * could be anything (zero or non-zero), have to be linearly independent. Any linear combination of them with coefficients c1, c2, c3,... would result in a vector whose first component is c1a1. If it is the zero vector, then c1a1 = 0, hence c1 = 0. The second component of this linear combination is c1(∗) + c2a2 (c1 times the second component of the first vector plus c2a2). Since c1 = 0, we get c2a2 = 0, hence c2 = 0. And so forth. 7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 40

Returning to our example, we proved that the set of vectors

 1   0   0   0   0   0   1   0   0   0             −1   2   5   0   0            w1 =  1  , w2 =  −1  , w3 =  0  , w4 =  10  , w5 =  0  .            12   −7   −33   −24   11             −8   6   27   26   −9  0 −1 2 11 1 spans the same space as the original set v1,..., v8; since they are linearly independent they are a basis of this subspace and its dimension is 5.

Solution by the second algorithm. In many ways, this is a better solution. We need to row reduce the matrix M; this time I will take it to RRE form. Using excel, of course. The following row operations put this matrix into RRE form. I’m not sure one can do it in less, but you may certainly try.

III(2)−(1),III(3)−3(1),III(4)+(1),III(5)+2(1),III(6)−4(1),III(7)+2(1),II 1 ,III(1)−(2),III(3)+5(2), − 5 (2)

III(4)−5(2),III(5)−2(2),III(6)+3(2),III(7)−7(2),II−(3),III 4 ,III 1 ,III 33 ,III 27 , (1)− 5 (3) (2)− 5 (3) (5)− 5 (3) (6)+ 5 (3)

III 2 ,I(5,7),II5(5),III 3 ,III 2 ,III 9 ,III 11 ,II 1 ,III(1)+7(6), (7)+ 5 (3) (1)− 5 (5) (2)− 5 (5) (6)+ 5 (5) (7)− 5 (5) 25 (5)

III(2)+5(6),III(3)−(6),III(4)−2(6),III(5)+11(6),III(7)+29(6),I4,6 The RRE form is:  1 3 0 0 −1 0 0 2   0 0 1 0 2 0 0 3     0 0 0 1 0 0 0 0     0 0 0 0 0 1 0 −2     0 0 0 0 0 0 1 1     0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 Looking at this matrix we notice that the columns that contain the leading 1’s of the non-zero rows are columns 1, 3, 4, 6, and 7. That means that of the original vectors, v1, v3, v4, v6, and v7 constitute a basis of the subspace spanned by the original 8 vectors. Since this basis has 5 vectors, we again get that the dimension is 5, but now we have a basis consisting of a subset of the original vectors.

7.1 Exercises

1. Show that the vector space of all m × n matrices; that is Mm,n (or, in the complex case, Mm,n(C) has dimension m · n. Describe a basis.

2. In one of the examples in this section it was shown that the vectors

 1   0   0   0   0   0   1   0   0   0             −1   2   5   0   0            w1 =  1  , w2 =  −1  , w3 =  0  , w4 =  10  , w5 =  0             12   −7   −33   −24   11             −8   6   27   26   −9  0 −1 2 11 1 8 DETERMINANTS 41

constitute a basis for the subspace of R7 spanned by the eight vectors  1   3   1   1   1   1   2   6   −3   1   −8   0               3   9   −2   1   −7   1              v1 =  −1  , v2 =  −3  , v3 =  4  , v4 =  0  , v5 =  9  , v6 =  1  ,              −2   −6   0   5   2   1               4   12   1   −2   −2   1  −2 −6 5 −1 12 1

 0   3   2   −3       1   −1      v7 =  0  , v8 =  8  .      1   −5       1   10  −1 8

1. Express v1 as a linear combination of the vectors w1, w2, w3, w4, w5. Show there is only one way of doing this.

2. Express w1 as a linear combination of the vectors v1, v2, v3, v4, v5, v6, v7, v8 in three different ways.

8 Determinants

Every square matrix (and only square matrices) gets assigned a number called its . To repeat, if M is a square matrix then the determinant of M, denoted usually by det(M), is a scalar (real if the matrix is real; otherwise complex). There are many (equivalent) ways of defining the determinant; I’ll select what could be a favorite way of doing it because it also tells you how to compute it. At first glance the definition may look a bit complex, but I hope that the examples will clear it up. A bit of practice will clear it up even more. By the way, in this section I prove almost nothing; you’ll have to take my word that all I tell you is true.

The definition I have in mind is a recursive definition; which becomes a recursive algorithm for computing determinants. Recursive procedures should be quite familiar to anybody who has done a bit of programming. Let us start with the simplest case, a 1 × 1 matrix. That’s just a scalar enclosed in parentheses; for example (5) or (1 + 3i). Mostly, one doesn’t even write the parentheses and identifies the set of 1 × 1 matrices with the set of scalars. We define, for a 1 × 1 matrix det(a) = a. So the determinant of a scalar is the scalar. This could be called our base case. The next part of the definition explains how to reduce computing the determinant of an n×n-matrix to computing determinants of (n−1)×(n−1) matrices. I’ll give it first in words; after the examples I’ll write out the formulas in a more precise way. Here is the full algorithm, which is usually known as Laplace’s expansion. Assume given an n × n square matrix.

Step 1. If n = 1, it is explained above what to do. Suppose from now on n ≥ 2. Step 2. Assign a sign (“+” or “−”) to each position of the matrix, beginning by placing a + in position (1, 1) and then alternating signs. This is totally independent of the entries in the matrix; the entry in + position may well be negative, the one in a − position can be positive. For example, here is how this signs assignment looks for 2 × 2, 3 × 3 and 4 × 4 matrices:

 + − + −   + − +   + −  − + − + , − + − ,   . − +    + − + −  + − +   − + − + 8 DETERMINANTS 42

Step 3. Select a row of the matrix. One usually selects the row with the largest number of zero entries; all things being equal one selects the first row. For each entry in that row compute the determinant of the (n − 1) × (n − 1) matrix obtained by crossing out the selected row and the column containing the entry. Multiply the determinant times the entry. If the entry is in a negative position, change the sign of this result. Step 4. Add up the results of all the computations done in step 3. That’s the determinant.

An important fact here is that the result of the computation does not depend on the choice of the row. That’s not so easy to justify with the tools at our disposal. Here is how we would compute the determinant of a 2 × 2 matrix

 a b  A = c d

Let us select the first row. The first entry is a; if we cross out the first row and the column containing a, we are left with d. We multiply by a, getting ad. Since a is in a positive position, we leave it as it is. The next (and last) entry of the first row is b. Crossing out the first row and the column containing b we are left with c. We multiply b times c and, since b is in a negative place, change the sign to −bc. Adding it all up gives ad − bc. Thus

 a b  det(A) = det = ad − bc c d

What would have happened if we had chosen the second row? We would get the same determinant in the form −cb + da.

A word on notation. Given a square n × n matrix, it is customary to write its determinant by replacing the parentheses that enclose the array of entries of the matrix by vertical lines. Thus

a b = ad − bc. c d So if the matrix A is given by   a11 a22 ··· ann  a21 a22 ··· a2n  A =   ,  . . . .   . . . .  an1 an2 ··· ann then

a11 a22 ··· ann

a21 a22 ··· a2n det(A) = , ......

an1 an2 ··· ann One slight problem with this notation is that the vertical lines look like absolute values, but determinants can be negative (or even non-real).

Let us compute now a 3 × 3 determinant. While you should memorize the formula for a 2 × 2 determinant, there is no need to memorize any further formulas (assuming you know how Laplace’s expansion works). Expansion by the first row:

a b c e f d f d e d e f = a − b + c h i g i g h g h i To complete the job we need to compute the three 2 × 2 determinants. Here is how one can compute a concrete 4 × 4 determinant. I will expand at first by the second row because it has a 0 term, and that cuts by one the 8 DETERMINANTS 43 number of 3 × 3 determinants to compute; all three by three determinants by the first row.

1 2 3 2 2 3 2 1 3 2 1 2 2 2 −3 −4 0 = (−2) −1 2 −2 + (−3) 1 2 −2 + 4 1 −1 −2 1 −1 2 −2 1 2 2 1 2 2 1 1 2 1 1 2 2   2 −2 −1 −2 −1 2 = (−2) 2 − 3 + 2 2 2 1 2 1 2   2 −2 1 −2 1 2 −3 − 3 + 2 2 2 1 2 1 2   −1 −2 1 −2 1 −1 +4 − 2 + 2 1 2 1 2 1 1     = (−2) 2(4 − (−4)) − 3(−2 − (−2)) + 2(−2 − 2) − 3 (4 − (−4)) − 3(2 − (−2)) + 2(2 − 2)   +4 ((−2) − (−2)) − 2(2 − (−2)) + 2(1 − (−1)) = −20.

Laplace’s method is useful for calculating the determinants of small matrices (up to 3 × 3, maybe 4 × 4), and for matrices that have a lot of zeros. But it is not a very efficient method. I will list now the basic properties of determinants.Some of these properties will allow us to calculate determinants more efficiently. Maybe. Some of these properties are easy to verify, others are not so easy. D1. If M is an n × n matrix and its rows are linearly dependent; equivalently, one row is a linear combination of the other rows, then det(M) = 0. In particular this holds if M has a zero row, or two equal rows. D2. If M is an n × n, then det(M T ) = det(M). The determinant of a matrix equals the determinant of its transpose. Because of this, it turns out that one can compute the determinant of a matrix expanding along a column rather than a row. D3. If M is an n×n matrix and its columns are linearly dependent; equivalently, one column is a linear combination of the other column, then det(M) = 0. In particular this holds if M has a zero column, or two equal columns. This property is, of course, and immediate consequence of properties D1,D2. D4. The effect of row operations. Let M be ann × n matrix. If the matrix N is obtained from M by

1. interchanging two rows, that is applying I(i,j), i 6= j, then det(N) = − det(M);

2. multiplying a row by a scalar c, (operation IIc(i)), then det(N) = c det(M);

3. adding to a row i a row j times a scalar (operation III(i)+c(j), i 6= j), then det(N) = det(M). D5. If A, B are n × n matrices, then det(AB) = det(BA) = det(A) × det(B). With these properties we can compute determinants using row reduction. While computing 5 × 5 determinants is still a difficult thing to do without the aid of some calculating device (nice computer software, for example), it is a far better method than using the Laplace expansion. If programmed in some computer language, it is an algorithm that may be hard to beat for finding the determinant of medium sized matrices. To use it at maximum efficiency we need the following additional property, which is actually an easy consequence of the Laplace expansion. First a definition. A square matrix is said to be upper triangular if all entries below the main diagonal are 0. It is said to lower triangular if all entries below the main diagonal are 0. The following matrix is upper triangular.   a11 a12 a13 ··· a1(n−1) a1n  0 a22 a23 ··· a2(n−1) a2n     0 0 a33 ··· a3(n−1) a3n    U =  ......   . . . . ······     ......   ··· . . . . ···  0 0 0 ··· 0 ann 8 DETERMINANTS 44

If you transpose it, it becomes lower triangular. The property in question is: D6. The determinant of an upper or a lower triangular matrix equals the product of its diagonal entries. Examples: Calculate the determinants of the following matrices:  1 −5 6 7 8   2 0 0 0   1 −5 6  0 2 3 4 5 −3 5 0 0   a)A = 0 2 3 , b)B =   , c)C =  0 0 0 3 3  .    4 4 −3 0    0 0 2    0 0 0 2 1  1 2 5 6   0 0 0 0 7 Solution. a) det(A) = 1 · 2 · 2 = 4, b) det(B) = 2 · 5 · (−3) · 6 = −180 c) det(C) = 1 · 2 · 0 · 2 · 7 = 0. The idea now is to row reduce the matrix to upper (or lower–upper is better) triangular form, keeping track of how the row reductions affect the determinant. If we interchange rows, we multiply the determinant by -1, if we multiply a row by a constant, we divide the determinant by that constant; if we do an operation of type III, the determinant stays the same. I’ll illustrate this computing again the determinant of the 4 × 4 matrix computed before. That is, I’ll compute

1 2 3 2

2 −3 −4 0 . 1 −1 2 −2

1 1 2 2

First we work on the first column.We perform the operations III(2)−2(1),III(3)−(1),III(4)−(1) on the matrix; the determinant is unchanged. We get

1 2 3 2 1 2 3 2

2 −3 −4 0 0 −7 −10 −4 = 1 −1 2 −2 0 −3 −1 −4

1 1 2 2 0 −1 −1 0 We can now exchange lines 2 and 3; this changes the sign of the determinant, so to compensate we multiply the new determinant by −1 and get

1 2 3 2 1 2 3 2

2 −3 −4 0 0 −1 −1 0 = − 1 −1 2 −2 0 −3 −1 −4

1 1 2 2 0 −7 −10 −4 Next I multiply the second row by −1. To compensate I need to divide by −1, which of course is the same as changing the sign:

1 2 3 2 1 2 3 2

2 −3 −4 0 0 1 1 0 = 1 −1 2 −2 0 −3 −1 −4

1 1 2 2 0 −7 −10 −4

Next I perform operations III(3)+3(2),III(4)+7(2). This does not change the determinant.

1 2 3 2 1 2 3 2

2 −3 −4 0 0 1 1 0 = 1 −1 2 −2 0 0 2 −4

1 1 2 2 0 0 −3 −4 Trying to avoid working with fractions, or just for the fun of it, I’ll multiply the 3rd row by 3 and the fourth row by 2. I need to compensate by dividing the determinant by 2 × 3 = 6:

1 2 3 2 1 2 3 2

2 −3 −4 0 1 0 1 1 0 = 1 −1 2 −2 6 0 0 6 −12

1 1 2 2 0 0 −6 −8 8 DETERMINANTS 45

Finally, I perform operation III(4)+(3), which does not change the determinant (and puts the matrix in upper triangular form) to get

1 2 3 2 1 2 3 2

2 −3 −4 0 1 0 1 1 0 1 = = (1 · 1 · 6 · (−20)) = −20. 1 −1 2 −2 6 0 0 6 −12 6

1 1 2 2 0 0 0 −20

What are determinants good for? Well, here is a first application.

Theorem 7 Let     v11 v1m  v21   v2m  v =   ,..., v =   1  .  m  .   .   .  vn1 vnm be m vectors in Rn. They are linearly independent if and only if the matrix whose columns consist of the vectors; that is the matrix   v11 v12 ··· v1m  v21 v22 ··· v2m  M =    . . . .   . . . .  vn1 vn2 ··· vnm

( has at least one m × m submatrix with a non-zero determinant. More generally, the dimension of v1,..., vm) is k if and only if M contains a k × k submatrix with 0 determinant, and every (k + 1) × (k + 1) submatrix has 0 determinant. I’ll try to explain why this result must hold. First of all, by a submatrix of a matrix we mean either the matrix itself, or any matrix obtained by crossing out some rows and/or columns of the original matrix. For example, the 3 × 4 matrix  a b c d   e f g h  i j k ` has a total of 18 2 × 2 submatrices, of which a few are

 a b   a c   b d   g h  , , , . e f e g j ` k `

Before I try to explain why this theorem is true (an explanation you may skip if you trust everything I tell), I want to consider some cases. Suppose m > n. Then there is no way we can find an m × m submatrix with a non-zero determinant for the simple reason that you can’t find an m × m submatrix when you have fewer than m rows. This simply reflects the fact that the dimension of Rn is n and if m > n, a set of m vectors cannot be independent. So we may just restrict considerations to the case m ≤ n. Suppose m = n. Then the theorem tells us that the n-vectors are linearly independent, hence a basis of Rn, if and only the determinant of M (in this case the one and only n × n submatrix) is different from 0.

Here is a reason for the theorem. To simplify, I replace the matrix M by its transpose N; N = M T , so N is m×n and the vectors are the rows of N. Because he determinant of a matrix and its transpose are the same, nothing changes much. Now suppose I row reduce N. A bit if reflection shows that in doing this, every row operation of N is an equivalent row operation on some submatrices. The submatrices are also being row operated on, perhaps reduced. The determinant can change by a row operation, but only by multiplication by a non-zero scalar. So, if we reduce N to RRE, any submatrix of the original matrix N with a non-zero determinant will end as a matrix of the RRE form of N with non-zero determinant. And conversely, zero determinants stay zero determinants. As we learned before, the dimension of the subspace spanned by the vectors v1,..., vm, is the number of non-zero rows of the RRE form. A bit of reflection shows that if this number is k, then the largest non-zero determinant 8 DETERMINANTS 46 you can get in the RRE form is the k × k determinant obtained by crossing out all zero rows, and all columns not containing a leading 1.

A very important application is the following. Theorem 8 Let A be a square n × n matrix. Then A is invertible if and only if det(A) 6= 0. I will try to at least give a reason why this theorem holds. Warning: Some of my arguments could be circular! But, circles are, after all,consistent. Let us suppose first that A is a square . If we want to solve any system of linear equations having A as its matrix, a system of the form

Ax = b, there is a theoretically simple way of doing it. I say “theoretically simple,” because it isn’t the best to actually use in practice. The method is to multiply on the right by the inverse. Well, let’s be precise. Assume first there is a solution x. Then, because Ax = b, we have A−1(Ax) = A−1b,(A−1A)x = A−1b, Ix = A−1b, x = A−1b. In other words, the only possible solution is x = A−1b. Conversely, if we take x = A−1b, then we verify at once that Ax = b. That is: If A is an invertible n × n square matrix, then the system of linear equations Ax = b has a unique solution for every b ∈ Rn. Fine. Now suppose A is a square, n × n matrix such that the equation Ax = b has a unique solution for every n n b ∈ R . With e1,..., en being the canonical basis of R , we solve the equations (1) (2) (n) Ax = e1,Ax = e2,...,Ax = en.

If these solutions are     x11 x1n  x21   x2n  x(1) =   ,,..., x(n) =    .   .   .   .  xn1 xnn and we use them as columns of a matrix X; that is   x11 x12 ··· x1n  x21 x22 ··· x2n  X =    . . . .   . . . .  xn1 xn2 ··· xnn Then we see at once that AX = I, showing A is invertible. So we also have If A is a square n × n matrix such that for every b ∈ Rn the system of equations Ax = b has a solution, then A is invertible. And the solution is unique. Putting it all together we see that a square n × n matrix is invertible if and only if the systems Ax = b have a unique solution for every choice of b ∈ Rn. Well, if we recall Theorem 1, we see that this is equivalent to the RRE form of the matrix A being the identity matrix. Let’s state this as a theorem.

Theorem 9 A a square n × n matrix A is invertible if and only if its RRE form is the n × n identity matrix.

And now we can explain why Theorem 8 must be true. When we row reduce a matrix the determinant might change sign (if we interchange rows) or get multiplied by a constant (if we do that to a row). But if it starts as being different from 0, it will end different from 0. And, of course, vice versa. If A is invertible, its RRE form is I, det(I) = 1 6= 0, thus det(A) 6= 0. On the other hand, for a square matrix, the only way one can avoid getting I as the RRE form is one either runs into a zero row or a zero column. In either case, the determinant is 0. Thus the only square matrix in RRE form with non-zero determinant is I. So if det(A) 6= 0, then because the determinant of its RRE form also must be 6= 0, the RRE form must be I and A is invertible. We conclude this section with a new method for inverting matrices. It is probably the best method to use for 2 × 2 matrices, acceptable for 3 × 3 matrices, not so good for higher orders. To explain it, and also to write out in a more precise way the computation of determinants by Laplace expansion, it is convenient to develop some additional jargon. 8 DETERMINANTS 47

Let A = (aij)1≤i,j≤n be a square n × n matrix (n ≥ 2). If (i, j) is a position in the matrix, I will denote by µij(A) the (n − 1) × (n − 1) matrix obtained from A by eliminating the i-th row and the j-th column. An n × n matrix has n2 such submatrices. For example, if

 a b c  A =  d e f  , g h i then these nine matrices are  e f   d f   d e   b c   a c  µ (A) = , µ (A) = , µ (A) = , µ (A) = , µ (A) = , 11 h i 12 g i 13 g h 21 h i 22 g i

 a b   b c   a c   a b  µ (A) = , µ (A) = , µ (A) = , µ (A) = . 23 g h 31 e f 32 d f 33 d e

The (i, j)-th of the matrix A is defined to be the determinant of µij. The (i, j)-th cofactor is the same as the minor if (i, j) is a positive position, minus the minor otherwise. I will denote the (i, j)-th cofactor of A by cij(A); a convenient way of writing is i+j cij(A) = (−1) det(µij(A)). The factor (−1)i+j is 1 precisely if (i, j) is a positive position; −1 otherwise.

The adjunct matrix A† is defined as the transpose of the matrix of cofactors:

† T A = (cij(A)) .

This matrix is interesting because of the following result:

Theorem 10 Let A be a square matrix. Then

AA† = A†A = (det(A))I.

In particular, if det A = 0 then AA† is the zero matrix; not a very interesting observation. What is more interesting is that if det A 6= 0, we can divide by determinant of A and get  1   1  A A† = A† A = I. det A det A This provides an alternative reason for why Theorem 8 is true and shows:

−1 1 † Theorem 11?? If det(A) 6= 0, then A = det A A . Let us use this method to try to find again the inverse of the matrix

 1 2 2  A =  2 3 1  1 0 1

First, we compute the determinant. No point in doing anything else if the determinant is 0. Expanding by the last row

1 2 2 2 2 1 2 det(A) = 2 3 1 = + (14) 3 1 2 3 1 0 1 = (2 − 6) + (3 − 4) = −5 6= 0 (15)

The matrix is invertible. Next we compute the 9 cofactors.

3 1 2 1 2 3 c11(A) = + = 3, c12(A) = − = −1, c13(A) = + = −3, 0 1 1 1 1 0 8 DETERMINANTS 48

2 2 1 2 1 2 c21(A) = − = −2, c22(A) = + = −1, c23(A) = − = 2, 0 1 1 1 1 0

2 2 1 2 1 2 c31(A) = + = −4, c32(A) = + = 3, c33(A) = + = −1, 3 1 2 1 2 3 The cofactor matrix is  3 −1 −3   −2 −1 2  −4 3 −1 Transposing we get the adjunct:  3 −2 −4  † A =  −1 −1 3  −3 2 −1 The inverse is  − 3 2 4    5 5 5 3 −2 −4   −1 1  1 1 3  A = −  −1 −1 3  =  −  5  5 5 5  −3 2 −1   3 2 1 5 − 5 5 Notes. Definitions of determinants vary. Of course, the end product is always the same! At an elementary level, a common approach is to define the determinant of a square matrix recursively as the scalar you obtain when expanding Laplace by the first row. Using our notation for submatrices this definition would look somewhat like this: Let A = (aij)1≤i,j≤n be a square n × n matrix, say n ≥ 2 (one can start, as we did from n = 1, but most texts will begin with n = 2). 1.) If n = 2 then det(A) = a11a22 − a12a13. (Recursion base.) 2.) If n ≥ 3, define n X 1+j det(A) = (−1) a1j det(µ1j(A)), j=1 where (as above), µ1j(A) is the matrix obtained from A by crossing out the first row and j-th column. (Reduction of the n case to the n − 1 case.)

The problem with this as a definition,rather than a theorem, is that it is quite hard to use it to verify any properties of the determinant. In particular, it isn’t easy to show, beginning with this definition, that you can actually expand using any row, or even column, not just the first row. I won’t go into this any further, except to write out now the Laplace expansion formulas for the determinant, since so far I only gave them in words. It holds, for every i, 1 ≤ i ≤ n, that

n X i+j det(A) = (−1) aij det(µij(A)). j=1 This is the expansion by rows. A similar result holds for columns. For each j such that 1 ≤ j ≤ n,

n X i+j det(A) = (−1) aij det(µij(A)). i=1 Laplace vs. Row Reduction. To compute a 2 × 2 determinant one has to perform 3 operations: 2 products and a difference. Using Laplace, computing a 3 × 3 determinant reduces to computing three 2 × 2 determinants; a 4 × 4 determinant “reduces” to computing four 3 × 3 determinants, thus 4 × 3 = 12 two by two determinants. And so forth. For an n × n matrix, there will be approximately n! = n(n − 1) ··· 3 · 2 operations involved in computing a determinant by Laplace’s method. This is computationally very, very bad. It’s OK for small matrices, but as the matrices get larger, it becomes a very bad method. Of course, if the matrix is what is called a , that is, has a lot of zero entries, then it could be a reasonable method.

Consider now row reducing the matrix to triangular form. To get the first column to have all entries under the first one equal to 0, you need to pivot (perhaps; i.e., get a non-zero entry to position (1, 1)), let’s call this one 8 DETERMINANTS 49 operation though it hardly takes time in a computer, divide each entry of the first row by the entry in position (1, 1) (n operations), and then for each row below the first multiply the first row by the first entry of the row in question (n operations) and subtract the first row multiplied by that entry from the row in question (n operations. That is we have a maximum of 1 + n + (n − 1)n = n2 + 1 operations. We have to repeat similar operations for the other columns; to simplify let’s say we go all the way to RRE form, so that we have n2 + 1 operations per column. That is a total of n(n2 +1) < (n+1)3 operations. Now, for small values of n, there is little difference. For example, if n = 4, then 4! = 24, (4 + 1)3 = 125. Row reduction seems worse (it actually isn’t). But, suppose n = 10. Then

10! = 3, 628, 800, 113 = 1331.

Row reduction is a polynomial time algorithm, while Laplace is super-exponential. Cramer’s rule. The expression of the inverse of a matrix in terms of the adjunct matrix gives rise to a very popular method for solving linear systems of equations in which the system matrix is square (as many equations as unknowns) called Cramer’s rule,(so named for Gabriel Cramer, an 18th century Swiss mathematician who   x1  .  supposedly first stated it). In words it states that the solution x =  .  of the n × n system Ax = b can xn be found as follows. The component xj equals the quotient of the determinant of the matrix obtained from A by replacing the j-th column by b divided by the determinant of A. The advantage of this method, which only works if det A 6= 0, is that you can compute one component of the solution without having to compute the others. The disadvantage is that a computer might require more time to find one component by this method than all by row reduction.

8.1 Exercises To come.