6. and vector triple product 6.1. The scalar triple product. Definition 1.37 (Scalar triple product). Let a, b, c ∈ R3. The scalar triple product of the vectors a, b, and c (in that order) is the scalar (a × b) · c ∈ R. Remark 1.38. We usually omit the brackets, since a × b · c can only mean (a × b) · c, because a × (b · c) is the of a vector and a scalar, which is not defined. The scalar triple product satisfies the properties (i) provided we keep the vectors a, b, and c in the same order, we can exchange the cross product and the , thus (1.13) a × b · c = a · b × c, (ii) the scalar product is unaltered provided we keep a, b, and c in the same cyclic order, namely (1.14) a × b · c = b × c · a = c × a · b. These properties follow by writing out the left and right hand sides, for instance for the first identity

a × b · c = c1(a2b3 − a3b2) + c2(a3b1 − a1b3) + c3(a1b2 − a2b1)

a · b × c = a1(b2c3 − b3c2) + a2(b3c1 − b1c3) + a3(b1c2 − b2c1) and it easy to check that these two quantities agree. For the second, we have

a × b · c = c1(a2b3 − a3b2) + c2(a3b1 − a1b3) + c3(a1b2 − a2b1)

b × c · a = a1(b2c3 − b3c2) + a2(b3c1 − b1c3) + a3(b1c2 − b2c1) and again we can check that these two quantities agree.

6.2. Geometric Interpretation of scalar triple product. The scalar triple product has a nice geometric interpretation.

Consider three vectors u, v, w ∈ R3. We can form a (a squashed rectangular box) by w taking these vectors as adjacent edges.

The volume of a parallelepiped is given by the area v of the base, times the perpendicular height. If we take the base to be the parallelogram with adjacent 0 u edges u and v, this parallelogram has area = |u×v|.

Now n = u × v is a vector perpendicular to the base, so the perpendicular height is given by the w length of the projection of w on n. u × v Thus the perpendicular height is given by v  n  |w · u × v| w · = . |n| |u × v| 0 u

Combining these observations, we see that the volume is given by |w · u × v| |u × v| = |w · u × v|. |u × v| In other words, we have

the volume of the parallelepiped with sides u, v, and w is given by |w · u × v|.

24 Remark 1.39. Had we chosen a different base, for example the parallelogram with adjacent edges u and w, we would have found the volume to be |u × w · v| . But from property (1.14), this is the same as |w · u × v|.

Example 1.40. Find the volume of the parallelepiped with adjacent edges OA, OB, OC where a = (1, 2, −1), b = (3, 1, 0), c = (2, 2, 5).

Solution:

6.3. Vector triple product. Given vectors a, b, c ∈ R3, the vector products (a × b) × c and a × (b × c) are called vector triple prod- ucts. We have seen on an exercise that in general (a × b) × c 6= a × (b × c). So the order in which we compute the cross products matters.

We now examine these vector triple products more closely. a × b Let ` be the plane through a and b. Then n = a × b is perpendicular to this plane and a n × c = (a × b) × c is perpendicular to n again, so lies back in the plane `. But any vector in ` can be 0 written as a of a and b. b

Therefore we expect (a × b) × c = sa + tb, for some scalars s and t.

Similarly, a × (b × c) lies in the plane containing b and c, and so we expect a × (b × c) = ub + vc, for some scalars u and v.

In fact, these vector triple products, are given by

(1.15) (a × b) × c = (a · c)b − (b · c)a and

(1.16) a × (b × c) = (a · c)b − (a · b)c.

The identities (1.15) and (1.16) follow from a direct computation. For instance, we have (a × b) × c

= (a2b3 − a3b2, a3b1 − a1b3, a1b2 − a2b1) × (c1, c2, c3)  = (a3b1 − a1b3)c3 − (a1b2 − a2b1)c2, (a1b2 − a2b1)c1 − (a2b3 − a3b2)c3, (a2b3 − a3b2)c2 − (a3b1 − a1b3)c1  = (a · c)b1 − (b · c)a1, (a · c)b2 − (b · c)a2, (a · c)b3 − (b · c)a3 = (a · c)b − (b · c)a.

25 The second identity (1.16) follows from a similar computation. Remark 1.41. The identities (1.15) and (1.16) are easy enough to learn if you remember that (i) the vectors that appear in the expressions on the right hand side are the two vectors that are in brackets on the left hand side (ii) the middle vector on the left has a plus on the right, the other a minus sign (iii) the scalar coefficient of each vector on the right is the dot product of the remaining two vectors.

Example 1.42. Find (a × b) × c, where a = (0, 1, 0), b = (0, 0, 1), and c = (2, 2, 3). Solution:

Example 1.43. Given |a| = 2 and a · b = −3, find (a × b) × a. Solution:

Example 1.44. Simplify ((a × b) × a) × b. Solution:

26 Summary of Chapter 1. Vectors Section 1 - 3 Background material: dot product, projections, lines (parametric and coordinate), planes.

Section 4

The cross product a × b = (a2b3 − a3b2, a3b1 − a1b3, a1b2 − a2b1) is ⊥ to a and b. The standard properties contained in Lemma 1.23: given vectors u, v, w ∈ R3 and a scalar a ∈ R (1) u · (u × v) = 0 = v · (u × v), (2) u × v = −v × u, (3)( au) × v = u × (av) = a(u × v), (4)( u + v) × w = u × w + v × w and u × (v + w) = u × v + u × w, (5) u × u = v × v = 0.

Behaviour of standard unit vectors : e1 × e2 = e3, e2 × e3 = e1, e3 × e1 = e2 [cyclic order]. Geometrical interpretation of the cross product: length: a × b = |a||b| sin θ, direction: perpendicular to a and b, given by right hand rule and a × b = 0 ⇐⇒ a = tb for some t ∈ R, ⇐⇒ either a and b are parallel, or one of them is trivial.

Section 5 1 Area of triangle ∆ABC = 2 |(b − a) × (c − a)|. Area of parallelogram ABCD = |(b − a) × (c − a)|. |(q−p)×v| Distance of a point Q from line p + tv is given by |v| . Plane through three points A, B, and C, has normal n = (b − a) × (c − a) and equation x · n = a · n. Two lines p + ta and q + sb are parallel ⇐⇒ a × b = 0. |(q−p)·(a×b)| If a × b 6= 0, then the perpendicular distance between the lines p + ta and q + sb is |a×b| . Two lines p + ta and q + sb are called skew if a × b 6= 0, and they do not intersect.

Section 6 Scalar triple product of a, b, c ∈ R3 is the number (a × b) · c ∈ R written a × b · c. The scalar triple product satisfies the properties (1.13) and (1.14): a × b · c = a · b × c and a × b · c = b × c · a = c × a · b.

Volume of parallelepiped with sides a, b, c ∈ R3 is given by |a × b · c|. Given a, b, c ∈ R3 we define the vector triple products to be the vectors (a × b) × c and a × (b × c). (Note that (a × b) × c 6= a × (b × c)). The vector triple product satisfies the identities (1.15) and (1.16): (a × b) × c = (a · c)b − (b · c)a and a × (b × c) = (a · c)b − (a · b)c.

27

CHAPTER 2

MATRICES, LINEAR EQUATIONS AND

1. Matrices and operations 1.1. Matrices. A matrix is a device for storing data. For example, if 4 teams A, B, C and D are playing in a rugby competition then the number of tries, penalties and conversions each team scores in their first game, could be recorded in the table below on the left. TPC 2 3 1 A 2 3 1 3 0 2 B 3 0 2 R =   4 1 3 C 4 1 3   0 3 0 D 0 3 0 However, once the format of the table is known, it is really only the array R of numbers on the right above that is important. The array R is called a matrix.

Definition 2.1 (Matrices). A matrix is a rectangular array of numbers. The numbers in the array are called the entries in the matrix.

Example 2.2. Some examples of matrices are 2 3 1 √  1 2  2 π e 3 0 2 1   , 3 0 , 2 −1 0 , 3 1 0 , . 4 1 3    2  2   −1 2 0 0 0 0 3 0 The size of a matrix is given by the number of rows (horizontal lines) and columns (vertical lines) it contains.

The first matrix in Example 2.2 has 4 rows and 3 columns, so its size is 4 by 3, denoted 4 × 3. The number of rows is always given first, followed by the number of columns. The other matrices in Example 2.2 are 3 × 2, 1 × 3, 3 × 3 and 2 × 1 matrices respectively.

We use capital letters A, B, C, ... for matrices, and small letters for their entries.

Often we will write a a a  A = 11 12 13 a21 a22 a23 for a general 2 × 3 matrix, or   a11 a12 a13 . . . a1n  a21 a22 ...... a2n  A =    ......  am1 am2 ...... amn

for a general m × n matrix. In these matrices, the term aij denotes the (i, j) entry, that is, the entry in row i and column j of the matrix.

Notation: For compactness, we write A = [aij] when A is a n×m matrix with entries aij. Occasionally, we also use [A]ij to mean the (i, j) entry of the matrix A.

29 Example 2.3. If −1 8 2 A = , 3 −7 0

then a12 = 8, a22 = −7 and a23 = 0.

Example 2.4. Let A = [aij] be a 2 × 3 matrix, where aij = i + j. Write down the matrix A and find 3 2 3 X X X a1j, aii, ja2j. j=1 i=1 j=1

Solution: We have 1 + 1 1 + 2 1 + 3 2 3 4 A = = . 2 + 1 2 + 2 2 + 3 3 4 5 and so 3 X a1j = a11 + a12 + a13 = sum of the entries in row 1 = 2 + 3 + 4 = 9 j=1 and 2 X aii = a11 + a22 = 2 + 4 = 6 i=1 and 3 X ja2j = 1 × a21 + 2 × a22 + 3 × a23 = 1 × 3 + 2 × 4 + 3 × 5 = 26. j=1

Some size matrices have special names. Definition 2.5 (Row vector/column vector//zero matrix). (i)A1 × n matrix is often called a row matrix or row vector.

(ii) An m × 1 matrix is often called a column matrix or column vector.

(iii) An n × n matrix is said to be a square matrix.

(iv) The zero matrix of size m × n is the matrix with all entries 0 and is denoted by A = 0.

Two matrices with the same entries are regarded as equivalent.

Definition 2.6 (Equality for matrices). Two matrices A = [aij] and B = [bij] are equal if they have the same size (both m × n say) and their corresponding entries are equal. Thus aij = bij for all values of i and j.

Example 2.7. (1) A = 2 −3 5 is a 1 × 3 row matrix or row vector.

(2) Let 1 1 0 B = ,C = . 2 2 0 Then B is a 2 × 1 column matrix or column vector, and C is a 2 × 2 square matrix. Note that C 6= B because C and B have different sizes.

(3) We have  1 x  1 2  = −3 −2 y −2

30 precisely when x = 2 and y = 3.

1.2. Addition and subtraction. Initially, in Subsection 1.1, we considered a matrix R, whose columns gave the number of tries, penalties and conversions respectively for teams A, B, C and D in their first games. Suppose we have a similar matrix S giving the same data for the teams’ second games. Then to find the total number of tries, penalties and conversions scored by the teams in their first two games, obviously we must add the cor- responding entries in the two arrays. This is how matrix addition is defined.

Definition 2.8 (Matrix addition). Let A = [aij] and B = [bij] be m × n matrices. The matrix sum of A and B is defined to be the m × n matrix

A + B = [aij + bij]. That is, A + B is the m × n matrix formed by adding the corresponding entries in A and B, so the (i, j) entry in A + B is aij + bij. Similarly, the difference of A and B is the m × n matrix

A − B = [aij − bij], so the (i, j) entry in A − B is the difference aij − bij.

It is important to note that two matrices A and B can only be added when they have the same size! In other words the matrix sum is only defined when A and B have the same number of columns, and the same number of rows.

2 −1 3 0 1 2 2 Example 2.9. Let A = , B = , c = . Find (where defined) 0 5 2 2 −1 4 4 A + B,A − B,A + C,B − C.

Solution: We have 2 + 0 −1 + 1 3 + 2 2 0 5 A + B = = , 0 + 2 5 − 1 2 + 4 2 4 6 and 2 − 0 −1 − 1 3 − 2  2 −2 1  A − B = = . 0 − 2 5 + 1 2 − 4 −2 6 −2 Neither A + C nor B − C is defined, since the size of C is not the same as that of A or B.

1.3. Scalar . Similar to the case with vectors, we can always multiply (or scale) a matrix by a scalar λ ∈ R.

Definition 2.10 (). If A = [aij] is an m × n matrix and λ ∈ R is a scalar (i.e. a number), then we define

λA = [λaij]. That is, λA is the n × m matrix where each entry of A is multiplied by λ, thus the (i, j) entry in λA is λaij. Note, we usually write (−1)A as −A.

1 Example 2.11. For the matrices in Example 2.9, find (−1)A, 0B, 2 C, 2A − 3B.

Solution: We have −2 1 −3 0 0 0 1 1 (−1)A = −A = , 0B = = 0, C = , 0 −5 −2 0 0 0 2 2 and 4 −2 6 0 3 6   4 −5 0  2A − 3B = − = . 0 10 4 6 −3 12 −6 13 −8

31 The rules for manipulating m × n matrices under the operations of addition and scalar multiplication are similar to those for numbers. Lemma 2.12 (Properties of matrix addition and scalar multiplication). For all m × n matrices A, B, C, and 0 (the zero matrix), and for all scalars λ, µ ∈ R we have (i) A + (B + C) = (A + B) + C [Associative law] (ii) A + B = B + A [Commutative law] (iii)(λ + µ)A = λA + µA [Scalar laws] (iv) λ(A + B) = λA + λB (v) λ(µA) = (λµ)A (vi) A + 0 =A = 0 + A [Zero laws] (vii) A − A = 0. Proof. These are all quite obvious, because in performing any matrix we are just applying the same operation to the numbers in each of the (i, j) positions. For example to show (i) is true, consider the (i, j) entry in A + (B + C), which is aij + (bij + cij). However aij + (bij + cij) = (aij + bij) + cij, since these are just numbers for which the associative law of addition holds. But (aij +bij)+cij is the (i, j) en- try in (A+B)+C. So A+(B+C) = (A+B)+C, since they have the same (i, j) entry for any i and j. 

Using these rules we can now do the same sort of standard with matrices as we do with numbers.

Example 2.13. If A, B and X are m × n matrices with 3A + 2X − B = 5A − X, find X in terms of the matrices A and B. Solution: Adding B −3A to both sides of this equation (and using the rules), we get 2X = 2A+B −X. 1 Now adding X to both sides, we get 3X = 2A + B, so X = 3 (2A + B).

1.4. . We add matrices by adding the corresponding entries, so you might expect to multiply matrices by multiplying the corresponding entries. But it turns out that this is not a useful operation. To understand the correct definition of matrix multiplication, we introduce the idea by considering again the matrix R in Subsection 1.1 representing the rugby competition. Let P be the column matrix giving the number of points for a try, penalty and conversion respectively. TPC A 2 3 1 5 T B 3 0 2 ,P = 3 P .   C 4 1 3 2 C D 0 3 0 Now the number of points team A scores = 2 × 5 + 3 × 3 + 1 × 2. This is how the matrix product of a row matrix by a column matrix is defined – as the sum of the products of corresponding pairs. 5       AP = 2 3 1 3 = 2 × 5 + 3 × 3 + 1 × 2 = 21 . 2 To multiply R by P we multiply each row of R in turn by the column P to form a new column matrix.

The previous example suggests the following definition.

Definition 2.14 (Matrix multiplication). Let A = [aij] be a m × n matrix, and let B = [bij] be a n × p matrix. The product C = AB is the m × p matrix with (i, j) entry cij given by Pn cij = ai1b1j + ai2b2j + ··· + ainbnj = k=1 aikbkj.

32 Thus the entries cij of the matrix C are found by multiplying together the corresponding entries from row i of A and column j of B and then adding up the resulting products. Another way of thinking about this, is that cij may be regarded as the dot product of row i of A with column j of B. col j col j ↓ ↓ . . . b ... ......  1j ......  . . . b ... row i → a a . . . a  2j  = . . . c ... ← row i  i1 i2 in......   ij  ......   ...... bnj ...

It is important to emphasise that the product AB of matrices A and B can only be formed if the number of columns in A equals the number of rows in B.

A convenient way to check whether the product C = AB is defined, and if so to find its size, is to write down the size of A first and then the size of B AB = C m × n n × p m × p If the two bolded numbers agree, then the product AB is well-defined. The two outside numbers give the size of C.

Example 2.15. If A is a 2 × 3 matrix, B a 3 × 1 and C a 3 × 2, find the sizes of AB, BA, AC and CA (where defined).

Solution: AB is a 2 × 1 matrix but BA is not defined. AC is a 2 × 2 matrix and CA is a 3 × 3 matrix. Notice this means AC 6= CA.

1 0  2 0 −1 Example 2.16. If A = and B = 2 1 , find AB and BA. 1 2 3   3 −1

Solution: The matrix C = AB is a 2 × 2 matrix. The entries in C are 1 0  2 0 −1 −1 1  BA = 2 1 = = C. 1 2 3   14 −1 3 −1 The (1, 1) and (1, 2) entries of C are found by multiplying the first row of A by the first and second columns of B respectively. That is

c11 = 2 × 1 + 0 × 2 + (−1) × 3 = −1 and c12 = 2 × 0 + 0 × 1 + (−1) × (−1) = 1. Similarly, the (2, 1) and (2, 2) entries of C are found by multiplying the second row of A by the first and second columns of B respectively. That is

c21 = 1 × 1 + 2 × 2 + 3 × 3 = 14 and c22 = 1 × 0 + 2 × 1 + 3 × (−1) = −1. On the other hand BA is a 3 × 3 matrix found by a similar calculation 1 0  2 0 −1 2 0 −1 BA = 2 1 = 5 2 1 .   1 2 3   3 −1 5 −2 −6 Again notice AB 6= BA.

2 3 −3 9  Example 2.17. If A = and B = , find AB and BA. 4 6 2 −6

33 Solution: A computation gives 2 3 −3 9  0 0 AB = = = 0 4 6 2 −6 0 0 so AB = 0 (the zero matrix) although A 6= 0 and B 6= 0, whereas −3 9  2 3  30 45  BA = = 6= 0. 2 −6 4 6 −20 −30

Warning: The previous examples show that matrix multiplication differs from multiplication of num- bers in certain ways.

(1) For real numbers a and b, we always have ab = ba (called the commutative law for multiplication). However for matrices A and B, AB need not be equal to BA (and we say matrix multiplication is not commutative). Equality can fail to hold for 3 reasons: (a) AB may be defined whereas BA is not (Example 2.15). (b) AB and BA may both be defined but of different sizes (Example 2.16). (c) AB and BA may both be defined and of the same size but AB 6= BA (Example 2.17).

(2) For real numbers a and b, if ab = 0 then either a = 0 or b = 0. However the matrix equation AB = 0 (the zero matrix of the appropriate size) does not necessarily imply A = 0 or B = 0 (Example 2.17).

Despite the fact that in general we do not have AB 6= BA, a number of other familiar properties still hold. Lemma 2.18 (Properties of matrix multiplication). For all matrices A, B, C for which the products are defined and for all scalars λ ∈ R we have (2.1) A(BC) = (AB)C, [Associative Law] (2.2) A(B + C) = AB + AC, [Distributive Laws] (2.3) (A + B)C = AC + BC, (2.4) (λA)B = λ(AB) = A(λB). [Scalar Law] Proof. (Sketch.) To give some idea why these properties hold we establish (2.2). The associative law (2.1) is a bit more complicated, so is omitted (but still follows from a similar computation). We assume A = [aij] and B = [bij] are both m × n matrices and C = [cij] is an n × p matrix. We also denote the (i, j) element in any matrix X by [X]ij. We check that the (i, j) element in A(B + C) is the same as the (i, j) element in AB + AC. n X [A(B + C)]ij = [A]ik[B + C]kj k=1 n X = aik(bkj + ckj) k=1 n X = (aikbkj + aikckj) [using distributive law for numbers] k=1 n n X X = aikbkj + aikckj [rule for sums] k=1 k=1

= [AB]ij + [AC]ij. 

Remark 2.19. Although both matrix addition and multiplication were defined for pairs of matrices, the associative laws Lemma 2.12 and Lemma 2.18 enable us to denote sums and products of three matrices as A + B + C and ABC without inserting any brackets. For no matter how brackets are inserted, the associative laws guarantee the same end result will be obtained.

34 Remark 2.20. Because of the warning above, some of the familiar rules of factorization for numbers are no longer valid for matrices. For example, if a and b are numbers then (a + b)2 = a2 + 2ab + b2. However if A and B are square matrices of the same size, then (A + B)2 = (A + B)(A + B) = A(A + B) + B(A + B) [using the distributive laws (2.2) and (2.3)] = A2 + AB + BA + B2 6= A2 + 2AB + B2. [because typically AB 6= BA]

1.5. . The transpose of an m×n matrix A, denoted AT , is the n×m matrix, obtained from A by interchanging its rows and columns. That is, the first column of AT is the first row of A, the second column of AT is the second row of A etc. Thus the (i, j) element in AT is the (j, i) element of A.  2 3 −1 Example 2.21. Find the transpose of A = 1 2 3 and B = . −1 2 0

Solution: By definition, we have     1  T 2 −1 T 2 3 −1 AT = 1 2 3 = 2 and BT = = 3 2 .   −1 2 0   3 −1 0

The transpose operation satisfies the following properties. Lemma 2.22 (Properties of the transpose). For all matrices A and B (for which the operations are defined) and for all scalars λ ∈ R: (2.5) (AT )T = A, (2.6) (λA)T = λAT , (2.7) (A + B)T = AT + BT , (2.8) (AB)T = BT AT . (Note how the order is reversed!) Proof. The properties (2.5), (2.6), and (2.7) are pretty obvious. For example (2.7) says that adding then interchanging rows and columns yields the same result as first interchanging rows and columns and then adding. We show (2.8): Suppose A is an m × n matrix and B is an n × p matrix. We check that the (i, j) element in (AB)T is the same as the (i, j) element in BT AT . n T X [(AB) ]ij = [AB]ji = ajkbki k=1 n n X X T T = bkiajk = [B ]ik[A ]kj k=1 k=1 T T = [B A ]ij.



2. Systems of linear equations 2.1. Linear equations. We now put matrices to work solving equations of a special type.

35 Definition 2.23 (Linear equations). A linear equation in the n variables (unknowns) x1, x2, . . . , xn is one which can be expressed in the form

a1x1 + a2x2 + ··· + anxn = b, where a1, a2, . . . , an and b are constants.

When n = 2, the equation a1x1 + a2x2 = b is that of a line. Hence the word linear. Of course when n = 3, a1x1 + a2x2 + a3x3 = b represents a plane (as we saw in Chapter 1).

If we only have a few variables, we will frequently avoid subscripts and call the variables x, y, z, . . . instead of x1, x2, x3,... etc. For example, the following are linear equations:

x + 3y = 7, 2x1 − 3x2 + 4x3 = 6, x1 − 2x2 − 3x3 + x4 = 7, x + 3y + 2z + w = 1. A finite number of linear equations which we solve simultaneously is called a system of linear equations. A sequence of numbers s1, s2, . . . , sn is called a solution of the system if x1 = s1, x2 = s2, . . . , xn = sn is a solution of every equation in the system. For example, x1 = 1, x2 = 2, x3 = −1 is a solution of the system

4x1 − x2 + 3x3 = −1

3x1 + x2 + 9x3 = −4 since these values satisfy both equations, whereas x1 = 1, x2 = 8, x3 = 1 is not a solution since these values satisfy the first equation but not the second. The set of all possible solutions s1, s2, . . . sn to the system is called the solution set.

Because of the way matrix multiplication was defined in Subsection 1.4 we can write any system of equations as a single matrix equation. Example 2.24. The system of 3 equations in 3 unknowns x + 2y + 3z = 4 3x + y + 2z = 0 2x + 3y + z = 11 can be written as the matrix equation of form AX = B, where A is the 3 × 3 matrix of coefficients, X the 3 × 1 column vector of unknowns and B is the 3 × 1 column vector of constants, thus 1 2 3 x  4  A = 3 1 2 X = y ,B =  0  . 2 3 1 z 11

This example can be generalised in an obvious way: Lemma 2.25 (Linear systems). The system of m linear equations in n unknowns

a11x1 + a12x2 + ··· + a1nxn = b1

a21x1 + a22x2 + ··· + a2nxn = b2 . . . .

am1x1 + am2x2 + ··· + amnxn = bm can be written as the matrix equation of form AX = B, where A = the m × n matrix of coefficients, X = the n × 1 column vector of unknowns, B = the m × 1 column vector of constants.

36 Proof. This is simply an application of the definition of matrix multiplication. 

Thus the problem of solving a system of linear equations reduces to solving the single matrix equation AX = B, where A and B are known and X is to be found.

2.2. Solving systems of equations. To solve a system of equations using the standard method of eliminating variables, we allow ourselves 3 natural operations: Type I. Multiply any equation by a nonzero constant. Type II. Add a multiple of one equation to another. Type III. Interchange two equations. Example 2.26. To solve the pair of simultaneous equations S x + 2y = 13 (1) 2x + 7y = 41 (2) we could eliminate x from (2), by subtracting twice Equation (1) from (2) (a type II operation) to obtain x + 2y = 13 (1) 3y = 15 (3) = (2) − 2 × (1) Now we can solve for (y) by multiplying Equation (3) by 1/3 (a type I operation) and solve for x by eliminating y from Equation (1) (by subtracting two thirds of Equation (3) – a type II operation). 2 x = 3 (4) = (1) − (3) 3 1 y = 5 (5) = (3). 3 This final pair of equations S0 represents the solution set. Here we didn’t use a type III operation, but it is useful in larger systems.

The idea used above was to replace the first system of equations S by another system S0, which is easier (immediate even) to solve. Note however the solution to each system is the same. Geometrically we have:

y y

S S0

(3, 5) (3, 5) y = 5

2x + 7y = 41

x = 3 x + 2y = 13 x x

Definition 2.27 (Equivalent systems). Suppose S is a system of m equations in n unknowns. A system S0 of equations is equivalent to the system S, if S0 can be obtained from S by a finite number of opera- tions of type I, II or III (above).

Just as in our example above, the fundamental result about equivalent systems is the following.

Theorem 2.28 (Solution sets for equivalent systems). If S and S0 are equivalent systems of equations then S and S0 have the same solution set. (Any solution of system S is a solution of S0 and vice versa.)

37 Proof. (Sketch.) It is immediate that operations I and III don’t affect the solution set. A little thought shows that operation II doesn’t either. We indicate why below (but you may wish to skip this explanation on a first read through). If x1 = s1, x2 = s2, . . . , xn = sn is a solution to

a1x1 + a2x2 + ··· + anxn = b (1) 0 0 0 0 and a1x1 + a2x2 + ··· + anxn = b , (2) then it satisfies 0 0 0 0 (a1 + λa1)x1 + (a2 + λa2)x2 + ··· + (an + λan)xn = b + λb , (3) = (1) + λ(2) where λ is any scalar. Further the reverse is also true. Any solution of (1) and (3) = (1) + λ(2) will 0 satisfy (2). Therefore S and S have the same solution set. 

To solve a system of equations, rather than messing with the equations themselves, we can introduce matrices as in Lemma 2.25. This makes the working quicker and neater, particularly when we have more than two unknowns.

Example 2.29. If we write the system of equations of Example 2.26 form, then the first pair of equations (1) and (2) can be represented by the matrix equation 1 2 x 13 AX = = 2 7 y 41 the second pair (1) and (3) by the matrix equation 1 2 x 13 = 0 3 y 15 and the third pair (4) and (5) by 1 0 x 3 CX = = = D 0 1 y 5 Notice that at each step it is only the coefficient matrix and the right side matrix which change. For this reason we just work with these two matrices joined together. That is, we start with the matrix 1 2 13 [A|B] = 2 7 41 and do certain row operations on [A|B] matrix until we finish up with the matrix 1 0 3 [C|D] = . 0 1 5

Definition 2.30 (Augmented matrix). The augmented matrix of the system of m equations in n un- knowns (expressed in matrix form) AX = B is the m × (n + 1) matrix [A|B] formed by adjoining the column B to the coefficient matrix A.

Thus the set of equations

a11x1 + a12x2 + ··· + a1nxn = b1 a21x1 + a22x2 + ··· + a2nxn = b2 ... am1x1 + am2x2 + ··· + amnxn = bm has augmented matrix   a11 a12 . . . a1n b1  a21 a22 . . . a2n b2    .  ......  am1 am2 . . . amn bm

38 The operations of type I, II, III performed on the system of equations then translate to the following elementary row operations on the augmented matrix:

Type I. Multiply a row by a non-zero constant. Type II. Add a multiple of one row to another. Type III. Interchange two rows.

Notation (1) We say a matrix B is equivalent to a matrix A, and write A ∼ B, if B is obtained from A by a finite number of row operations. (Then of course B ∼ A also, since we can easily reverse each of the operations to get from B back to A.) (2) In performing row operations it is important to record how we get from one matrix to the next. At each step we do this by specifying at the end of each row in the new matrix, how it was obtained from the rows of the previous matrix. We use the greek letter ρ (rho) and denote the first row of the previous matrix as ρ1, the second row by ρ2, etc. and describe in terms of these how each row in the new matrix is formed. (Usually we will only label those rows that change from the previous matrix.)

Example 2.31. Let us apply row operations to the augmented matrix of the system S of Example 2.26, to reduce it to the augmented matrix of the equivalent system S0, and keep track of the operations performed. 1 2 13 The augmented matrix of S is . We see 2 7 41

1 2 13 1 2 13 ρ ∼ 1 2 7 41 0 3 15 −2ρ1 + ρ2   1 2 13 ρ1 ∼ 1 0 1 5 3 ρ2 1 0 3 −2ρ + ρ ∼ 2 1 , 0 1 5 ρ2

1 0 3 is transformed into , the augmented matrix of S0. 0 1 5

As in Example 2.31 above, we always aim to finish up with a simple matrix of the following type: Definition 2.32 (Row echelon form). A matrix is in reduced row echelon form (or is a reduced row echelon matrix) if it satisfies the following properties: (i) all the zero rows (if any) are at the bottom of the matrix, (ii) the first nonzero entry in every nonzero row is a 1 (called the leading entry of its row), (iii) the leading entries move progressively to the right going down the matrix, (iv) the leading entry in a row is the only nonzero entry in that column. A matrix which satisfies the properties (i), (ii) and (iii) is said to be a row echelon matrix (the word “echelon” means “step like”).

For example, the matrices

1 0 0 2 1 2 0 0 3 0 1 3 0 A = 0 1 0 3 ,B = 0 0 1 0 3 ,C = 0 0 0 1 0 0 1 5 0 0 0 1 0 0 0 0 0

are each in reduced row echelon form. However 1 3 1 0 1 0 3 1 0 1 0 0 1 1 0 0 1 3 A = 0 0 0 0 ,B = 0 2 1 0 ,C = 1 0 3 0 ,D =         0 0 0 0 0 1 5 0 0 0 1 0 0 0 1   0 0 0

39 are not reduced row echelon because they fail to satisfy properties (i), (ii), (iii), (iv) respectively. The matrices 1 −1 3 2 1 2 −1 −2 3 E = 0 1 −5 3 ,F = 0 0 1 −6 2 , 0 0 1 5 0 0 0 1 0 are in row echelon (but not reduced row echelon) form.

2.3. Gauss reduction. Our aim now in solving a system of equations is to reduce the augmented matrix of the original system to a (reduced) row echelon matrix, from which it is easy to read off the solution. We can do this because: Theorem 2.33. Every matrix is row equivalent to a unique matrix in reduced row echelon form. The (reduced) row echelon matrix is produced by applying a sequence of row operations to the original matrix. The step by step algorithm for doing this is called Gauss reduction (or Gauss elimination) and is illustrated on the matrix below. Although we shall not prove it, it is important to note that no matter how you perform the reduction procedure, you always finish up with the same matrix in reduced row echelon form. That is, the reduced row echelon form of a matrix is unique. (Note that if you just aim for row echelon form, the final matrix is no longer unique. That is, starting from the same matrix different people may get different row echelon forms.) Example 2.34. Perform Gauss reduction on the matrix 0 2 4 6 A = 3 3 3 3 . 2 5 7 9

Step 1: Locate the left most column that does not consist entirely of zeros. If necessary, interchange the top row with another row to bring a nonzero entry, call it a, to the top of this column. When required this involves an elementary row operation of type III. In our example, the first column has nonzero entries and we must interchange rows 1 and 2 to bring a nonzero entry to the top of column 1.   3 3 3 3 ρ2 A ∼ 0 2 4 6 ρ1 2 5 7 9 ρ3 Step 2. If a 6= 1, multiply the first row by 1/a in order to make the first nonzero entry in row one equal to 1 (and so introduce a leading 1 as it is called). This involves an elementary row operation of type I. In our example a = 3, so we multiply row 1 by 1/3.   1 1 1 1 (1/3)ρ1 ∼ 0 2 4 6 ρ2 2 5 7 9 ρ3 Step 3. Add suitable multiples of the first row to the rows below, so that all entries in the column below the leading 1 become zeros. These are elementary row operations of type II. They leave the leading 1 as the only nonzero entry in its column. In our example we add −2 times row 1 to row 3   1 1 1 1 ρ1 ∼ 0 2 4 6 ρ2 0 3 5 7 −2ρ1 + ρ3 Step 4. Now ignore row 1 (cover it up) and with the (smaller) matrix that is left, begin again with steps 1 and 2 above. Then carry out step 3 on the entire matrix (with row 1 back in the picture) using multiples of row 2 so that the second leading 1 is the only nonzero entry in its column. In our example, we cover up row 1 and consider column 2 in step 1. Now there is no need for a row swop since the second row already starts with a nonzero entry a = 2. We then carry out step 2 by multiplying row 2 by 1/2.   1 1 1 1 ρ1 ∼ 0 1 2 3 (1/2)ρ2 0 3 5 7 ρ3

40 Now carry out step 3 on the entire matrix. We add −1 times row 2 to row 1 and −3 times row 2 to row 3, to clear the entries above and below the leading 1 in row 2:   1 0 −1 −2 −ρ2 + ρ1 ∼ 0 1 2 3  ρ2 0 0 −1 −2 −3ρ2 + ρ3 Step 5. Repeat the previous step, covering up each row in turn until there are either no rows of the matrix left, or all the remaining rows consist entirely of zeros. Then the matrix we have constructed is a reduced row echelon matrix. In our example we cover up the first two rows of the matrix and then multiply row 3 by -1.   1 0 −1 −2 ρ1 ∼ 0 1 2 3  ρ2 0 0 1 2 −ρ3 Finally clear the entries above the leading 1 in row 3 by carrying out the row operations indicated below   1 0 0 0 ρ3 + ρ1 ∼ 0 1 0 −1 −2ρ3 + ρ2 0 0 1 2 −ρ3 which is the unique reduced row echelon matrix derived from A.

Summary: Procedure for solving a system of linear equations. 1. Obtain the augmented matrix of the system. 2. Use Gauss reduction to obtain the equivalent reduced row echelon matrix. 3. Read off the solution to the reduced row echelon matrix. 4. Check by substitution that the solution works.

Example 2.35. Solve the system of equations 2y + 4z = 6 3x + 3y + 3z = 3 2x + 5y + 7z = 9 Solution: 0 2 4 6 1. The augmented matrix of the system is A = 3 3 3 3. 2 5 7 9 1 0 0 0  2. From Example 2.34, A ∼ 0 1 0 −1, a reduced row echelon matrix. 0 0 1 2 3. The system of equations represented by the reduced row echelon matrix above says x = 0, y = −1, z = 2. 4. We can easily check this solution works in each equation. Note: In the Gauss reduction process, we may choose at step 4 (and subsequent steps) to clear only the entries in the column that are below the leading 1. In this case we finish up with a row echelon matrix. The system of equations that such a matrix represents is then in triangular form and may be solved by the method of back substitution described below. Example 2.36. Solve the system of equations in Example 2.35, by reducing the augmented matrix of the system to row echelon form, and then using back substitution.

Solution. It is easily checked that if we just clear the entries below the leading 1s the augmented matrix. The equations represented by this matrix are x + y + z = 1 (1) y + 2z = 3 (2) z = 2 (3)

41 We now use back substitution: From (3) z = 2, so from (2) y + 4 = 3, so y = −1. Thus from (1) x + (−1) + 2 = 1, so x = 0. We thus obtain the same solution as before. Example 2.37. A person is ordered by his doctor to take 33 units of vitamin A, 50 units of vitamin B and 63 units of vitamin C each day. The person can choose from three brands of vitamin pills. Brand X contains 3 units of A, 2 units of B and 4 units of C, Brand Y contains 3 units of A, 4 units of B and 5 units of C, Brand Z contains 3 units of A, 6 units of B and 7 units of C. If the person meets his vitamin requirements exactly by taking x of brand X, y of brand Y and z of brand Z each day, write down equations which x, y and z satisfy, and solve these using Gauss reduction to bring the augmented matrix to reduced row echelon form.

Solution: We easily obtain that x, y and z must satisfy the equations

3x + 3y + 3z = 33 (by considering the vitamin A requirement) 2x + 4y + 6z = 50 (by considering the vitamin B requirement) 4x + 5y + 7z = 63 (by considering the vitamin C requirement)

3 3 3 33 1. Now the augmented matrix of this system is A = 2 4 6 50 . 4 5 7 63 2. If we apply Gauss reduction to the augmented matrix we obtain

  1   1 1 1 11 3 ρ1 1 1 1 11 ρ1 A ∼ 2 4 6 50 ρ2 ∼ 0 2 4 28 −2ρ1 + ρ2 4 5 7 63 ρ3 0 1 3 19 −4ρ1 + ρ3     1 1 1 11 ρ1 1 0 −1 −3 −ρ2 + ρ1 1 ∼ 0 1 2 14 2 ρ2 ∼ 0 1 2 14  ρ2 0 1 3 19 ρ3 0 0 1 5 −ρ2 + ρ3   1 0 0 2 ρ3 + ρ1 ∼ 0 1 0 4 −2ρ3 + ρ2 0 0 1 5 ρ3

3. We can now read off the solution to the simplified system of equations, represented by the reduced row echelon matrix in 2, to get x = 2, y = 4, z = 5. 4. Check this solution works for each equation.

2.4. Inconsistent systems and multiple solutions. Recall that given a pair of simultaneous equations in two variables the corresponding lines may (i) meet at a point (ii) be coincident or (iii) be parallel.

6 6 6 ¡ ¡ Q Q Q Q ¡ Q Q Q ¡ Q Q Q Q Q Q Q Q¡ Q Q Q ¡ Q Q Q Q Q Q Q Q ¡ Q Q Q Q ¡ Q- Q- Q Q- Q Q Q Q ¡ Q Q That is, the pair of equations has either exactly one solution, infinitely many solutions or no solution. The same applies to any system of linear equations. We have already looked at the situation where there is a unique solution. We now consider the remaining two cases.

42 Example 2.38. Find the solutions of the system of equations x + y + z = 4 (1) x + 2y + 3z = 10 (2) Solution. Let us take the augmented matrix of this system and bring it to reduced row echelon form using Gauss reduction: 1 1 1 4  1 1 1 4 1 0 −1 −2 −ρ + ρ ∼ ∼ 2 1 1 2 3 10 0 1 2 6 −ρ1 + ρ2 0 1 2 6 The equations represented by this simplified matrix are x − z = −2 . y + 2z = 6 So to solve this system we may choose z arbitrarily and solve for x and y in terms of z. To emphasize that z is arbitrary, we usually say let z = λ, for any λ ∈ Z. Then x = −2 + λ and y = 6 − 2λ. That is (x, y, z) = (−2 + λ, 6 − 2λ, λ)(λ ∈ R) = (−2, 6, 0) + λ(1, −2, 1) (λ ∈ R). We recognize this as the equation of a line in R3. plane(1) This is not surprising because ge- @@@@@@@@@@@@@@@@@ ometrically (1) and (2) represent @@@@@@@@@@@@@@@ @@@@@@@@@@@@@@ line of planes and the solution found @ @ intersection above is just the line of intersec- @

tion of the two planes. @ plane(2)

Notation: In general the variables that correspond to the leading 1s are called the leading variables (x and y in Example 2.38). Any remaining variables (such as z in Example 2.38) are called free variables, because they may be chosen in an arbitrary way. This means that any system with one (or more) free variables has infinitely many solutions. When the solution set is expressed in terms of one or more free variables, it is usually called the general solution of the system.

Example 2.39. Find the general solution of the system of equations in x, y, z and w with augmented matrix 1 2 1 −1 3 A = 0 0 1 −1 2 . 2 4 2 −2 6 Solution: Just as in the previous example, we first find the reduced row echelon form of this matrix. 1 2 0 0 1 −ρ + ρ 2 1 x + 2y = 1 A ∼ 0 0 1 −1 2 which represents .   z − w = 2 0 0 0 0 0 −2ρ2 + ρ1 Here x and z are the leading variables, so we may solve for these in terms of the free variables y and w. So let y = λ, w = µ (where λ, µ are arbitrary real numbers). Then x = 1 − 2λ and z = 2 + µ, and so for every λ, µ ∈ R we can write the solution as (x, y, z, w) = (1 − 2λ, λ, 2 + µ, µ).

Example 2.40. Suppose the vitamin requirements and number of units each brand contained in Example 2.37 were different, so that x, y and z satisfy a system of equations with augmented matrix: 1 1 2 10 A = 0 3 3 9  . 1 4 5 19

43 Find all possible solutions for x, y and z.

Solution. We have     1 1 1 2 10 1 0 1 7 − 3 ρ2 + ρ1 1 A ∼ 0 3 3 9  ∼ 0 1 1 3 3 ρ2 0 3 3 9 −ρ1 + ρ3 0 0 0 0 −ρ2 + ρ3 (Notice one equation completely disappears here, because the original 3 equations were not independent. That is, the third equation was the sum of the first two, so conveyed no extra information.)

Here we may solve for the leading variables x and y in terms of the free variable z. So let z = λ (with λ ∈ R), then x = 7 − λ and y = 3 − λ. Thus for every λ ∈ R, we have the solution (x, y, z) = (7 − λ, 3 − λ, λ). Now since for this application x, y and z must be non-negative integers (they represent the number of pills to be taken), we need z = λ ≥ 0, y = 3 − λ ≥ 0. Thus λ ≤ 3 (and x = 7 − λ ≥ 0 so λ ≤ 7). These conditions must all be satisfied simultaneously, so 0 ≤ λ ≤ 3, i.e. λ = 0, 1, 2 or 3. This gives all possible solutions as (x, y, z) = (7, 3, 0), when λ = 0 = (6, 2, 1), when λ = 1 = (5, 1, 2), when λ = 2 = (4, 0, 3), when λ = 3.

plane(1) @@@@@@@@@@@@@@@@@

Note: Geometrically the 3 equa- @@@@@@@@@@@@@@@ ¨ @@@@@@@@@@@@@@@ @¨ line of tions in Example 2.40 represent ¨¨ ¨¨ plane(3)¨ ¨ @ intersection 3 planes that meet in a line. The ¨ ¨ @ general solution in fact is just the equation of this line. plane(2)

So far any system we have considered has had a solution – either one solution or infinitely many solutions. We say such a system is consistent. We next examine a system which has no solution and is thus said to be inconsistent.

Example 2.41. Show that the system of equations in x, y and z with augmented matrix 1 2 1 1 A = 2 5 2 2 3 5 3 4 given below is inconsistent.

Solution. We have     1 2 1 1 1 0 1 1 −2ρ2 + ρ1 A ∼ 0 1 0 0 −2ρ1 + ρ2 ∼ 0 1 0 0 . 0 −1 0 1 −3ρ2 + ρ3 0 0 0 1 ρ2 + ρ3 Now the last equation represented by the matrix above is 0x + 0y + 0z = 1, which clearly has no solution for any values of x, y or z. Thus, the system has no solution either, and we therefore say the system is inconsistent. Example 2.42. Describe geometrically the 3 planes represented by the 3 equations in Example 2.41.

44 Solution: plane(3)

plane(1) @@@@@@@@@@@@@@@@ In this example although each @ @ @ pair of planes meet in a line there @@ no line of is no point in common to all @intersection three planes, so there is no solu- @ tion to the system of equations. plane(2)

Of course there are other possible geometrical configurations, such as the planes being parallel, which would also result in the system having no solution.

45