Algebra, geometry, combinatorics

Dr Mark V Lawson

October 24, 2014 ii Contents

1 The nature of mathematics 1 1.1 What are algebra, geometry and combinatorics? ...... 1 1.1.1 Algebra ...... 1 1.1.2 Geometry ...... 5 1.1.3 Combinatorics ...... 7 1.2 The scope of mathematics ...... 8 1.3 Pure versus applied mathematics ...... 9 1.4 The antiquity of mathematics ...... 11 1.5 The modernity of mathematics ...... 12 1.6 The legacy of the Greeks ...... 14 1.7 The legacy of the Romans ...... 15 1.8 What they didn’t tell you in school ...... 15 1.9 Further reading and links ...... 16

2 Proofs 19 2.1 How do we know what we think is true is true? ...... 20 2.2 Three fundamental assumptions of logic ...... 22 2.3 Examples of proofs ...... 23 2.3.1 Proof 1 ...... 23 2.3.2 Proof 2 ...... 26 2.3.3 Proof 3 ...... 28 2.3.4 Proof 4 ...... 29 2.3.5 Proof 5 ...... 31 2.4 Axioms ...... 37 2.5 Mathematics and the real world ...... 41 2.6 Proving something false ...... 41 2.7 Key points ...... 42 2.8 Mathematical creativity ...... 43

i ii CONTENTS

2.9 Set theory: the language of mathematics ...... 43 2.10 Proof by induction ...... 52

3 High-school algebra revisited 57 3.1 The rules of the game ...... 57 3.1.1 The axioms ...... 57 3.1.2 Indices ...... 63 3.1.3 Sigma notation ...... 66 3.1.4 Infinite sums ...... 68 3.2 Solving quadratic equations ...... 70 3.3 Order ...... 76 3.4 The real ...... 77

4 theory 81 4.1 The remainder theorem ...... 81 4.2 Greatest common divisors ...... 91 4.3 The fundamental theorem of arithmetic ...... 97 4.4 Modular arithmetic ...... 108 4.4.1 Congruences ...... 109 4.4.2 Wilson’s theorem ...... 112 4.5 Continued fractions ...... 113 4.5.1 Fractions of fractions ...... 113 4.5.2 Rabbits and pentagons ...... 116

5 Complex numbers 123 5.1 arithmetic ...... 123 5.2 The fundamental theorem of algebra ...... 131 5.2.1 The remainder theorem ...... 132 5.2.2 Roots of polynomials ...... 134 5.2.3 The fundamental theorem of algebra ...... 136 5.3 Complex number geometry ...... 141 5.3.1 sin and cos ...... 141 5.3.2 The complex plane ...... 141 5.3.3 Arbitrary roots of complex numbers ...... 145 5.3.4 Euler’s formula ...... 148 5.4 Making sense of complex numbers ...... 150 5.5 Radical solutions ...... 151 5.5.1 Cubic equations ...... 151 CONTENTS iii

5.5.2 Quartic equations ...... 154 5.5.3 Symmetries and particles ...... 156 5.6 Gaussian and factorizing primes ...... 157

6 Rational functions 159 6.1 Numerical partial fractions ...... 159 6.2 Analogies ...... 162 6.3 Partial fractions ...... 163 6.4 Integrating rational functions ...... 167

7 Matrices I: linear equations 171 7.1 Matrix arithmetic ...... 171 7.1.1 Basic matrix definitions ...... 171 7.1.2 Addition, subtraction, scalar multiplication and the transpose ...... 173 7.1.3 Matrix multiplication ...... 175 7.1.4 Special matrices ...... 179 7.1.5 Linear equations ...... 181 7.1.6 Conics and quadrics ...... 182 7.1.7 Graphs ...... 183 7.2 Matrix algebra ...... 186 7.2.1 Properties of matrix addition ...... 186 7.2.2 Properties of matrix multiplication ...... 187 7.2.3 Properties of scalar multiplication ...... 188 7.2.4 Properties of the transpose ...... 189 7.2.5 Some proofs ...... 189 7.3 Solving systems of linear equations ...... 195 7.3.1 Some theory ...... 196 7.3.2 Gaussian elimination ...... 198 7.4 Blankinship’s algorithm ...... 206

8 Matrices II: inverses 209 8.1 What is an inverse? ...... 209 8.2 Determinants ...... 213 8.3 When is a matrix invertible? ...... 217 8.4 Computing inverses ...... 223 8.5 The Cayley-Hamilton theorem ...... 227 8.6 Complex numbers via matrices ...... 230 iv CONTENTS

9 Vectors 231 9.1 Vector algebra ...... 232 9.1.1 Addition and scalar multiplication of vectors ...... 232 9.1.2 Inner, scalar or dot products ...... 238 9.1.3 Vector or cross products ...... 240 9.1.4 Scalar triple products ...... 243 9.2 Vector arithmetic ...... 245 9.2.1 i’s, j’s and k’s...... 245 9.3 Geometry with vectors ...... 249 9.3.1 Position vectors ...... 249 9.3.2 Linear combinations ...... 250 9.3.3 Lines ...... 251 9.3.4 Planes ...... 255 9.3.5 Determinants ...... 258 9.4 Summary of vectors ...... 263 9.5 *Two vector proofs* ...... 266 9.6 ...... 268 Chapter 1

The nature of mathematics

This chapter is a guide to the mathematics described in this book.

1.1 What are algebra, geometry and combi- natorics?

1.1.1 Algebra Algebra started as the study of equations. The simplest kinds of equations are ones like 3x − 1 = 0 where there is only one unknown x and that unknown occurs to the power 1. This means we have x alone and not, say, x1000. It is easy to solve this specific equation. Add 1 to both sides to get

3x = 1 and then divide both sides by 3 to get 1 x = . 3 This is the solution to my original equation and, to make sure, we check our answer by calculating 1 3 · − 1 3

1 2 CHAPTER 1. THE NATURE OF MATHEMATICS and observing that we really do get 0 as required. Even this simple example raises an important point: to carry out these calculations, I had to know what rules the numbers and symbols obeyed. You probably applied these rules unconsciously, but in this book it will be important to know explicitly what they are. The method used for the specific example above can be applied to any equation of the form

ax + b = 0 as long as a 6= 0. Here a, b are specific numbers, probably real numbers, and x is the I am trying to find. This equation is the most general example of a linear equation in one unknown. If x occurs to the power 2 then we get

ax2 + bx + c = 0 where a 6= 0. This is an example of a quadratic equation in one unknown. You will have learnt a formula to solve such equations. But there is no reason to stop at 2. If x occurs to the power 3 we get a cubic equation in one unknown

ax3 + bx2 + cx + d = 0 where a 6= 0. Solving such equations is much harder than solving quadratics but there is also an algebraic formula for the roots. But there is no reason to stop at cubics. We could look at equations in which x occurs to the power 4, quartics, and once again there is a formula for finding the roots. The highest power of x that occurs in such an equation is called its degree. These results might lead you to expect that there are always algebraic formulae for finding the roots of any polynomial equation whatever its degree. There aren’t. For equations of degree 5, the quintics, and more, there are no algebraic formulae which enable you to solve the equations. I don’t mean that no formulae have yet been discovered, I mean that someone has proved that such a formula is impossible, that someone being the young French mathematician Evariste Galois (1811–1832), the James Dean of mathematics. Galois’s work meant the end of the view that algebra was about finding formulae to solve equations. We shall not study Galois’s work in this book but it has had a huge impact on algebra. It is one of the reasons why the algebra you study later in your university careers will look very different from the algebra you studied at school. In fact, one of my goals in writing this book is to help you navigate this transition. 1.1. WHAT ARE ALGEBRA, GEOMETRY AND COMBINATORICS? 3

I have talked about solving equations where there is one unknown but there is no reason to stop there. We can also study equations where there are any finite number of unknowns and those unknowns occur to any powers. The best place to start is where we have any number of unknowns but each unknown can occur only to the first power and no products of unknowns are allowed. This means we are studying linear equations like x + 2y + 3z = 4. Our goal is to find all the values of x, y and z that satisfy this equation. Thus the solutions are ordered triples (x, y, z). For example, both (0, 2, 0) and (2, 1, 0) are solutions whereas (1, 1, 1) is not a solution. It is unusual to have just one linear equation to solve. Usually we have two or more such as x + 2y + 3z = 4 and x + y + z = 0. We then need to find all the triples (x, y, z) that satisfy both equations simultaneously. In fact, as you should check, all the triples (λ − 4, 4 − 2λ, λ) where λ is any number satisfy both equations. For this reason, we often speak about simultaneous linear equations. It turns out that solving systems of linear equations never becomes difficult however many unknowns there are. The modern way of studying systems of linear equations uses matrix theory. That leaves studying equations where there are at least 2 unknowns and where there are no constraints on the powers of the unknowns and the extent to which they may be multiplied together. This is much more complicated. If you only allow squares such as x2 or products of at most two unknowns, such as xy, then there are relatively simple methods for solving them. But, even here, strange things happen. For example, the solutions to x2 + y2 = 1 can be written (x, y) = (sin θ, cos θ). If you allow cubes or products of more than two unknowns then you enter the world of subjects like algebraic ge- ometry and even connect with current research. In this book, I shall introduce you to the theory of polynomial equations and also to the theory of linear equations. I shall also show you how to solve equations that look like this ax2 + bxy + cy2 + dx + ey + f = 0. 4 CHAPTER 1. THE NATURE OF MATHEMATICS

So far, I have been talking about the algebra of numbers. But I shall also introduce you to the algebra of matrices, and the algebra of vectors, and the algebra of subsets of a set, amongst others. In fact, I think the first shock on encountering university mathematics can be summed up in the following statement.

There is not one algebra, but many different algebras, each de- signed for different purposes.

These different algebras are governed by different sets of rules. For this reason, it becomes crucial in university mathematics to make those rules explicit. In this book, the algebra you studied at school I often call high- school algebra so we know what we are talking about. In my description of solving equations, I have left to one side something that probably seemed obvious: the nature of the solutions. These solutions are of course numbers but what do we mean by ‘numbers’? You might think that a number is a number but in mathematics this concept turns out to be much more interesting than it might first appear. The everyday idea of a number is essentially that of a real number. Informally, these are the numbers that can be expressed as positive or negative decimals, with possibly an infinite number of digits after the decimal place such as

π = 3 · 14159265358 ... where the dots indicate that this can be continued forever. Whilst such numbers are sufficient to solve linear equations in one unknown, they are not enough to solve quadratics, cubics, quartics etc. These require the in- troduction of complex numbers which involve such apparent ineffabilities as the square root of minus one. Because such numbers don’t occur in everyday life, there is a temptation to view them as somehow artificial or of purely theoretical interest. This is wrong with a capital w. All numbers are artifi- cial, in that they are artefacts of our imaginations that help us to understand the world. Although you can see examples of two things you cannot see the number two. It is an idea, an abstraction. As for being of only theoretical interest, it is worth noting that quantum mechanics, the theory that explains the behaviour of atoms and their constituents, uses complex numbers in an essential way. In fact, for mathematicians the word ‘number’ usually means ‘complex number’ and mathematics is unthinkable without them. 1.1. WHAT ARE ALGEBRA, GEOMETRY AND COMBINATORICS? 5

But this is not the end of our excavations of what we mean by the word ‘number’. There are occasions when we want to restrict the solutions: we might want whole number solutions or solutions as fractions. It turns out that the usual high-school methods for solving equations don’t work in these cases. For example, consider the equation

2x + 4y = 3.

To find the real or complex solutions, we let x be any real or complex value and then we can solve the equation to work out the corresponding value of y. But suppose that we are only interested in whole number solutions? In fact, there are none. You can see why by noting that the lefthand side of the equation is exactly divisible by 2, whereas the righthand side isn’t. When we are interested in solving equations, of whatever type, by means of whole numbers or fractions we say that we are studying Diophantine equations. The name comes from Diophantus of Alexandria who flourished around 250 CE, and who studied such equations in his book Arithmetica. It is ironic that solving Diophantine equations is often much harder than solving equations using real or complex numbers.

1.1.2 Geometry If algebra is about manipulating symbols, then geometry is about pictures. The Ancient Greeks developed geometry to a very high level. Some of their achievements are recorded in Euclid’s book the Elements which I shall have more to say about later. It developed the whole of what became known as Euclidean geometry on the basis of a few rules known as axioms. This geom- etry gives every impression of being a faithful mathematical version of the geometry of actual space and for that reason you might expect that, unlike algebra, there is only one geometry and that’s that. In fact, it was discovered in the nineteenth century that there are other mathematical geometries such as spherical geometry and hyperbolic geometry. In the twentieth century, it became apparent that even the space we inhabit was much more complex than it appeared. First came the four dimensional geometry of special rela- tivity and then the curved space-time of general relativity. Modern particle physics suggests that there may be many more dimensions in real space than we can see. So, in fact, we have the following. 6 CHAPTER 1. THE NATURE OF MATHEMATICS

There is not one geometry, but many different geometries, each designed for different purposes.

In this book, I will only talk about three-dimensional Euclidean geometry, but this is the gateway to all these other geometries. This, however, is not the end of the story. In fact, any book about algebra must also be about geometry. The two are indivisible but it was not always like that. Unlike geometry which began with a sort of Big Bang in Ancient Greece, algebra crystallized much more slowly over time and in different places. There is even some algebra, disguised, in the Elements. In the 17th century, Ren´eDescartes discovered the first connection between algebra and geometry which will be completely familiar to you. For example, x2 + y2 = 1 is an algebraic equation, but it also describes something geometric: a circle of unit radius centred on the origin. This connection between algebra and geometry will play an important role in our study of linear equations and vectors. But it is just a beginning.

If you are studying an algebra look for an accompanying geometry, and if you are studying a geometry find a companion algebra.

This is quite a fancy way of saying things, but it boils down to the fact that manipulating symbols is often helped by drawing pictures, and sometimes the pictures are to complex so it is helpful to replace them with symbols. It’s not a one-way street. I want to give you some idea of why the connection between algebra and geometry is so significant. Let me start with a problem that looks completely algebraic. Problem: find all whole numbers a, b, c that satisfy the equation a2 + b2 = c2. I’ll write solutions that satisfy this equation as (a, b, c). Such numbers are called Pythagorean triples. Thus (0, 0, 0) is a solution and so is (3, 4, 5), and I can put in minus signs since when squared they disappear so (−3, 4, −5) is a solution. In addition, if (a, b, c) is a solution so is (λa, λb, λc) where λ is any whole number. I shall now show that this problem is equivalent to one in geometry. Suppose first that a2 + b2 = c2. We exclude the case where c = 0 since then a = 0 and b = 0. We may therefore divide both sides by c2 and get a2 b2 + = 1. c c 1.1. WHAT ARE ALGEBRA, GEOMETRY AND COMBINATORICS? 7

Recall that a is a real number that can be written in the u form v where u and v are whole numbers and v 6= 0. It follows that a b (x, y) = , c c is a rational point on the unit circle; that is, a point with rational co-ordinates. On the other hand, if m p (x, y) = , n q is a rational point on the unit circle then (mq)2 (np)2 + = 1. (nq)2 (nq)2 Thus (mq, pn, nq) is a Pythagorean triple. We may therefore interpret our algebraic question as a geometric one: to find all Pythagorean triples, find all those points on the unit circle with centre the origin whose x and y co- ordinates are both rational. In fact, this can be used to get a very nice solution to the original algebraic problem as we shall show later.

1.1.3 Combinatorics The term ‘combinatorics’ may not be familiar though the sorts of questions it deals with are. Combinatorics is the branch of mathematics that deals with arrangements and the counting of arrangements. The fact that it deals in counting makes it sound like this should be an easy subject. In fact, it is often very difficult. For example, counting lies behind probability theory, a subject that can often defy intuition. Let me give you a simple example. In a class of, say, 25 students, how likely do you think it is that two students will share the same birthday? By this I mean, the same date and month, though not year. Unless you’ve seen this problem before, I think the instinct is to say ‘not very’. This is because we imagine in our mind’s eye those 25 students to be arranged across 365 days without any pair of students landing on the same date. In fact the answer, which you can calculate using the methods of this book, is just over a half. In other words, there is the same chance of two students sharing the same birthday as there is of tossing a coin and getting heads. This little problem is often known as the birthday paradox. It is a good example of where maths can be used to correct our faulty intuition. But 8 CHAPTER 1. THE NATURE OF MATHEMATICS this is really a counting problem. To get the right answers to such problems, you need to think about what you are counting in the right way.

1.2 The scope of mathematics

The most common replies to the question ‘what is mathematics?’ addressed to a non-mathematician are usually the depressing ‘arithmetic’ or ‘accoun- tancy’. Asked what they remember about school maths and they might be able to dredge up some more-or-less arcane words with challenging spellings: hypotenuse, isosceles, parallelogram. It either sounds a bit boring or a bit weird, but in any event is so obviously completely removed from real life that it can safely be ignored. Mathematics, therefore, has an image problem. I think part of the reason for this is the kind of maths that is taught in schools and the way it is taught. School mathematics suffers by being based on the narrow syllabuses proscribed by examining boards under political direction. As a result, it is more by luck than design if anyone at school gets an idea of what maths is actually about. In addition, teaching too often means teaching to the exam, which means working through past exam papers and learning tricks1. Let me begin by showing you just how vast a subject mathematics really is. The official Mathematics Subject Classification currently divides math- ematics into 64 broad areas in any one of which a mathematician could work their entire professional life. You can see what they are in the box. By the way, the missing numbers are deliberate and not because I cannot count.

Mathematics Subject Classification 2010 (adapted) 00. General 01. History and biography 03. and foundations 05. Combinatorics 06. Order theory 08. Gen- eral algebraic systems 11. 12. theory 13. Commutative rings 14. Algebraic geometry 15. Linear and multi- linear algebra 16. Associative rings 17. Non-associative rings 18. Category theory 19. K-theory 20. Group theory and generaliza-

1I say teaching and not teachers. My criticism is directed at policy not those who are forced to carry out that policy often under enormous pressures. 1.3. PURE VERSUS APPLIED MATHEMATICS 9

tions 22. Topological groups 26. Real functions 28. Measure and integration 30. Complex functions 31. Potential theory 32. Several complex variables 33. Special functions 34. Ordinary dif- ferential equations 35. Partial differential equations 37. Dynamical systems 39. Difference equations 40. Sequences, series, summa- bility 41. Approximations and expansions 42. Harmonic analysis 43. Abstract harmonic analysis 44. Integral transforms 45. Integral equations 46. Functional analysis 47. Operator theory 49. Calcu- lus of variations 51. Geometry 52. Convex geometry and discrete geometry 53. Differential geometry 54. General topology 55. Algebraic topology 57. Manifolds 58. Global analysis 60. Proba- bility theory 62. Statistics 65. Numerical analysis 68. Computer science 70. Mechanics 74. Mechanics of deformable solids 76. Fluid mechanics 78. Optics 80. Classical thermodynamics 81. Quantum theory 82. Statistical mechanics 83. Relativity 85. As- tronomy and astrophysics 86. Geophysics 90. Operations research 91. Game theory 92. Biology 93. Systems theory 94. Information and communication 97. Mathematics education Each of these broad areas is then subdivided into a large number of smaller areas, any one of which could be the subject of a PhD thesis. This is a little overwhelming, so to make it more manageable it can be summarized, very roughly, into the following ten areas: Algebra Number theory Calculus and analysis Probability and statistics Combinatorics Differential equations Geometry and topology Mathematical physics Logic Computing Most undergraduate courses will fit under one of these headings. But it is important to remember that mathematics is one subject — dividing it up into smaller areas is done for convenience only. When solving a problem any and all of the above areas might be needed.

1.3 Pure versus applied mathematics

Sometimes a distinction is drawn between pure and applied mathematics. Pure maths is supposed to be maths done for its own sake with no thought to 10 CHAPTER 1. THE NATURE OF MATHEMATICS applications, whereas applied maths is maths used to solve some, presumably practical, problem. I think there is often an implicit moralistic undertone to this distinction with pure maths being viewed as perhaps rather self-indulgent and decorative, and applied maths as socially responsible grown-up maths that pays its way. Politicians prefer applied maths because they think it will make money. Evidence for this distinction is the following quote from the English mathematician G. H. Hardy (1877–1947) that is often used to prove the point: “I have never done anything ‘useful’. No discovery of mine has made, or is likely to make, directly or indirectly, for good or ill, the least difference to the amenity of the world.” Hardy was a truly great mathematician and a decent human being. As his dates show, he was of the generation that witnessed the First World War where science and technology were applied to the business of wholesale slaughter. His views on maths are therefore a not unnatural reaction on the part of someone who taught young people who then went to war never to return. Maths for him was perhaps a sanctuary2. In reality, the terms pure and applied are extremely fuzzy. A mathematician might start work on solving a real-life problem and then be led to develop new pure mathematics, or start in pure maths and develop an application. Calculus, for example, developed mainly out of the need to solve problems in physics and then was applied to pure maths. Complex numbers couldn’t have been more pure, introduced to provide the missing roots to polynomial equations, but are now the basis of quantum mechanics. In reality, there is just one mathematics.

The Banach-Tarski Paradox The glory of mathematics is often to be found in its sheer weirdness. For a universe founded on logic, it can lead to some pretty confounding conclusions. For example, a solid the size of a pea may be cut into a finite number of pieces which may then be reassembled in such a way as to form another solid the size of the sun. This is known as the Banach- Tarski Paradox (1924). There’s no trickery involved here and no sleight of hand. This is clearly pure maths — give me a real pea and whatever I do it will remain resolutely pea-sized — but the ideas it uses involve

2There was a similar reaction at the end of the Second World War amongst physicists who turned instead to biology as an alternative to building weapons. 1.4. THE ANTIQUITY OF MATHEMATICS 11

such fundamental and seemingly straightforward notions as length, area and volume that have important applications in applied maths.

1.4 The antiquity of mathematics

The history of chemistry or astronomy is not hugely relevant, however inter- esting it may be, to modern theories of chemistry or astronomy. A few hun- dred years ago, chemistry was alchemy and astronomy was astrology: modern chemists are not searching for the philosopher’s stone and astronomers don’t caste horoscopes. Alchemists and astrologers are often the forbears they would prefer to forget.3 Maths is different, since what was mathematically true hundreds of years ago remains true today. Here is a famous example. Plimpton 322 is a small clay tablet kept in the George A. Plimpton Collection at Columbia University dating to about 1,800 BCE. Impressed on the tablet are a number of columns of numbers written in cuneiform. The numbers are written not in base 10 but in base 60, the base that still lies behind the way we tell the time and measure angles. The meaning and purpose of this clay tablet is much disputed. But the second and third columns consist of the following numbers, where I have given the usual corrected numbers. I have given the first seven lines of the table — there are fifteen in the original.

B C 1 119 169 2 3367 4825 3 4601 6649 4 12709 18541 5 65 97 6 319 481 7 2291 3541

If you calculate C2 − B2 you will get a perfect square D2. Thus (B,D,C) is a Pythagorean triple. How such large Pythagorean triples were computed is a mystery. This antiquity, combined with the fact that maths is a cumulative subject, meaning that you have to learn X before you can learn Y , has the unfortunate

3I am exaggerating a little here for rhetorical purposes. In fact, much fine work was carried out under the guise of alchemy and astrology. 12 CHAPTER 1. THE NATURE OF MATHEMATICS effect that most of the mathematics you learnt at school was invented before 1800. Here is a very rough chronology.

BCE CE 2000 Solving quadratics 1550 Solving cubics and quartics 400 Existence of irrational numbers 1590 Logarithms 300 Euclidean geometry 1630 Analytic geometry 200 Conics 1675 Calculus 1700 Probability 1795 Complex numbers

Only matrices (1850) and vectors (1880) were introduced more recently. How- ever, if you think of all the developments in physics since 1800 such as black holes, the big bang theory, parallel universes, quantum then you might sus- pect that there have also been big developments in mathematics. There have, but you would be forgiven for not knowing about them because they are not promoted in the media or taught in school. I should add that like any other field of human endeavour, it is of course true that mathematical ideas go in and out of fashion, but crucially they don’t become wrong with time.

1.5 The modernity of mathematics

The fact that what’s taught in schools doesn’t seem to change much from generation to generation leads to one of the biggest misconceptions about mathematics: that it has already all been discovered. To try and bring you up to date, I am going to say a little about three mathematicians and their work: Alan Turing (1912–1954), Sir Andrew Wiles (b. 1953), and Terence Tao (b. 1975). I have chosen them to illustrate some additional points I want to make about maths.

Alan Turing

Alan Turing is the only mathematician I know who has had a West End play written about his life: the 1986 play Breaking the code by Hugh White- more. Turing is best known as one of the leading members of Bletchley Park during the Second World War, for his role in the British development of computers during and after the War, and for the ultimately tragic nature of 1.5. THE MODERNITY OF MATHEMATICS 13 his early death. Here I want to return to Turing the mathematician. As a graduate student, he wrote a paper in 1936 entitled On computable numbers with an application to the Entscheidungsproblem, where the long German word means decision problem and refers to a specific question in mathemat- ical logic. It was as a result of solving this problem that Turing was led to formulate a precise mathematical blueprint for a computer now called Tur- ing machines in his honour. This is the most extreme example I know of a problem in pure maths leading to new applied maths — in fact, it led to the whole field of computer science and the information age we now inhabit. Amongst computer scientists, Turing is regarded as the father of computer science. So, mathematicians invented the modern world. Andrew Wiles Mathematicians operate on a completely different timescale from everyone else. I have already talked about Pythagorean triples, those whole numbers (x, y, z) that satisfy the equation x2 + y2 = z2. Here’s an idle thought. What happens if we try to find whole number solutions to x3+y3 = z3 or x4+y4 = z4 or more generally xn + yn = zn where n ≥ 3. Let’s exclude the trivial case where some of the numbers x, y or z are 0. So, here is the question: for n ≥ 3 find all whole number solutions to xn + yn = zn where xyz 6= 0. Back in the 17th century, Pierre de Fermat (1601?–1665) wrote in the margin of a book, the Arithmetica of Diophantus, that he had found a proof that there were no such solutions but that sadly there wasn’t enough room for him to record it. This became known as Fermat’s Last Theorem. In fact, since Fermat’s supposed proof was never found, it was really a conjecture. More to the point, it is highly unlikely that he ever had a proof since in the subsequent centuries many attempts were made to prove this result, all in vain, although substantial progress was made. This problem became one of mathematics’ many Mount Everests: the peak that everyone wanted to scale. Finally, on Monday 19th September, 1994, sitting at his desk, Andrew Wiles, building on over three centuries of work, and haunted by his premature announcement of his success the previous year, had a moment of inspiration as the following quote from the Daily Telegraph dated 3rd May 1997 reveals “Suddenly, totally unexpectedly, I had this incredible revelation. It was so indescribably beautiful, it was so simple and so elegant.” As a result Fermat’s Conjecture really is a theorem, but the proof required travelling through what can only be described as mathematical hyperspace. 14 CHAPTER 1. THE NATURE OF MATHEMATICS

Wiles’s reaction to his discovery is also a glimpse of the profound intellectual excitement that engages the emotions as well as the intellect when doing mathematics4.

Terence Tao

Tao won the 2006 Field’s medal. This is a mathematical honour compa- rable with a Nobel Prize though with the added twist that you have to be under 40 to get one. You can read his thoughts at his blog, as well as use it to find all manner of interesting things. So, what sorts of things does he do? Here is one example that is remarkably easy to explain though the proof is formidable. You know what primes are and, in any event, we shall talk about them later. They can be regarded as the atoms of numbers and their prop- erties have inspired hard questions and deep results. One of the things that interests mathematicians is the sorts of patterns that can be found in primes. An arithmetic progression is a sequence of numbers of the form a + dk where a and d are fixed numbers. Consider the arithmetic progression 3 + 2k. Ob- serve that for the consecutive values of k = 0, 1, 2, the numbers 3, 5, 7 which arise are all prime. But when k = 3 we get 9 which is not prime. Our little example is an instance of an arithmetic progression with 3 terms all prime. Here is one with 10 terms 199 + 210k where k = 0, 1,..., 9. In 2004, Tao and his colleague Ben Green proved that there were arithmetic progressions of arbitrary length all of whose terms are prime. In other words, for any number n there is an arithmetic progression so that the first n terms are all prime.

1.6 The legacy of the Greeks

The word ‘mathematics’ is Greek. In fact, many mathematical terms are Greek: lemma, theorem, hypotenuse, orthogonal, polygon, to name just a few. The Greek alphabet is used as a standard part of mathematical nota- tion. The very concept of a mathematical proof is a Greek idea. All of this reflects the fact that Ancient Greece is the single most important historical influence on the development and content of mathematics. By Ancient Greek

4There is a BBC documentary directed by Simon Singh about Andrew Wiles made for the BBC’s Horizon series. It is an exemplary example of how to portray complex mathematics in an accessible way and cannot be too highly recommend. 1.7. THE LEGACY OF THE ROMANS 15 mathematics, I mean the mathematics developed in the wider Greek world around the Mediterranean in the thousand or more years between roughly 600 BCE and and 600 CE. It begins with the work of semi-mythical figures, such as Thales of Miletus and Pythagoras of Samos, and is developed in the books of such mathematicians as Euclid, Archimedes, Apollonius of Perga, Diophantus and Pappus. Of all the Ancient Greek mathematicians the great- est was Archimedes. His work is sophisticated mathematics of the highest order. In particular, he developed methods that are close to those of integral calculus and used them to calculate areas and volumes of complicated curved shapes.

1.7 The legacy of the Romans

For all their aqueducts, roads, baths and maintenance of public order, it has been said of the Romans that their only contribution to mathematics was when Cicero rediscovered the grave of Archimedes and had it restored5.

1.8 What they didn’t tell you in school

This book is written to help you make the transition from school maths to university maths. You might well still be in school, or you might have left school fifty years ago, it doesn’t matter. Maths as taught in school and the maths taught at university are very different, but the failure to understand those differences can cause problems. To be successful in university mathe- matics you have to think in new ways. University Mathematics is not just School Mathematics with harder sums and fancier notation, it is different, fundamentally different, from what you did at school.

In much of school mathematics, you learn methods for solving spe- cific problems. Often, you just learn formulae.

A method for solving a problem that requires little thought in its appli- cation is called an algorithm. Computer programs are the supreme examples of algorithms, and it is certainly true that finding algorithms for solving spe- cific problems is an important part of mathematics, but it is by no means the

5George Simmons, Calculus Gems, McGraw-Hill, Inc., New York, 1992, page 38. 16 CHAPTER 1. THE NATURE OF MATHEMATICS only part. Problems do not come neatly labelled with the methods needed for their solution. A new problem might be solvable using old methods or it might require you to adapt those methods. On the other hand, you may have to invent completely new methods to solve it. Such new methods re- quire new ideas. In fact, what you might not have appreciated from school mathematics is the important role played in mathematics by ideas. An idea is a tool to help you think.

Mathematics at school is often taught without reasons being given for why the methods work.

This is the fundamental difference between school mathematics and uni- versity mathematics. A reason why something works is called a proof. I shall say a lot more about proofs in Chapter 2.

The Millennium Problems Mathematics is difficult but intellectually rewarding. Just how hard can be gauged by the following. The Millennium Problems is a list of seven outstanding problems posed by the Clay Institute in the year 2000. A correct solution to any one of them carries a one million dollar prize. To date, only one has been solved, the Poincar´econjecture, by Grigori Perelman in 2010, who declined to take the prize money. The point is that no one offers a million dollars for something that is trivial. You can read more about these problems at

http://www.claymath.org/millennium-problems

1.9 Further reading and links

There is a wealth of material about mathematics available on the Web and I would encourage exploration. Here, I will point out some books and links that develop the themes of this chapter. A book that is in tune with the goals of this chapter is

P. Davis, R. Hersh, E. A. Marchisotto, The mathematical experience, Birkh¨auser, 2012. 1.9. FURTHER READING AND LINKS 17

It’s one of those books that you can dip into and you will learn something interesting but, most importantly, it will expand your understanding of what mathematics is, as it did mine. A good source book for the history of mathematics, and again something that can be dipped into, is

C. B. Boyer, U. C. Merzbach, A history of mathematics, Jossey Bass, 3rd Edition, 2011.

The books above are about maths rather than doing maths. Let me now turn to some books that do maths in a readable way. There is a plethora of popular maths books now available, and if you pick up any books by Ian Stewart — though if the book appears to be rather more about volcanoes than is seemly in a maths book, you have Iain Stewart — and Peter Higgins then you will find something interesting. Sir (William) Timothy Gowers won a Field’s Medal in 1998 and so can be assumed to know what he is talking about.

T. Gowers, Mathematics: A Very Short Introduction, Oxford University Press, 2002

It is worth checking out his homepage for some interesting links. He also has his own blog which is worth checking out. I think the Web is serving to humanize mathematicians: their ivory towers all have wi-fi. A classic book of this type is

R. Courant, H. Robbins, What is mathematics, OUP, 1996.

This is also an introduction to university-level maths, and it has influenced my thinking on the subject. If you have never looked into Euclid’s book the Elements, then I would recommend you do6. There is an online version that you can access via David E. Joyce’s website at Clark University. A handsome printed version, edited by Dana Densmore, has been published by Green Lion Press, Santa Fe, New Mexico.

6Whenever I refer to Euclid, it will always be to this book. It consists of thirteen chapters, themselves called ‘books’, which are numbered in the Roman fashion I–XIII. 18 CHAPTER 1. THE NATURE OF MATHEMATICS

Finally, let me mention the books of Martin Gardner. For a quarter of a century, he wrote a monthly column on recreational mathematics for the Scientific American which inspired amateurs and professionals alike. I would start with

M. Gardner, Hexaflexagons, probability paradoxes, and the Tower of Hanoi: Martin Gardner’s first book of mathematical puzzles and games, CUP, 2002 and follow your interests. Chapter 2

Proofs

Part of the argument sketch, Monty Python

M = Man looking for an argument A = Arguer

M: An argument isn’t just contradiction. A: It can be. M: No it can’t. An argument is a connected series of statements intended to establish a proposition. A: No it isn’t. M: Yes it is! It’s not just contradiction. A: Look, if I argue with you, I must take up a contrary position. M: Yes, but that’s not just saying ‘No it isn’t.’ A: Yes it is! M: No it isn’t! A: Yes it is! M: Argument is an intellectual process. Contradiction is just the automatic gainsaying of any statement the other person makes. (short pause) A: No it isn’t.

The most fundamental difference between school and university mathe- matics lies in proofs. At school, you were probably told mathematical facts and given recipes that solved particular kinds of problems. But the chances

19 20 CHAPTER 2. PROOFS are, you were not given any reasons to back up those facts or explanation as to why those recipes worked. University and professional mathematics is different. Reasons and explanations are essential and are called proofs. They are the essence of mathematics. Mathematical truth, and the notion of proof that supports it, is so different from what we encounter in everyday life that I shall need to begin by setting the scene.

2.1 How do we know what we think is true is true?

Human beings usually believe something first for emotional reasons, and then look for the evidence to back it up. The pitfalls of this are obvious. We shall therefore be interested in reasons that do not involve emotion. To be concrete, how would you verify the following claim: Mount Everest is between 8 and 9 km high?

The appeal to authority

In the past, claims such as this would be resolved by consulting an en- cyclopedia or atlas whereas today, of course, we would simply go online. If you do this, you will find that a height of about 8.8 km is quoted. For most purposes this would settle things. But it’s important to understand what this entails. We are, in effect, taking someone’s word for it. We assume that whoever posted this information knows what they are talking about. What we are doing, therefore, is appealing to authority. Most of what we take to be true is based on such appeals to authority: parents, teachers, politicians, religiosi etc tell us things that they claim to be true and more often than not we believe them. There’s a small element of laziness involved on our part, but it is so convenient. The pitfalls of this are also obvious.

The appeal to experiment

But where did the figure of 8.8km come from? It wasn’t just plucked from the sky. The height of Mount Everest was first measured as part of the great survey of India undertaken in the nineteenth century. This consisted of a team of expert surveyers who not only employed extremely precise instru- ments that were used to take multiple measurements but who also tried to 2.1. HOW DO WE KNOW WHAT WE THINK IS TRUE IS TRUE? 21 minimize the effect of factors influencing the accuracy of their measurements such as temperature and, amazingly, variations in gravity. Making measure- ments and taking great pains over those measurements together with esti- mations of the error bounds is such an important part of science that science itself would be impossible without it. Let’s call this the appeal to experiment.

This brings me to how we know statements are true in mathematics. The essential point is the following:

Neither of the above methods for ascertaining truth plays any role whatsoever in determining mathematical truth.

This is so important, I am going to say it again in a different way:

• Results are not true in maths because I say so or because someone important said they were true a long time ago.

• Results are not true in mathematics because I have carried out exper- iments and I always get the same answer.

• Results are not true in maths ‘just because they are’.

How then can we determine whether something in mathematics is true?

• Results are true in maths only because they have been proved to be true.

• A proof shows that a result is true.

• A proof is something that you yourself can follow and at the end you will see the truth of what has been proved.

• A result that has been proved to be true is called a theorem.

• The appeal to authority and the appeal to experiment are both fallible. The appeal to proof is never fallible. The only truths we know for certain are mathematical truths.

This is heady stuff. So what, then, is a proof? The remainder of this chapter is devoted to an introductory answer to this question. 22 CHAPTER 2. PROOFS 2.2 Three fundamental assumptions of logic

In order to understand how mathematical proofs work, there are three sim- ple, but fundamental, assumptions you have to understand.

I. Mathematics only deals in statements that are capable of being either true or false.

Mathematics does not deal in statements which are ‘sometimes true’ or ‘mostly false’. There are no approximations to the truth in mathematics and no grey areas. Either a statement is true or a statement is false, though we might not know which. This is quite different from everyday life, where we often say things which contain a grain of truth or where we say things for rhetorical reasons which we don’t entirely mean. Mathematics also doesn’t deal in statements that are neither true nor false like exclamations such as ‘Out damned spot!’ or with questions such as ‘To be or not to be?’.

II. If a statement is true then its negation is false, and if a statement is false then its negation is true.

In natural languages, negating a sentence is achieved in different ways. In English, the negation of ‘It is raining’ is ‘It is not raining’. In French, the negation of ‘Il pleut’ is obtained by wrapping the verb in ‘ne . . . pas’ to get ‘It ne pleut pas’. To avoid grammatical idiosyncracies, we can use the formal phrase ‘it is not the case that’ and place it in front of any sentence to negate it. So, ‘It is not the case that it is raining’ is the negation of ‘It is raining’. In some languages, and French is one of them, adding negatives is used for emphasis. This used to be the case in older forms of English and is often the case in informal English. In formal English, we are taught that two negatives make a positive which is actually the rule taken from mathematics above where it is true. In fact, negating negatives in natural languages is more complex than this. For example, if your partner says they are ‘not un- happy’ then this isn’t quite the same as being ‘happy’ and maybe you need to talk.

III. Mathematics is free of contradictions. 2.3. EXAMPLES OF PROOFS 23

A contradiction is where both a statement and its negation are true. This is impossible by (II) above. This assumption will play a vital role in proofs as we shall see later.

2.3 Examples of proofs

Armed with the three assumptions above, I am going to take you through five proofs of five results, three of them being major theorems. This will enable me to show you examples of proofs but will also illustrate important issues about how proofs, and mathematics, work. Although proofs can be long or short, hard or easy they all tend to follow the same script. First, there will be a statement of what is going to be proved. This usually has the form: if a bunch of things are assumed true then something else is also true. If the things assumed true are lumped together as A, for assumptions, and the thing to be proved true is labelled C, for conclusion, then a statement to be proved usually has the shape ‘if A then C’ or ‘A implies C’ or, in notation, ‘A ⇒ C’. The proof itself should be thought of as a (rational) argument between two protagonists whom we shall call Alice and Bob. We assume that Alice wants to prove C. She can use any of the assumptions A, any previously proved theorems, the rules of logic, which I shall describe as we meet them, and definitions. Bob’s role is to act like an attorney and to demand that Alice justify each claim she makes. Thus Alice cannot just make assertions without justifying them, and she is limited in the sorts of things that count as justifications. At the end of this, Alice can say something like ‘ . . . and so C is proved’ and Bob will be forced to agree.

2.3.1 Proof 1 We shall prove the following statement.

The square of an even number is even, and the square of an odd number is odd.

In fact, this is really two statements ‘If n is an even number then n2 is even’ and ‘If n is an odd number then n2 is odd.’ Before we can prove them, we 24 CHAPTER 2. PROOFS need to understand what they are actually saying. The terms odd and even are only used of whole numbers such as

0, 1, 2, 3, 4,...

These numbers are called the natural numbers and they are the first kinds of numbers we learn about as children. Thus we are being asked to prove a statement about natural numbers. The terms ‘odd’ and ‘even’ might seem obvious, but we need to be clear about how they are used in maths. By definition, a n is even if it is exactly divisible by 2, otherwise it is said to be odd. In maths, we usually just say divisible rather than exactly divisible. This definition of divisibility only makes sense when talking about whole numbers. For fractions, for example, it is pointless since one fraction will always divide another fraction. Notice that 0 is an even number because 0 = 2 × 0. In other words, 0 is exactly divisible by 2. However, remember, you cannot divide by 0 but you can certainly divide into 0. You might have been told that a number is even if its last digit is one of the digits 0, 2, 4, 6, 8. In fact, this is a consequence of our definition rather than a definition itself. I shall ask you to prove this result in the exercises. I shall say no more about the definition of even. What about the definition of odd? A number is odd if it is not even. This is not a very useful definition since a number is odd if it fails to be even. We want a more positive characterization. So we shall describe a better one. If you attempt to divide a number by 2 then there are two possibilities: either it goes exactly, in which case the number is even, or it goes so many times plus a remainder of 1, in which case the number is odd. It follows that a better way of defining an odd number n is one that can be written n = 2m + 1 for some natural number m. So, the even numbers are those natural numbers that are divisible by 2, thus the numbers of the form 2n for some n, and the odd numbers are those that leave the remainder 1 when divided by 2, thus the numbers of the form 2n + 1 for some n. Every number is either odd or even but not both. There is a moral to be drawn from what I have just done, and I shall state it boldly because of its importance. It may seem obvious but experi- ence shows that it is, in fact, not.

Every time you are asked to prove a statement, you must ensure that you understand what that statement is saying. This means, in particular, checking that you understand what all the words in 2.3. EXAMPLES OF PROOFS 25 the statement mean.

The next point is that we are making a claim about all even numbers. If you pick a few even numbers at random and square them then you will find in every case that the result is even but this does not prove our claim. Even if you checked a trillion even numbers and squared them and the results were all even it wouldn’t prove the claim. Maths, remember, is not an experimental science. There are plenty of examples in maths of statements that look true and are true for umpteen cases but are in fact bunkum. This means that, in effect, we have to prove an infinite number of state- ments: 02 is even, and 22 is even, and 42 is even . . . I cannot therefore prove my claim by picking a specific even number, like 12, and checking that its square is even. This simply verifies one of the infinitely many statements above. As a result, the starting point for my proof cannot be a specific even number. It has to be a general even number. We are now in a position to prove our claims. First, we prove that the square of an even number is even.

1. Let n be an even number. This is the assumption that gets the ball rolling. Notice that n is not a specific even number. We want to prove something for all even numbers so we cannot argue with a specific one.

2. Then n = 2m for some natural number m. Here we are using the definition of what it means to be an even number.

3. Square both sides of the equation in (2) to get n2 = 4m2. To do this correctly, you need to follow the rules of high-school algebra.

4. Now rewrite this equation as n2 = 2(2m2). This uses more basic high- school algebra.

5. Since 2m2 is a natural number, it follows that n2 is even using our definition of an even number. This proves our claim.

Second, we prove that the square of an odd number is odd. I’ll provide less commentary than in the previous case.

1. Let n be an odd number.

2. By definition n = 2m + 1 for some natural number m. 26 CHAPTER 2. PROOFS

3. Square both sides of the equation in (2) to get n2 = 4m2 + 4m + 1.

4. Now rewrite the equation in (3) as n2 = 2(2m2 + 2m) + 1.

5. Since 2m2 + 2m is a natural number, it follows that n2 is odd using our definition of an odd number. This proves our claim.

We have therefore proved our two claims. I admit that they are not exciting but just bear with me.

2.3.2 Proof 2 We shall prove the following statement.

If the square of a number is even then that number is even, and if the square of a number is odd then that number is odd.

In fact, this is really two statements ‘If n2 is even then n is even’ and ‘If n2 is odd then n is odd’. At first reading, you might think that I am simply repeating what I proved above. But in Proof 1, I proved

‘if n is even then n2 is even’ whereas now I want to prove

‘if n2 is even then n is even’.

Our assumptions in each case are different and our conclusions in each case are different. It is therefore important to distinguish between A ⇒ B and B ⇒ A. The statement B ⇒ A is called the converse of the statement A ⇒ B. Experience shows that people are prone to swapping assumptions and conclusions without being aware of it. We prove the first claim.

1. Suppose that n2 is even.

2. Now it is very tempting to try and use the definition of even here, just as we did in Proof 1, and write n2 = 2m for some natural number m. But this turns out to be a dead-end. Just like playing a game such as chess, not every possible move is a good one. Choosing the right move comes with experience and sometimes just plain trial-and-error. 2.3. EXAMPLES OF PROOFS 27

3. So we make a different move. We know that n is either odd or even. Our goal is to prove that it must be even.

4. Could n be odd? The answer is no, because as we showed in Proof 1, if n is odd then, as we showed above, n2 is odd.

5. Therefore n is not odd.

6. But a number that is not odd must be even. It follows that n is even.

We use a similar strategy to prove the second claim. The proofs here were more subtle, and less direct, than in our first ex- ample and they employed the following important strategy: if there are two possibilities exactly one of which is true; we rule out one of those possibilities and so deduce that the other possibility must be true.1 Here is a concrete example. There are two politicians, Alice and Bob. One of them always lies and the other always tells the truth. Suppose you ask Bob the question: is it true that 2 + 2 = 5? If he replies ‘yes’ then you know Bob is lying. Without further ado, you can deduce that Alice is that paragon of politicians and always tells the truth. If A ⇒ B and B ⇒ A then we say that A if, and only, if B or A iff B or A ⇔ B. The use of the word iff is peculiar to mathematical English. If we combine Proofs 1 and 2, we have proved the following two statements for all natural numbers n:‘n is even if, and only if, n2 is even’ and ‘n is odd if, and only if, n2 is odd’. It is important to remember that the statement ‘A if, and only, if B’ is in fact two statements in one. It means (1) ‘A implies B’ and (2) ‘B implies A’. So, to prove the statement ‘A if and only if B’ we have to prove TWO statements: we have to prove ‘A implies B’ and we have to prove ‘B implies A’. The results of this example were trickier to prove than the previous ones, but not much more exciting. However, we have now laid the foundations for a truly remarkable result.

1This might be called the Sherlock Holmes method. “How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?” The Sign of Four, 1890. 28 CHAPTER 2. PROOFS

2.3.3 Proof 3 We shall now prove our first real theorem. √ 2 cannot be written as an exact fraction.

If you square each of the fractions in turn 3 7 17 41 , , , ,... 2 5 12 29 you will find that you get closer and closer to 2 and so each of these numbers is an approximation to the square root of 2. This raises the question: is x it possible to find a fraction y whose square is exactly 2? In fact, it isn’t but that isn’t proved just because my attempts above failed. Maybe, I just haven’t looked√ hard enough. So, I have to prove that it is impossible. To prove that 2 is not an exact fraction, I am actually going to begin by trying to show you that it is. √ x 1. Suppose that 2 = y where x and y are positive whole numbers where y 6= 0.

x 2. We may assume that y is a fraction in its lowest terms so that the only natural number that divides both x and y is 1. Keep your eye on this assumption because it will come back to sting us later.

x2 3. Square both sides of the equation in (2) to get 2 = y2 . 4. Multiply both sides of the equation in (3) by y2.

5. We therefore get the equation 2y2 = x2.

6. Since 2 divides the lefthandside of this equation, it must divide the righthandside. This means that x2 is even.

7. We now use Proof 2 to deduce that x is even.

8. We may therefore write x = 2u for some natural number u.

9. Substitute this value for x we have found in (5) to get 2y2 = 4u2.

10. Divide both sides of the equation in (9) by 2 to get y2 = 2u2. 2.3. EXAMPLES OF PROOFS 29

11. Since the righthand-side of the equation in (10) is even so is the left- handside. Thus y2 is even. 12. Since y2 is even, it follows by Proof 2, that y is even. 13. If (1) is true then we are led to the following two conclusions. From (2), we have that the only natural number to divide both x and y is 1. From (7) and (12), 2 divides both√ x and y. This is a contradiction. Thus (1) cannot be true. Hence 2 cannot be written as an exact fraction.

This result is phenomenal. It says that no matter how much money you √spend on a computer it will never be able to calculate the exact value of 2, just a very, very good approximation. We now make a very important definition. A real number√ that is not rational is called irrational. We have therefore proved that 2 is irrational.

2.3.4 Proof 4 We now prove our second real theorem.

The sum of the angles in a triangle add up to 180◦.

This is a famous result that everyone knows. You might have learnt about it at school by drawing lots of triangles and measuring their angles but as I said above, maths is not an experimental science and so this enterprize proves nothing. The proof I give is very old and occurs in Euclid’s book the Elements: specifically, Book I, Proposition 32. Draw a triangle and call its three angles α, β and γ respectively.

β

α γ

Our goal is to prove that α + β + γ = 180◦. In fact, we shall show that the three angles add up to a straight line which is the same thing. Draw a line through the point P parallel to the base of the triangle. 30 CHAPTER 2. PROOFS

P β

α γ

Then extend the two sides of the triangle that meet at the point P as shown.

0 β0 γ α0 β

α γ

As a result, we get three angles that I have called α0, β0 and γ0. I now make the following claims

• β0 = β because the angles are opposite each other in a pair of inter- secting straight line.

• α0 = α because these two angles are formed from a straight line cutting two parallel lines.

• γ0 = γ for the same reason as above.

But since α0 and β0 and γ0 add up to give a straight line, we have proved the claim. Now this is all well and good, but we have proved our result on the basis of three other results currently unproved:

1. That given a line l and a point P not on that line I may draw a line through the point P and parallel to l.

2. If two line intersect, then opposite angles are equal.

3. If a line l cuts two parallel lines l1 and l2 the angle l makes with l1 is the same as the angle it makes with l2. 2.3. EXAMPLES OF PROOFS 31

How do we know they are true? Result (2) can readily be proved. We shall use the diagram below.

β α γ δ

The proof that α = γ follows from the simple observation that α+β = β +γ. This still leaves (1) and (3). I shall say more about them later when I talk about axioms.

2.3.5 Proof 5

The most famous theorem of them all is the one attributed to Pythagoras and proved in Book I, Proposition 47 of Euclid. We begin with a right-angled triangle.

c a

b

We want to prove, of course, that

a2 + b2 = c2.

Consider the shape below. It has been constructed from four copies of our triangle and two squares of areas a2 and b2, respectively. I claim that this shape is actually a square. First, the sides all have the same length a + b. Second, the angles at the corners are right angles by Proof 4. 32 CHAPTER 2. PROOFS

a b

a a2

b b2

Now look at the following picture. This is also a square with sides a + b so it has the same area as the first square. Using Proof 4, the shape in the middle really is a square with area c2. b a

a b

c2

b a

a b

If we subtract the four copies of the original triangle from both squares, the shapes that remain must have the same areas, and we have proved the claim. 2.3. EXAMPLES OF PROOFS 33

Exercises 2.3

1. Raymond Smullyan is both a mathematician and a magician. Here are two of his puzzles. On an island there are two kinds of people: knights who always tell the truth and knaves who always lie. They are indistinguishable.

(a) You meet three such inhabitants A, B and C. You ask A whether he is a knight or knave. He replies so softly that you cannot make out what he said. You ask B what A said and they say ‘he said he is a knave’. At which point C interjects and says ‘that’s a lie!’. Was C a knight or a knave? (b) You encounter three inhabitants: A, B and C. A says ‘exactly one of us is a knave’. B says ‘exactly two of us are knaves’. C says: ‘all of us are knaves’. What type is each?

2. There are five houses, from left to right, each of which is painted a different colour, their inhabitants are called W, C, O, S and M, but not necessarily in that order, who own different pets, drink different drinks and drive different cars.

(a) There are five houses. (b) W lives in the red house. (c) C owns the dog. (d) Coffee is drunk in the green house. (e) O drinks tea. (f) The green house is immediately to the right (that is: your right) of the ivory house. (g) The Oldsmobile driver owns snails. (h) The Bentley owner lives in the yellow house. (i) Milk is drunk in the middle house. (j) S lives in the first house. 34 CHAPTER 2. PROOFS

(k) The person who drives the Chevy lives in the house next to the man with the fox. (l) The Bentley owner lives in a house next to the house where the horse is kept. (m) The Lotus owner drinks orange juice. (n) M drives the Porsche. (o) S lives next to the blue house.

There are two questions: who drinks water and who owns the aardvark?

3. Prove that the sum of any two even numbers is even, that the sum of any two odd numbers is even, and that the sum of an odd and an even number is odd.

4. Prove that the sum of the interior angles in any quadrilateral is equal to 360◦.

5.

(a) A rectangular box has side of length 2, 3 and 7 units. What is the length of the longest diagonal? (b) I draw a square. Without measuring any lengths, you now have construct a square that has exactly twice the area. (c) A right-angled triangle has sides with lengths x, y and hypotenuse z2 z. Prove that if the area of the triangle is 4 then the triangle is isosceles.

6.

(a) Prove that the last digit in the square of a positive whole number must be one of 0,1,4,5,6, or 9. Is the converse true? (b) Prove that a natural number is even if, and only if, its last digit is even. (c) Prove that a natural number is exactly divisible by 9 if, and only if, the sum of its digits is divisible by 9. √ 7. Prove that 3 cannot be written as an exact fraction. 2.3. EXAMPLES OF PROOFS 35

8. The goal of this question is to prove Ptolomy’s theorem2. This deals with cyclic quadrilaterals, that is those quadrilaterals whose vertices lie on a circle. With reference to the diagram below,

C c B b y

D x d a

A

this theorem states that

xy = ac + bd.

Hint. Show that on the line BD there is a point X such that the angle XADˆ is equal to the angle BACˆ . Deduce that the triangles AXD and ABC are similar, and that the triangles AXB and ACD are similar. Let the distance between D and X be e. Show that

e c y − e b = and that = . a x d x

From this, the result follows by simple algebra. To help you show that the triangles are similar, you will need to use Proposition III.21 from Euclid which is illustrated by the following diagram

2Claudius Ptolomeus was a Greek mathematician and astronomer who flourished around 150 CE in the city of Alexandria. 36 CHAPTER 2. PROOFS

9. The goal of this question is to find all Pythagorean triples. That is natural numbers (a, b, c) such that a2 + b2 = c2. We shall do this using geometry by finding all the rational points on the unit circle. We shall use the diagram below.

P

A

We have drawn a unit circle centre the origin. From the point (−1, 0), called A, we draw a line to any other point P on the circle.

(a) Show that any line passing through the point A has the equation y = t(x + 1) where t is any real number. (b) Show that this line intersects the circle at some point P on the circle, different from A, when

1 − t2 2t  (x, y) = , . 1 + t2 1 + t2

(c) Deduce that the rational points on the circle correspond to the values of t which are rational. 2.4. AXIOMS 37

p (d) Put t = q , in its lowest terms. Deduce that all Pythagorean triples are obtained as the following (r(q2 − p2), 2pqr, r(p2 + q2)) where p, q, r are any integers. 10. Take any positive natural number n; so n = 1, 2, 3,... If n is even, n divide it by 2 to get 2 ; if n is odd, multiply it by 3 and add 1 to obtain 3n+1. Now repeat this process and stop only if you get 1. For example, if n = 6 you get 6, 3, 10, 5, 16, 8, 4, 2, 1. What happens if n = 11? What about n = 27? Prove that no matter what number you start with, you will always eventually reach 1.

2.4 Axioms

At this point, I need to confront some potential problems with the idea of proof I have been developing. Once this is done, I will then be able to complete the proof of Proof 4. Suppose I am trying to prove the statement S. Then I am done if I can find a theorem S1 so that S1 ⇒ S. But this raises the question of how I know that S1 is a theorem. This can only be because I can find a theorem S2 such that S2 ⇒ S1. There are now three possibilities:

1. At some point I find a theorem Sn such that S ⇒ Sn. This is clearly a bad thing. In trying to prove S I have in fact used S and so haven’t proved anything at all. This is an example of circular reasoning and has to be avoided. I can do this by organizing what I know in a hierarchy — so to prove a result, I am only allowed to use those theorems already proved. In this way, I can avoid going around in circles. 2. Assuming I have avoided the above pitall, the next nasty possibility is that I get an infinite sequence of implications:

... ⇒ Sn ⇒ Sn−1 ⇒ ... ⇒ S1 ⇒ S. I never actually know that S is a theorem because it is always proved in terms of something else without end. This is also clearly a bad thing. I establish relative truth, a statement is true if another is true, but not absolute truth. I clearly don’t want this to happen. But if not, then I am led inexorably to the third possibility. 38 CHAPTER 2. PROOFS

3. To prove S, I only have to prove only a finite number of implications

Sn ⇒ Sn−1 ⇒ ... ⇒ S1 ⇒ S.

But, if Sn is supposed to be a theorem then how do I know it is true if not in terms of something else, contradicting the assumption that this was supposed to be a complete argument? I shall now delve into case (3) above in more detail, since resolving it will lead to an important insight. Maths is supposed to be about proving theo- rems but the analysis above has led us to the uncomfortable possibility that some things have to be accepted as true ‘because they are’ which contradicts what I went to great trouble to rubbish earlier. Before I explain the way out of this conundrum, let me first consider an example from an apparently completely different enterprize: playing a game. To be concrete, let’s take the game of chess. Most people have learnt chess at some point even if, like me, you are not very good at it. This game consists of a board and some pieces. The pieces are of different types — kings, queens, knights, bishops, castles, pawns — each of which can be moved in different ways. To play chess means to accept the rules of chess and to move the pieces in accordance with the rules. Whether one player wins or there is a draw is also described by the rules of chess. It’s meaningless to ask whether the rules of chess are true. But a move in chess is valid, which is another way of saying true, if it is made according to those rules. This example provides a way of understanding how maths works. Maths should be viewed as a collection of different mathematical domains each described by its own ‘rules of the game’ which in maths are termed axioms. These axioms are the basic assumptions on which the theory is built and are the building blocks of all proofs within that mathematical domain. Our goal is to prove interesting theorems from those axioms. As an example, consider Euclidean geometry. The Greeks attributed the discovery of geometry to the Ancient Egyptians who needed it in recalculat- ing land boundaries for the purposes of tax assessment after the yearly flood of the Nile. Thus geometry probably first existed as a collection of geomet- rical methods that worked: the tax was calculated, the pyramids built and everyone was happy. But it was the Ancient Greeks themselves who elevated it into a mathematical science and a model of what could be achieved in mathematics. Euclid’s book the Elements codified what was known about geometry into a handful of axioms and then showed that all of geometry 2.4. AXIOMS 39 could be deduced from those axioms by the use of mathematcial proof. The Elements is not only the single most important mathematics book ever writ- ten but one of the most important books — fullstop. Here is a list of the key axioms.

1. Two distinct points determine a unique straight line.

2. A line segment can be extended infinitely in either direction.

3. Circles can be drawn with any centre and any radius.

4. Any two right angles are equal to each other.

5. Suppose that a straight line cuts two lines l1 and l2. If the interior ◦ angles on the same side add up to strictly less than 180 , then if l1 and l2 are extended on that side they will eventually meet.

The last axiom needs a picture to illustrate what is going on.

l1

l2

In principle, all of the results you learnt in school about triangles and cir- cles can be proved from these axioms. I say ‘in principle’ since there were a few bugs which were later fixed by a number of mathematicians most no- tably David Hilbert. But this shouldn’t detract from what an enormous achievement Euclid’s book was and is. We may now finish off Proof 4: claim 40 CHAPTER 2. PROOFS

(1) is proved in Book I, Proposition 31, and claim (3) is proved in Book I, Proposition 29. One way of teaching maths at university would therefore be to start with a list of axioms and start proving things. But this approach has a num- ber of disadvantages: it is time-consuming, laborious, sometimes, even, a bit tedious, and takes a very, very long time to reach the really interesting theorems. Therefore, in this book, I shall usually base each topic on quite high-level axioms so that we can get to the interesting theorems quickly, but I shall also give pointers to readers who want to see the full axiomatic development.

Exercises 2.4

1. Hofstadter’s MU-puzzle.A string is just an ordered sequence of sym- bols. In this puzzle, you will construct strings using the letters M,I,U. You are given the string MI which is your only axiom. You can make new strings only by using the following rules any number of times in succession in any order:

(I) If you have a string that ends in I then you can add a U on at the end. (II) If you have a string Mx where x is a string then you may form Mxx. (III) If III occurs in a string then you may make a new string with III replaced by U. (IV) If UU occurs in a string then you may erase it.

I shall write x → y to mean that y is the string obtained from the string x by applying one of the above four rules. Here are some examples:

• By rule (I), MI → MIU. • By rule (II), MIU → MIUIU. • By rule (III), UMIIIMU → UMUMU. • By rule (IV), MUUUII → MUII.

The question is: can you make MU? 2.5. MATHEMATICS AND THE REAL WORLD 41 2.5 Mathematics and the real world

Euclidean geometry appears to be about the real world. In fact, for thou- sands of years this was what mathematicians believed until they discovered other geometries with different properties. On the surface of a sphere, for example, the sum of the angles in a spherical triangle will actually be bigger than 180◦, the exact amount being determined by the area of the triangle. This result played an important role in surveying. But our analysis above leads us to the following conclusion:

Mathematics is about logically consistent mathematical universes.

A mathematical truth is therefore something proved in one of those math- ematical universes, and is not a truth about ‘out there’. Despite this, math- ematical truths do help us to understand the actual physical universe we inhabit. For example, does the geometry of the universe follow the rules of Euclidean geometry? Here is what NASA says on the basis of the Wilkinson Microwave Anisotropy Probe (WMAP): “WMAP also confirms the predictions that the amplitude of the variations in the density of the universe on big scales should be slightly larger than smaller scales, and that the universe should obey the rules of Euclidean geometry so the sum of the interior angles of a triangle add to 180 degrees.”

http://map.gsfc.nasa.gov/news/index.html

2.6 Proving something false

‘Proving a statement true’ and ‘proving a statement false’ sound similar but it turns out that ‘proving a statement false’ requires a lot less work than ‘proving a statement true’. There is an asymmetry between them. To prove a statement false all you need do is find a counterexample. Here is an example. Consider the following statement: every odd number bigger than 1 is a prime. This is false. The reason is that 9 is odd, bigger than 1, and not prime. Thus 9 is a counterexample. The number 9 here can be regarded as a witness that shows the claim to be false. To prove a statement true, you have to work hard. To prove a statement false, you only have to find one 42 CHAPTER 2. PROOFS counterexample and you are done. (Though in research mathematics finding a counterexample can be a Herculean task).

2.7 Key points

• One of the goals of this book is to introduce you to proofs. This does not mean that you will afterwards be able to do proofs. That takes time and practice.

• Initially, you should aim to understand proofs. This means seeing why a proof is true. A good test of whether you really understand a proof is whether you can explain it to someone else. It is much easier to check that a proof is correct then it is to invent the proof in the first place. Nevertheless, be warned, it can also take a long time just to understand a proof.

• I shall ask you to find some proofs for yourself. But do not expect to find them in a few minutes. Constructing proofs takes time, trial and error and, yes, luck.

• If you don’t understand the words used in a statement that you are asked to prove then you are not going to be able to prove that state- ment. Definitions are vitally important in mathematics.

• Every statement that you make in a proof must be justified: if it is a definition, say that it is a definition; if it is a result known to be true, that is a theorem, say that it is known to be true; if it is one of the assumptions, say that it is one of the assumptions; if it is an axiom, say that it is an axiom.

• When starting out, it is probably best to write each statement of a proof on a separate line followed by its justification.

Finally, there are one or two pieces of terminology and notation that are worth mentioning here. The conclusion of a proof is marked using the symbol 2. This replaces the older use of QED. If we believe something might be true but there isn’t yet a proof we say that it is a conjecture. The things we can prove fall, roughly, into the following categories: a theorem is a major result, worthy of note; a proposition is a result, and a lemma is an auxiliary result, 2.8. MATHEMATICAL CREATIVITY 43 a tool, useful in many different places; a corollary is a result we can deduce with little or no effort from a proposition or theorem.

2.8 Mathematical creativity

Everything I have said above is true, but does need to be placed in perspec- tive. Where do proofs come from? More to the point, where do theorems come from? Music is a useful analogy. You can learn how to write music down, but that doesn’t make you a musician. In fact, there are some talented musicians who cannot even read music. Proofs keep us honest and ground what we are doing, but what makes maths fun is that it is creative, and for creativity there are no rules. For example, in dreaming up a theorem, experimentation may well play a role. Sometimes a theorem may evolve in tandem with a proof, at other times the theorem, or more accurately, the conjecture comes first and then there is the struggle to prove it, which may take place over many generations and centuries.

2.9 Set theory: the language of mathematics

Everyday English is good at everyday jobs, but can be hopelessly impre- cise where accuracy is important. To get around this, special varieties of English, little dialects, have been constructed for particular purposes. In mathematics, we use precise versions of everyday language augmented with special symbols. Part of this special language is that of set theory, invented by Georg Cantor (1845–1918) in the last quarter of the nineteenth century. This section is mainly a phrasebook of the most important terms we shall need for most of this book. I shall develop this language further when I need to when studying combinatorics. The starting point of set theory are the following two deceptively simple definitions:

• A set is a collection of objects which we wish to regard as a whole. The members of a set are called its elements3.

• Two sets are equal precisely when they have the same elements.

3Strictly speaking this definition is nonsense. Why? 44 CHAPTER 2. PROOFS

We often use capital letters to name sets: such as A, B, or C or fancy capital letters such as N and Z. The elements of a set are usually denoted by lower case letters. If x is an element of the set A then we write

x ∈ A and if x is not an element of the set A then we write

x∈ / A.

A set should be regarded as a bag of elements, and so the order of the elements within the set is not important. In addition, repetition of elements is ignored.4

Examples 2.9.1.

1. The following sets are all equal: {a, b}, {b, a}, {a, a, b}, {a, a, a, a, b, b, b, a} because the order of the elements within a set is not important and any repetitions are ignored. Despite this it is usual to write sets without repetitions to avoid confusion. We have that a ∈ {a, b} and b ∈ {a, b} but α∈ / {a, b}.

2. The set {} is empty and is called the empty set. It is given a special symbol ∅, which is taken from Danish and is the first letter of the Danish word meaning ‘empty’. Remember that ∅ means the same thing as {}. Take careful note that ∅= 6 {∅}. The reason is that the empty set contains no elements whereas the set {∅} contains one element. By the way, the symbol for the emptyset is different from the Greek letter phi: φ or Φ.

The number of elements in a set is called its cardinality. If X is a set then |X| denotes its cardinality. A set is finite if it only has a finite number of elements, otherwise it is infinite. If a set has only finitely many elements then we might be able to list them if there aren’t too many: this is done by putting them in ‘curly brackets’ { and }. We can sometimes define infinite sets by using curly brackets but then, because we can’t list all elements in an infinite set, we use ‘...’ to mean ‘and so on in the obvious way’. This can also be used to define finite sets where there is an obvious pattern. Often,

4If you want to take account of repetitions you have to use multisets. 2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS 45 we describe a set by saying what properties an element must have to belong to the set. Thus {x: P (x)} means ‘the set of all things x which satisfy the condition P ’. Here are some examples of sets defined in various ways.

Examples 2.9.2.

1. D = { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sun- day }, the set of the days of the week. This is a small finite set and so we can conveniently list its elements.

2. M = { January, February, March, . . . , November, December }, the set of the months of the year. This is a finite set but I didn’t want to write down all the elements so I wrote ‘. . . ’ to indicate that there were other elements of the set which I was too lazy to write down explicitly but which are, nevertheless, there.

3. A = {x: x is a }. I define a set by describing the proper- ties that the elements of the set must have. Here P (x) is the statement ‘x is a prime number’ and those natural numbers x are admitted mem- bership to the set when they are indeed prime.

In this book, the following sets of numbers will play a special role. We shall use this notation throughout and so it is worthwhile getting used to it.

Examples 2.9.3.

1. The set N = {0, 1, 2, 3,...} of all natural numbers.

2. The set Z = {..., −3, −2, −1, 0, 1, 2, 3,...} of all integers. The reason Z is used to designate this set is because ‘Z’ is the first letter of the word ‘Zahl’, the German for number.

3. The set Q of all rational numbers i.e. those numbers that can be written as fractions whether positive or negative.

4. The set R of all real numbers i.e. all numbers which can be represented by decimals with potentially infinitely many digits after the decimal point. 46 CHAPTER 2. PROOFS

5. The set C of all complex numbers, which I shall introduce from scratch later on.

Given a set A, a new set B can be formed by choosing elements from A to put in B. We say that B is a subset of A, which is written B ⊆ A. In mathematics, the word ‘choose’ also includes the possibilty of choosing nothing and the possibility of choosing everything. In addition, there doesn’t have to be any rhyme or reason to your choices: you can pick elements ‘at random’ if you want. If B ⊆ A and A 6= B then we say that B is a proper subset of A.

Examples 2.9.4.

1. ∅ ⊆ A for every set A, where we choose no elements from A. It is a very common mistake to forget the empty set when listing subsets of a set.

2. A ⊆ A for every set A, where we choose all the elements from A. It is a very common mistake to forget the set itself when listing subsets of a set.

3. N ⊆ Z ⊆ Q ⊆ R ⊆ C.

4. E, the set of even natural numbers, is a subset of N.

5. O, the set of odd natural numbers, is a subset of N.

6. P = {2, 3, 5, 7, 11, 13, 17, 19, 23,...}, the set of primes, is a subset of N.

7. A = {x: x ∈ R and x2 = 4} which is just the set {−2, 2}. There is a particular kind of subset that will be convenient to define now. If A and B are sets we define the set A \ B to consist of those elements of A that are not in B. Thus, in particular, A \ B ⊆ A. The operation is called relative complement. For example, N \ E = O. The set R \ Q is precisely the set of irrational numbers. When set theory is first encountered it doesn’t look very impressive. What could you possibly do with these very simple, if not simple-minded, defini- tions? In fact, all of mathematics can be developed using set theory. I am going to finish off this section with a first glimpse at the power of set theory. 2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS 47

Consider the set {a, b}. I have explained above that order doesn’t matter and so this is the same set as {b, a}. But there are many occasions where we do want order to matter. For example, in the Olympics it is important to know who came first and second in the 100m sprint not merely that the first two over the finishing line were X and Y in alphabetical order. So we need a new notion where order does matter. It is called an ordered pair and is written (a, b), where a is called the first component and b is called the second component. The key feature of this new object is that (a, b) = (c, d) if, and only if, a = c and b = d. So, order matters. For example, the or- dered pair (1, 2) is different from the ordered pair (2, 1). Furthermore, (1, 1) does not mean the same as 1 on its own. The idea of an ordered pair is a familiar one from co-ordinate geometry. We use ordered pairs of real num- bers (x, y) to specifiy points in the plane. At first blush, set theory seems inadequate to define ordered pairs. But in fact it can. I have put the details in a box and you don’t need to read them to understand the rest of the book.

Ordered Pairs I am going to show you how sets, which don’t encode order directly, can nevertheless be used to define ordered pairs. It is an idea due to Kuratowski (1896–1980). Define

(a, b) = {{a}, {a, b}}.

We have to prove, using only this definition, that we have (a, b) = (c, d) if, and only if, a = c and b = d. The proof is essentially an exercise in special cases. I shall prove the hard direction. Suppose that

{{a}, {a, b}} = {{c}, {c, d}}.

Since {a} is an element of the lefthand side it must be an element of the righthand side. So {a} ∈ {{c}, {c, d}}. There are now two possibilities. Either {a} = {c} or {a} = {c, d}. The first case gives us that a = c, and the second case gives us that a = c = d. Since {a, b} is an element of the lefthand side it must be an element of the righthand side. So {a, b} ∈ {{c}, {c, d}}. There are again two possibilities. Either {a, b} = {c} or {a, b} = {c, d}. The first case gives us that a = b = c, and the second 48 CHAPTER 2. PROOFS

case gives us that (a = c and b = d) or (a = d and b = c). We therefore have the following possibilities:

• a = b = c. But then {{a}, {a, b}} = {{a}}. It follows that c = d and so a = b = c = d and, in particular, a = c and b = d.

• a = c and b = d.

• In all remaining cases, a = b = c = d and so, in particular, a = c and b = d. We can now build sets of ordered pairs. Let A and B be sets. Define A × B, the product of A and B, to be the set A × B = {(a, b): a ∈ A and b ∈ B}. Example 2.9.5. Let A = {1, 2, 3} and let B = {a, b}. Then A × B = {(1, a), (1, b), (2, a), (2, b), (3, a), (3, b)} and B × A = {(a, 1), (1, b), (a, 2), (b, 2), (a, 3), (b, 3)}. So, in particular, A × B 6= B × A, in general. If A = B it is natural to abbreviate A × A as A2. This now agrees with the notation R2 which is the set of all ordered pairs of real numbers and, geometrically, can be regarded as the real plane. We have defined ordered pairs but there is no reason to stop with just pairs. We may also define ordered triples. This can be done by defining (x, y, z) = ((x, y), z). The key property of ordered triples is that if (a, b, c) = (d, e, f) then a = d, b = e and c = f. Given three sets A, B and C we may define their product A × B × C to be the set of all ordered triples (a, b, c) where a ∈ A, b ∈ B and c ∈ C. A good example of an ordered triple in everyday life is a date that consist of a day, a month and a year. Thus the 16th June 1904 is really an ordered triple (16, June, 1904) where we specify day, month and year in that order. If A = B = C then we write A3 rather than A × A × A. Thus the set R3 consists of all Cartesian co-ordinates (x, y, z). In general, we may define ordered n-tuples, which look like this (x1, . . . , xn), and products of n- n sets A1 × ... × An. And if A1 = ... = An then we write A for their n-fold product. 2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS 49

Russell’s Paradox There is more to sets than meets the eye. I shall now describe a famous result in the history of mathematics called Russell’s Paradox. Define the following R = {x: x∈ / x}, in other words: the set of all sets that do not contain themselves as an element. For example, ∅ ∈ R. We now ask the question: is R ∈ R? Before resolving this question, let’s back off a bit and ask what it means for X ∈ R. From the entry requirements, we would have to show that X ∈/ X . Putting X = R we deduce that R ∈ R is true only if R ∈/ R. Since this is an evident contradiction, we are inclined to deduce that R ∈/ R. However, if R ∈/ R then in fact R satisfies the entry requirements to be an element of R and so R ∈ R. Thus exactly one of R ∈ R and R ∈/ R must be true but assuming one is true implies the other is true. We therefore have an honest-to-goodness contradiction. Our only way out is to conclude that, whatever R might be, it is not a set. But this in turn contradicts my definition of a set as a collection of objects since R is a collection of objects. If you want to understand how to escape this predicament, you will have to study set theory. Disconcerting as this might be to you, imagine how much more so it was to the mathematician Gottlob Frege (1848–1925). He was working on a book which based the development of maths on sets when he received a letter from Russell describing this paradox and undermining what Frege was attempting to achieve. Bertrand Russell himself was an Anglo-Welsh philosopher born in 1872, when Queen Victoria still had another thirty years on the throne as ‘Queen empress’, and who died in 1970 a few months after Neil Arm- strong stepped onto the moon. As a young man he made important contributions to the foundations of mathematics but in the course of his extraordinary life he found time to stand for parliament, encouraged the philosopher Ludwig Wittgenstein, received two prison sentences, won the Nobel prize for literature, was the first president of CND, and cam- paigned against the Vietnam war. See Russell: a very short introduction by A. C. Grayling published by OUP, 2002, for a very short introduction.

I shall conclude this section by touching on a fundamental notion of math- ematics: that of a function. I shall approach it by first defining something 50 CHAPTER 2. PROOFS more general. Let A and B be any sets. By definition a subset X ⊆ A × B is called a relation from A to B. To motivate this definition, and new terminology, I shall consider an example.

Example 2.9.6. Let A be the set {A(dam), B(eth), C(ate), D(ave)} of peo- ple. Let B be the set {a(apples), b(ananas), o(ranges)} of fruit. Define X to be the following set of ordered pairs

{(A, a), (A, o), (B, b), (D, a), (D, b), (D, o)} which tells us who likes which fruit. Thus, for example, Adam likes apples and oranges (but not bananas) and Cate doesn’t like any of the fruit on offer. It is pretty irresistible to represent this information by means of a directed graph, such as the one below. Clearly, such graphs can be drawn to represent any relation. The term ‘relation’ is now explained by the fact that X tells us how the elements of A are related to the elements of B. In this case, the relation is ‘likes to eat’.

A a B o C b D

Let X be a relation from A to B. We say that X is a function if it satisfies two additional conditions: first, for each a ∈ A there is at least one b ∈ B such that (a, b) ∈ X; second, if (a, b), (a, c) ∈ X then b = c. If we think back to the graph in our example above, then the first condition says that every element in A is at the base of an arrow, and the second condition says that for each element in A is never at the base of two, or more, arrows. Slightly different notation is used when dealing with functions. Rather than thinking of ordered pairs, we think instead of inputs and outputs. Thus a function from A to B is determined when for each a ∈ A there is associated exactly one element b ∈ B. We think of a as the input and b as the corresponding, uniquely determined, output. If we denote our function by f then we write b = f(a) or that a 7→ b. Thus the corresponding relation is the set of all 2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS 51 ordered pairs (a, f(a)) where a ∈ A We call the set A the domain of the function and the set B the codomain of the function. We write f : A → B or f A → B.

Example 2.9.7. Here is an example of a function f : A → B. Let A be the set of all students in the lecture theatre at this time. Let B be the set of natural numbers. Then f is defined when for each student a ∈ A we associate their age f(a). We can see why this is precisely a function and not merely a relation. First, everyone has an age and, assuming they don’t lie, they have exactly one age. On the other hand, if we kept A as it is and let B be the set of nationalities then we will no longer have a function in general. Some people might be stateless, but even if we include that as a possibility in the set B, we still won’t necessarily have a function since some people own more than one passport.

Exercises 2.9

1. Let A = {♣, ♦, ♥, ♠}, B = {♠, ♦, ♣, ♥} and C = {♠, ♦, ♣, ♥, ♣, ♦, ♥, ♠}. Is it true or false that A = B and B = C? Explain.

2. Let X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Write down the following subsets of X:

(a) The subset A of even elements of X. (b) The subset B of odd elements of X. (c) C = {x: x ∈ X and x ≥ 6}. (d) D = {x: x ∈ X and x > 10}. (e) E = {x: x ∈ X and x is prime}. (f) F = {x: x ∈ X and (x ≤ 4 or x ≥ 7)}.

3. (a) Find all subsets of {a, b}. How many are there? Write down also the number of subsets with respectively 0, 1 and 2 elements. (b) Find all subsets of {a, b, c}. How many are there? Write down also the number of subsets with respectively 0, 1, 2 and 3 elements. 52 CHAPTER 2. PROOFS

(c) Find all subsets of the set {a, b, c, d}. How many are there? Write down also the number of subsets with respectively 0, 1, 2, 3 and 4 elements. (d) What patterns do you notice arising from these calculations?

4. If the set A has m elements and the set B has n elements how many elements does the set A × B have?

5. If A has m elements, how many elements does the set An have?

6. Prove that that two sets A and B are equal if, and only if, A ⊆ B and B ⊆ A.

2.10 Proof by induction

This is a method of proof that, although useful, does not always deliver much insight into why something is true. The basis of this method is the following:

Let X be a subset of N that satisfies the following two conditions: first, 0 ∈ X, and second if n ∈ X then n + 1 ∈ X. Then X = N. This fact is called the induction principle, and can be viewed as one of the basic axioms describing the natural numbers. We may use it as a proof technique in the following way. Suppose we have an infinite number of statements S0,S1,S2,... which we want to prove. By the induction principle, it is enough to do two things:

1. Show that S0 is true.

2. Show that if Sn is true then Sn+1 is also true.

It will then follow that Si is true for all positive i. Proofs by induction have the following script: Base step Show that the case n = 0 holds.

Induction hypothesis (IH) Assume that the case where n = k holds.

Proof bit Now use (IH) to show that the case where n = k + 1 holds.

Conclude that the result holds for all n by the induction principle. 2.10. PROOF BY INDUCTION 53

Example 2.10.1. Prove by induction that n3 + 2n is exactly divisible by 3 for all natural numbers n ≥ 0. Base step: when n = 0, we have that 03 + 2 · 0 = 0 which is exactly divisible by 3. Induction hypothesis: assume result is true for n = k. We prove it for n = k + 1. We need to prove that (k + 1)3 + 2(k + 1) is exactly divisible by 3 assuming only that k3 + 2k is exactly divisible by 3. We first expand (k + 1)3 + 2(k + 1) to get

k3 + 3k2 + 3k + 1 + 2k + 2.

This is equal to (k3 + 2k) + 3(k2 + k + 1) which is exactly divisible by 3 using the induction hypothesis. In practice, some simple variants of this principle are used. Rather than the whole set N, we often work with a set of the form

≥k N = N \{0, 1, . . . , k − 1} where k ≥ 1. Our induction principle is modified accordingly: a subset X of N≥k that contains k and contains n + 1 whenever it contains n must be equal to the whole of N≥k. In our script above, the base step involves checking the case where n = k. What I described above I shall call basic induction. There is also some- thing called the strong induction principle which runs as follows:

Let X be a subset of N that satisfies the following two conditions: first, 0 ∈ X and second, if {0, 1 . . . , n} ⊆ X, where n ≥ 1, then {0, 1 . . . , n + 1} ⊆ X. Then X = N. Finally, there is the well-ordering principle that states that every non- empty subset of the natural numbers has a smallest element. Induction, strong induction and well-ordering look very different from each other. In fact, they are equivalent and all useful in proving theorems. Proposition 2.10.2. The following are equivalent. 1. The induction principle.

2. The strong induction principle. 54 CHAPTER 2. PROOFS

3. The well-ordering principle.

Proof. (1)⇒(2). I shall assume that the induction principle holds and prove that the strong induction principle holds. Let X ⊆ N be such that 0 ∈ X and and if {0, 1 . . . , n} ⊆ X, where n ≥ 1, then {0, 1 . . . , n + 1} ⊆ X. We shall use induction to prove that X = N. Let Y ⊆ N consist of all natural numbers n such that {0, 1, . . . , n} ⊆ X. We have that 0 ∈ Y and we have that n + 1 ∈ Y whenever n ∈ Y . By induction, we deduce that Y = N. It follows that X = N. (2)⇒(3). I shall assume that the strong induction principle holds and prove that the well-ordering principle holds. Let X ⊆ N be a subset that has no smallest element. I shall prove that X must be empty. Put Y = N \ X.I claim that 0 ∈ Y . If not, then 0 ∈ X and that would obviously have to be the smallest element, which is a contradiction. Suppose that {0, 1, . . . , n} ⊆ Y . Then we must have that n + 1 ∈ X because otherwise n + 1 would be the smalest element of X. We now invoke strong induction to deduce that Y = N and so X = ∅. (3)⇒(1). I shall assume the well-ordering principle and prove the induc- tion principle. Let X ⊆ N be a subset such that 0 ∈ X and whenever n ∈ X then n + 1 ∈ X. Suppose that N \ X is non-empty. Then it would have a smallest element k say. But then k − 1 ∈ X and so, by assumption, k ∈ X, which is a contradiction. Thus N \ X is empty and so X = N. Strong induction will be used in a few places in this book but I will discuss it in more detail when needed.

Exercises 2.10

1. Prove that for each natural number n ≥ 3, we have that

n2 > 2n + 1.

2. Prove that for each natural number n ≥ 5, we have that

2n > n2.

3. Prove that for each natural number n ≥ 1, the number 4n +2 is divisible by 3. 2.10. PROOF BY INDUCTION 55

4. Prove that n(n + 1) 1 + 2 + 3 + ... + n = . 2 5. Prove that 2 + 4 + 6 + ... + 2n = n(n + 1).

6. Prove that

n(n + 1)2 13 + 23 + 33 + ... + n3 = . 2

7. Prove that a set with n ≥ 0 elements has exactly 2n subsets. 56 CHAPTER 2. PROOFS Chapter 3

High-school algebra revisited

In this chapter, I will review some of the basic constructions from high-school algebra from the perspective of this book.

3.1 The rules of the game

3.1.1 The axioms Algebra deals with the manipulation of symbols. This means that symbols are altered and combined according to certain rules. In high-school, the alge- bra you studied was mainly based on the properties of the real numbers. This means that when you write x you mean an unknown or yet-to-be-determined real number. In this section, I shall describe the rules, or axioms, that you use for doing algebra with real numbers. The primary operations we are interested in are addition x + y and multiplication x × y. As usual, I shall abbreviate the operation of multiplication by concatenation, which simply means we write xy. Sometimes, it is helpful to denote multiplication as follows x · y. Of course, there are two other familiar operations: subtrac- tion and division. We shall see that these should be treated in a different way: subtraction as the inverse of addition, and division as the inverse of multiplication. Both addition and multiplication require two inputs and then deliver one output with the inputs and outputs all being taken from the same set. They are therefore examples of what are called binary operations and are the commonest kinds of operations in algebra. For example, as we shall see

57 58 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED later, matrix addition and matrix multiplication are both binary operations, the vector product of two vectors is a binary operation, and the intersection and union of two sets are both binary operations. A binary operation on a set X is nothing other than a function from X × X to X. I shall use ∗ to mean any binary operation defined on some specified set X. We usually write binary operations between the inputs rather than using the usual functional notation. a a ∗ b b ∗

The two most important properties a binary operation may have is com- mutativity and associativity. A binary operation is commutative if a ∗ b = b ∗ a in all cases. That is, the order in which you carry out the operation is not important. Addition and multiplication of real, and as we shall see later, complex numbers are commutative. But we shall also meet binary operations that are not commutative: both matrix multiplication and vector products are examples. Commutativity is therefore not automatic. A binary operation is associative if (a ∗ b) ∗ c = a ∗ (b ∗ c) in all cases. Remember that the brackets tell you how to work out the product. Thus (a ∗ b) ∗ c means first work out a ∗ b, let’s call it d, and then work out d ∗ c. Almost all the binary operations we shall meet in this book are associative, the one important exception being the vector product. In order to show that a binary operation ∗ is associative, we have to check that all possible products (a ∗ b) ∗ c and a ∗ (b ∗ c) are equal. To show that a binary operation is not associative, we simply have to find specific values for a, b and c so that (a ∗ b) ∗ c 6= (a ∗ b) ∗ c. Here are examples of both of these possibilities.

Example 3.1.1. Let’s take the set or real numbers R and investigate a new binary operation denoted by ◦ that is defined as follows a ◦ b = a + b + ab. 3.1. THE RULES OF THE GAME 59

We shall prove that it is associative. First, we have to understand what it is we have to show. From the definition of associativity, we have to prove that (a ◦ b) ◦ c = a ◦ (b ◦ c) for all real numbers a, b and c. To do this, we calculate first the lefthand side and then the righthand side and then verify they are equal. Because we are trying to prove a result true for all real numbers, we cannot choose specific values of a, b and c. We first calculate (a ◦ b) ◦ c. Using the axioms for real numbers, we get that (a ◦ b) ◦ c = (a + b + ab) ◦ c = (a + b + ab) + c + (a + b + ab)c which is equal to a + b + c + ab + ac + bc + abc. Now we calculate a ◦ (b ◦ c). We get that a ◦ (b ◦ c) = a ◦ (b + c + bc) = a + (b + c + bc) + a(b + c + bc) which is equal to a + b + c + ab + ac + bc + abc. We now see that we get the same answers however we bracket the product and so we have proved that the binary operation ◦ is associative.

Example 3.1.2. Let’s take the set N and define the binary operation ⊕ as follows a ⊕ b = a2 + b2. I shall show that this binary operation is not associative. Let’s calculate first (1 ⊕ 2) ⊕ 3. By definition this is computed as follows (1 ⊕ 2) ⊕ 3 = (12 + 22) ⊕ 3 = 5 ⊕ 3 = 52 + 32 = 25 + 9 = 34. Now we calculate 1 ⊕ (2 ⊕ 3) as follows 1 ⊕ (2 ⊕ 3) = 1 ⊕ (22 + 32) = 1 ⊕ (4 + 9) = 1 ⊕ 13 = 12 + 132 = 1 + 169 = 170. Therefore (1 ⊕ 2) ⊕ 3 6= 1 ⊕ (2 ⊕ 3). It follows that the binary operation ⊕ is not associative. We are now ready to state the algebraic axioms that form the basis of high-school algebra. We shall split them up into three groups: those dealing only with addition, those dealing only with multiplication, and finally those that deal with both operations together. 60 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED

Axioms for addition

(F1) Addition is associative. Let x, y and z be any real numbers. Then (x + y) + z = x + (y + z).

(F2) There is an additive identity. The number 0 (zero) is the additive identity. This means that for an real number x we have that x + 0 = x = 0 + x. Thus adding zero to a number leaves it unchanged.

(F3) Each element has a unique additive inverse. This means that for each number x there is another number, written −x, with the property that x+(−x) = 0 = (−x)+x. The number −x is called the additive inverse of the number x.

(F4) Addition is commutative. Let x and y be any real numbers. Then x + y = y + x. The word commutative means that the order in which you add the numbers does not matter.

The first thing to understand is that none of these axioms should be surprising. They should all agree with your intuition.

Axioms for multiplication

(F5) Multiplication is associative. Let x, y and z be any real numbers. Then (xy)z = x(yz).

(F6) There is a multiplicative identity. The number 1 is the multiplicative identity. This means that for any real number x we have that 1x = x = x1.

(F7) Each non-zero number has a unique multiplicative inverse. Let x 6= 0. Then there is a unique real number written x−1 with the property that x−1x = 1 = xx−1. The number x−1 is called the multiplicative inverse 1 of x. It is, of course, the number x . It is very important to observe that zero does not have a multiplicative inverse.

(F8) Multiplication is commutative. Let x and y be any real numbers. Then xy = yx. Once again the word commutative means that the order in which you carry out the operations doesn’t matter. In this case, the operation is multiplication. 3.1. THE RULES OF THE GAME 61

The axioms for multiplication are very similar to those for addition. The only real difference between them is axiom (F7). This expresses the fact that you cannot divide by zero.

Linking axioms

(F9)0 6= 1.

(F10) The additive identity is a multiplicative zero. This means that 0x = 0 = x0. If you multiply any real number by 0 then you get 0.

(F11) Multiplication distributes over addition on the left and the right. There are actually two distributive laws: the left distributive law

x(y + z) = xy + xz

and the right distributive law

(y + z)x = yx + zx.

Let me come back to the omission of subtraction and division. These are not viewed as binary operations in their own right. Instead, we define a − b to mean a + (−b). Thus to subtract b means the same thing as adding −b. Likewise, we define a ÷ b, when b 6= 0 to mean a × b−1. Thus to divide by b is to multiply by b−1. We have missed out one further ingredient in algebra, and that is the properties of equality.

Properties of equality

(E1) If a = b then c + a = c + b.

(E2) If a = b then ca = cb.

Example 3.1.3. When I talked about algebra in Chapter 1, I mentioned that the usual way of solving a linear equation in one unknown depended on the properties of real numbers. Let me now show you how we use the above 62 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED axioms to solve ax+b = 0 where a 6= 0. Throughout, I use without comment the two properties of equality I have listed above.

ax + b = 0 (ax + b) + (−b) = 0 + (−b) by (F3) ax + (b + (−b)) = 0 + (−b) by (F1) ax + 0 = 0 + (−b) by (F3) ax = 0 + (−b) by (F2) ax = −b by (F2) a−1(ax) = a−1(−b) by (F10) since a 6= 0 (a−1a)x = a−1(−b) by (F5) 1x = a−1(−b) by (F10) x = a−1(−b) by (F5)

I don’t propose that you go into quite such gory detail when solving equations, but I wanted to show you what actually lay behind the rules that you might have been taught at school. Example 3.1.4. We can use our axioms to prove that −1×−1 = 1 something which is hard to understand in any other way. By definition, −1 is the additive inverse of 1. This means that 1 + (−1) = 0. Let us calculate (−1)(−1) − 1. We have that

(−1)(−1) − 1 = (−1)(−1) + (−1) by definition of subtraction = (−1)(−1) + (−1)1 since 1 is the multiplicative identity = (−1)[(−1) + 1] by the left distributivity law = (−1)0 by properties of additive inverses = 0 by properties of zero

Hence (−1)(−1) = 1. In other words, the result follows from the usual rules of algebra. My final example explains the reason for the prohibition about dividing by zero. Example 3.1.5. The following fallacious ‘proof’ shows that 1 = 2. 1. Let a = b. 3.1. THE RULES OF THE GAME 63

2. Then a2 = ab when we multiply both sides by a. 3. Now add a2 to both sides to get 2a2 = a2 + ab. 4. Subtract 2ab from both sides to get 2a2 − 2ab = a2 + ab − 2ab. 5. Thus 2(a2 − ab) = a2 − ab. 6. We deduce that 2 = 1 by cancelling. The source of the problem is in passing from line (5) to line (6). We are in fact dividing by zero and this is the source of the problem.

3.1.2 Indices We usually write a2 rather than aa, and a3 instead of aaa. In this section, r r s I want to review the meaning of algebraic expressions such as a where s is any rational number. Our starting point is a result that I would encourage you to assume as an axiom at a first reading. I have included the proof to show you a more sophisticated example of proof by induction. Lemma 3.1.6 (Generalized associativity). Let ∗ be any binary operation defined on a set X. If ∗ is associative then however you bracket a product such as x1 ∗ ... ∗ xn you will always get the same answer.

Proof. If x1, x2, ··· , xn are elements of the set X then one particular brack- eting will play an important role in our proof

x1 ∗ (x2 ∗ (··· (xn−1 ∗ xn) ··· )) which we write as [x1x2 . . . xn]. The proof is by strong induction on the length n of the product in ques- tion. The base case is where n = 3 and is just an application of the associative law. Assume that n ≥ 4 and that for all k < n, all bracketings of a sequence of k elements of X lead to the same answer. This is therefore the induc- tion hypothesis for strong induction. Let X denote any properly bracketed expression obtained by inserting brackets into the sequence x1, x2, ··· , xn. Observe that the computation of such a bracketed product involves comput- ing n − 1 products. This is because at each step we can only compute the 64 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED

product of adjacent letters xi ∗ xi+1. Thus at each step of our calculation we reduce the number of letters by one until there is only one letter left. However the expression may be bracketed, the final step in the computation will be of the form Y ∗ Z, where Y and Z will each have arisen from properly bracketed expressions. In the case of Y it will involve a bracketing of some sequence x1, x2, . . . , xr, and for Z the sequence xr+1, xr+2, . . . xn for some r such that 1 ≤ r ≤ n − 1. Since Y involves a product of length r < n, we may assume by the induction hypothesis that Y = [x1x2 . . . xr]. Observe that [x1x2 . . . xr] = x1 ∗ [x2 . . . xr]. Hence by associativity,

X = Y ∗ Z = (x1 ∗ [x2 . . . xr]) ∗ Z = x1 ∗ ([x2 . . . xr] ∗ Z).

But [x2 . . . xr] ∗ Z is a properly bracketed expression of length n − 1 in x2, ··· , xn and so using the induction hypothesis must equal [x2x3 . . . xn]. It follows that X = [x1x2 . . . xn]. We have therefore shown that all possible bracketings yield the same result in the presence of associativity. We illustrate a special case of the above proof in the example below.

Example 3.1.7. Take n = 5. Then the notation [x1x2x3x4x5] introduced in the above proof means x1 ∗ (x2 ∗ (x3 ∗ (x4 ∗ x5))). Consider the product ((x1 ∗ x2) ∗ x3) ∗ (x4 ∗ x5). Here we have Y = (x1 ∗ x2) ∗ x3 and Z = x4 ∗ x5. By associativity Y = x1 ∗ (x2 ∗ x3). Thus Y ∗ Z = (x1 ∗ (x2 ∗ x3)) ∗ (x4 ∗ x5). But this is equal to x1 ∗ ((x2 ∗ x3) ∗ (x4 ∗ x5)) again by associativity. By the induction hypothesis (x2 ∗ x3) ∗ (x4 ∗ x5) = x2 ∗ (x3 ∗ (x4 ∗ x5)), and so

((x1 ∗ x2) ∗ x3) ∗ (x4 ∗ x5) = x1 ∗ (x2 ∗ (x3 ∗ (x4 ∗ x5))), as required. If a binary operation is associative then the above lemma tells us that computing products of elements is straightforward because we never have to worry about how to evaluate it as long as we maintain the order of the elements. We now consider a special case of this result. Let a be any real number. Define the nth power an of a, where n is a natural number, as follows: a1 = a and an = aan−1 for any n ≥ 2. Generalized associativity tells us that an can in fact be calculated in any way we like because we shall always obtain the same answer. The following result should be familiar. I shall ask you to prove it in the exercises. Lemma 3.1.8 (Laws of exponents). Let m, n ≥ 1 be any natural numbers. 3.1. THE RULES OF THE GAME 65

1. am+n = aman. 2. (am)n = amn. It follows from the above lemma that powers of the same element a com- mute with one another: aman = anam as both products equal am+n. Our goal now is to define what am means when m is an arbitrary rational number. We shall be guided by the requirement that the above laws of exponents should continue to hold. We may extend the laws of exponents to allow m or n to be 0. The only way to do this is to define a0 = 1, where 1 is the identity and a 6= 0.

An extreme case! What about 00? This is a can of worms. For this book, it is probably best to define 00 = 1.

We have explained what an means when n is positive but what can we say when the exponent is negative? In other words, what does a−n mean? We assume that the rules above still apply. Thus whatever a−n means we should −n n 0 −n 1 have that a a = a = 1. It follows that a = an . With this interpretation we have defined an for all values of x. 1 We now investigate what a n should mean. If the law of exponents are to 1 1 √ n 1 n continue holidng, then (a n ) = a = a. It follows that a n = a. r We may now calculate a s it is equal to

r √ s r a s = ( a) . How do we calculate (ab)n? This is just ab times itself n times. But the order in which we multiply a’s and b’s doesn’t matter and so we can arrange all the a’s to the front. Thus (ab)n = anbn. We also have similar results for addition. We define 2x = x + x and nx = x + ... + x where the x occurs n times. We have 1x = x and 0x = 0. Let {a1, . . . , an} be a set of n elements. If we write them all in some order ai1 , . . . , ain then we have what is called a permutation of the elements. The following lemma can be treated as an axiom and the proof omitted until later. Lemma 3.1.9 (Generalized commutativity). Let ∗ be an associative and commutative binary operation on a set X. Let a1, . . . , an be any n elements of X. Then

a1 ∗ ... ∗ an = ai1 ∗ ... ∗ ain . 66 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED

Proof. First prove by induction the result that

a1 ∗ ... ∗ an ∗ b = b ∗ a1 ∗ ... ∗ an.

Let a1, . . . , an, an+1 be n+1 elements. Consider the product ai1 ∗...∗ain ∗ain+1 .

Suppose that an+1 = air . Then

ai1 ∗ ... ∗ air ∗ ... ∗ ain ∗ ain+1 = (ai1 ∗ ... ∗ ain ) ∗ an+1 where the expression in the backets is a product of some permutation of the elements a1, . . . , an. We have used here our result above. But by the induction hypothesis, we may write ai1 ∗ ... ∗ ain = a1 ∗ ... ∗ an.

3.1.3 Sigma notation At this point, it is appropriate to introduce some useful notation. Let a1, a2, . . . , an be n numbers. Their sum is a1 + a2 + ... + an and because of generalized associativity we don’t have to worry about brackets. We now abbreviate this as n X ai. i=1 Where P is Greek ‘S’ and stands for Sum. The letter i is called a subscript. The equality i = 1 tells us that we start the value of i at 1. The equality i = n tells us that we end the value of i at n. Although I have started the sum at 1, I could, in other circumstances, have started at 0, or any other appropriate number. This notation is very useful and can be manipulated using the rules above. If 1 < s < n, then we can write

n s n X X X ai = ai + ai. i=1 i=1 s+1

If b is any number then

n ! n X X b ai = bai i=1 i=1 is the generalized distributivity law that you are asked to prove in the exer- cises. These uses of sigma-notation shouldn’t cause any problems. 3.1. THE RULES OF THE GAME 67

The most complicated use of P-notation arises when we have to sum up what is called an array of numbers aij where 1 ≤ i ≤ m and 1 ≤ j ≤ n. This arises in matrix theory, for example. For concreteness, I shall give the example where m = 3 and n = 4. We can therefore think of the numbers aij as being arranged in a 3 × 4 array as follows:

a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34

Observe that the first subscript tells you the row and the second subscript tells you the column. Thus a23 is the number in the second row and the third column. Now we can add these numbers up in two different ways getting the same answer in both cases. The first way is to add the numbers up along the rows. So, we calculate the following sums

4 4 4 X X X a1j, a2j, a3j. j=1 j=1 j=1

We then add up these three numbers

4 4 4 3 4 ! X X X X X a1j + a2j + a3j = aij . j=1 j=1 j=1 i=1 j=1

The second way is to add the numbers up along the columns. So, we calculate the following sums

3 3 3 3 X X X X ai1, ai2, ai3, ai4. i=1 i=1 i=1 i=1 We then add up these four numbers

n n n n 4 3 ! X X X X X X ai1 + ai2 + ai3 + ai4 = aij . i=1 i=1 i=1 i=1 j=1 i=1

The fact that 3 4 ! 4 3 ! X X X X aij = aij i=1 j=1 j=1 i=1 68 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED is a consequence of the generalized commutativity law that you are asked to prove in the exercises. We therefore have in general that

m n ! n m ! X X X X aij = aij . i=1 j=1 j=1 i=1

3.1.4 Infinite sums What I have defined so far are finite sums and form part of algebra. There are also infinite sums ∞ X ai i=1 which form part of analysis, the subject that provides the foundations for calculus. There is one place where we use infinite sums in everyday life, and 1 that is in the decimal representations of numbers. Thus the fraction 3 can be written as 0 · 3333 ... and this is in fact an infinite sum: it means the infinite sum ∞ X 3 . 10i i=1 But in general infinite sums are problematic. For example, consider the infinite sum ∞ X S = (−1)i+1. i=1 So, this is just S = 1 − 1 + 1 − 1 + ... What is S? You’re first instinct might be to say 0 because

S = (1 − 1) + (1 − 1) + ...

But it could equally well be 1 calculated as follows

S = 1 + (−1 + 1) + (−1 + 1) + ...

1 1 In fact, it could even be 2 since S + S = 1 and so S = 2 . There is clearly something seriously awry here, and it is that infinite sums have to be handled very carefully if they are to make sense. Just how is the business of analysis 3.1. THE RULES OF THE GAME 69 and won’t be an issue in this book.

Warning! ∞ is not a number. It simply tells us to keep adding on terms for increasing values of i without end so we never write 3 . 10∞

Exercises 3.1

1. Prove the following identities using the axioms introduced.

(a)( a + b)2 = a2 + 2ab + b2. (b)( a + b)3 = a3 + 3a2b + 3ab2 + b3 (c) a2 − b2 = (a + b)(a − b) (d)( a2 + b2)(c2 + d2) = (ac − bd)2 + (ad + bc)2

2. Calculate the following.

(a)2 3. 1 (b)2 3 . (c)2 −4. − 3 (d)2 2 .

3. Assume that aij are assigned the following values

a11 = 1 a12 = 2 a13 = 3 a14 = 4 a21 = 5 a22 = 6 a23 = 7 a24 = 8 a31 = 9 a32 = 10 a33 = 11 a34 = 12

Calculate the following sums. P3 (a) i=1 ai2. P4 (b) j=1 a3j.

P3 P4 2  (c) i=1 j=1 aij . 70 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED

4. Let a, b, c ∈ R. If ab = ac is it true that b = c? Explain. 5. Laws of exponents.

(a) Prove by induction that am+n = aman. To do this, fix m and then prove the result by induction on n. Deduce that it holds for all m. (b) Prove by induction that (am)n = amn. To do this, fix m and then prove the result by induction on n. Deduce that it holds for all m.

6. Prove by induction that the left generalized distributivity law holds

a(b1 + b2 + b3 + ... + bn) = ab1 + ab2 + ab3 + ... + abn,

for any n ≥ 2.

3.2 Solving quadratic equations

The previous section might have given the impression that algebraic calcu- lations are routine. In fact, once you pass beyond linear equations, they usually require good ideas. The first place where a good idea is needed is in solving quadratic equations. Quadratic equations were solved by the Baby- lonians and the Egyptians and are dealt with in all school algebra courses. I have included them here because I want to show you that you don’t have to remember a formula to solve such equations; what you have to remember is a method. Let’s recall some definitions. An expression of the form

ax2 + bx + c where a, b, c are numbers and a 6= 0 is called a quadratic polynomial or a polynomial of degree 2. The numbers a, b, c are called the coefficients of the quadratic. A quadratic where a = 1 is said to be monic. A number r such that ar2 + br + c = 0 is called a root of the polynomial. The problem of finding all the roots of a quadratic is called solving the quadratic. Usually this problem is stated in the form: ‘solve the quadratic equation ax2 + bx + c = 0’. Equation because 3.2. SOLVING QUADRATIC EQUATIONS 71 we have set the polynomial equal to zero. I shall now show you how to solve a quadratic equation without having to remember a formula. Observe first that if ax2 + bx + c = 0 then

b c x2 + x + = 0. a a

Thus it is enough to find the roots of monic quadratics. We shall solve this 2 b equation by trying to do the following: write x + a x as a perfect square plus a number. This will turn out to be the crux of solving the quadratic. We shall illustrate our construction by using some diagrams. First, we represent 2 b geometrically the expression x + a x.

x

x

b a

Now cut the red rectangle into two pieces along the dotted line and rearrange them as shown below. 72 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED

b x 2a

x

b 2a

It is now geometrically obvious that if we add in the small dotted square, we get a new bigger square. This explain why the procedure is called completing the square. We now express in algebraic terms what these diagrams suggest. b  b b2  b2  b 2 b2 x2 + x = x2 + x + − = x + − . a a 4a2 4a2 2a 4a2 We therefore have that b  b 2 b2 x2 + x = x + − . a 2a 4a2 Look carefully at what we have done here: we have rewritten the lefthand side as a perfect square — the first term on the righthandside — plus a number — the second term on the righthandside. It follows that b c  b 2 b2 c  b 2 4ac − b2 x2 + x + = x + − + = x + + . a a 2a 4a2 a 2a 4a2 Setting the last expression equal to zero and rearranging, we get  b 2 b2 − 4ac x + = . 2a 4a2 Now take square roots of both sides, remembering that a non-zero number has two square roots: r b b2 − 4ac x + = ± 2a 4a2 3.2. SOLVING QUADRATIC EQUATIONS 73 which of course simplifies to √ b b2 − 4ac x + = ± . 2a 2a Thus √ −b ± b2 − 4ac x = 2a the usual formula for finding the roots of a quadratic. Example 3.2.1. Solve the quadratic equation 2x2 − 5x + 1 = 0. by completing the square. Divide through by 2 to make the quadratic monic giving 5 1 x2 − x + = 0. 2 2 We now want to write 5 x2 − x 2 as a perfect square plus a number. We get 5  52 25 x2 − x = x − − . 2 4 16 Thus our quadratic becomes  52 25 1 x − − + = 0. 4 16 2 Rearranging and taking roots gives us √ √ 5 17 5 ± 17 x = ± = . 4 4 4 We now check our answer by substituting each of our two roots back into the original quadratic and ensuring that we get zero in both cases. For the quadratic equation ax2 + bx + c = 0 the number D = b2 − 4ac, called the discriminant of the quadratic, plays an important role. 74 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED

• If D > 0 then the quadratic equation has two distinct real solutions.

• If D = 0 then the quadratic equation has one real root repeated. In b 2 this case, the quadratic is the perfect square x + 2a .

• If D < 0 then we shall see that the quadratic equation has two complex roots which are complex conjugate to each other. This is called the irreducible case.

If we put y = ax2 + bx + c then we may draw the graph of this equation. The roots of the original quadratic therefore correspond to the points where this graph crosses the x-axis. The diagrams below illustrate the three cases that can arise.

D > 0

D = 0 3.2. SOLVING QUADRATIC EQUATIONS 75

D < 0

Exercises 3.2 1. Calculate the discriminants of the following quadratics and so deter- mine whether they have two distinct roots, or repeated roots, or no real roots. (a) x2 + 6x + 5. (b) x2 − 4x + 4. (c) x2 − 2x + 5. 2. Solve the following quadratic equations by completing the square. Check your answers.

(a) x2 + 10x + 16 = 0. (b) x2 + 4x + 2 = 0. (c)2 x2 − x − 7 = 0.

3. I am thinking of two numbers x and y. I tell you their sum a and their product b. What are x and y in terms of a and b?

2 4. Let p(x) = x + bx + c be a monic quadratic with roots x1 and x2. Express the discriminant of p(x) in terms of x1 and x2.

5. This question is an interpretation of part√ of Book X of Euclid. We shall be interested in numbers√ of the form a + b where a and b are rational and b > 0 where b is irrational1. 1Remember that irrational means not rational. 76 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED √ √ √ (a) If a = b + c where c is irrational Then b = 0. √ √ √ √ (b) If a + b = c + d where √a and √c are rational and b and d are irrational then a = c and b = d. √ √ √ (c) Prove that the square roots of a + b have the form ±( x + y).

3.3 Order

In addition to algebraic operations, the real numbers are also ordered: we can always say of two real numbers whether they are equal or whether one of them is bigger than the other. I shall write down first the axioms for order that hold both for rational and complex numbers. The following notation is important. If a ≤ b and a 6= b then we write a < b and say that a is strictly less than b.

Axioms for order

(O1) For every element a ≤ a.

(O2) If a ≤ b and b ≤ a then a = b.

(O3) If a ≤ b and b ≤ c then a ≤ c.

(O4) Given any two elements a and b then either a ≤ b or b ≤ a or a = b.

If a > 0 the we say that it is positive and if a < 0 we say it is negative.

(O5) If a ≤ b and c ≤ d then a + b ≤ b + d.

(O6) If a ≤ b and c is positive then ac ≤ bc.

The only axiom that you really have to watch is (O6). Here is an example of a proof using these axioms.

Example 3.3.1. We prove that a ≤ b if, and only if, b − a is positive. Since this statement involves an ‘if, and only, if’ there are, as usual,two statements to be proved. Suppose first that a ≤ b. By axiom (O5), we may add −a to both sides to get a+(−a) ≤ b+(−a). But a+(−a) = 0 and b+(−a) = b−a, by definition. It follows that 0 ≤ b − a and so b − a is positive. Now we prove the converse. Suppose that b − a is positive. Then by definition 0 ≤ b − a. 3.4. THE REAL NUMBERS 77

Also by definition, b − a = b + (−a). Thus 0 ≤ b + (−a). By axiom (O5), we may add a to both sides to get 0 + a ≤ (b + (−a)) + a. But 0 + a = a and (b + (−a)) + a quickly simplifies to b. We have therefore proved that a ≤ b, as required.

Exercises 3.3

1. Prove that between any two distinct rational numbers there is another rational number.

2. Prove the following using the axioms.

(a) If a ≤ b then −b ≤ −a. (b) a2 is positive for all a 6= 0. (c) If 0 < a < b then 0 < b−1 < a−1.

3.4 The real numbers

The axioms I have introduced so far apply equally well to both the rational numbers Q and the real numbers R. But we have seen that√ although Q ⊆ R the two sets are not equal because we have proved that 2 ∈/ Q. In fact, we shall see later that there are many more irrational numbers than there are rational numbers. In this section, I shall explain the fundamental difference between rationals and reals. This material will not be needed in the rest of this book instead its rˆoleis to connect with the foundations of calculus, that is, with analysis. It is convenient to write K to mean either Q or R in what follows because I want to make the same definitions for both sets. A non-empty subset A ⊆ K is said to be bounded above if there is some number b ∈ K so that for all a ∈ A we have that a ≤ b. For example, the set A = {2n : n ≥ 0} is not bounded above since its elements getter bigger and bigger without limit. On 1 n the other hand, the set B = { 2 : n ≥ 0} is bounded above, for example by 1. A non-empty subset A as above is said to have a least upper bound if you can find a number a ∈ K with the following two properties: first of all, a but be an upper bound for A and second of all if b is any upper bound for 78 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED

A then a ≤ b. We shall now apply these definitions to a result we obtained earlier. Let 2 A = {a: a ∈ Q and a ≤ 2} and let 2 B = {a: a ∈ R and a ≤ 2}. 1 Then A ⊆ Q and B ⊆ R. Both sets are bounded above: the number 1 2 , for example, works in both case. However, I shall prove that the subset A does not have a least upper bound, whereas the subset B does. Let’s consider subset A first. Suppose that r were a least upper bound. 2 I claim that√r would have to equal 2 which is impossible because we have proved that 2 is irrational. 2 Suppose first that r < 2. Then I claim there is a rational number r1 such 2 that r < r1 and r1 < 2. Choose any rational number h such that 0 < h < 1 and 2 − r2 h < . 2r + 1 2 Put r1 = r + h. By construction r1 > r. We calculate r1 as follows

2 2 2 2 2 2 2 r1 = r + 2rh + h = r + (2r + h)h < r + (2r + 1)h = r + 2 − r = 2.

2 Thus r1 < 2 as claimed. But this contradicts the fact that r is an upper bound of the set A. Suppose now that 2 < r2. Then I claim that I can find a rational number 2 r2−2 r1 such that r1 < r and 2 < r1. Put h = 2r and define r1 = r − h. Clearly, 2 0 < r1 < r. We calculate r2 as follows

2 2 2 2 2 2 2 2 r1 = r − 2rh + h = r − (r − 2) + h > r − (r − 2) = 2.

But this contradicts the fact that r is supposed to be a least upper bound. We√ have therefore proved that if r is a least upper bound√ of A then r = 2. But this is impossible because we have proved that 2 is irrational. Thus the set A does not have a least upper bound in the rationals. However, by essentially the same reasoning√ the set B does have a least upper bound in the reals: the number 2. This motivates the following definition. It is this axiom that is needed to develop calculus properly. 3.4. THE REAL NUMBERS 79

The completeness axiom for R Every non-empty subset of the reals that is bounded above has a least upper bound.

The Peano Axioms Set theory is supposed to be a framework in which all of mathematics can take place. Let me briefly sketch out how we can construct the real numbers using set theory. The starting point are the Peano axioms studied by G. Peano (1858–1932). These deal with a set P and an operation on this set called the successor function which for each n ∈ P produces a unique element n+. The following four axioms should hold:

(P1) There is a distinguished element of P that we denote by 0.

(P2) There is no element n ∈ P such that n+ = 0.

(P3) If m, n ∈ P and m+ = n+ then m = n,

(P4) If X ⊆ P is such that 0 ∈ X and if n ∈ X then n+ ∈ X then X = P .

By using ideas from set theory, one shows that P is essentially the set of natural numbers together with its operations of addition and multi- plication. The natural numbers are deficient in that it is not always possible to solve equations of the form a + x = b because of the lack of negative numbers. However, we can use set theory to construct Z from N by using ordered pairs. The idea is to regard (a, b) as meaning a − b. However, there are many names for the same so we should have (0, 1) and (2, 3) and (3, 4) all signifying the same number: namely, −1. To make this work, one uses another idea from set theory, that of equivalence relations which we shall meet later. This gives rise to the set Z. Again using ideas from set theory, the usual operations can be constructed on Z. But the integers are deficient because we cannot always solve equa- tions of the form ax + b = 0 because of the lack of rational numbers. To construct them we use ordered pairs again. This time (a, b), where 80 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED

a b 6= 0, is interpreted as b . But again we have the problem of multiple names for what should be the same number. Thus (1, 2) should equal (−1, −2) should equal (2, 4) and so forth. Once again this problem is solved by using an equivalence relation, and once again, the set which arises, which is denoted by Q, is endowed with the usual operations. As√ we have seen, the rationals are deficient in not containing numbers like 2. The intuitive idea behind the construction of the reals from the rationals is that we want to construct R as all the numbers that can be approximated arbitrarily by rational numbers. To do this, we form the set of all subsets X of Q which have the following characteristics: X 6= ∅, X 6= Q, if x ∈ X and y ≤ x then y ∈ X, and X doesn’t have a biggest element. These subsets are called Dedekind cuts and should be regarded as defining the real number r so that X consists of all the rational numbers less than r. Chapter 4

Number theory

Number theory is one of the oldest branches of mathematics and deals, mainly, with the properties of the integers, the simplest kinds of numbers. It is a vast subject, and so this chapter can only be an introduction. The main result proved is that every natural number greater than one can be written as a product of powers of primes, a result known as the fundamental theorem of arithmetic. This shows that the primes are the building blocks, or atoms, from which all natural numbers are constructed. The primes are still the subject of intensive research and the source of many unanswered questions. It is ironic that the numbers we learn about first as children are the source of some of mathematics’ most difficult and interesting questions. The tool that enables this chapter to work is the remainder theorem so that is where we shall start.

4.1 The remainder theorem

We begin by stating a basic result that you may assume as an axiom but which I shall also set as a proof in one of the exercises.

Lemma 4.1.1 (Remainder Theorem). Let a and b be integers where b > 0. Then there are unique integers q and r such that

a = bq + r where 0 ≤ r < b.

81 82 CHAPTER 4. NUMBER THEORY

The number q is called the quotient and the number r is called the re- mainder. For example, if we consider the pair of natural numbers 14 and 3 then 14 = 3 · 4 + 2 where 4 is the quotient and 2 is the remainder. Your first reaction to this result is that it is obvious and you might conclude from this that it is there- fore uninteresting. But this would be wrong. It is certainly not hard to understand but despite that it is important. The reason is that whenever we have a question that involves divisibility, it is very likely going to require the use of this result.

Example 4.1.2. From the remainder theorem, we know that every natural number n can be written as n = 10q + r where 0 ≤ r ≤ 9. The integer r is nothing other than the units digit in the usual base 10 representation of n. Thus, for example, 42 = 10 × 4 + 2. Similarly, it is the remainder theorem that tells us that odd numbers are precisely those that leave remainder 1 when divided by 2.

Let a and b be integers where a 6= 0. We say that a divides b or that b is divisible by a if there is a q such that b = aq. In other words, there is no remainder. We also say that a is a divisor or factor of b. We write a | b to mean the same thing as ‘a divides b’. It is very important to remember that a a | b does not mean the same thing as b . The latter is a number, the former is a statement about two numbers. As a very simple example of the remainder theorem, we shall look at how we write numbers down. I don’t think our hunter-gatherer ancestors worried too much about writ- ing numbers down because there wasn’t any need: they didn’t have to fill in tax-returns and so didn’t need accountants. However, organizing cities does need accountants and so ways had to be found of writing numbers down. The simplest way of doing this is to use a mark like |, called a tally, for each thing being counted. So |||||||||| means 10 things. This system has advantages and disadvantages. The ad- vantage is that you don’t have to go on a training course to learn it. The disadvantage is that even quite small numbers need a lot of space like

|||||||||||||||||||||||||||||||||||||| 4.1. THE REMAINDER THEOREM 83

It’s also hard to tell whether

||||||||||||||||||||||||||||||||||||||| is the same number or not. (It’s not.) It’s inevitable that people will in- troduce abbreviations to make the system easier to use. Perhaps it was in this way that the next development occurred. Both the ancient Egyptians and Romans used similar systems but I’ll describe the Roman system because it involves letters rather than pictures. First, you have a list of basic symbols:

number 1 5 10 50 100 500 1000 symbol I V X L C D M

There are more symbols for bigger numbers. Numbers are then written according to the additive principle. Thus MMVIIII is 2009. Incidently, I understand that the custom of also using a subtractive principle so that, for example, IX means 9 rather than using VIIII, is a more modern innovation. This system is clearly a great improvement on the tally-system. Even quite big numbers are written compactly and it is easy to compare numbers. On the other hand, there is more to learn. The other disadvantage is that we need separate symbols for different powers of 10 and their multiples by 5. This was probably not too inconvenient in the ancient world where it is likely that the numbers needed on a day-to-day basis were never going to be that big. A common criticism of this system is that it is hard to do multiplication in. However, that turns out to be a non-problem because, like us, the Romans used pocket calculators or, more accurately, a device called an abacus that could easily be carried under a toga. The real evidence for the usefulness of this system of writing numbers is that it survived for hundreds and hundreds of years. The system used throughout the world today is quite different and is called the positional number system. It seems to have been in place by the ninth century in India but it was hundreds of years in development and the result of ideas from many different cultures: the invention of zero on its own is one of the great steps in human intellectual development. The genius of the system is that it requires only 10 symbols

0, 1, 2, 3, 4, 5, 6, 7, 8, 9 84 CHAPTER 4. NUMBER THEORY and every natural number can be written using a sequence of these symbols. The trick to making the system work is that we use the position on the page of a symbol to tell us what number it means. Thus 2009 means

103 102 101 100 2 0 0 9

In other words

2 × 103 + 0 × 102 + 0 × 101 + 9 × 100.

Notice the important rˆoleplayed by the symbol 0 which makes it clear to which column a symbol belongs otherwise we couldn’t tell 29 from 209 from 2009. The disadvantage of this system is that you do have to go on a course to learn it because it is a highly sophisticated way of writing numbers. On the other hand, it has the enormous advantage that any number can be written down in a compact way. Once the basic system had been accepted it could be adapted to deal not only with positive whole numbers but also negative whole numbers, using the symbol −, and also fractions with the introduction of the decimal point. By the end of the sixteenth century, the full decimal system was in place.

Notation warning! In the UK, we use a raised decimal point like 0 · 123 and not a comma. Also we generally write the number 1 without a long hook at the top. If you do write it like that there is a danger that people will confuse it with the number 7 which is not always written in the UK with a line through it.

We shall now look in more detail at the way in which numbers can be written down using a positional notation. In order not to be biased, we shall not just work in base 10 but show how any base can be used. Our main tool is the remainder theorem. Let’s see how to represent numbers in base b where b ≥ 2. If d ≤ 10 then we represent numbers by sequences of symbols taken from the set

Zd = {0, 1, 2, 3, . . . d − 1} 4.1. THE REMAINDER THEOREM 85 but if d > 10 then we need new symbols for 10, 11, 12 and so forth. It’s convenient to use A,B,C, .... For example, if we want to write numbers in base 12 we use the set of symbols

{0, 1,..., 9, A, B} whereas if we work in base 16 we use the set of symbols

{0, 1,..., 9, A, B, C, D, E, F }.

If x is a sequence of symbols then we write xd to make it clear that we are to interpret this sequence as a number in base d. Thus BAD16 is a number in base 16. The symbols in a sequence xd, reading from right to left, tell us the con- tribution each power of d such as d0, d1, d2, etc makes to the number the sequence represents. Here are some examples.

Examples 4.1.3. Converting from base d to base 10.

1. 11A912 is a number in base 12. This represents the following number in base 10: 1 × 123 + 1 × 122 + A × 121 + 9 × 120, which is just the number

123 + 122 + 10 × 12 + 9 = 2001.

2. BAD16 represents a number in base 16. This represents the following number in base 10:

B × 162 + A × 161 + D × 160,

which is just the number

11 × 162 + 10 × 16 + 13 = 2989.

3. 55567 represents a number in base 7. This represents the following number in base 10:

5 × 73 + 5 × 72 + 5 × 71 + 6 × 70 = 2001. 86 CHAPTER 4. NUMBER THEORY

These examples show how easy it is to convert from base d to base 10.

There are two ways to convert from base 10 to base d.

1. The first runs in outline as follows. Let n be the number in base 10 that we wish to write in base d. Look for the largest power m of d such that adm ≤ n where a < d. Then repeat for n − adm. Continuing in this way, we write n as a sum of multiples of powers of d and so we can write n in base d.

2. The second makes use of the remainder theorem. The idea behind this method is as follows. Let

n = am . . . a1a0

in base d. We may think of this as

n = (am . . . a1)d + a0

It follows that a0 is the remainder when n is divided by d, and the 0 quotient is n = am . . . a1. Thus we can generate the digits of n in base d from right to left by repeatedly finding the next quotient and next remainder by dividing the current quotient by d; the process starts with our input number as first quotient.

Examples 4.1.4. Converting from base 10 to base d.

1. Write 2001 in base 7. I’ll solve this question in two different ways: the long but direct route and then the short but more thought-provoking route. We see that 74 > 2001. Thus we divide 2001 by 73. This goes 5 times plus a remainder. Thus 2001 = 5 × 73 + 286. We now repeat with 286. We divide it by 72. It goes 5 times again plus a remainder. Thus 286 = 5 × 72 + 41. We now repeat with 41. We get that 41 = 5 × 7 + 6. We have therefore shown that

2001 = 5 × 73 + 5 × 72 + 5 × 7 + 6.

Thus 2001 in base 7 is just 5556. 4.1. THE REMAINDER THEOREM 87

Now for the short method. quotient remainder 7 2001 7 285 6 7 40 5 7 5 5 0 5

Thus 2001 in base 7 is: 5556.

2. Write 2001 in base 12. quotient remainder 12 2001 12 166 9 12 13 10 = A 12 1 1 0 1

Thus 2001 in base 12 is: 11A9.

3. Write 2001 in base 2. quotient remainder 2 2001 2 1000 1 2 500 0 2 250 0 2 125 0 2 62 1 2 31 0 2 15 1 2 7 1 2 3 1 2 1 1 0 1 88 CHAPTER 4. NUMBER THEORY

Thus 2001 in base 2 is (reading from bottom to top):

11111010001.

When converting from one base to another it is always wise to check your calculations by converting back. Number bases have some special terminology associated with them which you might encounter: Base 2 binary.

Base 8 octal.

Base 10 decimal.

Base 12 duodecimal.

Base 16 hexadecimal.

Base 20 vigesimal.

Base 60 sexagesimal. Binary, octal and hexadecimal occur in computer science; there are remnants of a vigesimal system in French and the older Welsh system of counting; base 60 was used by astronomers in ancient Mesopotamia and is still the basis of time measurement (60 seconds = 1 minute, and 60 minutes = 1 hour) and angle measurement. As a final example, of the importance of the remainder theorem, we look at how we may write proper fractions as decimals. To see what’s involved, let’s calculate some decimal fractions. Examples 4.1.5.

1 1. 20 = 0 · 05. This fraction has a finite decimal representation. 1 2. 7 = 0 · 142857142857142857142857142857 .... This fraction has an infinite decimal representation, which consists of the same sequence of numbers repeated. We abbreviate this decimal to 0 · 142857.

37 3. 84 = 0 · 44047619. This fraction has an infinite decimal representation, which consists of a non-repeating part followed by a part which repeats. 4.1. THE REMAINDER THEOREM 89

I shall characterize those fractions which have a finite decimal represen- tation once we have proved our main theorem. I want to focus here on the last two cases. Case (2) is said to be a purely periodic decimal whereas case (3), which is more general, is said to be ultimately periodic.

Proposition 4.1.6. An infinite decimal fraction represents a rational num- ber if and only if it is ultimately periodic.

Proof. The key is in the remainders. Consider the ultimately periodic deci- mal number r = 0 · a1 . . . asb1 . . . bt. We shall prove that r is rational. Observe that

s 10 r = a1 . . . as · b1 . . . bt and s+t 10 = a1 . . . asb1 . . . bt · b1 . . . bt. From which we get that

s+t s 10 r − 10 r = a1 . . . asb1 . . . bt − a1 . . . as where the righthand side is the decimal form of some integer that we shall call a. It follows that a r = 10s+t − 10s is a rational number. The proof of the converse is based on the method we use to compute m the decimal expansion of n . We carry out repeated divisions by n and at each step of the computation we use the remainder obtained to calculate the next digit. But there are only a finite number of possible remainders and our expansion is assumed infinite. Thus at some point there must be repetition.

Example 4.1.7. We shall write the ultimately periodic decimal 0 · 94.¯ as a proper fraction in its lowest terms. Put r = 0 · 94.¯ Then

• r = 0 · 94.¯

• 10r = 9.444 ... 90 CHAPTER 4. NUMBER THEORY

• 100r = 94.444 ....

85 17 Thus 100r −10r = 94−9 = 85 and so r = 90 . We can simplify this to r = 18 . We can now easily check that this is correct.

Exercises 4.1

1. Find the quotients and remainders for each of the following pair of numbers. Divide the smaller into the larger.

(a) 30 and 6. (b) 100 and 24. (c) 364 and 12.

2. Write the number 2009 in

(a) Base 5. (b) Base 12. (c) Base 16.

3. Write the following numbers in base 10.

(a) DAB16.

(b) ABBA12.

(c) 443322115. 4. Write the following decimals as fractions in their lowest terms.

(a)0 · 534. (b)0 · 2106. (c)0 · 076923.

5. Prove the following properties of the division relation on Z. (a) If a 6= 0 then a | a. (b) If a | b and b | a then a = ±b. (c) If a | b and b | c then a | c. 4.2. GREATEST COMMON DIVISORS 91

(d) If a | b and a | c then a | (b + c). 6. This question develops a proof of the remainder theorem. Let a and b be integers with b > 0. Then there exist a unique pair of integers q and r such that a = qb + r where 0 ≤ r < b. (a) Let X = {a − nb: n ∈ Z}. Show that this set contains non-negative elements. (b) Let X+ be the subset of X consisting of non-negative elements. This subset is non-empty by the first step. Use the well-ordering principle to deduce that this set contains a minimum element r. Thus r = a − qb ≥ 0 for some q ∈ Z. (c) Show that if r ≥ b then X+ in fact contains a smaller element, which is a contradiction. (d) We therefore have that a = bq + r where 0 ≤ r < b. It remains to prove that q and r are unique with these propertries. Assume therefore that a = bq0 + r0 where 0 ≤ r0 < b. Deduce that q = q0 and r = r0.

4.2 Greatest common divisors

Let a, b ∈ N. A number d which divides both a and b is called a common divisor of a and b. The largest number which divides both a and b is called the of a and b and is denoted by gcd(a, b). A pair of natural numbers a and b is said to be coprime if gcd(a, b) = 1. For us gcd(0, 0) is undefined but if a 6= 0 then gcd(a, 0) = a. Example 4.2.1. Consider the numbers 12 and 16. The set of divisors of 12 is {1, 2, 3, 4, 6, 12}. The set of divisors of 16 is {1, 2, 4, 8, 16}. The set of common divisors is the set of numbers that belong to both of these two sets: namely, {1, 2, 4}. The greatest common divisor of 12 and 16 is therefore 4. Thus gcd(12, 16) = 4. One application of greatest common divisors is in simplifying fractions. 12 3 For example, the fraction 16 is equal to the fraction 4 because we can divide out by the common divisor of numerator and denominator. The fraction which results cannot be simplified further and is in its lowest terms. 92 CHAPTER 4. NUMBER THEORY

a b Lemma 4.2.2. Let d = gcd(a, b). Then gcd( d , d ) = 1. Proof. Because d divides both a and b we may write a = a0d and b = b0d for some natural numbers a0 and b0. We therefore need to prove that gcd(a0, b0) = 1. Suppose that e | a0 and e | b0. Then a0 = ex and b0 = ey for some natural numbers x and y. Thus a = exd and b = eyd. Observe that ed | a and ed | b and so ed is a common divisor of both a and b. But d is the greatest common divisor and so e = 1, as required. Let me paraphrase what the result above says since it is not surprising. If I divide two numbers by their greatest common divisor then the numbers that remain are coprime. This seems intuitively plausible and the proof ensures that our intuition is correct. Example 4.2.3. Greatest common divisors arise naturally in solving lin- ear equations where we require the solutions to be integers. Consider, for example, the linear equation 12x + 16y = 5. If we want our solutions (x, y) to have real number co-ordinates, then it is of course easy to solve this equation and find infinitely many solutions since the solutions form a line in the plane. But suppose now that we require (x, y) ∈ Z2; that is, we want the solutions to be integers. In other words, we want to know whether the line contains any points with integer co-ordinates. We can see immediately that this is impossible. We have calculated that gcd(12, 16) = 4. Thus if x and y are integers, the number 4 divides the lefthand side of our equation. But clearly, 4 does not divide the righthand side of our equation. Thus the set

2 {(x, y):(x, y) ∈ Z and 12x + 16y = 5} is empty. If the numbers a and b are large, then calculating their gcd in the way I did above would be time-consuming and error-prone. We want to find an efficient method of calculating the greatest common divisor. The following lemma is the basis of just such a method.

Lemma 4.2.4. Let a, b ∈ N, where b 6= 0, and let a = bq+r where 0 ≤ r < b. Then gcd(a, b) = gcd(b, r). 4.2. GREATEST COMMON DIVISORS 93

Proof. Let d be a common divisor of a and b. Since a = bq + r we have that a − bq = r so that d is also a divisor of r. It follows that any divisor of a and b is also a divisor of b and r. Now let d be a common divisor of b and r. Since a = bq + r we have that d divides a. Thus any divisor of b and r is a divisor of a and b. It follows that the set of common divisors of a and b is the same as the set of common divisors of b and r. Thus gcd(a, b) = gcd(b, r). The point of the above result is that b < a and r < b. So calculat- ing gcd(b, r) will be easier than calculating gcd(a, b) because the numbers involved are smaller. Compare z }| { a = b q + r with a = bq + r . | {z } The above result is the basis of an efficient algorithm for computing greatest common divisors. It was described in Propositions 1 and 2 of Book VII of Euclid. Algorithm 4.2.5 (Euclid’s algorithm). Input: a, b ∈ N such that a ≥ b and b 6= 0. Output: gcd(a, b). Procedure: write a = bq + r where 0 ≤ r < b. Then gcd(a, b) = gcd(b, r). If r 6= 0 then repeat this procedure with b and r and so on. The last non-zero remainder is gcd(a, b) Example 4.2.6. Let’s calculate gcd(19, 7) using Euclid’s algorithm. I have highlighted the numbers that are involved at each stage. 19 = 7 · 2 + 5 7 = 5 · 1 + 2 5 = 2 · 2 + 1 ∗ 2 = 1 · 2 + 0 By Lemma 1.3.3 we have that gcd(19, 7) = gcd(7, 5) = gcd(5, 2) = gcd(2, 1) = gcd(1, 0). The last non-zero remainder is 1 and so gcd(19, 7) = 1 and, in this case, the numbers are coprime. 94 CHAPTER 4. NUMBER THEORY

There are occasions when we need to extract more information from Eu- clid’s algorithm as we shall discover later when we come to deal with prime numbers. The following provides what we need.

Theorem 4.2.7 (B´ezout’stheorem). Let a and b be natural numbers. Then there are integers x and y such that

gcd(a, b) = xa + yb.

I shall prove this theorem by describing an algorithm that will compute the integers x and y above. This is achieved by running Euclid’s algorithm in reverse and is called the extended Euclidean algorithm. The procedure for doing so is outlined below but the details are explained in the example that follows it.

Algorithm 4.2.8 (Extended Euclidean algorithm). Input: a, b ∈ N where a ≥ b and b 6= 0. Output: numbers x, y ∈ Z such that gcd(a, b) = xa + yb. Procedure: apply Euclid’s algorithm to a and b; working from bottom to top rewrite each remainder in turn.

Example 4.2.9. This is a little involved so I have split the process up into steps. I shall apply the extended Euclidean algorithm to the example I calculated above. I have highlighted the non-zero remainders wherever they occur, and I have discarded the last equality where the remainder was zero. I have also marked the last non-zero remainder.

19 = 7 · 2 + 5 7 = 5 · 1 + 2 5 = 2 · 2 + 1 ∗

The first step is to rearrange each equation so that the non-zero remainder is alone on the lefthand side.

5 = 19 − 7 · 2 2 = 7 − 5 · 1 1 = 5 − 2 · 2 4.2. GREATEST COMMON DIVISORS 95

Next we reverse the order of the list

1 = 5 − 2 · 2 2 = 7 − 5 · 1 5 = 19 − 7 · 2

We now start with the first equation. The lefthand side is the gcd we are interested in. We treat all other remainders as algebraic quantities and sys- tematically substitute them in order. Thus we begin with the first equation

1 = 5 − 2 · 2.

The next equation in our list is

2 = 7 − 5 · 1 so we replace 2 in our first equation by the expression on the right to get

1 = 5 − (7 − 5 · 1) · 2.

We now rearrange this equation by collecting up like terms treating the high- lighted remainders as algebraic objects to get

1 = 3 · 5 − 2 · 7.

We can of course make a check at this point to ensure that our arithmetic is correct. The next equation in our list is

5 = 19 − 7 · 2 so we replace 5 in our new equation by the expression on the right to get

1 = 3 · (19 − 7 · 2) − 2 · 7.

Again we rearrange to get

1 = 3 · 19 − 8 · 7 .

The algorithm now terminates and we can write

gcd(19, 7) = 3 · 19 + (−8) · 7 , as required. We can also, of course, easily check the answer! 96 CHAPTER 4. NUMBER THEORY

I shall describe a much more efficient algorithm for implementing the extended Euclidean algorithm later in this book when I have discussed ma- trices. A very useful application of B´ezout’stheorem is the following. Lemma 4.2.10. Let a and b be natural numbers. Then a and b are coprime if, and only if, we may find integers x and y such that

1 = xa + yb.

Proof. Suppose first that a and b are coprime. Then by B´ezout’stheorem

gcd(a, b) = ax + by for some integers a and b. But, by assumption, gcd(a, b) = 1. Conversely, suppose that 1 = xa + yb. Then any natural number that divides both a and b must divide 1. It follows that gcd(a, b) = 1. The significance of the above lemma is that whenever you know that a and b are coprime, you can actually write down an expression 1 = xa + yb which means the same thing. This turns out to be enormously useful.

Exercises 4.2

1. Use Euclid’s algorithm to find the gcd’s of the following pairs of num- bers.

(a) 35, 65. (b) 135, 144. (c) 17017, 18900.

2. Use the extended Euclidean algorithm to find integers x and y such that gcd(a, b) = ax+by for each of the following pairs of numbers. You should ensure that your answers for x and y have the correct signs.

(a) 112, 267. (b) 242, 1870. 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 97

3. We know how to find the greatest natural number that divides two numbers. Define now gcd(a, b, c) to be the greatest common divisor of a and b and c jointly. Prove that

gcd(a, b, c) = gcd(gcd(a, b), c).

Deduce that gcd(gcd(a, b), c) = gcd(a, gcd(b, c)). We may similarly define gcd(a, b, c, d) to be the greatest common divisor of a and b and c and d jointly. Calculate gcd(910, 780, 286, 195) and justify your calculations.

4. The following question is by Dubisch Amer. Math. Mon. 69. Define N∗ = N \{0}. A binary operation ◦ defined on N∗ is known to have the following properties:

(a) a ◦ b = b ◦ a. (b) a ◦ a = a. (c) a ◦ (a + b) = a ◦ b.

Prove that a ◦ b = gcd(a, b). Hint: the question is not asking you to prove that gcd(a, b) has these properties.

5. You have an unlimited supply of 3 cent stamps and an unlimited supply of 5 cent stamps. By combining stamps of different values you can make up other values: for example, three 3 cent stamps and two 5 cent stamps make the value 19 cents. What is the largest value you cannot make? Hint: you need to show that the question makes sense.

6. Let n ≥ 1. Define φ(n) to be the number of numbers less than or equal to n and coprime to n. This is the Euler totient function. Tabulate the values of φ(n) for 1 ≤ n ≤ 12.

4.3 The fundamental theorem of arithmetic

The goal of this section is to state and prove the most basic result about the natural numbers: each natural number, excluding 0 and 1, can be written 98 CHAPTER 4. NUMBER THEORY as a product of powers of primes in essentially one way. The primes are therefore the ‘atoms’ from which all natural numbers can be built. A proper divisor of a natural number n is a divisor that is neither 1 nor n. A natural number n is said to be prime if n ≥ 2 and the only divisors of n are 1 and n itself. A number bigger than or equal to 2 which is not prime is said to be composite. It is important to remember that the number 1 is not a prime. The only even prime is the number 2. The properties of primes have exercised a great fascination ever since they were first studied and continue to pose questions that mathematicians have yet to solve. There are no nice formulae to tell us what the nth prime is but there are still some interesting results in this direction. The polynomial p(n) = n2 − n + 41 has the property that its value for n = 1, 2, 3, 4,..., 40 is always prime. Of course, for n = 41 it is clearly not prime. In 1971, the mathematician Yuri Matijasevic found a polynomial in 26 variables of degree 25 with the property that when non-negative integers are substituted for the variables the positive values it takes are all and only the primes. However, this polynomial does not generate the primes in any particular order. Lemma 4.3.1. Let n ≥ 2. Either n is prime or the smallest proper divisor of n is prime. Proof. Suppose n is not prime. Let d be the smallest proper divisor of n. If d were not prime then d would have a smallest proper divisor and this divisor would in turn divide n, but this would contradict the fact that d was the smallest proper divisor of n. Thus d must itself be prime. The following was also proved by Euclid: it is Proposition 20 of Book IX of Euclid. Theorem 4.3.2. There are infinitely many primes.

Proof. Let p1, . . . , pn be the first n primes. Put

N = (p1 . . . pn) + 1.

If N is a prime, then N is a prime bigger than pn. If N is composite, then N has a prime divisor p by Lemma 4.3.1. But p cannot equal any of the primes p1, . . . , pn because N leaves remainder 1 when divided by pi. It follows that p is a prime bigger than pn. Thus we can always find a bigger prime. It follows that there must be an infinite number of primes. 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 99

Example 4.3.3. It’s interesting to consider some specific cases of the num- bers introduced in the above proof. The first few are already prime. • 2 + 1 = 3 prime.

• 2 · 3 + 1 = 7 prime.

• 2 · 3 · 5 + 1 = 31 prime.

• 2 · 3 · 5 · 7 + 1 = 211 prime.

• 2 · 3 · 5 · 7 · 11 + 1 = 2, 311 prime.

• 2 · 3 · 5 · 7 · 11 · 13 + 1 = 30, 031 = 59 · 509.

The Prime Number Theorem There are infinitely many primes but how are those primes distributed? For example, are they arranged fairly regularly, or do the gaps between them get bigger and bigger? There are no formulae which output the nth prime in a usable way, but if we adopt a statistical approach then we can obtain much more useful results. The idea is that for each natural number n we count the number of primes π(n) less than or equal to n. The graph of π(n) has a staircase shape — it certainly isn’t smooth — but as you zoom away it begins to look smoother and smoother. This raises the question of whether there is a smooth function that is a good approximation to π(n). In 1792, the young Carl Friedrich Gauss (1777–1855) observed that π(n) appeared to be close to the value of the n amazingly simple function ln(n) . But proving that this was always true, and not just an artefact of the comparatively small numbers he looked at, turned out to be difficult. Eventually, in 1896 two mathematicians, Jacques Hadamard (1865–1963) and the spectacularly named Charles Jean Gustave Nicolas Baron de la Vall´eePoussin (1866–1962), proved independently of each other that

π(x) lim = 1 x→∞ x/ ln(x)

a result known as the Prime Number Theorem. It was proved using com- plex analysis; that is, calculus using complex numbers. As an example, 100 CHAPTER 4. NUMBER THEORY

we have that π(1, 000, 000) = 78, 498 whereas 106 = 72, 382. ln 106

Algorithm 4.3.4. To decide√ whether a number n is prime or composite. Check to see if any prime p ≤ n divides n. If none of them do, the number n is prime. We shall now explain why this√ works. If a divides√ n then we can√ write n =√ab for some number b. If a < n then b > n whilst if a > n then b < n. Thus to decide if√n is prime or not we need only carry out trial divisions by all numbers a ≤ n. However, this is inefficient because if a divides n and a is not prime then a is divisible by some prime p which must therefore also divide√n. It follows that we need only carry out trial divisions by the primes p ≤ n.

Example 4.3.5. Determine whether 97 is prime using the above√ algorithm. We first calculate the largest whole number less than or equal to 97. This is 9. We now carry out trial divisions of 97 by each prime number p where 2 ≤ p ≤ 9; by the way, if you aren’t certain which of these numbers is prime: just try them all. You’ll get the right answer although not as efficiently. You might also want to remember that if m doesn’t divide a number neither can any multiple of m. In any event, in this case we carry out trial divisions by 2, 3, 5 and 7. None of them divides 97 exactly and so 97 is prime.

Cryptography Prime numbers play an important role in exchanging secret information. In 1976, Whitfield Diffie and Martin Hellman wrote a paper on cryptog- raphy that can genuinely be called ground-breaking. In ‘New directions in cryptography’ IEEE Transactions on Information Theory 22 (1976), 644–654, they put forward the idea of a public-key cryptosystem which would enable

. . . a private conversation . . . [to] be held between any two in- dividuals regardless of whether they have ever communicated before.

With considerable farsightedness, Diffie and Hellman foresaw that such 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 101

cryptosystems would be essential if communication between computers was to reach its full potential. However, their paper did not describe a concrete way of doing this. It was R. I. Rivest, A. Shamir and L. Adle- man (RSA) who found just such a concrete method described in their paper, ‘A method for obtaining digital signatures and public-key cryp- tosystems’ Communications of the ACM 21 (1978), 120–126. Their method is based on the following observation. Given two prime num- bers it takes very little time to multiply them together, but if I give you a number that is a product of two primes and ask you to factorize it then it takes a lot of time. You might like to think about why in relation to the algorithm I gave for factroizing numbers above. After considerable experimentation, RSA showed how to use little more than undergraduate mathematics to put together a public-key cryptosystem that is an essential ingredient in e-commerce. Ironically, this secret code had in fact been invented in 1973 at GCHQ, who had kept it secret.

The following is the key property of primes we shall need to prove the fundamental theorem of arithmetic. It is the main reason why we needed B´ezout’stheorem. It is Proposition 30 of Book VII of Euclid.

Lemma 4.3.6 (Euclid’s lemma).

1. Let p | ab where p is a prime. Then p | a or p | b.

2. Let p | a1 . . . an where p is a prime. Then p | ai for some i.

Proof. (1) Suppose that p does not divide a. We shall prove that p must then divide b. If p does not divide a, then a and p are coprime. By Lemma 4.2.10, there exist integers x and y such that 1 = px + ay. Thus b = bpx + bay. Now p | bp and p | ba, by assumption, and so p | b, as required. (2) This is a typical application for proof by induction. We have proved the base case where n = 2. Assume that the result holds when n = k. We prove that it holds for n = k + 1. Suppose that p | (a1 . . . ak)ak+1. From the base case, either p | a1 . . . ak or p | ak+1. But we may now deduce that p | pi for some 1 ≤ i ≤ k or p | ak+1 by the induction hypothesis. We have therefore proved the result.

Example 4.3.7. The above result is not true if p is not a prime. For example, 6 | 9 × 4 but 6 divides neither 9 nor 4. 102 CHAPTER 4. NUMBER THEORY

Lemma 4.3.6 is so important, I want to spell out in words what it says: If a prime divides a product of numbers it must divide at least one of them.

There is a very nice application of Euclid’s lemma√ to proving that certain numbers are irrational. It generalizes our proof that 2 is irrational described in Chapter 2. Theorem 4.3.8. The square root of every prime number is irrational. √ Proof. We shall prove this by contradiction. Assume that we can write p as a rational. I shall show that this assumption leads to a contradiction and √ a so must be false. We are assuming that p = b . By cancelling the greatest common divisor of a and b we can in fact assume that gcd(a, b) = 1. This √ a will be crucial to our argument. Squaring both sides of the equation p = b and multiplying the resulting equation by b2 we get that pb2 = a2. This says that a2 is divisible by p. But if a prime divides a product of two numbers it must divide at least one of those numbers by Euclid’s lemma. Thus p divides a. Thus we can write a = pc for some natural number c. Substituting this into our equation above we get that pb2 = p2c2. Dividing both sides of this equation by p gives b2 = pc2. This tells us that b2 is divisible by p and so in the same way as above p √ divides b. We have therefore shown that our assumption that p is rational leads to both a and b being divisible by p. But this contradicts the fact that √ gcd(a, b) = 1. Our assumption is therefore wrong, and so p is not a rational number. We now come to the main theorem of this chapter. Theorem 4.3.9 (Fundamental theorem of arithmetic). Every number n ≥ 2 can be written as a product of primes in one way if we ignore the order in which the primes appear. By product we allow the possibility that there is only one prime. 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 103

Proof. Let n ≥ 2. If n is already a prime then there is nothing to prove, so we can suppose that n is composite. Let p1 be the smallest prime divisor of 0 0 0 n. Then we can write n = p1n where n < n. Once again, n is either prime or composite. Continuing in this way, we can write n as a product of primes. We now prove uniqueness. Suppose that

n = p1 . . . ps = q1 . . . qt are two ways of writing n as a product of primes. Now p1 | n and so p1 | q1 . . . qt. By Euclid’s Lemma, the prime p1 must divide one of the qi’s and, since they are themselves prime, it must actually equal one of the qi’s. By relabelling if necessary, we can assume that p1 = q1. Cancel p1 from both sides and repeat with p2. Continuing in this way, we see that every prime occurring on the lefthand side occurs on the righthand side. Changing sides, we see that every prime occurring on the righthand side occurs on the lefthand side. We deduce that the two prime decompositions are identical. When we write a number as a product of primes we usually gather to- gether the same primes into a prime power, and write the primes in increasing order which then gives a unique representation. This is illustrated in the ex- ample below. Example 4.3.10. Let n = 999, 999. Write n as a product of primes. There are a number of ways of doing this but in this case there is an obvious place to start. We have that n = 32 ·111, 111 = 33 ·37, 037 = 33 ·7·5, 291 = 33 ·7·11·481 = 33 ·7·11·13·37. Thus the prime factorisation of 999, 999 is 999, 999 = 33 · 7 · 11 · 13 · 37.

Supernatural Numbers There are natural numbers. Are there super natural numbers? It sounds like a joke but in fact there are, though to be honest they are only encountered in advanced work. But since they are easy to understand and I like the name, I have included a brief description List the primes in order 2, 3, 5, 7,.... By the fundamental theorem of arithmetic, each natural number ≥ 2 may be expressed as a unique product of powers of primes. Let’s write each such natural number as a product all primes. 104 CHAPTER 4. NUMBER THEORY

This can be achieved by including those primes not needed by raising them to the power 0. For example,

10 = 2 · 5 = 21 · 30 · 51 · 70 ...

which we could write as

(1, 0, 1, 0, 0, 0 ...)

and 12 = 22 · 3 = 22 · 31 · 50 · 70 ... which we could write as

(2, 1, 0, 0, 0, 0 ...)

Of course, for each natural number from some point on all the entries will be zero. Thus each natural number ≥ 2 is encoded by an infinite sequence of natural numbers that are zero from some point onwards. We now define a supernatural number to be any sequence

(a1, a2, a3,...)

where the ai are natural numbers. We define a natural number to be a supernatural number where the ai = 0 for all i ≥ m for some natural number m ≥ 1. This makes sense because each natural supernatural number can be regarded as the encoded version of a natural number in the non-super sense. I shall denote the set of supernatural numbers by S; this is not yet the complete list since I still have to add some special such numbers. I shall denote supernatural numbers by bold letters such as a. I shall also denote the ith component by ai. Let a and b be two supernatural numbers. We define their product as follows

(a · b)i = ai + bi. This makes sense because, for example,

10 · 12 = 120 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 105

and

(1, 0, 1, 0, 0, 0 ...) · (2, 1, 0, 0, 0, 0 ...) = (3, 1, 1, 0, 0, 0 ...)

which encodes 233151 = 120. I shall leave you to check that the multi- plication is associative. If we define

1 = (0, 0, 0,...)

and allow it to be supernatural then we also have a multiplicative iden- tity because 1 · a = a = a · 1. Now introduce a new symbol ∞ which satisfies a + ∞ = ∞ = ∞ + a. Then if we allow 0 = (∞, ∞, ∞,...) as a supernatural number then we also have a zero in the set of super- natural numbers since 0 · a = 0 = a · 0. Finally, allow ∞ to occur anyway any number of times in the definition of a supernatural number. Then we have the full set of supernatural numbers. How do you think that we could define gcd(a, b) and lcm(a, b) of supernatural numbers?

I shall now describe two simple applications of our main theorem. The greatest common divisor of two numbers a and b is the largest number that divides into both a and b. On the other hand, if a | c and b | c then we say that c is a common multiple of a and b. The smallest common multiple of a and b is called the of a and b and is denoted by lcm(a, b). You might expect that to calculate the least common multiple we would need a new algorithm, but in fact we can use Euclid’s algorithm as the following result shows.

Proposition 4.3.11. Let a and b be natural numbers not both zero. Then

gcd(a, b) · lcm(a, b) = ab.

Proof. We begin with a special case to motivate the idea. Suppose that a = pr and b = ps where p is a prime. Then it is immediate from the 106 CHAPTER 4. NUMBER THEORY properties of indices that

gcd(a, b) = pmin(r,s) and lcm(a, b) = pmax(r,s) and so, in this special case, we have that gcd(a, b) · lcm(a, b) = ab. Next suppose that the prime factorizations of a and b are

r1 rm s1 sm a = p1 . . . pm and b = p1 . . . pm where the pi are primes. We may easily determine the prime factorization of gcd(a, b) when we bear in mind the following points. The primes that occur in the prime factorization of gcd(a, b) must be from the set {p1, . . . , pm}, the min(ri,si) number pi divides gcd(a, b) but no higher power does. It follows that

min(r1,s1) min(rm,sm) gcd(a, b) = p1 . . . pm .

A similar kind of argument proves that

max(r1,s1) max(rm,sm) lcm(a, b) = p1 . . . pm .

The proof of the fact that gcd(a, b)·lcm(a, b) = ab now follows by multiplying the above two prime factorizations together. In the above proof, we assumed that a and b had prime factorizations using the same set of primes. This need not be true in general, but by allowing zero powers of primes we can easily arrange for the same sets of primes to occur and the argument above remains valid.

For our next result, we begin with an observation. Some fractions, such 1 as 2 can be written with only a finite number of digits after the decimal 1 place, but others, such as 3 require an infinite number of digits. We can now account for this using our main theorem.

a Proposition 4.3.12. A proper rational number b in its lowest terms has a finite decimal expansion if and only if b = 2m5n for some natural numbers m and n.

a Proof. Let b have the finite decimal representation 0 · a1 . . . an. This means a a a a = 1 + 2 + ... + n . b 10 102 10n 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 107

The righthand side is just the fraction a 10n−1 + a 10n−2 + ... + a 1 2 n . 10n The denominator contains only the prime factors 2 and 5 and so the reduced form will also only contain at most the prime factors 2 and 5. To prove the converse, consider the proper fraction a . 2α5β If α = β then the denominator is 10α. If α 6= β then multiply the fraction by a suitable power of 2 or 5 as appropriate so that the resulting fraction has denominator a power of 10. But any fraction with denominator a power of 10 has a finite decimal expansion.

Exercises 4.3

1. List the primes less than 100. Hint: use the Sieve of Eratosthenes1 which can be used to construct a table of all primes up to the number N. List all numbers from 2 to N inclusive. Mark 2 as prime and then cross out from the table all numbers which are multiples of 2. The process now iterates as follows. Find the smallest number which is not marked as a prime and which has not been crossed out. Mark it as a prime and cross out all its multiples. If no such number can be found then you have found all primes less than or equal to N.

2. For each of the following numbers use Algorithm 4.3.4 to determine whether they are prime or composite. When they are composite find a prime factorization. Show all working.

(a) 131. (b) 689. (c) 5491.

3. Find the lowest common multiples of the following pairs of numbers.

1Eratosthenes of Cyrene who lived about 250 BCE. He is famous for using geometry and some simple observations to estimate the circumference of the earth. 108 CHAPTER 4. NUMBER THEORY

(a) 22, 121. (b) 48, 72. (c) 25, 116.

4. Given 24 · 3 · 55 · 112 and 22 · 56 · 114, calculate their greatest common divisor and least common multiple.

5. Use the√ fundamental theorem of arithmetic to show that we can always write n, where n is a natural number, as a product of a natural number and a product of square roots of primes. Calculate the square roots of the following numbers exactly using the above method.

(a) 10. (b) 42. (c) 54.

6. Let a and b be coprime. Prove that if a | bc then a | c.

4.4 Modular arithmetic

From an early age, we are taught to think of numbers as being strung out along the number line

−3 −2 −1 0 1 2 3

But that is not the only way we count. We count the seasons in a cyclic manner

. . . autumn, winter, spring, summer . . . and likewise the days of the week

. . . Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday . . .

Also the months of the year or the hours in a day, whether by means of the 12- hour clock or the 24-hour clock. The fact that we use words for these events obscures the fact that we really are counting. This is clearer in the names for the months since October, November and December were originally the 4.4. MODULAR ARITHMETIC 109 eighth, ninth and tenth months, respectively, until Roman politics intervened and they were shifted. But the counting in all these cases is not linear but cyclic. Rather than using a number line to represent this type of counting, we use instead number circles, and rather than using the words above, I shall use numbers. Here is the number circle for the seasons with numbers replacing words.

0

3 1

2

Adding in these systems of arithmetic means stepping around in a clockwise direction, whereas subtracting means stepping around in an anticlockwise direction. Modular arithmetic is the name given to these different systems of cyclic counting. It was Gauss who realised that these different systems of counting were mathematically interesting.

4.4.1 Congruences Let n ≥ 2 be a fixed natural number which in this context we call the modulus. If a, b ∈ Z we write a ≡ b if, and only if, a and b leave the same remainder when divided by n or, what amounts to the same thing, n | a − b. Here are a couple of simple examples. If n = 2, then a ≡ b if, and only if, a and b are either both odd or both even. On the other hand, if n = 10 then a ≡ b if, and only if, a and b have the same units digit. The symbol ≡ is a modification of the equality symbol =. If a ≡ b with respect to n we say that a is congruent to b modulo n. In fact, congruence behaves like a weakened form of equality as we now show.

Lemma 4.4.1. Let n ≥ 2 be a fixed modulus.

1. a ≡ a. 110 CHAPTER 4. NUMBER THEORY

2. a ≡ b implies b ≡ a.

3. a ≡ b and b ≡ c implies that a ≡ c.

4. a ≡ b and c ≡ d implies that a + c ≡ b + d.

5. a ≡ b and c ≡ d implies that ac ≡ bd.

Here is a very simple application of modular arithmetic.

Lemma 4.4.2. A natural number n is divisible by 9 if, and only if, the sum of the digits of n is divisible by 9.

Proof. We shall work modulo 9. The proof hinges on the fact that 10 ≡ 1 modulo 9. By using Lemma 4.4.1, we quickly find that 10r ≡ 1 for all natural numbers r ≥ 1. We use this result now. Let

n n−1 n = an10 + an−110 + ... + a110 + a0.

Then n ≡ an + ... + a0. Thus n and the sum of the digits of n leave the same remainder when divided by 9, and so n is divisible by 9 if, and only if, the sum of the digits of n are divisible by 9. Solving a linear equation such as ax+by = c is very easy. For each possible real value of x we can compute the corresponding real value of y. But suppose now that a, b and c are integers and we only want to find solutions (x, y) whose co-ordinates are integers? This is an example of a Diophantine equation. We shall show how it may solved with the help of modular arithmetic. First, we shall show that the problem of finding integer solutions is equivalent to solving a simple kind of liner equation in one unknown in modular arithmetic.

Lemma 4.4.3. Let a, b and c be integers. Then the following are equivalent.

1. The pair (x1, y1) is an integer solution to ax + by = c for some y1.

2. The integer x1 is a solution to the equation ax ≡ c (mod b).

Proof. (1) ⇒ (2). Suppose that ax1 + by1 = c. Then it is immediate that ax1 ≡ c (mod b). (2) ⇒ (1). Suppose that ax1 ≡ c (mod b). Then by definition, ax1 − c = bz1 for some integer z1. Thus ax1 + b(−z1) = c. We may therefore put y1 = z1. 4.4. MODULAR ARITHMETIC 111

We shall now describe how to solve all equations of the form

ax ≡ b (mod n).

Lemma 4.4.4. Consider the linear congruence ax ≡ b (mod n). 1. The linear congruence has a solution if, and only if, d = gcd(a, n) is such that d | b.

2. If the condition in part (1) holds and x0 is any solution, then all solu- tions have the form n x = x + t 0 d where t ∈ Z.

Proof. (1). Suppose first that x1 is a solution to our linear congruence. Then by definition, ax1 −b = nq for some integer q. It follows that ax1 +n(−q) = b. By definition d | a and d | n and so d | b. We now prove the converse. By B´ezout’s theorem, we may find integers u and v such that au + nv = d. By assumption, d | b and so b = dw for some integer w. It follows that auw + nvw = dw = b. Thus a(uw) ≡ b (mod n), and we have found a solution. (2) Let x0 be any one solution to ax ≡ b (mod n). It is routine to check n that x = x0 + t d for any t ∈ Z. Let x1 be any solution to ax ≡ b (mod n). Then a(x1 − x0) ≡ 0 (mod n). Thus a(x1 − x0) = tn for some integer t. The result now follows. There is a special case of the above result that is very important. Its proof is immediate. Corollary 4.4.5. Let p be a prime. Then the linear congruence ax ≡ b (mod p), where a is not congruent to 0 modulo p, always has a solution, and all solutions are congruent modulo p. Example 4.4.6. Let’s find all the points on the line 2x + 3y = 5 that have integer co-ordinates. Observe first that gcd(2, 3) = 1. Thus such points exist. In this case, by inspection, 1 = 2 · 2 + (−1)3. Thus 5 = 10 · 2 + (−5)3. It follows that (10, −5) is one point on the line with integer co-ordinates. Thus the set of integer solutions is

{(10 + 3t, −5 − 2t): t ∈ Z}. 112 CHAPTER 4. NUMBER THEORY

4.4.2 Wilson’s theorem I shall finish off this section with an application of congruences to primes. It is the first hint of hidden patterns in the primes. We need some notation first. For each natural number n define n!, pronounced n factorial, or if you are more extrovert n shriek, as follows: 0! = 1 and for n > 0 define n! = n · (n − 1)!. In other words, n! is what you get when you multiply together all the positive integers less than or equal to n. For each natural number n, we shall be interested in the value of (n − 1)! modulo n. Observe that there is no point in studying n! (mod n) since the answer is always 0. It’s worth doing some numerical calculations first to see if you can spot a pattern. Theorem 4.4.7 (Wilson’s Theorem). Let n be a natural number. Then n is a prime if, and only if, (n − 1)! ≡ n − 1 (mod n) Since n − 1 ≡ −1 (mod n) this is usually expressed in the form (n − 1)! ≡ −1 (mod n). Proof. The statement to be proved is an ‘if, and only if’ and so we have to prove two statements: (1) If n is prime then (n − 1)! ≡ n − 1 (mod n). (2) If (n − 1)! ≡ n − 1 (mod n) then n is prime. We prove (1) first. Let n be a prime. The result is clearly true when n = 2 so we may assume n is an odd prime. For each 1 ≤ a ≤ n − 1 there is a unique number 1 ≤ b ≤ n − 1 such that ab ≡ 1 (mod n). If a = b then a1 ≡ 1 (mod n) which means that n | (a − 1)(a + 1). Since n is a prime either n | a − 1 or a | a + 1. This can only occur if a = 1 or a = n − 1. Thus (n − 1)! ≡ n − 1 (mod n), as claimed. We now prove (2). Suppose that (n−1)! ≡ n−1 (mod n). We prove that n is a prime. Observe that when n = 1 we have that (n − 1)! = 1 which is not congruent to 0 modulo 1. When n = 4, we get that (4−1)! ≡ 2 (mod 4). Suppose that n > 4 is not prime. Then n = ab where 1 < a, b < n. If a 6= b then ab occurs as a factor of (n − 1)! and so this is congruent to 0 modulo n. If a = b then a occurs in (n − 1)! and so does 2a. Thus n is again a factor of (n − 1)!. This theorem is interesting for another reason. To show that a number is prime, we would usually apply the algorithm we described earlier which 4.5. CONTINUED FRACTIONS 113 is just a systematic way of carrying out trial division. This theorem shows that a number is prime in a completely different way. Although it is not a pratical test for deciding whether a number is prime or composite, since n! gets very big very quickly, it shows that there might be backdoor ways of showing that a number is prime. This is a very important question in the light of the rˆoleof prime numbers in cryptography.

4.5 Continued fractions

The goal of this section is to show how some of the ideas we have introduced so far can interact with each other. The material we cover is not needed elsewhere in this book.

4.5.1 Fractions of fractions We return to an earlier calculation. We used Euclid’s algorithm to calculate gcd(19, 7) as follows.

19 = 7 · 2 + 5 7 = 5 · 1 + 2 5 = 2 · 2 + 1 2 = 1 · 2 + 0

We first rewrite each line, except the last, as follows 19 5 = 2 + 7 2 7 2 = 1 + 5 5 5 1 = 2 + 2 2 Take the first equality 19 5 = 2 + . 7 2 5 7 But 7 is the reciprocal of 5 , and from the second equality 7 2 = 1 + . 5 5 114 CHAPTER 4. NUMBER THEORY

If we combine them, we get

19 1 = 2 + 2 7 1 + 5 however strange this may look. We may repeat the process to get

19 1 = 2 + 7 1 1 + 1 2 + 2 Fractions like this are called continued fractions. Suppose I just gave you

1 2 + 1 1 + 1 2 + 2 You could work out what the usual rational expression was by working from the bottom up. First compute the part in bold below

1 2 + 1 1 + 1 2 + 2 to get 1 2 + 1 1 + 5 2 which simplifies to 1 2 + 2 1 + 5 This process can no be repeated and we shall eventually obtain a standard fraction. I am not going to develop the theory of continued fractions, but I shall show you one more application. Let r be a real number. We may write r as r = m1 + r1 where 0 ≤ r1 < 1. For example, π may be written as π = 3 · 14159265358 ... where here m = 3 and r1 = 0 · 14159265358 .... Now 4.5. CONTINUED FRACTIONS 115

1 since r1 < 1 and assume that it is non-zero. Then > 1. We may therefore r1 1 repeat the above process and write = m2 + r2 where once again r2 < 1. r1 This begin to feel an aweful lot like what we did above. In fact, we may write 1 r = m1 + , m2 + r2 and we can continue the above process with r2. It looks like we would obtain a continued fraction representation of r with the big difference that it could be infinite. Here is a concrete example. √ √ Example 4.5.1. We apply the above process to 3. Clearly, 1 < 3 < 2. Thus √ √ 3 = 1 + ( 3 − 1) √ where 3 − 1 < 1. We now focus on 1 √ . 3 − 1

√To convert this into a more usable form we multiple top and bottom by 3 + 1. We therefore get that 1 1 √ √ = ( 3 + 1). 3 − 1 2 √ 1 1 It is clear that 1 < 2 ( 3 + 1) < 1 2 . Thus √ 1 3 − 1 √ = 1 + . 3 − 1 2 We now focus on 2 √ 3 − 1 √ which simplifies to 3 + 1. Clearly √ 2 < 3 + 1 < 3. √ √ Thus 3 + 1 = 2 + ( 3 − 1). However, we have now gone full circle. Let’s see what we have obtained. We have that √ 1 3 = 1 + . 1 1 + √ 2 + ( 3 − 1) 116 CHAPTER 4. NUMBER THEORY √ However, we saw above that the pattern repeats as 3 − 1, so what we actually have is √ 1 3 = 1 + . 1 1 + 1 2 + 1 1 + ... Let’s see where we are by computing

1 1 + 1 1 + 1 2 + 1 √ 7 which simplifies to 4 . You can check that this is an approximation to 3.

4.5.2 Rabbits and pentagons We now illustrate some of the ways that algebra and geometry may inter- act. We begin with an artificial looking question. In his book, Liber Abaci, Fibonacci raised the following little puzzle which I’ve taken from MacTutor: “A certain man put a pair of rabbits in a place surrounded on all sides by a wall. How many pairs of rabbits can be produced from that pair in a year if it is supposed that every month each pair begets a new pair which from the second month on becomes productive?” These are obviously mathematical rabbits rather than real ones so let me spell out the rules more explicitly: Rule 1 The problem begins with one pair of immature rabbits.2

Rule 2 Each immature pair of rabbits takes one month to mature.

Rule 3 Each mature pair of rabbits produces a new immature pair at the end of a month. 2Fibonacci himself seems to have assumed that the starting pair was already mature but we shan’t. 4.5. CONTINUED FRACTIONS 117

Rule 4 The rabbits are immortal. The important point is that we must solve the problem using the rules we have been given. To do this, I am going to draw a picture. I will represent an immature pair of rabbits by ◦ and a mature pair by •. Rule 2 will be represented by ◦

• and Rule 3 will be represented by • @ ~~ @@ ~~ @@ ~~ @@ •~ ◦ Rule 1 tells us that we start with ◦. Applying the rules we obtain the following picture for the first 4 months. ◦ 1 pair

• 1 pair ¢ << ¢¢ << ¢¢ << ¢¢ << • Ñ¢  ◦ 2 pairs ¢ << << ¢¢ << << ¢¢ << << ¢¢ << << • Ñ¢  ◦  • 3 pairs ¢ << ¢ << ¢¢ << ¢¢ << ¢¢ << ¢¢ << ¢¢ << ¢¢ << •Ñ¢  ◦ • •Ñ¢  ◦ 5 pairs

We start with 1 pair and at the end of the first month we still have 1 pair, at the end of the second month 2 pairs, at the end of the third month 3 pairs, and at the end of the fourth month 5 pairs. I shall write this F0 = 1, F1 = 1, F2 = 2, F3 = 3, F4 = 5, and so on. Thus the problem will be solved if we can compute F12. There is an apparent pattern in the sequence of numbers 1, 1, 2, 3, 5,... after the first two terms in the sequence each number is the sum of the previous two. Let’s check that we are not just seeing things. Suppose that the number of immature pairs of rabbits at a given time t is 118 CHAPTER 4. NUMBER THEORY

It and the number of mature pairs is Mt. Then using our rules at time t + 1 we have that Mt+1 = Mt + It and It+1 = Mt. Thus

Ft+1 = 2Mt + It. Similarly Ft+2 = 3Mt + 2It. It is now easy to check that

Ft+2 = Ft+1 + Ft.

The sequence of numbers such that F0 = 1, F1 = 1 and satisfying the rule Ft+2 = Ft+1 + Ft is called the Fibonacci sequence. We have that

F0 = 1,F1 = 1,F2 = 2,F3 = 3,F4 = 5,F5 = 8,F6 = 13,F7 = 21,

F8 = 34,F9 = 55,F10 = 89,F11 = 144,F12 = 233. The solution to the original question is therefore 233 pairs of rabbits. Fibonacci numbers arise in the most diverse situations: famously, in phyl- lotaxis which is the study of how leaves and petals are arranged on plants. We shall now look for a formula that will enable us to calculate Fn directly. To begin, we’ll follow an idea due to the astronomer Jonannes Kepler, and look at the behaviour of the fractions Fn+1 as n gets bigger and bigger. I Fn have tabulated some calculations below.

F1 F2 F3 F4 F5 F6 F7 F14 F0 F1 F2 F3 F4 F5 F6 F13 1 2 1 · 5 1 · 6 1 · 625 1 · 615 1 · 619 1 · 6180 These ratios seem to be going somewhere; the question is: where? Notice that F F + F F 1 n+1 = n n−1 = 1 + n−1 = 1 + . Fn Fn Fn Fn Fn−1 But for very large n we suspect that Fn+1 and Fn will be almost the same. Fn Fn−1 This suggests, but doesn’t prove, that we need to find the positive solution x to 1 x = 1 + . x Thus x is a number that when you take its reciprocal and add 1 you get x back again. This problem is really a quadratic equation in disguise x2 = x + 1 or more usually x2 − x − 1 = 0. 4.5. CONTINUED FRACTIONS 119

This equation can be solved very simply to give us √ 1 ± 5 x = . 2 That is √ √ 1 + 5 1 − 5 φ = and φ¯ = . 2 2 The number φ is called the golden ratio, about which a deal of nonsense has been written. Let’s go back and see if this calculation makes sense. First we calculate φ and we get φ = 1 · 618033988 ... I compute F 6765 19 = = 1 · 618033963 F18 4181 on my pocket calculator. This is pretty close. We can now get our formula for the Fibonacci numbers. Define

1 n+1 ¯n+1 fn = √ φ − φ . 5

I’m going to show you that Fn = fn. To do this, I’ll use the following identities which are straightforward to check √ φ − φ¯ = 5 φ2 = φ + 1 and φ¯2 = φ¯ + 1.

Let’s start with f0. We know that √ φ − φ¯ = 5 and so we really do have that f0 = 1. To calculate f1 we use the other formulae and again we get f1 = 1. We now calculate fn + fn+1 we get 1 n+1 ¯n+1 1 n+2 ¯n+2 fn + fn+1 = √ φ − φ + √ φ − φ 5 5 1 = √ φn+1 + φn+2 − (φ¯n+1 + φ¯n+2) 5 1 = √ φn+1(1 + φ) − φ¯n+1(1 + φ¯) 5 1 = √ φn+1φ2 − φ¯n+1φ¯2 5 1 n+3 ¯n+3 = √ φ − φ = fn+2 5 120 CHAPTER 4. NUMBER THEORY

Because fn and Fn start in the same place and satisfy the same rules, we have therefore proved that

F = √1 φn+1 − φ¯n+1 . n 5

At this point, we can go back and verify our original idea that the fractions Fn+1 seem to get closer and closer to φ as n gets larger and larger. We have Fn that F φn+2 − φ¯n+2 n+1 = n+1 n+1 Fn φ − φ¯ φ 1 = − φ¯ n+1 1 ( φ )n+1 − 1 1 − ( φ ) φ¯ φ¯ φ¯ I have rewritten it like this so that we can see what happens as n gets larger φ¯ and larger. Observe that the absolute value of φ is less than 1. So as n gets larger and larger the first term above gets closer and closer to φ. Now look φ at the second term. The absolute value of the fraction φ¯ is strictly greater than 1. Thus as n gets larger and larger the denominator of the second term gets larger and larger and so the fraction as a whole gets smaller and smaller. Thus we have proved that Fn+1 really is close to φ when n is large. Fn So far, what we have been doing is algebra. I shall now show that there is geometry here as well. Below is a picture of a regular pentagon. I have assumed that the length of the sides is 1. I claim that the length of a diagonal, such as BE, is equal to φ.

B C 1

φ A

D

E

To prove this I am going to use Ptolomy’s theorem. We shall concentrate on the cyclic quadrilateral formed by the vertices ABDE. 4.5. CONTINUED FRACTIONS 121

B C

A

D

E

I’ll let the side of a diagonal be x. Then by Ptolomy’s theorem, we have that

x2 = 1 + x.

But this is precisely the quadratic equation we solved above. Its positive solution is φ and so the length of a diagonal of a regular pentagon with side 1 is φ. This raises the question of whether we can somehow see the Fibonacci numbers in the regular pentagon. The answer is: almost. Consider the diagram below.

B C e0 d0 a0 A c0 D b0

E

I’ve drawn in all the diagonals. The shaded triangle BCD is similar to the shaded triangle Ac0E. This means that they have exactly the same shapes just different sizes. It follows that

Ac0 BC = . AE BD 122 CHAPTER 4. NUMBER THEORY

But AE is a side of the pentagon and so has unit length, and BD is of length φ. Thus 1 AC0 = . φ Now, Dc0 has the same length as BC which is a side of the pentagon. Thus Dc0 = 1. We now have 1 φ = DA = Dc0 + c0A = 1 + . φ Thus, just from geometry, we get 1 φ = 1 + . φ This is a very odd equation because φ is mentioned on both sides. Let’s go with it and repeat: 1 φ = 1 + 1 1 + φ and 1 φ = 1 + 1 1 + 1 1 + φ and 1 φ = 1 + 1 1 + 1 1 + 1 1 + φ and so on. We therefore obtain a continued fraction. For each of these 1 fractions cover up the term φ and then calculate what you see to get

1 1 3 1 5 1, 1 + = 2, 1 + 1 = , 1 + = ,... 1 1 + 1 2 1 3 1 + 1 1 + 1 and the Fibonacci sequence reappears. Chapter 5

Complex numbers

Why be one-dimensional when you can be two-dimensional? ?

−3 −2 −1 0 1 2 3

?

We begin by returning to the familiar number line, where I have placed the question marks there appear to be no numbers. I shall rectify this by defining the complex numbers which give us a number plane rather than just a number line. Complex numbers play a fundamental rˆolein mathematics. For example, in this chapter, I shall use them to show how e and π — numbers of radically different origins — are in fact connected.

5.1 Complex number arithmetic

In the set of real numbers we can add, subtract, multiply and divide, but we cannot always extract square roots. For example, the real number 1 has the two real square roots 1 and −1, whereas the real number −1 has no real

123 124 CHAPTER 5. COMPLEX NUMBERS square roots, the reason being that the square of any real non-zero number is always positive. In this section, we shall repair this lack of square roots and, as we shall learn, we shall in fact have achieved much more than this. Com- plex numbers were first studied in the 1500’s but were only fully accepted and used in the 1800’s. √ Warning! If r is a positive real number then r is usually interpreted to mean the positive square root. If I want√ to emphasize that both square roots need to be considered I shall write ± r.

When the discriminant of a quadratic equation is strictly less than zero, we know that it has no real roots. In this section, we shall show that in this case the equation has two complex roots. This will mean that quadratic equations will always have two roots. The key step is the following We introduce a new number, denoted by i, whose defining property is that i2 = −1. We shall assume that in all other respects it satisfies the usual axioms of high-school algebra. This assumption will be justified later. We shall now explore the consequences of this definition which turns out to be a profound one for mathematics. The numbers i and −i are the two ‘missing’ square roots of 1. In all other respects, the number i will behave like a real number. Thus if b is any real number then bi is a number, and if a is any real number then a + bi is a number. We therefore formally define a complex number to be a number of the form a+bi where a, b ∈ R. We denote the set of complex numbers by C. Complex numbers are sometimes called imaginary numbers. This is not such a good term: they are not figments of our imagination like unicorns or dragons. Like all numbers they are, however, products of our imagination: no one has seen the complex number number i but, then again, no one has seen the number 2. If z = a + bi then we call a the real part of z, denoted Re(z), and b the complex or imaginary part of z, denoted Im(z). Two complex numbers a + bi and c + di are equal precisely when a = c and b = d. In other words, when their real parts are equal and when their complex parts are equal. We can think of every real number as being a special kind of complex number because if a is real then a = a + 0i. Thus R ⊆ C. Complex numbers 5.1. COMPLEX NUMBER ARITHMETIC 125 of the form bi are said to be purely imaginary. Now we show that we can add, subtract, multiply and divide complex numbers. Addition, subtraction and multiplication are all easy. Let a + bi, c + di ∈ C. To add these numbers means to calculate (a + bi) + (c + di). We assume that the order in which we add complex numbers doesn’t matter and that we may bracket sums of complex numbers how we like and still get the same answer and so we can rewrite this as a+c+bi+di. Next we assume that multiplication of complex numbers distributes over addition of complex numbers to get (a+c)+(b+d)i. Thus (a + bi) + (c + di) = (a + c) + (b + d)i. The definition of subtraction is similar and justified in the same way

(a + bi) − (c + di) = (a − c) + (b − d)i.

To multiply our numbers means to calculate (a + bi)(c + di). We first assume complex multiplication distributes over complex addition to get (a + bi)(c + di) = ac + adi + bic + bidi. Next we assume that the order in which we multiply complex numbers doesn’t matter to get ac + adi + bic + bidi = ac + adi+bci+bdi2. Now we use the fact that i2 = −1 to get ac+adi+bci+bdi2 = ac+adi+bci−bd. We now rearrange the terms to get the following definition of multiplication

(a + bi)(c + di) = (ac − bd) + (ad + bc)i.

Examples 5.1.1. Carry out the following calculations.

1. (7 − i) + (−6 + 3i). We add together the real parts to get 1; adding together −i and 3i we get 2i. Thus the solution is 1 + 2i.

2. (2 + i)(1 + 2i). First we multiply out the brackets as usual to get 2 + 4i + i + 2i2. We now use the fact that i2 = −1 to get 2 + 4i + i − 2. Finally we simplify to get 0 + 5i = 5i.  2 3. 1√−i . Multiply out and simplify to get −i. 2 The final operation is division. We have to show that when a + ib 6= 0 the reciprocal 1 a + ib 126 CHAPTER 5. COMPLEX NUMBERS is also a complex number. We use an idea that can also be applied in other situations called rationalizing the denominator. It is convenient first to define a new operation on complex numbers. Let z = a + bi ∈ C. Define z¯ = a − bi.

The numberz ¯ is called the complex conjugate of z. Why is this operation useful? Let’s calculate zz¯. We have

zz¯ = (a + bi)(a − bi) = a2 − abi + abi − b2i2 = a2 + b2.

Notice that zz¯ = 0 if and only if z = 0. Thus for non-zero complex numbers z, the number zz¯ is a positive real number. Let’s see how we can use the complex conjugate to define division of complex numbers. Our goal is to calculate 1 a + bi where a+bi 6= 0. The first step is to multiply top and bottom by the complex conjugate of a + bi. We therefore get a − bi a − bi 1 = = (a − bi) . (a + bi)(a − bi) a2 + b2 a2 + b2

Examples 5.1.2. Carry out the following calculations.

1+i 1. i . The complex conjugate of i is −i. Multiply top and bottom of the −i+1 fraction to get 1 = 1 − i. i 2. 1−i . The complex conjugate of 1 − i is 1 + i. Multiply top and bottom i(1+i) i−1 of the fraction to get 2 = 2 . 4+3i 3. 7−i . The complex conjugate of 7 − i is 7 + i. Multiply top and bottom (4+3i)(7+i) 1+i of the fraction to get 50 = 2 . We shall need the following properties of the complex conjugate later on.

Lemma 5.1.3.

1. z1 + ... + zn = z1 + ... + zn.

2. z1 . . . zn = z1 ... zn. 5.1. COMPLEX NUMBER ARITHMETIC 127

3. z is real if and only if z = z.

Proof. (1) We prove the case where n = 2. The general case can then be proved using induction. Let z1 = a + ib and z2 = c + id. Then z1 + z2 = (a + c) + i(b + d). Thus z1 + z2 = (a + c) − i(b + d). But z1 = a − ib and z2 = c − id and so z1 + z2 = (a − ib) + (c − id) = (a + c) − (b + d)i. Thus z1 + z2 = z1 + z2. (2) We prove the case where n = 2. The general case can then be proved using induction. Using the notation form part (1), we have that z1z2 = (ac − bd) + (ad + bc)i. Thus z1z2 = (ac − bd) − (ad + bc)i. On the other hand, z1z2 = (ac − bd) − i(ad + bd), as required. (3) If z is real then it is immediate that z = z. Suppose that z = z where z = a + ib. Then a + ib = a − ib. Hence b = −b and so b = 0. It follows that z is real.

We now introduce a way of thinking about complex numbers that enables us to visualize them. A complex number z = a + bi has two components: a and b. It is irresistible to plot these as a point in the plane. The plane used in this way is called the complex plane: the x-axis is the real axis and the y-axis is interpreted as the complex axis.

z = a + ib ib

a

Although a complex number can be thought of as labelling a point in the complex plane, it can also be regarded as labelling the directed line segment from the origin to the point, and this turns out to be the√ more fruitful viewpoint. By Pythagoras’ theorem, the length of this line is a2 + b2. We define √ |z| = a2 + b2 128 CHAPTER 5. COMPLEX NUMBERS where z = a + bi. This is called the modulus1 of the complex number z. Observe that √ |z| = zz.¯ We shall use the following important property of moduli. Lemma 5.1.4. |wz| = |w| |z|. Proof. Let w = a + bi and z = c + di. Then wz = (ac − bd) + (ad + bc)i. Now |wz| = p(ac − bd)2 + (ad + bc)2 whereas |w| |z| = p(a2 + b2)(c2 + d2). But (ac − bd)2 + (ad + bc)2 = (ac)2 + (bd)2 + (ad)2 + (bc)2 = (a2 + b2)(c2 + d2). Thus the result follows. The complex numbers were obtained from the reals by simply adjoining one new number, i, a square root of −1. Remarkably, every complex number has a square root — there is no need to invent any new numbers. Theorem 5.1.5. Every nonzero complex number has exactly two square roots. Proof. Let z = a + bi be a nonzero complex number. We want to find a complex number w so that w2 = z. Let w = x + yi. Then we need to find real numbers x and y such that (x+yi)2 = a+bi. Thus (x2−y2)+2xyi = a+bi, and so equating real and imaginary parts, we have to solve the following two equations x2 − y2 = a and 2xy = b. Now we actually have enough information to solve our problem, but we can make life easier for ourselves by adding one extra equation. To get it, we use 2 2 the modulus function. From (x+yi) = a√+bi we get that |x + yi| = |a + bi|. Now |x + yi|2 = x2 + y2 and |a + bi| = a2 + b2. We therefore have three equations √ x2 − y2 = a and 2xy = b and x2 + y2 = a2 + b2. If we add the first and third equation together we get √ √ a a2 + b2 a + a2 + b2 x2 = + = . 2 2 2 We can now solve for x and therefore for y.

1Plural: moduli 5.1. COMPLEX NUMBER ARITHMETIC 129

Example 5.1.6. Every negative real number has√ two square roots. We have that the square roots of −r, where r > 0 are ±i r.

Example 5.1.7. Find both square roots of 3 + 4i and check your answers. We assume that there is a complex number x + yi where both x and y are real such that (x + yi)2 = 3 + 4i. Squaring and comparing real and imaginary parts we get that the following two equations must be satisfied by x and y

x2 − y2 = 3 and 2xy = 4.

We also have a third equation by taking moduli

x2 + y2 = 5.

Adding the first and third equation together we get x = ±2. Thus y = 1 if x = 2 and y = −1 if x = −2. The roots we want are therefore 2 + i and −2 − i. Of course, one root will be minus the other. Now square either root to check your answer: (2 + i)2 = 4 + 4i − 1 = 3 + 4i, as required.

Remark Notice that the two square roots of a non-zero complex number will have the form w and −w; in other words, one root will be −1 times the other.

If we combine our method for solving quadratics with our method for determining the square roots of complex numbers, we have a method for finding the roots of quadratics with any coefficients, whether they be real or complex.

Example 5.1.8. Solve the quadratic equation

4z2 + 4iz + (−13 − 16i) = 0.

The complex numbers obey the same algebraic laws as the reals and so we can solve this equation by completing the square or we can simply plug the numbers into the formula for the roots of a quadratic. Here I shall complete the square. First, we convert the equation into a monic one

(−13 − 16i) z2 + iz + = 0. 4 130 CHAPTER 5. COMPLEX NUMBERS

Next, we observe that

 i 2 1 z + = z2 + iz − . 2 4

Thus  i 2 1 z2 + iz = z + + . 2 4 Our equation therefore becomes

 i 2 1  13  z + + + − − 4i = 0. 2 4 4

We therefore have  i 2 z + = 3 + 4i. 2 Taking square roots of both sides using a previous calculation, we have that i z + = 2 + i or − 2 − i. 2

i 3i It follows that z = 2 + 2 or − 2 − 2 . Now check that these roots really do work.

Every quadratic equation ALWAYS has exactly two roots.

Exercises 5.1

1. Solve the following problems in complex number arithmetic. In each case, the answer should be in the form a + ib where a and b are real.

(a) (2+3i) + (4 + i). (b) (2 + 3i)(4 + i). (c) (8 + 6i)2. 2+3i (d) 4+i . 1 3 (e) i + 1+i . 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 131

3+4i 3−4i (f) 3−4i − 4+4i .

2. Find the square roots of each of the following complex numbers and check your answers.

(a) −i. √ (b) −1 + i 24. (c) −13 − 84i.

3. Solve the following quadratic equations and check your answers.

(a) x2 + x + 1 = 0. (b)2 x2 − 3x + 2 = 0. (c) x2 − (2 + 3i)x − 1 + 3i = 0.

5.2 The fundamental theorem of algebra

We have proved that every quadratic equation has exactly two roots. The goal of this section is to generalize this result: I shall prove that every poly- nomial equation of degree n has exactly n roots. This result plays a key role in calculus where it is used (in its real version which I also describe) to prove that any rational function can be integrated using partial fractions. In this section, we shall work with arbitrary polynomials so I shall now recall some of the terminology needed to handle them. An expression

n n−1 anx + an−1x + ... + a1x + a0 where ai are complex numbers, called the coefficients, is called a polynomial. If all the coefficients are zero then the polynomial is identically zero and we shall call it the zero polynomial. We assume an 6= 0. The degree of this polynomial is n. We abbreviate this to deg. If an = 1 the polynomial is said to be monic. The term a0 is called the constant term and the term n anx is called the leading term. Polynomials can be added, subtracted and multiplied. Two polynomials are equal if they have the same degree and the coefficients of terms of the same degree are equal.

• Polynomials of degree 1 are said to be linear. 132 CHAPTER 5. COMPLEX NUMBERS

• those of degree 2, quadratic.

• those of degree 3, cubic.

• those of degree 4, quartic.

• those of degree 5, quintic. There are special terms for polynomials of degree higher than 5, if you want them. Why are polynomials interesting? There are two answers to this ques- tion. First, they have widespread applications such as in helping to solve linear differential equations and in studying matrices. Second, a polynomial defines a function which is calculated in a very simple way using the op- erations of addition, subtraction and multiplication. However many, more complicated, functions can be usefully approximated by polynomial ones. We denote by C[x] the set of polynomials with complex coefficients and by R[x], the set of polynomials with real coefficients. I will write F [x] to mean F = R or F = C.

5.2.1 The remainder theorem The addition, subtraction and multiplication of polynomials is easy. We shall therefore concentrate in this section on division. Let f(x), g(x) ∈ F [x]. We say that g(x) divides f(x), denoted by

g(x) | f(x), if there is a polynomial q(x) ∈ F [x] such that f(x) = g(x)q(x). We say that g(x) is a factor of f(x). There are obvious similarities here with our work in Chapter 4. Example 5.2.1. Let f(x) = x4 + 2x + 1 and g(x) = x + 1. Then

x + 1 | x4 + 2x + 1 since x4 + 2x + 1 = (x + 1)(x3 − x2 + x + 1). In multiplying and dividing polynomials the following result is key. Lemma 5.2.2. Let f(x), g(x) ∈ F [x] be non-zero polynomials. Then

deg f(x)g(x) = deg f(x) + deg g(x). 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 133

m n Proof. Let f(x) have leading term amx and let g(x) have leading term bnx . m+n Then the leading term of f(x)g(x) is ambnx . Now ambn 6= 0 and so the degree of f(x)g(x) is m + n, as required. The following result is analogous to the remainder theorem for integers Lemma 4.1.1 I shall not prove it here. Lemma 5.2.3 (Remainder theorem). Let f(x) and g(x) be polynomials in F [x] where deg f(x) ≥ deg g(x). Then either

g(x) | f(x) or f(x) = g(x)q(x) + r(x) where deg r(x) < deg g(x). Example 5.2.4. Let f(x) = x3 +x+3 and g(x) = x2 +x. Then x3 +x+3 = (x − 1)(x2 + x) + (2x + 3). Here x − 1 is the quotient and 2x + 3 is the remainder. The following example is a reminder of how to carry out long division of polynomials. Remember that answers can always be checked by multiplying out. Example 5.2.5. Divide 6x4 + 5x3 + 4x2 + 3x + 2 by 2x2 + 4x + 5 and so find the quotient and remainder. We set out the computation in the following form. 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 To get the term involving 6x4 we would have to multiply the lefthand side by 3x2. As a result we write down the following 3x2 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2

We now subtract the lower righthand side from the upper and we get 3x2 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 −7x3 − 11x2 + 3x + 2 134 CHAPTER 5. COMPLEX NUMBERS

The procedure is now repeated with the new polynomial.

2 7 3x − 2 x 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 −7x3 − 11x2 + 3x + 2 3 2 35 −7x − 14x − 2 x 2 41 3x + 2 x + 2 The procedure is repeated one more time with the new polynomial

2 7 3 3x − 2 x + 2 quotient 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 −7x3 − 11x2 + 3x + 2 3 2 35 −7x − 14x − 2 x 2 41 3x + 2 x + 2 2 12 15 3x + 2 x + 2 29 11 2 x − 2 remainder This is the end of the line because the new polynomial we obtain has degree strictly less than the polynomial we are dividing by. What we have shown is that  7 3 29 11 6x4 + 5x3 + 4x2 + 3x + 2 = 2x2 + 4x + 5 3x2 − x + + x − . 2 2 2 2

You can verify this is true by multiplying out the righthand side.

5.2.2 Roots of polynomials Let f(x) ∈ F [x]. A number r ∈ F is said to be a root or zero of f(x) if f(r) = 0. The roots of f(x) are the solutions of the equation f(x) = 0.

Example 5.2.6. The number 1 is a root of x100−2x98+1 because 1−2+1 = 0.

Checking whether a number is a root is easy, but finding a root in the first place is trickier. The next result tells us that when we find roots of poly- nomials we are in fact determining linear factors. It is crucial to eveything we shall do. 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 135

Proposition 5.2.7. Let r ∈ F . Then r is a root of f(x) ∈ F [x] if and only if (x − r) | f(x).

Proof. Suppose that (x − r) | f(x). Then by definition f(x) = (x − r)q(x) for some polynomial q(x). If we now calculate f(r) we see immediately that it must be zero. We now prove the converse. Suppose that r is a root of f(x). By the remainder theorem, either (x − r) | f(x) or f(x) = q(x)(x − r) + r(x) where deg(r(x)) < deg(x − r) = 1. If the former then we are done. If the latter then it follows that r(x) is in fact a constant (that is, just a number). Call this number a. If we calculate f(r) we get a. It follows that in fact a = 0 and so (x − r) | f(x).

Example 5.2.8. We have seen that the number 1 is a root of x100 −2x98 +1. Thus by the above result (x − 1) | x100 − 2x98 + 1.

A root r of a polynomial f(x) is said to have multiplicity m if

(x − r)m | f(x) but (x − r)m+1 does not divide f(x). A root is always counted according to its multiplicity.

Example 5.2.9. The polynomial x2 + 2x + 1 has −1 as a root and no other roots. However (x + 1)2 = x2 + 2x + 1 and so the root −1 occurs with multiplicity 2. Thus the polynomial has two roots counting multiplicities. This is the sense in which we can say that a quadratic equation always has two roots.

The following result is extremely useful. It provides an upper bound to the number of roots a polynomial may have.

Theorem 5.2.10. A non-constant polynomial of degree n has at most n roots.

Proof. Let f(x) be a non-zero polynomial of degree n > 0. Suppose that f(x) has a root a. Then f(x) = (x − a)f1(x) by Proposition 5.2.7 and the degree of f1(x) is n − 1. This argument can be repeated and we reach the desired conclusion. 136 CHAPTER 5. COMPLEX NUMBERS

5.2.3 The fundamental theorem of algebra The big question I have so far not dealt with is whether a polynomial need have a root at all. This is answered by the following theorem whose name reflects its importance when first discovered, though not its significance in modern algebra. We shall not give a proof because that would require more advanced methods than are covered in this book. It was first proved by Gauss.

Theorem 5.2.11 (Fundamental theorem of algebra (FTA)). Every non- constant polynomial of degree n with complex coefficients has a root.

The fundamental theorem of algebra has the following important conse- quence using Theorem 5.2.10.

Corollary 5.2.12. Every non-constant polynomial with complex coefficients of degree n has exactly n complex roots (counting multiplicities). Thus every such polynomial can be written as a product of linear polynomials.

Proof. Let f(x) be a non-constant polynomial of degree n. By the FTA, this polynomial has a root r1. Thus f(x) = (x−r1)f1(x) where f1(x) is a polyno- mial of degree n − 1. This argument can be repeated and we eventually end up with f(x) = a(x − r1) ... (x − rn) where a is the last quotient, necessarily a complex number.

Example 5.2.13. It can be checked that the quartic x4 − 5x2 − 10x − 6 has roots −1, 3, i − 1 and −1 − i. We can therefore write

x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).

In many practical examples, our polynomials will have real coefficients and we will want any factors of the polynomial to likewise be real. The result above doesn’t do that because it could produce complex factors. However, we can rectify this situation at a very small price. We shall use the notion of the complex conjugate of a complex number that we introduced earlier. We may now prove the following key lemma.

Lemma 5.2.14. Let f(x) be a polynomial with real coefficients. If the com- plex number z is a root then so too is z. 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 137

Proof. Let n n−1 f(x) = anx + an−1x + ... + a1x + a0 where the ai are real numbers. Let z be a complex root. Then

n n−1 0 = anz + an−1z + ... + a1z + a0.

Take the complex conjugate of both side and use the properties of the complex conjugate to get

n n−1 0 = anz¯ + an−1z¯ + ... + a1z¯ + a0 and soz ¯ is also a root.

Example 5.2.15. We saw above that

x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).

Observe that the complex roots −1 − i and −1 + i are complex conjugates of each other.

Lemma 5.2.16. Let z be a complex number which is not real. Then

(x − z)(x − z¯) is an irreducible quadratic with real coefficients. On the other hand, if x2 + bx + c is an irreducible quadratic with real coefficients then its roots are complex conjugates of each other.

Proof. To prove the first claim, we multiply out to get

(x − z)(x − z¯) = x2 − (z +z ¯)x + zz.¯

Observe that z +z ¯ and zz¯ are both real numbers. The discriminant of this polynomial is (z − z¯)2. You can check that if z is complex and non-real then z − z¯ is purely complex. It follows that its square is negative. We have therefore shown that our quadratic is irreducible. The proof of the second claim follows from the formula for the roots of a quadratic combined with the fact that the square root of a negative real will have the form ±αi where α is real. 138 CHAPTER 5. COMPLEX NUMBERS

Example 5.2.17. We saw above that

x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).

Multiply out (x + 1 + i)(x + 1 − i) and we get x2 + 2x + 2. Thus

x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x2 + 2x + 2) with all the polynomials involved being real. The following theorem is the one that we can use to help us solve problems involving real polynomials. Theorem 5.2.18 (Fundamental theorem of algebra for real polynomials). Every non-constant polynomial with real coefficients can be written as a prod- uct of polynomials with real coefficients which are either linear or irreducible quadratic. Proof. We can write the polynomial as a product of linear polynomials. Bring the real linear factors to the front. The remaining linear polynomials will have complex coefficients. They correspond to roots that come in complex conjugate pairs. Multiplying together those complex linear factors corre- sponding to complex conjugate roots we get real quadratics and the result is proved. In fact, we can write any real polynomial as a real number times a product of monic linear and quadratic factors. This result is the basis of the method of partial fractions used in integrating rational functions in calculus. This is discussed in Chapter 6. Finding the exact roots of a polynomial is difficult, in general. However, the following result tells us how to find the rational roots of polynomials with integer coefficients. It is a nice, and perhaps unexpected, application of the number theory we developed in Chapter 4. Theorem 5.2.19 (Rational root theorem). Let

n n−1 f(x) = anx + an−1x + ... + a1x + a0

r be a polynomial with integer coefficients. If s is a root with r and s coprime then r | a0 and s | an. In particular, if the polynomial is monic then any rational roots must be integers and divide the constant term. 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 139

r Proof. Substituting s into f(x) we have, by assumption, that r r r 0 = a ( )n + a ( )n−1 + ... + a ( ) + a . n s n−1 s 1 s 0 Multiply through by sn to get

n n−1 n−1 n 0 = anr + an−1sr + ... + s r + a0s .

n n We now make two observations. First, r | a0s . I claim that r and s are coprime. We may now deduce that r | a0 from a previous exercise. It only remains to prove the claim. Let p be any prime that divides r and sn. Then by Euclid’s lemma, p divides r and s which is a contradiction since r and s n are coprime. It follows that r | a0. Second, s | anr . By a similar argument to the previous case s | an. Example 5.2.20. Find all the roots of the following polynomial

x4 − 8x3 + 23x2 − 28x + 12.

The polynomial is monic and so the only possible rational roots are integers and must divide 12. Thus the only possible rational roots are

±1, ±2, ±3, ±4, ±6, ±12.

We find immediately that 1 is a root and so (x−1) must be a factor. Dividing out by this factor we get the quotient

x3 − 7x2 + 16x − 12.

We check this polynomial for rational roots and find 2 works. Dividing out by (x − 2) we get the quotient

x2 − 5x + 6.

Once we get down to a quadratic we can solve it directly. In this case it factorizes as (x − 2)(x − 3). We therefore have that

x4 − 8x3 + 23x2 − 28x + 12 = (x − 1)(x − 2)2(x − 3).

At this point, multiply out the righthand side and check that we really do have an equality. In this case, all roots are rational and are 1,2,2,3. 140 CHAPTER 5. COMPLEX NUMBERS

Exercises 5.2

1. Find the quotient and remainder when the first polynomial is divided by the second.

(a) x3 − 7x − 1 and x − 2. (b) x4 − 2x2 − 1 and x2 + 3x − 1. (c)2 x3 − 3x2 + 1 and x.

2. Find all roots using the information given.

(a) 4 is a root of 3x3 − 20x2 + 36x − 16. (b) −1, −2 are both roots of x4 + 2x3 + x + 2.

3. Find a cubic having roots 2, −3, 4.

4. Find a quartic having roots i, −i, 1 + i and 1 − i.

5. The cubic x3 + ax2 + bx + c has roots α, β and γ. Show that a, b, c can each be written in terms of the roots. √ 6.3+ i 2 is a root of x4 + x3 − 25x2 + 41x + 66. Find the remaining roots. √ 7.1 − i 5 is a root of x4 − 2x3 + 4x2 + 4x − 12. Find the remaining roots.

8. Find all the roots of the following polynomials.

(a) x3 + x2 + x + 1. (b) x3 − x2 − 3x + 6. (c) x4 − x3 + 5x2 + x − 6.

9. Write each of the following polynomials as a product of linear or quadratic real factors.

(a) x3 − 1. (b) x4 − 1. (c) x4 + 1. 5.3. COMPLEX NUMBER GEOMETRY 141 5.3 Complex number geometry

We have proved that every non-zero complex number has two square roots and from the fundamental theorem of algebra (FTA), we know that every non-zero complex number has three cube roots, and four fourth roots, and more generally n nth roots. However, we didn’t prove the FTA. The main goal of this section is to prove that every non-zero complex number has n nth-roots. To do this, we shall think about complex numbers in a geometric, rather than an algebraic, way. Throughout this section we shall not assume FTA. We shall only need Theorem 5.2.10: every polynomial of degree n has at most n roots.

5.3.1 sin and cos We recall some well-known properties of the trigonometric functions sin and cos. First the addition formulae

sin(α + β) = sin α cos β + cos α sin β and cos(α + β) = cos α cos β − sin α sin β. These formulae were important historically because they enabled unknown values of sin’s and cos’s to be calculated from known ones, and so they were useful in constructing trig tables in the days before calculators Angles are most naturally measured in radians rather than degrees — the system of angle measurement based on degrees is an historical accident. Why 360 degrees in a circle? You would have to ask the Ancient Babylonians. Recall that positive angles are measures in an anticlockwise direction. The sin and cos functions are periodic functions with period 2π. This means that for all angles θ

sin(θ + 2πn) = sin θ and cos(θ + 2πn) = cos θ for all n ∈ Z. This fact will be crucial in what follows.

5.3.2 The complex plane In this section, we shall describe in more detail an alternative way of thinking about complex numbers which turns out to be very fruitful. Recall that a 142 CHAPTER 5. COMPLEX NUMBERS complex number z = a + bi has two components: a and b. We can plot these as a point in the plane. The plane used in this way is called the complex plane: the x-axis is the real axis and the y-axis is interpreted as the complex axis. Although a complex number can be thought of as labelling a point in the complex plane, it can more usefully be regarded as labelling the directed line segment from the origin to the point. This is how we shall regard it. Let z = a + bi be a non-zero complex number and let θ be the angle that it makes with the positive reals. The length of z as a directed line segment in the complex plane is |z|, and by basic trig a = |z| cos θ and b = |z| sin θ. It follows that z = |z| (cos θ + i sin θ) .

z i |z| sin θ

θ |z| cos θ

Observe that |z| is a non-negative real number. This way of writing complex numbers is called the polar form. At this point, I need to clarify the only feature of complex numbers that causes confusion. I have already mentioned that the functions sin and cos are periodic. For that reason, there is not just one number θ that yields the complex number z but infinitely many of them: namely, all the numbers θ + 2πk where k ∈ Z. For this reason, we define the argument of z, denoted by arg z, not merely to be the single angle θ but the set

arg z = {θ + 2πk : where k ∈ Z}. The angle θ is chosen so that 0 ≤ θ < 2π and is called, for convenience, the principal argument. But note that books vary on what they choose to call the principal argument. This feature of the argument plays a crucial role when we come to calculate nth roots. Let w = r (cos θ + i sin θ) and z = s (cos φ + i sin φ) be two non-zero complex numbers. We shall calculate wz. We have that 5.3. COMPLEX NUMBER GEOMETRY 143

wz = rs (cos θ + i sin θ) (cos φ + i sin φ) = rs[(cos θ cos φ − sin θ sin φ) + (sin θ cos φ + cos θ sin φ)i] but using the properties of the sin and cos functions this reduces to

wz = rs (cos(θ + φ) + i sin(θ + φ)) .

We thus have the following important result:

when two non-zero complex numbers are multiplied together their lengths are multiplied and their arguments are added.

This result helps us to understand the meaning of i. Multiplication by i is the same as a rotation about the origin by a right angle. Multiplication by i2 is therefore the same as a rotation about the origin by two right angles. But this is exactly the same as multiplication by −1.

i

−1 1

−i

We may apply similar reasoning to explain geometrically why −1 × −1 = 1. We of course proved this algebraically in Chapter 3. Multiplication by −1 is interpreted as rotation about the origin by 180◦. It follows that doing this twice takes us back to where we started and so is equivalent to multiplication by 1. The proof of the next theorem follows by induction from the result we proved above. But it is important to note that it is the result above that is really fundamental. 144 CHAPTER 5. COMPLEX NUMBERS

Theorem 5.3.1 (De Moivre). Let n be a positive integer. If z = r (cos θ + i sin θ) then zn = rn (cos nθ + i sin nθ) . Example 5.3.2. Observe that complex numbers of the form 1 S = {cos θ + i sin θ : θ ∈ R} can be interpreted geometrically as being the unit circle with centre the origin in the complex plane. Thus every non-zero complex number is a real number times a complex number lying on the unit circle. The set S1 has some interesting algebraic properties as well. Observe that if u, v ∈ S1 then uv ∈ S1, and that if u ∈ S1 then u−1 ∈ S1. Our results above have nice applications in painlessly obtaining trigono- metric identities. Example 5.3.3. If you remember that when multiplying complex numbers in polar form you add their arguments, then you can easily reconstitute the identities we started with since (cos α + i sin β)(cos α + i sin β) = cos(α + β) + i sin(α + β). This is helpful in getting both sines and signs right. Example 5.3.4. Express cos 3θ in terms of cos θ and sin θ using De Moivre’s Theorem. We have that (cos θ + i sin θ)3 = cos 3θ + i sin 3θ. However, we can expand the lefthand side to get cos3 θ + 3i cos2 θ sin θ + 3 sin θ(i sin θ)2 + (i sin θ)3 which simplifies to cos3 θ − 3 cos θ sin2 θ + i 3 cos2 θ sin θ − sin3 θ where we use the fact that i2 = −1 and i3 = −i and i4 = 1. Equating real and imaginary parts we get cos 3θ = cos3 θ − 3 cos θ sin2 θ. We also get the formula sin 3θ = 3 cos2 θ sin θ − sin3 θ for free. 5.3. COMPLEX NUMBER GEOMETRY 145

5.3.3 Arbitrary roots of complex numbers In this section, we shall prove that every non-zero complex number has n nth roots: thus it has three cube roots, and four fourth roots and so on. We begin with a special case that turns out to give us almost all the information we need to solve the general case. We shall also need the following important idea. The word radical simply means a square root, or a cube root, or a fourth root and so on. We regard the four basic operations of algebra — addition, subtraction, multiplication and division — together with the extraction of nth roots as purely algebraic operations. Although slightly failing as a precise definition, I shall say that a radical expression is an algebraic expression involving nth roots. For example, the formula for the roots of a quadratic describes the roots as radical expressions in terms of the coefficients of the quadratic. Thus a radical expression is supposed to be an explicit description of some real number. The following table gives some easy to find radical expressions for the sines and cosines of some well-known angles.

θ sin θ cos θ 0◦ 0 1 √ ◦ 1 3 30 2 2 45◦ √1 √1 √2 2 ◦ 3 1 60 2 2 90◦ 1 0

The nth roots of unity

We shall show that the number 1 has n nth roots — these are called the n roots of unity. We know that the equation zn − 1 = 0 has at most n roots, so all we need do is find n roots and we are home and dry. We begin with a motivating example.

Example 5.3.5. We find the three cube roots of 1. There are two ways of writing these roots: as trigonometric expressions and as radical expressions. Divide the unit circle in the complex plane into an equilateral triangle with 1 ◦ ◦ as one of its vertices. Then the other two roots are ω1 = cos 120 + i sin 120 ◦ ◦ 2π obtained by dividing 2π by 3 and ω2 = cos 240 + i sin 240 which is twice 3 . 2 If we put ω = ω1 then in fact ω2 = ω . This is the trigonometric form of the roots. 146 CHAPTER 5. COMPLEX NUMBERS

ω

1

ω2

In this case, it is easy to write down the radical expressions for the roots as well since we already have radical expressions for sin 60◦ and cos 60◦. We therefore have that 1  √  1  √  ω = −1 + i 3 and ω2 = − 1 + i 3 . 2 2 The general case is solved in a similar way to our example above using regular n-gons in the complex plane where one of the vertices is 1. Theorem 5.3.6 (Roots of unity). The n roots of unity are given by the following formula 2kπ 2kπ cos + i sin n n for k = 1, 2, . . . , n. These complex numbers are arranged uniformly on the unit circle and form a regular polygon with n sides: the cube roots of unity form an equilateral triangle, the fourth roots form a square, the fifth roots form a pentagon, and so on. There is one point here that is potentially confusing. It is always possible, and easy, to write down trigonometric expressions for the nth roots of unity. Using such an expression, we can then write down numerical values of the nth roots to any desired degree of accuracy. Thus, from a purely practical point of view, we can find the nth roots of unity. It is also always possible to write down the radical expressions of the nth roots of unity but this is far from easy in general. In fact, it forms part of the advanced subject known as Galois theory. Example 5.3.7. Gauss proved the following result which is highly non- trivial. You can verify that it is true by using a calculator — at least up to the limits of your calculator. It is a good example of a radical expression 5.3. COMPLEX NUMBER GEOMETRY 147 where, on this occasion, the only radicals that occur are square roots; the theory Gauss developed showed that this implied that the 17-gon could be constructed using only a ruler and compass. 2π √ q √ 16 cos = −1 + 17 + 34 − 2 17 17 s  √ q √ √ q √  + 68 + 12 17 − 16 34 + 2 17 − 2(1 − 17) 34 − 2 17

Arbitrary nth roots The nth roots of unity play an important role in finding arbitrary nth roots. We begin with an example to illustrate the idea.

Example 5.3.8. We√ find the three cube roots of 2. If you use your calculator you will simply find 3 2, a real number. There should be two others: where are they? The explanation is that the other two cube roots are complex. Let ω be the complex cube root of 1 that we described above. Then the three cube roots of 2 are the following √ √ √ 3 2, ω 3 2, ω2 3 2. The above example generalizes. Theorem 5.3.9 (nth roots). Let z = r (cos θ + i sin θ) be a non-zero complex number. Put √  θ θ  u = n r cos + i sin , n n the obvious nth root, and put 2π 2π ω = cos + i sin , n n the first interesting nth root of unity. Then the nth roots of z are as follows u, uω, . . . , uωn−1. It follows that the nth roots of z = r (cos θ + i sin θ) can be written in the form √   θ 2kπ   θ 2kπ  n r cos + + i sin + n n n n for k = 0, 1, 2, . . . , n − 1. This is the reason why every non-zero number has two square roots that differ by a multiple of −1: the two square roots of 1 are 1 and -1. 148 CHAPTER 5. COMPLEX NUMBERS

5.3.4 Euler’s formula We have seen that every real number can be written as a whole number plus a possibly infinite decimal part. It turns out that many functions can also be written as a sort of decimal. I shall illustrate this by means of an example. Consider the function ex. All you need to know about this function is that it is equal to its derivative and e0 = 1. We would like to write

x 2 3 e = a0 + a1x + a2x + a3x + ... where the ai are real numbers that we have yet to determine. We can work out the value of a0 easily by putting x = 0. This tells us that a0 = 1. To get the value of a1 we first differentiate our expression to get

x 2 e = a1 + 2a2x + 3a3x + ...

Now put x = 0 again and this time we get that a1 = 1. To get the value of a2 we differentiate our expression again to get

x e = 2a2 + 3 · 2 · a3x + ...

1 Now put x = 0 and we get that a2 = 2 . Continuing in this way we quickly 1 spot the pattern for the values of the coefficient an. We find that an = n! where n! = n(n − 1)(n − 2) ... 2 · 1. What we have done for ex we can also do for sin x and cos x and we obtain the following series expansions of each of these functions.

x x2 x3 x4 • e = 1 + x + 2! + 3! + 4! + ....

x3 x5 x7 • sin x = x − 3! + 5! − 7! + ....

x2 x4 x6 • cos x = 1 − 2! + 4! − 6! + ....

There are interesting connections between these three series. We shall now show that complex numbers help to explain them. Without worrying about the validity of doing so, we calculate the infinite series expansion of eiθ. We have that 1 1 eiθ = 1 + (iθ) + (iθ)2 + (iθ)3 + ... 2! 3! 5.3. COMPLEX NUMBER GEOMETRY 149 that is 1 1 1 eiθ = 1 + iθ − θ2 − θ3i + θ4 + ... 2! 3! 4! By separating out real and complex parts, and using the infinite series we obtained above, we get Euler’s remarkable formula

eiθ = cos θ + i sin θ.

Thus the complex numbers enable us to find the hidden connections between the three most important functions of calculus: the exponential function and the sine and cosine functions. It follows that every non-zero complex number can be written in the form reiθ. If we put θ = π in Euler’s formula, we get the following result, which is widely regarded as one of the most amazing in mathematics.

Theorem 5.3.10 (Euler’s identity).

eπi = −1.

This result shows us that the real numbers π, e and −1 are connected, but that to establish that connection we have to use the complex number i. This is one of the important roles of the complex numbers in mathematics in that they enable us to make connections between topics that look different: they form a mathematical hyperspace.

Exercises 5.3

1. Express cos 5x and sin 5x in terms of cos x and sin x.

2. Prove the following where x is real.2

1 ix −ix (a) sin x = 2i (e − e ). 1 ix −ix (b) cos x = 2 (e + e ). 4 1 Hence show that cos x = 8 [cos 4x + 4 cos 2x + 3]. 3. Find the 4th roots of unity as radical expressions.

2 1 x −x 1 x −x Compare (a) and (b) below with sinh x = 2 (e − e ) and cosh x = 2 (e + e ). 150 CHAPTER 5. COMPLEX NUMBERS

4. Find the 6th roots of unity as radical expressions.

5. Find the 8th roots of unity as radical expressions.

6. Solve x3 = −8i.

7. Find radical expresssions for the roots of x5 − 1, and so show that √ p √ 5 − 1 10 + 2 5 cos 72◦ = and sin 72◦ = . 4 4 To do this, consider the equation

x4 + x3 + x2 + x + 1 = 0.

Divide through by x2 to get 1 1 x2 + + x + + 1 = 0. x2 x

1 Put y = x + x . Show that y satisfies the quadratic y2 + y − 1 = 0.

You can now find all four values of x.

8. Determine all the values of ii. What do you notice?

5.4 Making sense of complex numbers

In this chapter, I have assumed that complex numbers exist and that they obey the usual high-school rules of algebra. In this section, I shall sketch out a proof of this. We start with the set R×R whose elements are ordered pairs (a, b) where a and b are real numbers. It will be helpful to denote these ordered pairs by bold letters so a = (a1, a2). We define 0 = (0, 0), 1 = (1, 0) and i = (0, 1). We now define operations as follows

• If a = (a1, a2) and b = (b1, b2), define a + b = (a1 + b1, a2 + b2).

• If a = (a1, a2) define −a = (−a1, −a2). 5.5. RADICAL SOLUTIONS 151

• If a = (a1, a2) and b = (b1, b2), define

ab = (a1b1 − a2b2, a1b2 + a2b1).

• If a = (a1, a2) 6= 0 define a −a a−1 = ( 1 , 2 ). p 2 2 p 2 2 a1 + a2 a1 + a2

It is now a long exercise to check that all the usual axioms of high-school algebra hold. Observe now that the element (a1, a2) can be written

(a1, 0)1 + (a2, 0)i and that i2 = (0, 1)(0, 1) = (−1, 0) = −1. The elements of the form (a, 0) can be identified with the real numbers. This proves that the complex numbers as I described them earlier in this chapter really do exist.

5.5 Radical solutions

There are two great historical revolutions in the history of algebra. The first is the discovery that there are irrational numbers — this means that we have to learn√ to work with real numbers that are described by radical expressions such as 2. The second is Galois’s discovery that the roots of a polynomial need not be radical expressions of the coefficients of the polynomial — put simply, that there is not always a formula for the roots of a polynomial equation. We begin by describing the way in which cubics and quartics can be solved purely algebraically.

5.5.1 Cubic equations Let 3 2 f(x) = a3x + a2x + a1x + a0 where a3 6= 0. I shall assume all coefficients are real though the theory works in general. We shall find all the roots of f(x). This problem can be 152 CHAPTER 5. COMPLEX NUMBERS

simplified in two ways. First, we may divide through by a3 and so, without loss of generality, we may assume that f(x) is monic. That is a3 = 1. Second, by means of a substitution we may obtain a cubic in which the coefficient of 2 a3 the term in x is zero. Put x = y − 3 . You should do this and check that you get a polynomial of the form

g(y) = y3 + py + q.

We say that such a cubic is reduced. It follows that without loss of generality, we need only solve the cubic

g(x) = x3 + px + q.

To do this needs what looks like a minor miracle. Let u and v be two complex 2π 2π variables. Let ω = cos 3 + i sin 3 , one of the complex cube roots of unity. You should now check that the following cubic

t(x) = x3 − 3uv − (u3 + v3) has the roots u + v, uω + vω2, uω2 + vω. Now we can solve x3 + px + q = 0 if we can find u and v such that

p = −3uv, q = −u3 − v3.

Now if we cube the first equation, we get the following two equations −p = u3v3, −q = u3 + v3. 27 If we regard u3 and v3 as the unknowns we know their sum and we know their product. This means that u3 and v3 are the roots of the quadratic equation p3 x2 + qx − = 0. 27 We therefore have that r ! 1 27q2 + 4p3 u3 = −q + 2 27 5.5. RADICAL SOLUTIONS 153 and r ! 1 27q2 + 4p3 v3 = −q − . 2 27 To find u we have to take a cube root of the number u3 and there are three possible such roots. Choose one such value for u. We then choose the value of v so that p = −3uv.

Example 5.5.1. Find the roots of x3 − 9x − 2 = 0. Here p = 9 and q = −2. The quadratic equation we have to solve is therefore

x2 − 2x − 27 = 0. √ √ This has roots 1 ± 2 7. Put u3 = +2 7. We may choose a real cube root in this case to get q3 √ u = 1 + 28. We must then choose v to be

q3 √ u = 1 − 28.

We may now write down the three roots of our original cubic.

The following cubic equation was studied by Bombelli in 1572 and had an important influence on the development of complex numbers.

Example 5.5.2. Consider the cubic

x3 − 15x − 4 = 0.

The associated quadratic in this case is

x2 + 4x + 125 = 0.

This gives the two solutions that Bombelli would have written in a way equivalent to the following √ x = 2 ± −121.

We would write this as x = 2 ± 11i. 154 CHAPTER 5. COMPLEX NUMBERS

Thus u3 = 2 + 11i and v3 = 2 − 11i.

There are√ three cube roots of 2 + 11i all complex. Let’s press on√ regardless. Write 3 2 + 11i to represent one of those cube roots. Write 3 2 − 11i to be the corresponding cube root such that their product is 5. Thus at least symbolically we may write √ √ u + v = 3 2 + 11i + 3 2 − 11i.

What is surprising is that for some choice of these cube roots this value must be real. The reason is that the graph of our cubic has one real root which can easily be checked to be 4. To see why, observe that

(2 + i)3 = 2 + 11i and (2 − i)3 = 2 − 11i.

If we choose 2 + i as one of the cube roots of 2 + 11i then we have to choose 2 − i as the corresponding cube root of 2 − 11i. In this way, we get

4 = (2 + 11i) + (2 − 11i) as a root. It was the fact that real roots arose in this way that provided the first inkling that there was a number system, the complex numbers, that extended the so-called real numbers, but had every much as tangible existence.

5.5.2 Quartic equations Let 4 3 2 f(x) = a4x + a3x + a2x + a1x + a0.

As usual, we may assume that a4 = 1. By means of a suitable substitution, which is left as an exercise, we may eliminate the cubed term. We therefore end up with a reduced quartic which it is convenient to write in the following way x4 = ax2 + bx + c. Suppose that we could write the righthand side as a perfect square (dx + e)2. Then our quartic could be written as the product of two quadratics

x2 − (dx + e) x2 + dx + e . 5.5. RADICAL SOLUTIONS 155

The roots of each these two quadratics will be the four roots of our original quartic. It is not true that we can always do this, but by means of another miracle we can transform the equation into one with the same roots where we can. Let t be a new variable whose value will be determined later. We may write (x2 + t)2 = (a + 2t)x2 + bx + (c + t2). We now want to choose a value of t so that the righthand side is a perfect square. This happens when the discriminant of the quadratic (a + 2t)x2 + bx + (c + t2) is zero. That is when

b2 − 4(a + 2t)(c + t2) = 0.

Now this is a cubic in t. We now use the method of the previous section to find a specific value of t say t1. We then get

 2 2 2 b (x + t1) = (a + 2t1) x + . 2(a + 2t1)

It follows that the roots of the original quartic are the roots of the following two quadratics   2 √ b (x + t1) − a + 2t1 x + = 0 2a + 4t1 and   2 √ b (x + t1) + a + 2t1 x + = 0. 2a + 4t1 Example 5.5.3. Solve the quartic

x4 = 1 − 4x.

We shall find a value of t below

(x2 + t)2 = t4 + 2x2t + t2 = 2x2t − 4x + (1 + t2) which makes the righthand side a perfect square. This requires us to find a root of the cubic t3 + t − 2 = 0. 156 CHAPTER 5. COMPLEX NUMBERS

Here t = 1 works. Our quartic with t therefore becomes

(x2 + 1)2 = 2(x − 1)2.

Therefore the roots of our original quartic are the roots of the following two quadratics √ √ (x2 + 1) − 2(x − 1) = 0 and x2 + 1 + 2(x − 1) = 0.

The roots of our original quartic are therefore p√ p√ 1 ± i 8 + 1 −1 ± 8 − 1 √ and √ . 2 2

5.5.3 Symmetries and particles Although quadratic equations had been solved in antiquity, it was not until the 16th century that cubics and quartics were first solved. This great leap forward in the development of algebra was centred on a group of Italian mathematicians — Scipione del Ferro (1465–1525), Niccolo Tartaglia (1500– 1557), Girolamo Cardano (1501–1576), Ludovico Ferrari (1522–1562), Rafael Bombelli (1526–1572) — whose antics are worthy of an opera or Shakespeare comedy but the importance of their work cannot be overemphasized. But two points arise. First, the solution of quadratics, cubics and quartics seem to rely on mathematical miracles. Second, we appear to see a pattern: to solve cubics we need to solve an associated quadratic and to solve quartics we need to solve an associated cubic. These two points were investigated by a number of mathematicians in great depth: in particular, Lagrange (1736– 1813), Ruffini (1765–1822) and Abel (1802–1829). The expectation was high that quintics should be solvable by using quartics in a way that continued the pattern. Then came the great surprize. Ruffini and Abel proved that the pattern does not continue and that one cannot always describe the roots of a quintic in radical form — there are, of course, five roots — the point is that these roots cannot in general be written down using an algebraic formula. The question is why and the answer to this question also explains the algebraic miracles we used above. It was discovered by Evariste Galois (1811–1832). I shall not go into the details of his biography — he was killed, for instance, in a duel — since you will find much more written about him elsewhere, some of it accurate, instead I shall focus on his mathematics. 5.6. GAUSSIAN INTEGERS AND FACTORIZING PRIMES 157

By building on the work of Lagrange, he explained the miracles above and much more. His approach was new: to determine whether the roots of a polynomial could be expressed in algebraic terms as radical expressions, he studied the symmetries of the polynomial. Just what this means is explained in a subject known as Galois theory after its founder. Crucially, this is not a mere extrapolation of existing algebraic manipulation, instead it involves working at a higher level of abstraction. As so often happens in mathematics, a development in one area led to developments in other areas. Sophus Lie (1811–1832) realized that symmetries could also be used to help understand the tricks that were used to solve differential equations. It was in this way that symmetry came to play a fundamental role in physics. If you hear a particle physicist talking about symmetries, they are paying an unconscious tribute to Galois’ bold work in studying the nature of the roots of polynomial equations.

5.6 Gaussian integers and factorizing primes

Complex numbers may be used to factorize some primes. For example,

5 = (1 − 2i)(1 + 2i).

To develop this example further, we shall need some definitions. The integers Z are a subset of the reals R. We define the Gaussian integers, denoted by Z[i], to be all complex numbers of the form m + in where m and n are integers. What our example shows is that some primes can be factorized using Gaussian integers. The question is: which ones? Observe that 5 = 12 + 22. In other words, it can be written as a sum of two squares. Another example of a prime that can be written as a sum of two squares is 13. We have that

13 = 9 + 4 = 32 + 22.

This prime can also be factorizes using Gaussian integers

13 = (3 + 2i)(3 − 2i).

In fact, any prime p that can be written as a sum of two squares p = a2 + b2, can also be factorized using Gaussian integers

p = (a + ib)(a − ib). 158 CHAPTER 5. COMPLEX NUMBERS

This raises the question of exactly which primes can be written as a sum of two squares.

Lemma 5.6.1. Let p be an odd prime that can be written as a sum of two squares. Then p ≡ 1 (mod 4).

Proof. Let p = a2 + b2. Since p is assumed odd, we must have that one of a2 and b2 is even and the other odd. Without loss of generality, we may assume that a2 is odd and b2 is even. But from Chapter 2, this implies that a is odd and b is even. We may therefore write a = 2u and b = 2v + 1 for some natural numbers u and v. But then p = 4u2 + 4v2 + 4v + 1. It follows that p ≡ 1 (mod 4).

Lemma 5.6.2. Each odd prime p satisfies either p ≡ 1 (mod 4) or p ≡ 3 (mod 4).

Proof. The possible remainder when p is divided by 4 are 0, 1, 2, 3. Since p is an odd prime both 0 and 2 are impossible and the result follows. The lemma above tells us that each odd prime belongs to exactly one of two camps. The obvious question is whether both of these camps are infinite.

Proposition 5.6.3.

1. There are infinitely many primes p such that p ≡ 3 (mod 4).

2. There are infinitely many primes p such that p ≡ 1 (mod 4).

We have proved that if an odd prime p can be written as a sum of two squares then p ≡ 1 (mod 4). The hard question is whether the converse is true.

Theorem 5.6.4 (Euler, 1754). An odd prime p can be written as a sum of two squares if, and only if, p ≡ 1 (mod 4).

We may deduce from this theorem that every odd prime p ≡ 1 (mod 4) can be factorized by means of Gaussian integers. Chapter 6

Rational functions

f(x) A (real) rational function is simply a quotient g(x) where f(x) and g(x) are any polynomials with real coefficients, the polynomial g(x) of course not being equal to the zero polynomial. If deg f(x) < deg g(x), I shall say that the rational function is proper. The set of all rational functions R(x) — notice I use round brackets unlike the square brackets for the set of real polynomials — can be added, subtracted, multiplied and divided. In fact, they satisfy all the algebraic laws of high-school algebra. Rational functions are enormously useful in mathematics. The goal of this section is to show that every rational function can be written as a sum of simpler rational functions. Once I have shown how to do this, I will outline its application to integration.

6.1 Numerical partial fractions

This section is intended as motivation for the partial fraction representation of rational functions described in a later chapter, so it can be omitted at first reading. The idea is to show how a fraction can be written as a sum of other fractions having a particular form. Specifically, the goal of this section is to show how a proper fraction can be written as a sum of proper fractions over prime power denominators. This involves two steps which I shall describe by means of examples. The theory is an application of the fundamental theorem or arithmetic and the extended Euclidean algorithm. In order to add two fractions together, we first have to ensure that both are expressed over the same denominator. For example, suppose we want to

159 160 CHAPTER 6. RATIONAL FUNCTIONS

5 8 add 7 and 13 . Since 7 × 13 = 91 we have the following 5 8 65 + 56 121 + = = . 7 13 91 91

810 We shall now consider the reverse process, using the fraction 1003 as an example. Observe that 1003 = 17 × 59 where 17 and 59 are coprime. Our goal is to write 810 a b = + 1003 17 59 for some natural numbers a and b. By the extended Euclidean algorithm, we can write 1 = 7 · 17 − 2 · 59. It follows that 1 7 · 17 − 2 · 59 7 2 = = − . 1003 17 · 59 59 17 Now multiply both sides by 810 to get 810 7 · 810 2 · 810 6 5 6 5 = − = 96 − 95 = 1 + − . 1003 59 17 59 17 59 17 Simplifying we get 810 6 12 = + 1003 59 17 as required. 10 We shall now do something different. Consider the fraction 16 . We have that 16 = 24 and so we cannot write it as a product of coprime numbers. However, we can do something else. We can write 10 = 2+8 = 21 +23. Thus

10 21 + 23 21 23 1 1 = = + = + 16 24 24 24 23 2 and so 10 1 1 = + . 16 21 23 41 Let’s now combine these two steps. Consider the fraction 90 . The prime factorisation of 90 is 2 · 32 · 5. Our first goal is to write

41 a b c = + + . 90 2 32 5 6.1. NUMERICAL PARTIAL FRACTIONS 161

Thus we have to find a, b, c such that

41 = 45a + 10b + 18c.

By trial and error, remembering that a, b, c have to be integers, we find that

41 = 45 · 1 + 10 · 5 + (−3) · 18.

It follows that 41 1 5 3 = + − . 90 2 32 5 We now want to write 5 d e = + 32 3 32 where |d| , |e| < 3. But 5 = 2 + 3 and so

5 1 2 = + . 32 3 32 It follows that 41 1 1 2 3 = + + − . 90 2 3 9 5 We may summarise what we have found in the following theorem.

Theorem 6.1.1.

a n1 nr (i) Let b be a proper fraction, and let b = p1 . . . pr be the prime factorisation of b. Then r a X ci = b pni i=1 i

for some integers ci, where each of the fractions is proper.

c (ii) Now let p be a prime and pn a proper fraction. Then

n c X dj = pn pj j=1

where each dj is such that |dj| < p. 162 CHAPTER 6. RATIONAL FUNCTIONS 6.2 Analogies

There are parallels between the properties of the natural numbers and the properties of real polynomials. We have seen that there are remainder theo- rems for both natural numbers and polynomials. In the case of the natural numbers, we used the remainder theorem to develop Euclid’s algorithm and the Extended Euclidean algorithm for computing greatest common divisors. We can do the same thing for polynomials. We define the greatest common divisor of two real polynomials a(x) and b(x) to be a real polynomial of largest degree dividing both a(x) and b(x). Any two such gcd’s will be real number multiples of each other. We say that a(x) and b(x) are coprime if their greatest common divisor is a constant polynomial. Euclid’s algorithm and the Extended Euclidean algorithm can both be proved for real polyno- mials. As a consequence, if a(x) and b(x) are coprime real polynomials, then we can find real polynomials c(x) and d(x) such that

1 = a(x)c(x) + b(x)d(x).

If f(x) is any real polynomial, then we can multiply both sides of the above equation by f(x) to get

f(x) = a(x)[f(x)c(x)] + b(x)[f(x)d(x)].

Thus f(x) can be written in terms of a(x) and b(x) in a very simple way. There is a simple refinement of this result I shall use below. If deg f(x) < deg a(x) + deg b(x) then using the remainder theorem, we can in fact write

f(x) = B(x)a(x) + A(x)b(x) where deg B(x) < deg b(x) and deg A(x) < deg a(x). Every natural number can be written as a product of primes, where a prime is a number which cannot be factorised in a non-trivial way. The analogue of a prime number for real polynomials is the notion of an irreducible polynomial. The real polynomial f(x) is said to be irreducible if it cannot be factorised into real polynomials each having smaller degree than f(x). Unlike the case of prime numbers, we can characterise the real irreducible polynomials very easily. It is a consequence of the fundamental theorem for real polynomials that there are only two kinds of irreducible real polynomials: linear polynomials c(x−a) and irreducible quadratic polynomials c(x2 +ax+ b), that is, quadratics having only non-real roots. 6.3. PARTIAL FRACTIONS 163

We now have the following analogue of the fundamental theorem of arith- metic for real polynomials: every real polynomial of degree at least 1 can be written as a product of a real number and powers of distinct monic polyno- mials or distinct monic irreducible quadratic polynomials in essentially one way.

6.3 Partial fractions

f(x) Let g(x) be a rational function. If deg f(x) > deg g(x) then we may apply the Remainder Theorem and write f(x) r(x) = q(x) + g(x) g(x) where deg r(x) < deg g(x). Thus without loss of generality, we may assume that deg f(x) < deg g(x) in what follows. I shall also assume that g(x) is monic; if it isn’t there will simply be a constant factor at the front. By the fundamental theorem for real polynomials, we may write g(x) as a product of distinct factors of the form (x − a)r or (x2 + ax + b)s. Using f(x) this decomposition of g(x), the rational function g(x) can now be written as a sum of simpler rational functions which have the following forms:

• For each factor of g(x) of the form (x − a)r, we will have a sum of the form A A A 1 + ... + r−1 + r . x − a (x − a)r−1 (x − a)r

• For each factor of g(x) of the form (x2 + ax + b)s, we will have a sum of the form A x + B A x + B A x + B 1 1 + ... + s−1 s−1 + s s . x2 + ax + b (x2 + ax + b)s−1 (x2 + ax + b)s

f(x) This is called the partial fraction decomposition of g(x) . The practical method for finding such decompositions is best illustrated by means of some examples.

Examples 6.3.1. 164 CHAPTER 6. RATIONAL FUNCTIONS

5 2 1. Write x2+x−6 in partial fractions. We have that x +x−6 = (x+3)(x− 2), a product of two distinct linear factors. We expect a solution of the form 5 A B = + x2 + x − 6 x + 3 x − 2 where A and B are real numbers to be determined. The RHS is just

A(x − 2) + B(x + 3) . (x + 3)(x − 2)

Comparing the LHS with the RHS we get that

5 = A(x − 2) + B(x + 3)

which must hold for all values of x. Putting x = 2 we get B = 1 and putting x = −3 we get A = 1. Thus

5 −1 1 = + . x2 + x − 6 x + 3 x − 2

At this point, we check our solution.

9 2. Write (x−1)(x+2)2 in partial fractions. Here we have a single linear factor and a square of a linear factor. We therefore expect an answer in the form 9 A B C = + + . (x − 1)(x + 2)2 x − 1 x + 2 (x + 2)2 Carrying out the sum on the RHS, and comparing the LHS with the RHS we get that

9 = A(x + 2)2 + B(x − 1)(x + 2) + C(x − 1).

Putting x = 1 we get that A = 1, putting x = −2, we get that C = −3 and putting x = −1 and using the values we have for A and C we get that B = −1. Thus

9 1 1 3 = − − . (x − 1)(x + 2)2 x − 1 x + 2 (x + 2)2 6.3. PARTIAL FRACTIONS 165

16x 4 3. Write x4−16 in partial fractions. We have that x − 16 = (x − 2)(x + 2)(x2 + 4), a product of two distinct linear factors and a quadratic factor. We expect a solution of the form

16x A B Cx + D = + + . x4 − 16 x − 2 x + 2 x2 + 4 This leads to

16x = A(x + 2)(x2 + 4) + B(x − 2)(x2 + 4) + (Cx + D)(x − 2)(x + 2).

Using appropriate values of x we get that A = 1, B = 1, C = 2 and D = 0. Thus 16x 1 −1 2x = + + . x4 − 16 x − 2 x + 2 x2 + 4

3x2+2x+1 4. Write (x+2)(x2+x+1)2 in partial fractions. We expect a solution in the form

3x2 + 2x + 1 A Bx + C Dx + E = + + . (x + 2)(x2 + x + 1)2 x + 2 x2 + x + 1 (x2 + x + 1)2

This leads to

3x2+2x+1 = A(x2+x+1)2+(Bx+C)(x+2)(x2+x+1)+(Dx+E)(x+2).

Putting x = −2 yields A = 1. There are four unknowns left and so we need four equations. However, to avoid having to solve four equations in four unknowns we can vary our procedure. Putting x = 0 gives 0 = C + E. Putting x = 1 gives −1 = B + C + D + E. Thus −1 = B + D. On the RHS the highest power of x occurring is x2. On the LHS the highest power of x occurring apears to be x4 but that immediately implies that its coefficient must be zero. The coefficient of x4 is 1 + B and so B = −1 which means that D = 0. Put x = 2. This gives 6 = 7C + E. This quickly leads to E = −1 and C = 1. Thus

3x2 + 2x + 1 1 1 − x −1 = + + . (x + 2)(x2 + x + 1)2 x − 2 x2 + x + 1 (x2 + x + 1)2 166 CHAPTER 6. RATIONAL FUNCTIONS

Let me conclude this section by sketching out why the partial fraction decomposition of real rational functions is possible. Consider the proper rational function f(x) a(x)b(x) where a(x) and b(x) are coprime. Then we indicated above that we may write f(x) A(x) B(x) = + a(x)b(x) a(x) b(x) where the rational functions are all proper. This may be generalised as f(x) follows. Let g(x) be a proper rational function. Let g(x) = a1 . . . am(x) be a product of pairwise coprime polynomials. Then we may write m f(x) X Ai(x) = , g(x) a (x) i=1 i where the rational functions are all proper. We shall now assume that the ai(x) are either powers of linear factors or of quadratic factors and that these factors are distinct for different i. h(x) Consider the proper rational function (x−a)r where r ≥ 1. Then we may write r−1 h(x) = a0 + a1(x − 1) + ... + ar−1(x − a) for some real numbers a0, . . . , ar−1 in a way analogous to writing a natural number in a number base. Thus h(x) a a a = r−1 + ... + 1 + 0 . (x − a)r x − a (x − a)r−1 (x − a)r

h(x) Consider the proper rational function (x2+ax+b)r where r ≥ 1. Then we may similarly write 2 2 r−1 h(x) = (a0x+b0)+(a1x+b1)(x +ax+b)+...+(ar−1x+br−1)(x +ax+b) for some real numbers a0, . . . , ar−1 and b0, . . . , br−1 in a way analogous to writing a natural number in a number base. Thus h(x) a x + b a x + b a x + b = r−1 r−1 + ... + 1 1 + 0 0 . (x2 + ax + b)r x2 + ax + b (x2 + ax + b)r−1 (x2 + ax + b)r The existence of partial fraction decompositions of real rational functions now follows. 6.4. INTEGRATING RATIONAL FUNCTIONS 167 6.4 Integrating rational functions

In order to appreciate the significance of partial fractions it is essential to understand how they are used. The goal of this section is therefore to show you how to calculate Z f(x) dx g(x) exactly, when f(x) and g(x) are real polynomials. We need to know one key property of integration: namely, if ai are real numbers then n n Z X X Z aifi(x)dx = ai fi(x)dx i=1 i=1 This property is known as linearity. I shall break my discussion up into a number of steps.

f(x) Step 1. Suppose that in g(x) we have that deg f(x) > deg g(x). By the Remainder Theorem for polynomials we can write f(x) r(x) = q(x) + g(x) g(x) where deg r(x) < deg g(x). By the linearity of integration, we have that

Z f(x) Z Z r(x) dx = q(x)dx + dx. g(x) g(x) In other words, to integrate an arbitrary rational function it is enough to know how to integrate polynomials and proper rational functions.

Step 2. By linearity of integration, integrating arbitrary polynomials can be reduced to integrating the following Z xndx where n ≥ 0.

f(x) Step 3. Let g(x) be a proper rational function, so that deg f(x) < deg g(x). We may factorise g(x) into a product of real linear polynomials and real 168 CHAPTER 6. RATIONAL FUNCTIONS

f(x) irreducible quadratic polynomials and then write g(x) as a sum of rational functions of one of the following two forms a , (x − d)r where a and d are real and r ≥ 1, and px + q (x2 + bx + c)s where p, q, b, c are real and s ≥ 1 and the quadratic has a pair of complex conjugate roots. By the linearity of integration, this reduces calculating

Z f(x) dx g(x) to calculating integrals of the following two forms

Z a dx (x − d)r and Z px + q dx. (x2 + bx + c)s Again by linearity of integration, this reduces to being able to calculate the following three integrals

Z 1 Z x Z 1 dx dx dx. (x − d)r (x2 + bx + c)s (x2 + bx + c)s

Step 4. We now concentrate on the two integrals involving quadratics. By completing the square, we can write

b b2 x2 + bx + c = (x + )2 + (c − ). 2 4

2 2 4c−b2 By assumption b − 4ac < 0 (why?). Put e = 4 (which makes sense). Thus b x2 + bx + c = (x + )2 + e2. 2 6.4. INTEGRATING RATIONAL FUNCTIONS 169

I shall now use a technique of calculus known as substitution and put y = b x + 2 . Doing this, and returning to x as my variable, we need to be able to integrate the following three integrals Z 1 Z x Z 1 dx dx dx. (x − d)r (x2 + e2)s (x2 + e2)s

Step 5. The second integral above can be converted into the first by means of the substitution x2 = u.

We have therefore proved the following.

Theorem 6.4.1. The integration of an arbitrary rational function can be reduced to integrals of the following three kinds:

1. R xndx.

R 1 2. (x−d)r dx.

R 1 3. (x2+e2)s dx. 170 CHAPTER 6. RATIONAL FUNCTIONS Chapter 7

Matrices I: linear equations

The term matrix was introduced by James Joseph Sylvester (1814–1897) in 1850, and the first paper on matrix algebra was published by Arthur Cayley (1821–1895) in 18581. Matrices were introduced initially as packaging for systems of linear equations, but then came to be investigated in their own right. The main goal of this chapter is to introduce the basics of the arithmetic and algebra of matrices. This chapter and the two that follow form the first steps in the subject known as linear algebra. It is hard to overemphasize the importance of this subject throughout mathematics and its applications

7.1 Matrix arithmetic

In this section, I shall introduce matrices and three arithmetic operations defined on them. I shall also define an operation called the ‘transpose of a matrix’ that will be important in later work. This section forms the founda- tion for all that follows.

7.1.1 Basic matrix definitions A matrix2 is a rectangular array of numbers. In this course, the numbers will usually be real numbers but, on occasion, I shall also use complex numbers

1A memoir on matrices, Philosphical Transactions of the Royal Society of London 148 (1858), 17–37. This is well-worth reading. 2Plural: matrices.

171 172 CHAPTER 7. MATRICES I: LINEAR EQUATIONS for variety.

Example 7.1.1. The following are all matrices:

 1 1 −1   1 2 3   4  , , 0 2 4 , 6 . 4 5 6 1   1 1 3 Usually the array of numbers that comprises a matrix is enclosed in round brackets. Occasionally books use square brackets with the same meaning. Later on, I shall introduce determinants and these are indicated by using straight brackets. In general, the kind of brackets you use is important and is not just a matter of taste. We usually denote matrices by capital Roman letters: A, B, C, etc. The size of a matrix is m × n if it has m rows and n columns. The entries in a matrix are often called the elements of the matrix and are usually denoted by lower case Roman letters. If A is an m × n matrix, and 1 ≤ i ≤ m and 1 ≤ j ≤ n, then the entry in the ith row and jth column of A is often denoted (A)ij. Thus ()ij means ‘the element in ith row and jth column’. Examples 7.1.2.

1. Let  1 2 3  A = 4 5 6

Then A is a 2×3 matrix. We have that (A)11 = 1, (A)12 = 2, (A)13 = 3, (A)21 = 4, (A)22 = 5, (A)23 = 6. 2. Let  4  B = 1

Then B is a 2 × 1 matrix. We have that (B)11 = 4, (B)21 = 1. 3. Let  1 1 −1  C =  0 2 4  1 1 3

Then C is a 3×3 matrix. (C)11 = 1, (C)12 = 1, (C)13 = −1, (C)21 = 0, (C)22 = 2, (C)23 = 4, (C)31 = 1, (C)32 = 1, (C)33 = 3. 7.1. MATRIX ARITHMETIC 173

4. Let D = 6 

Then D is a 1 × 1 matrix. We have that (D)11 = 6. Matrices A and B are said to be equal, written A = B, if they have the same size and corresponding entries are equal: that is, (A)ij = (B)ij for all allowable i and j. Example 7.1.3. Given that  a 2 b   3 x −2  = 4 5 c y z 0 Find a, b, c, x, y, z. This example simply illustrates what it means for two matrices to be equal. By definition a = 3, 2 = x, b = −2, 4 = y, 5 = z and c = 0. When we want to talk about an arbitrary matrix A we usually denote its elements by aij where i tells you the row the element lives in and j the column. For example, a typical 2 × 3 matrix A would be written  a a a  A = 11 12 13 a21 a22 a23

7.1.2 Addition, subtraction, scalar multiplication and the transpose We define first the operations that cause us no trouble. Addition Let A and B be two matrices of the same size. Then their sum A + B is the matrix defined by

(A + B)ij = (A)ij + (B)ij. That is, corresponding entries of A and B are added. If A and B are not the same size then their sum is not defined. Subtraction Let A and B be two matrices of the same size. Then their difference A − B is the matrix defined by

(A − B)ij = (A)ij − (B)ij. That is, corresponding entries of A and B are subtracted. If A and B are not the same size then their difference is not defined. 174 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

Scalar multiplication In matrix theory, numbers are often called scalars. For us scalars will usually be either real or complex. Let A be any matrix and λ any scalar. Then the matrix λA is defined as follows:

(λA)ij = λ(A)ij.

In other words, every element of A is multiplied by λ.

Transpose of a matrix Let A be an m × n matrix. Then the transpose T T of A, denoted A , is the n × m matrix defined by (A )ij = (A)ji. We therefore interchange rows and columns: the first row of A becomes the first column of AT , the second row of A becomes the second column of AT , and so on.

Examples 7.1.4. 1.  1 2 −1   2 1 3   1 + 2 2 + 1 −1 + 3  + = 3 −4 6 −5 2 1 3 + (−5) −4 + 2 6 + 1

which gives  3 3 2  −2 −2 7 2.  1 2 −1   2 1 3   1 − 2 2 − 1 −1 − 3  − = 3 −4 6 −5 2 1 3 − (−5) −4 − 2 6 − 1

which gives  −1 1 −4  8 −6 5 3.  1 1   3 3 2  − 2 1 −2 −2 7 is not defined since the matrices have different sizes.

4.  3 3 2   6 6 4  2 = −2 −2 7 −4 −4 14 7.1. MATRIX ARITHMETIC 175

5. The transposes of the following matrices

 1 1 −1   1 2 3   4  0 2 4 6  4 5 6 1   1 1 3

are, respectively,

 1 4   1 0 1     2 5  4 1  1 2 1  6 . 3 6 −1 4 3

7.1.3 Matrix multiplication This is more complicated than the other operations and, like them, is not always defined. To define this operation it is useful to work with two special classes of matrix. A row matrix or row vector is a matrix with one row (but any number of columns). A column matrix or column vector is a matrix with one column but any number of rows. Row and column matrices are often denoted by bold lower case Roman letters a, b, c .... The ith element of the row or column matrix a will be denoted by ai.

Examples 7.1.5. The matrix

1 2 3 4  is a row matrix whilst  1   2     3  4 is a column matrix.

I shall build up to the definition of matrix multiplication in three stages.

Stage 1. Let a be a row matrix and b a column matrix, where

a = (a1 a2 . . . am) 176 CHAPTER 7. MATRICES I: LINEAR EQUATIONS and   b1  b2     .  b =    .     .  bn Then their product ab is defined if, and only if, the number of columns of a is equal to the number of rows of b, that is m = n, in which case their product is the 1 × 1 matrix

ab = (a1b1 + a2b2 + ... + anbn). The number a1b1 + a2b2 + ... + anbn is called the inner product of a and b and is denoted by a · b. Using this notation we have that ab = (a · b). Example 7.1.6. This odd way of multiplying is actually quite natural. Here’s an example of where it arises in real life. If you buy y items whose unit cost is x then you spend xy. This can be generalized as follows when you buy a number of different kinds of items at different prices. Let a be the row matrix 0 · 6 1 0 · 2  where 0 · 6 is the price of a bottle of milk, 1 is the price of a loaf of bread, and 0 · 2 is the price of an egg. Let b be the column matrix  2   3  10 where 2 is the number of bottles of milk bought, 3 is the number of loaves of bread bought, and 10 is the number of eggs bought. Thus a is the price row matrix and b is the quantity column matrix. The total amount spent is therefore 0 · 6 × 2 + 1 × 3 + 0 · 2 × 10 : namely, the sum over all the commodities bought of the price of each com- modity times the number of items of that commodity purchased. This num- ber is precisely the inner product a · b: namely, 6 · 20. 7.1. MATRIX ARITHMETIC 177

Stage 2. Let a be a row matrix as above and let B be a matrix. Thus a is a 1 × m matrix and B is a p × q matrix. Then their product aB is defined if, and only if, the number of columns of a is equal to the number of rows of B. Thus m = p. To calculate the product think of B as consisting of q column matrices b1,..., bq. We calculate the q numbers a · b1,..., a · bq as in stage 1, and the q numbers that result become the entries of aB. Thus aB is a 1 × q matrix whose jth entry is the number a · bj.

Example 7.1.7. Let a be the cost matrix of our previous example. Let B be the 3 × 5 matrix whose columns tell me the quantity of commodities bought on each of the days of the week Monday to Friday:

 2 0 2 0 4  B =  3 0 4 0 8  10 0 10 0 20

Thus on Tuesday and Thursday no purchases were made, whilst on Friday extra commodities were bought in preparation for the weekend. The matrix aB is a 1 × 5 matrix which tells us how much was spent on each day of the week. Thus

 2 0 2 0 4   aB = 0 · 6 1 0 · 2  3 0 4 0 8  10 0 10 0 20 which is equal to 6 · 2 0 7 · 2 0 14 · 4 

Stage 3. Let A be an m × n matrix and let B be a p × q matrix. Their product AB is defined if, and only if, the number of columns of A is equal to the number of rows of B: that is n = p. If this is so then AB is an m × q matrix. To define this product we think of A as consisting of m row matrices a1,..., am and we think of B as consisting of q column matrices b1,..., bq. As in Stage 2 above, we multiply the first row of A into each of the columns of B and this gives us the first row of A; we then multiply the second row of A into each of the columns of B to get the second row of B, and so on.

Example 7.1.8. Let B be the 3 × 5 matrix of the previous example whose columns tell me the quantity of commodities bought on each of the days 178 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

Monday to Friday  2 0 2 0 4  B =  3 0 4 0 8  10 0 10 0 20 Let A be the 2×3 matrix whose first row tells me the cost of the commodities in shop 1 and whose second row tells me the cost of the commodities in shop 2.  0 · 6 1 0 · 2  A = 0 · 65 1 · 05 0 · 30 The first row of AB tells me how much was spent on each day of the week in shop 1, and the second row of AB tells me how much was spent on each day of the week in shop 2. Thus  2 0 2 0 4   0 · 6 1 0 · 2  AB = 3 0 4 0 8 0 · 65 1 · 05 0 · 30   10 0 10 0 20 which is equal to  6 · 2 0 7 · 2 0 14 · 4  7 · 45 0 8 · 5 0 17 Examples 7.1.9. 1.  2   3      1 −1 0 2 1  1  = 0    −1  3

2. The product  1 −1 2   0 −2 3  3 0 1 2 1 −1 doesn’t exist because the number of columns of the first matrix is not equal to the number of rows of the second matrix. 3. The product  4 1 4 3   1 2 4  0 −1 3 1 2 6 0   2 7 5 2 7.1. MATRIX ARITHMETIC 179

exists because the first matrix is a 2×3 and the second is a 3×4. Thus the product will be a 2 × 4 matrix and is

 12 27 30 13  8 −4 26 12

Summary of matrix multiplication

• Let A be an m × n matrix and B a p × q matrix. The product AB is defined if, and only if, n = p and the result will then be an m × q matrix. In other words:

(m × n)(n × q) = (m × q).

• (AB)ij is the inner product of the ith row of A and the jth column of B.

• It follows that the inner product of the ith row of A and each of the columns of B in turn yields each of the elements of the ith row of AB in turn.

If ai are row matrices and bj are column matrices then the product of two matrices can be written as follows     a1 a1 · b1 ... a1 · bn  .   .....        .  b1 ... bn =  .....       .   .....  am am · b1 ... am · bn

7.1.4 Special matrices Matrices come in all shapes and sizes, but some of these are important enough to warrant their own terminology. A matrix all of whose elements are zero is called a zero matrix. The m × n zero matrix is denoted Om,n or just O and we let the context determine the size of O.A square matrix is one in which the number of rows is equal to the number of columns. In a square matrix A the elements (A)11, (A)22,..., (A)nn are called the diagonal elements. All the other elements of A are called the off-diagonal elements. A diagonal matrix is 180 CHAPTER 7. MATRICES I: LINEAR EQUATIONS a square matrix in which all off-diagonal elements are zero. A scalar matrix is a diagonal matrix in which the diagonal elements are all the same. The n × n identity matrix is the scalar matrix in which all the diagonal elements are the number one. This is denoted by In or just I where we allow the context to determine the size of I. Thus scalar matrices are those of the form λI where λ is any scalar. A matrix is real if all its elements are real numbers, and complex if all its elements are complex numbers. A matrix A is said to be symmetric if AT = A. In particular, symmetric matrices are always square. Examples 7.1.10. 1. The matrix  1 0 0   0 2 0  0 0 3 is a 3 × 3 diagonal matrix. 2. The matrix  1 0 0 0   0 1 0 0     0 0 1 0  0 0 0 1 is the 4 × 4 identity matrix. 3. The matrix  42 0 0 0 0   0 42 0 0 0     0 0 42 0 0     0 0 0 42 0  0 0 0 0 42 is a 5 × 5 scalar matrix. 4. The matrix  0 0 0 0 0   0 0 0 0 0     0 0 0 0 0     0 0 0 0 0     0 0 0 0 0  0 0 0 0 0 7.1. MATRIX ARITHMETIC 181

is a 6 × 5 zero matrix.

5. The matrix  1 2 3   2 4 5  3 5 6 is a 3 × 3 symmetric matrix.

7.1.5 Linear equations Matrices are extremely useful in helping us to solve systems of linear equa- tions. For the time being, I shall simply show you how matrices provide a convenient notation for writing down such equations. A system of m linear equations in n unknowns is a list of equations of the following form

a11x1 + a12x2 + ... + a1nxn = b1

a21x1 + a22x2 + ... + a2nxn = b2 ···

am1x1 + am2x2 + ... + amnxn = bm

If we have only a few unknowns then we often use w, x, y, z rather than x1, x2, x3, x4.A solution is a set of values of x1, . . . , xn that satisfy all the equations. The set of all solutions is called the solution set or general solution. The equations above can be conveniently represented using matrices. Let A be the m × n matrix (A)ij = aij, let b be the m × 1 matrix (b)i = bi, and let x be the n × 1 matrix (x)j = xj. Then the system of linear equations above can be written in the form Ax = b. The matrix A is called the coefficient matrix. At the moment, we are just using matrices as packaging for the equations.

Example 7.1.11. The following system of linear equations

2x + 3y = 1 x + y = 2 182 CHAPTER 7. MATRICES I: LINEAR EQUATIONS may be written in matrix form as follows.

 2 3   x   1  = 1 1 y 2

7.1.6 Conics and quadrics We have dealt with polynomial equations in one unknown and, in this chap- ter, we shall deal with linear equations in several unknowns. But what about equations in several unknowns where both products and powers of the un- knowns can occur? The simplest class of such equations are the conics. These are equations of the form

ax2 + bxy + cy2 + dx + ey + f = 0 where a, b, c, d, e, f are numbers of some kind. These are equations in two variables and variables either appear to degree zero, which is the constant term, directly as linear terms or as binary products such as xy or x2. In general, the roots or zeroes of such equations form curves in the plane such as circles, ellipses and hyperbolas. The term conic arises from the way that they were first defined by the Greeks as those curves that arise when you cut a double cone by means of a plane. These curves are important in astronomy since it can be proved that the orbits of satellites, planets, space-craft etc always follow conics. The reason for introducing them here is that they can be represented as matrix equations as follows.

xT Ax + J T x + (h) = (0) where  1      a 2 b x f A = 1 x = J = 2 b c y g This is not just a notational convenience. The fact that the matrix A is symmetric means that powerful ideas from matrix theory, to be developed later in this book, can be brought to bear on studying such conics. If we replace the x above by the matrix

 x  x =  y  z 7.1. MATRIX ARITHMETIC 183 and A by a 3 × 3 symmetric matrix and J by a 3 × 1 matrix then we get the matrix equation of a quadric surface. Examples of such surfaces are the surface of a sphere or the surface described by a cooling tower. But even though we are dealing with three rather than two dimensions, the matrix algebra we shall develop applies just as well.

7.1.7 Graphs The word ‘graph’ is used in two, completely different, ways in mathematics: to mean the graph of a function, and to mean a certain kind of diagram. It is in the second sense that we shall use it here. A graph consists of a set of vertices and a collection of edges, where this collection may contain repeats. By an edge we mean either a set of two vertices or a single vertex. Graphs are represented by means of diagrams: the vertices are represented by circles and the edges by means of lines joining the edges. An edge joining a vertex to itself is called a loop. An example of a graph is given below.

4 3

5

1 2

This information can be represented by means of a 5×5 symmetric matrix G given below. The entry (G)ij is just the number of edges connecting vertices i and j.  0 1 0 1 0   1 0 1 0 0    G =  0 1 0 1 1     1 0 1 0 1  0 0 1 1 0

Exercises 7.1  1 2   1 4  1. Let A =  1 0  and B =  −1 1 . Find A + B, A − B and −1 1 0 3 −3B. 184 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

 0 4 2   1 −3 5  2. Let A =  −1 1 3  and B =  2 0 −4 . Find the matrices 2 0 2 3 2 0 AB and BA.

 0 1   3 1   1 0 3  3. Let A = , B = −1 1 and C = . 0 −1   −1 1 1 3 1 Calculate BA, AA and CB. Can any other pairs of these matrices be multiplied ? Multiply those which can.

4. Calculate  1   2     1 2 3  3  4

 2 1   3 0   −1 2 3  5. If A = −1 0 , B = and C = . Calcu-   −2 1 4 0 1 2 3 late both (AB)C and A(BC) and check that you get the same answer.

6. Calculate  2 −1 2   x   1 2 −4   y  3 −1 1 z

7. Calculate  2 + i 1 + 2i   2i 2 + i  i 3 + i 1 + i 1 + 2i where i is the complex number i.

8. Calculate  a 0 0   d 0 0   0 b 0   0 e 0  0 0 c 0 0 f

9. Calculate 7.1. MATRIX ARITHMETIC 185

(a)  1 0 0   a b c   0 1 0   d e f  0 0 1 g h i (b)  0 1 0   a b c   1 0 0   d e f  0 0 1 g h i (c)  a b c   0 1 0   d e f   1 0 0  g h i 0 0 1 10. Find the transposes of each of the following matrices

 1   1 2   1 −3 5  2 A = 1 0 , B = 2 0 −4 , C =        3  −1 1 3 2 0   4 11. This question deals with the following 4 matrices with complex entries and their negatives: I,X,Y,Z where  1 0   0 1   i 0   0 −i  I = X = Y = and Z = 0 1 −1 0 0 −i −i 0 Show that the product of any two such matrices is again a matrix of this type by completing the following table for multiplication where the entry in row A and column B is AB in that order. I X Y Z −I −X −Y −Z I X Y Z −I −X −Y −Z 186 CHAPTER 7. MATRICES I: LINEAR EQUATIONS 7.2 Matrix algebra

In this section, we shall look at algebra where the variables are matrices. This algebra is similar to high-school algebra but also differs significantly in one or two places. For example, if A and B are matrices it is not true in general that AB = BA even if both products are defined. We will learn in this section which rules of high-school algebra apply to matrices and those which don’t.

7.2.1 Properties of matrix addition In Chapter 3, I introduced the idea of a binary operation. Matrix addition and multiplication both have two inputs just as addition and multiplication of real numbers, but there is an added complication that not all pairs of matrices can be added and not all pairs of matrices can be multiplied. Despite this difference, I shall nevertheless use the same terminology I introduced in Chapter 3 but in this slightly different setting. (MA1)( A + B) + C = A + (B + C). This is the associative law for matrix addition. (MA2) A + O = A = O + A. The zero matrix O, the same size as A, is the additive identity for matrices the same size as A. (MA3) A + (−A) = O = (−A) + A. The matrix −A is the unique additive inverse of A. (MA4) A + B = B + A. Matrix addition is commutative. Thus matrix addition has the same properties as the addition of real num- bers, apart from the fact that the sum of two matrices is only defined when they have the same size. The role of zero is played by the zero matrix O of the appropriate size.

Example 7.2.1. Calculate 2A − 3B + 6I where  1 2   0 1  A = and B = 3 4 2 1 7.2. MATRIX ALGEBRA 187

Because we are dealing with matrix addition and scalar multiplication the rules we apply are the same as those in high-school algebra. We get

 8 1  0 11

7.2.2 Properties of matrix multiplication (MM1)( AB)C = A(BC). This is the associative law for matrix multiplica- tion.

(MM2) Let A be an m × n matrix. Then ImA = A = AIn. The matrices Im and In are the left and right multiplicative identities, respectively. (MM3) A(B + C) = AB + AC and (B + C)A = BA + CA. These are the left and right distributivity laws for matrix multiplication over matrix addition.

Thus matrix multiplication has the same properties as the multiplication of real numbers, apart from the fact that the product is not always defined, except the following three major differences.

1. Matrix multiplication is not commutative. Consider the matrices  1 2   1 1  A = and B = 3 4 −1 1

Then AB 6= BA. One consequence of the fact that matrix multiplication is not commutative is that

(A + B)2 6= A2 + 2AB + B2, in general (see below).

2. The product of two matrices can be a zero matrix without either matrix being a zero matrix. Consider the matrices  1 2   −2 −6  A = and B = 2 4 1 3 188 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

Then AB = O.

3. Cancellation of matrices is not allowed. Consider the matrices  0 2   2 3   −1 1  A = and B = and C = 0 1 1 4 1 4 Then A 6= O and AB = AC but B 6= C.

Just how different matrix algebra is from high-school algebra is shown by the following example. Example 7.2.2. Suppose that X2 = I. Then X2 − I = O and so we may factorize to get (X − I)(X + I) = O. But we cannot conclude from this that X = I or X = −I because we cannot conclude from the fact that the product of two matrices is a zero matrix then one of the matrices must itself by a zero matrix. We have seen that this is false. We therefore cannot deduce that the identity matrix has two square roots. In fact, it has infinitely many as we now show. Let  a b  A = c −a and suppose that a2 + bc = 1. Check that A2 = I. Examples of matrices satisfying these conditions are √  2  1 + n √ −n n − 1 + n2 where n is any positive integer. Thus the 2 × 2 identity matrix has infinitely many square roots.

7.2.3 Properties of scalar multiplication (S1)1 A = A. (S2) λ(A + B) = λA + λB (S3)( λµ)A = λ(µA). (S4)( λ + µ)A = λA + µA. (S5)( λA)B = A(λB) = λ(AB). 7.2. MATRIX ALGEBRA 189

7.2.4 Properties of the transpose (T1)( AT )T = A.

(T2)( A + B)T = AT + BT .

(T3)( αA)T = αAT .

(T4)( AB)T = BT AT .

It is important to observe that the transpose of a product reverses the order of the matrices. There are some important consequences of the above properties:

• Because matrix addition is associative we can write sums without brack- ets.

• Because matrix multiplication is associative we can write matrix prod- ucts without brackets.

• The left and right distributivity laws can be extended to arbitrary finite sums.

7.2.5 Some proofs In this section, I shall prove that the algebraic properties of matrices stated really do hold. I shan’t prove all of them: just a representative sample. I shall leave you the pleasure of proving the rest. It is important to observe that all the properties of matrix algebra are ultimately proved using the properties of real numbers. Let A be an m × n matrix whose entry in the ith row and jth column is aij. Let B be an n × p matrix whose entry in the jth row and kth column is bjk. By definition (AB)ik is the number equal to the product of the ith row of A times the kth column of B. This is just

n X (AB)ik = aijbjk. j=1 Theorem 7.2.3. 1. (A + B) + C = A + (B + C). 190 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

2. A(BC) = (AB)C. 3. (λ + µ)A = λA + µA. Proof. (1) To show that (A + B) + C = A + (B + C) we have to prove two things. First, the size of (A + B) + C is the same as the size of A + (B + C). Second, elements of (A + B) + C and A + (B + C) in corresponding positions are equal. To add A and B they have to be the same size and the result will be the same size as both of them. Thus C is the same size as A and B. It’s clear that both sides of the equation really are the same size. We now compare corresponding elements:

((A + B) + C)ij = (A + B)ij + (C)ij = ((A)ij + (B)ij) + (C)ij. But now we use the associativity of addition of real numbers to get

((A)ij+(B)ij)+(C)ij = (A)ij+((B)ij+(C)ij) = (A)ij+(B+C)ij = (A+(B+C))ij, as required. (2) Let A be an m × n matrix with entries aij, let B be an n × p matrix with entries bjk, and let C be a p × q matrix with entries ckl. It’s evident that A(BC) and (AB)C have the same size, so it remains to show that corresponding elements are the same. We shall prove that

(A(BC))il = ((AB)C)il. By definition n X (A(BC))il = ait(BC)tl, t=1 and p X (BC)tl = btscsl. s=1 Thus n p ! X X (A(BC))il = ait btscsl . t=1 s=1 Using distributivity of multiplication over addition for real numbers this sum is just n p X X aitbtscsl. t=1 s=1 7.2. MATRIX ALGEBRA 191

Now change the order in which we add up these real numbers to get

p n X X aitbtscsl. s=1 t=1

Now use distributivity again

p n ! X X aitbts csl. s=1 t=1

The sum within the brackets is just

(AB)is and so the whole sum is p X (AB)iscsl s=1 which is precisely

((AB)C)il. (3) Clearly (λ + µ)A and λA + µA have the same sizes. We show that corresponding elements are the same:

((λ + µ)A)ij = (λ + µ)(A)ij = λ(A)ij + µ(A)ij = (λA)ij + (µA)ij which is just (λA + µA)ij, as required.

We now prove the properties satisfied by the transpose.

Theorem 7.2.4.

1. (AT )T = A.

2. (A + B)T = AT + BT .

3. (αA)T = αAT .

4. (AB)T = BT AT . 192 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

Proof. (1) We have that T T T ((A ) )ij = (A )ji = (A)ij. (2) We have that T T T ((A + B) )ij = (A + B)ji = (A)ji + (B)ji = (A )ij + (B )ij which is just T T (A + B )ij. (3) We have that T T T ((αA) )ij = (αA)ji = α(A)ji = α(A )ji = (αA )ij. (4) Let A be an m×n matrix and B an n×p matrix. Thus AB is defined and is m × p. Hence (AB)T is p × m. Now BT is p × n and AT is n × m. Thus BT AT is defined and is p × m. Hence (AB)T and BT AT have the same size. We now show that corresponding elements are equal. By definition T ((AB) )ij = (AB)ji. This is equal to n n X X T T (A)js(B)si = (A )sj(B )is. s=1 s=1 But real numbers commute under multiplication and so n n X T T X T T T T (A )sj(B )is = (B )is(A )sj = (B A )ij, s=1 s=1 as required.

Quantum Mechanics Quantum mechanics is one of the fundamental theories of physics. At its heart are matrices. We have defined the transpose of a matrix but for matrices with complex entries there is another, related, operation. Given any complex matrix A we define the matrix A† to be the one obtained by transposing A and then taking the complex conjugate of all entries. It is therefore the conjugate-transpose of A. A matrix A is called Hermitian if A† = A. Observe that a real matrix is Hermitian precisely when it is symmetric. It turns out that quantum mechanics is based on Hermitian matrices and their generalizations. The fact that matrix multiplication 7.2. MATRIX ALGEBRA 193

is not commutative is one of the reasons that quantum mechanics is so different from classical mechanics. The theory of quantum computing makes heavy use of Hermitian matrices and their properties.

Exercises 7.2

1. Calculate  2 0   1 1   0 1   2 2  + + + 7 −1 1 0 1 1 3 3

2. Calculate  1   1     2  3 2 1  −1  3 1 5 3 −4

3. Calculate  5 4   x  x y  4 4 y

4. If  1 −1  A = 1 2 calculate A2, A3 and A4.

 1 1   1  5. Let A = and x = . Calculate Ax,A2x,A3x,A4x and 1 0 0 A5x. What do you notice?

6. Calculate A2 where  cos θ sin θ  A = − sin θ cos θ

7. Show that  1 2  A = 3 4 satisfies A2 − 5A − 2I = O. 194 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

8. Let A be the following 3 × 3 matrix

 2 4 4   0 1 −1  0 1 3

Calculate A3 − 6A2 + 12A − 8I where I is the 3 × 3 identity matrix.  3 1 −1  9. Let A =  2 2 −1  Calculate 2 2 0

A3 − 5A2 + 8A − 4I

where I is the 3 × 3 identity matrix.

10. If3X + A = B, find X in terms of A and B.  1 1   2 2  11. If X + Y = and X − Y = find X and Y . 2 2 1 1

12. If AB = BA show that A2B = BA2.

13. Is it true that AABB = ABAB?

14. Show that (A + B)2 − (A − B)2 = 2(AB + BA).

15. Let A and B be n × n matrices. Is it necessarily true that

(A − B)(A + B) = A2 − B2?

If so, prove it. If not, find a counterexample.

16. Expand (A + I)4 carefully.

17. A matrix A is said to be symmetric if AT = A.

(a) Show that a symmetric matrix must be square. (b) Show that if A is any matrix then AAT is defined and symmetric. 7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 195

(c) Let A and B be symmetric matrices of the same size. Prove that AB is symmetric if and only if AB = BA.

18. An n × n-matrix A is said to be skew-symmetric if AT = − A.

(a) Show that the diagonal entries of a skew-symmetric matrix are all zero. (b) If B is any n × n-matrix, show that B + BT is symmetric and that B − BT is skew-symmetric. (c) Deduce that every square matrix can be expressed as the sum of a symmetric matrix and a skew-symmetric matrix.

19. Let A, B and C be square matrices of the same size. Define [A, B] = AB − BA. Calculate

[[A, B],C] + [[B,C],A] + [[C,A],B].

20. Let A be a 2 × 2 matrix such that AB = BA for all 2 × 2 matrices B. Show that  λ 0  A = 0 λ for some scalar λ.

21. Let A be a 2 × 2 matrix. The trace of A, denoted tr(A), is the sum of the diagonal elements.

(a) Show that tr(A + B) = tr(A) + tr(B); tr(λA) = λtr(A); tr(AB) = tr(BA). (b) Let A be a known matrix. Show that the equation AX − XA = I cannot be solved for X.

7.3 Solving systems of linear equations

The goal of this section is to use matrices to help us solve systems of linear equations. We begin by proving some general results on linear equations, and then we describe Gaussian elimination, an algorithm for solving systems of linear equations. 196 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

7.3.1 Some theory A system of m linear equations in n unknowns is a list of equations of the following form

a11x1 + a12x2 + ... + a1nxn = b1

a21x1 + a22x2 + ... + a2nxn = b2 ···

am1x1 + am2x2 + ... + amnxn = bm

A solution is a sequence of values of x1, . . . , xn that satisfy all the equa- tions. The set of all solutions is called the solution set or general solution. The equations above can be conveniently represented using matrices. Let A be the m × n matrix (A)ij = aij, let b be the m × 1 matrix (b)i1 = bi, and let x be the n × 1 matrix (x)j1 = xj. Then the system of linear equations above can be written in the form Ax = b If b is a zero matrix, we say that the equations are homogeneous, otherwise they are said to be inhomogeneous. A system of linear equations that has no solution is said to be inconsistent; otherwise, it is said to be consistent. We begin with some results that tell us what to expect when solving systems of linear equations. Proposition 7.3.1. Homogeneous equations Ax = 0 are always consistent, because x = 0 is always a solution. In addition, the sum of any two solutions is again a solution, and the scalar multiple of any solution is again a solution. Proof. Let Ax = 0 be our homogeneous system of equations. Let a and b be solutions. That is Aa = 0 and Ab = 0. We now calculate A(a + b). To do this we use the fact that matrix multiplication satisfies the left distributivity law A(a + b) = Aa + Ab = 0 + 0 = 0. Now let a be a solution and λ any scalar. Then A(λa) = λAa = λ0 = 0. 7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 197

Proposition 7.3.2. Let Ax = b be a consistent system of linear equations. Let p be any one solution. Then every solution of the equation is of the form p + h for some solution h of Ax = 0.

Proof. Let a be any solution to Ax = b. Let h = a − p. Then Ah = 0. The result now follows.

Theorem 7.3.3 (Fundamental theorem of linear equations). We assume that the scalars are the rationals, the reals or the complexes. A system of linear equations Ax = b has either

• No solutions.

• Exactly one solution.

• Infinitely many solutions.

Proof. We prove that if we can find two different solutions we can in fact find infinitely many solutions. Let u and v be two distinct solutions to this equation then Au = b and Av = b. Consider now the column matrix w = u − v. Then

Aw = A(u − v) = Au − Av = 0 using the distributive law. Thus w is a non-zero column matrix that satisfies the equation Ax = 0. Consider now the column matrices of the form

u + λw where λ is any real number. This is therefore a set of infinitely many different column matrices. We calculate

A(u + λw) = Au + λAw = b using the distributive law and properties of scalars. It follows that the in- finitely many column matrices u + λw are solutions to the equation Ax = b. 198 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

7.3.2 Gaussian elimination In this section, we shall develop an algorithm that will take as input a system of linear equations and produce as output the following: if the system has no solutions it will tell us, on the other hand if it has solutions then it will determine them all. Our method is based on three simple ideas: 1. Certain systems of linear equations have a shape that makes them very easy to solve. 2. Certain operations can be carried out on systems of linear equations which simplify them but do not change the solutions. 3. Everything can be done using matrices. Here are examples of each of these ideas. Example 7.3.4. The system of equations

2x + 3y = 1 y = −3 is very easy to solve. From the second equation we get y = −3. Substituting this value into the first equation gives us x = 5. We can check that this solution is correct by checking that these two values satisfy every equation. Example 7.3.5. The system of equations 2x + 3y = 1 x + y = 2 can be converted into a system with the same solutions but which is easier to solve. Multiply the second equation by 2. This gives us the new equations

2x + 3y = 1 2x + 2y = 4 which have the same solutions as the original equations. Next, subtract the first equation from the second equation to get 2x + 3y = 1 −y = 3 7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 199

Finally, multiply the last equation by −1. The resulting equations have the same solutions as the original equations, but they can now be easily solved as we showed above. Example 7.3.6. The system of equations

2x + 3y = 1 x + y = 2 can be written in matrix form as the matrix equation

 2 3   x   1  = 1 1 y 2

For the purposes of our algorithm, we rewrite this equation in terms of what is called an augmented matrix

 2 3 1  1 1 2

The operations carried out in the previous example can be applied directly to the augmented matrix.

 2 3 1   2 3 1   2 3 1   2 3 1  =⇒ =⇒ =⇒ 1 1 2 2 2 4 0 −1 3 0 1 −3

This augmented matrix can then be converted back into the usual matrix form and solved

2x + 3y = 1 y = −3

We now formalize the above ideas. A matrix is called a row echelon matrix or to be in row echelon form if it satisfies the following three conditions: 1. Any zero rows are at the bottom of the matrix.

2. If there are non-zero rows then they begin with the number 1, called the leading 1.

3. In the column beneath a leading 1, the elements are all zero. 200 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

The following operations on a matrix are called elementary row operations:

1. Multiply row i by a non-zero scalar λ. We notate this operation by Ri ← λRi.

2. Interchange rows i and j. We notate this operation by Ri ↔ Rj. 3. Add a multiple λ of row i to another row j. We notate this operation by Rj ← Rj + λRi. The following result is not hard to prove.

Proposition 7.3.7. Applying the elementary row operations to a system of linear equations does not change their solution set.

Given a system of linear equations

Ax = b the matrix (A|b) is called the augmented matrix.

Algorithm 7.3.8. (Gaussian elimination) This is an algorithm for solving systems of linear equations. In outline, the algorithm runs as follows:

1. Given a system of equations

Ax = b

form the augmented matrix

(A|b).

2. By using elementary row operations, convert

(A|b)

into an augmented matrix (A0|b0) which is a row echelon matrix. 7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 201

3. Solve the equations obtained from

(A0|b0)

by back substitution.

Remarks

• The process in step (2) has to be carried out systematically to avoid going around in circles.

• Elementary row operations applied to a set of linear equations do not change the solution set. Thus the solution sets of

Ax = b and A0x = b0

are the same.

• Solving systems of linear equations where the associated augmented matrix is a row echelon matrix is easy and can be accomplished by back substitution.

Here is a more detailed description of step (2) of the algorithm — the input is a matrix B and the output is a matrix B0 which is a row echelon matrix:

1. Locate the leftmost column that does not consist entirely of zeros.

2. Interchange the top row with another row if necessary to bring a non- zero entry to the top of the column found in step 1.

3. If the entry now at the top of the column found in step 1 is a, then 1 multiply the first row by a in order to introduce a leading 1. 4. Add suitable multiples of the top row to the rows below so that all entries below the leading 1 become zeros.

5. Now cover up the top row, and begin again with step 1 applied to the matrix that remains. Continue in this way until the entire matrix is a row echelon matrix. 202 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

The important thing to remember is to start at the top and work down- wards. We now look in more detail at the final part of the overall algorithm where our set of equations A0x = b0 is derived from an augmented matrix which is a row echelon matrix. Assume that there is more than one solution. The variables are divided into two groups: those variables corresponding to the columns of A0 containing leading 1’s, called leading variables, and the rest, called free variables. We solve for the leading variables in terms of the free variables; the free variables can be assigned arbitrary values independently of each other. Examples 7.3.9. 1. We shall show that the following system of equations is inconsistent. x + 2y − 3z = −1 3x − y + 2z = 7 5x + 3y − 4z = 2 The first step is to write down the augmented matrix of the system. In this case, this is the matrix  1 2 −3 −1   3 −1 2 7  5 3 −4 2

Carry out the elementary row operations R2 ← R2 − 3R1 and R3 ← R3 − 5R1. This gives us  1 2 −3 −1   0 −7 11 10  0 −7 11 7

Now carry out the elementary row operation R3 ← R3 − R2 which yields  1 2 −3 −1   0 −7 11 10  0 0 0 −3 The equation corresponding to the last line of the augmented matrix is 0x + 0y + 0z = −3. Clearly, this equation has no solutions because it is zero on the left and non-zero on the right. It follows that the original set of equations has no solutions. 7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 203

2. We shall show that the following system of equations has exactly one solution, and we shall also check it.

x + 2y + 3z = 4 2x + 2y + 4z = 0 3x + 4y + 5z = 2

We first write down the augmented matrix

 1 2 3 4   2 2 4 0  3 4 5 2

We then carry out the elementary row operations R2 ← R2 − 2R1 and R3 ← R3 − 3R1 to get

 1 2 3 4   0 −2 −2 −8  0 −2 −4 −10

1 Then carry out the elementary row operations R2 ← − 2 R2 and R3 ← 1 − 2 R3 that yield  1 2 3 4   0 1 1 4  0 1 2 5

Finally, carry out the elementary row operation R3 ← R3 − R2

 1 2 3 4   0 1 1 4  0 0 1 1

This is now a row echelon matrix. Write down the corresponding set of equations

x + 2y + 3z = 4 y + z = 4 z = 1 204 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

Now solve by back substitution to get x = −5, y = 3 and z = 1. Finally, we check that  1 2 3   −5   4   2 2 4   3  =  0  3 4 5 1 2

3. We shall show that the following system of equations has infinitely many solutions, and we shall check them.

x + 2y − 3z = 6 2x − y + 4z = 2 4x + 3y − 2z = 14

The augmented matrix for this system is  1 2 −3 6   2 −1 4 2  4 3 −2 14 We transform this matrix into an echelon matrix by means of the fol- lowing elementary row operations R2 ← R2 − 2R1, R3 ← R3 − 4R1, 1 1 R2 ← − 5 R2, R3 ← − 5 R3 and R3 ← R3 − R2. This yields  1 2 −3 6   0 1 −2 2  0 0 0 0 Because the bottom row consists entirely of zeros, this means that we have only two equations

x + 2y − 3z = 6 y − 2z = 2

By back substitution, both x and y can be expressed in terms of z, and z may take any value we like. We say that z is a free variable. Let z = λ ∈ R. Then the set of solutions can be written in the form  x   2   −1   y  =  2  + λ  2  z 0 1 7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 205

We now check that these solutions work  1 2 −3   2 − λ   6   2 −1 4   2 + 2λ  =  2  4 3 −2 λ 14

as required.

Exercises 7.3

1. In each case, determine whether the system of equations is consistent or not. When consistent, find all solutions and show that they work.

(a)

2x + y − z = 1 3x + 3y − z = 2 2x + 4y + 0z = 2

(b)

2x + y − z = 1 3x + 3y − z = 2 2x + 4y + 0z = 3

(c)

2x + y − 2z = 10 3x + 2y + 2z = 1 5x + 4y + 3z = 4

(d)

x + y + z + w = 0 4x + 5y + 3z + 3w = 1 2x + 3y + z + w = 1 5x + 7y + 3z + 3w = 2 206 CHAPTER 7. MATRICES I: LINEAR EQUATIONS 7.4 Blankinship’s algorithm

The ideas of this chapter lead to an alternative, and better, procedure3 for calculating the integers x and y such that gcd(a, b) = xa + yb. To explain how it works, let’s go back to the basic step of Euclid’s algorithm. If a ≥ b then we divide b into a and write

a = bq + r where 0 ≤ r ≤ b. The key point is that gcd(a, b) = gcd(b, r). We shall now think of (a, b) and (b, r) as column matrices

 a   r  , . b b

We want the 2 × 2 matrix that maps

 a   r  to . b b

This is the matrix  1 −q  . 0 1 Thus  1 −q   a   r  = . 0 1 b b Finally, we can describe the process by the following matrix operation

 1 0 a   1 −q r  → 0 1 b 0 1 b by carrying out an elementary row operation. This procedure can be iterated. It will terminate when one of the entries in the righthand column is 0. The non-zero entry will then be the greatest common divisor of a and b and the matrix on the lefthand side will tell you how to get to 0, gcd(a, b) from a, b and so will provide the information that the Euclidean algorithm provides. All of this is best illustrated by means of an example.

3It was described by W. A. Blankinship in his paper ‘A new version of the Euclidean algorithm’ American Mathematical Monthly 70 (1963), 742–745. 7.4. BLANKINSHIP’S ALGORITHM 207

Let’s calculate x, y such that gcd(2520, 154) = x2520 + y154. We start with the matrix  1 0 2520  0 1 154 If we divide 154 into 2520 it goes 16 times plus a remainder. Thus we subtract 16 times the second row from the first to get

 1 −16 56  0 1 154

We now repeat the process but, since the larger number, 154, is on the bottom, we have to subtract some multiple of the first row from the second. This time we subtract twice the first row from the second to get

 1 −16 56  −2 33 42

Now repeat this procedure to get

 3 −49 14  −2 33 42

And again  3 −49 14  −11 180 0 The process now terminates because we have a zero in the rightmost column. The non-zero entry in the rightmost column is gcd(2520, 154). We also know that  3 −49   2520   14  = . −11 180 154 0 Now this matrix equation corresponds to two equations. It is the one corre- sponding to the non-zero value that says

14 = 3 × 2520 − 49 × 154 which is both true and solves the extended Euclidean problem. 208 CHAPTER 7. MATRICES I: LINEAR EQUATIONS Chapter 8

Matrices II: inverses

We have learnt how to add subtract and multiply matrices but we have not defined division. The reason is that in general it cannot always be defined. In this chapter, we shall explore when it can be. All matrices will be square and I will always denote an identity matrix of the appropriate size.

8.1 What is an inverse?

The simplest kind of linear equation is ax = b where a and b are scalars. If a 6= 0 we can solve this by multiplying by a−1 on both sides to get a−1(ax) = a−1b. We now use associativity to get (a−1a)x = a−1b. Finally, a−1a = 1 and so 1x = a−1b and this gives x = a−1b. The number a−1 is the multiplicative inverse of the non-zero number a. We now try to copy this approach for the matrix equation Ax = b. We suppose that there is a matrix B such that BA = I. • Multiply on the left both sides of our equation Ax = b to get B(Ax) = Bb. Because order matters when you multiply matrices, which side you multiply on also matters. • Use associativity of matrix mulitiplication to get (BA)x = Bb. • Now use our assumption that BA = I to get Ix = Bb. • Finally, we use the properties of the identity matrix to get x = Bb.

209 210 CHAPTER 8. MATRICES II: INVERSES

We appear to have solved our equation, but we need to check it. We calculate A(Bb). By associativity this is (AB)b. At this point we also have to assume that AB = I. This gives Ib = b, as required. We conclude that in order to copy the method for solving a linear equation in one unknown, our coefficient matrix A must have the property that there is a matrix B such that

AB = I = BA.

We take this as the basis of the following definition. A matrix A is said to be invertible if we can find a matrix B such that AB = I = BA. The matrix B we call it an inverse of A, and we say that the matrix A is invertible. Observe that A has to be square. A matrix that is not invertible is said to be singular. Example 8.1.1. A real number r regarded as a 1 × 1 matrix is invertible if and only if it is non-zero, in which case an inverse is its reciprocal. It’s clear that if A is a zero matrix, then it can’t be invertible just as in the case of real numbers. However, the next example shows that even if A is not a zero matrix, then it need not be invertible. Example 8.1.2. Let A be the matrix  1 1  0 0

We shall show that there is no matrix B such that AB = I = BA. Let B be the matrix  a b  c d From BA = I we get a = 1 and a = 0. It’s impossible to meet both these conditions at the same time and so B doesn’t exist. On the other hand here is an example of a matrix that is invertible. Example 8.1.3. Let  1 2 3  A =  0 1 4  0 0 1 8.1. WHAT IS AN INVERSE? 211 and  1 −2 5  B =  0 1 −4  0 0 1 Check that AB = I = BA. We deduce that A is invertible with inverse B. As always, in passing from numbers to matrices things become more complicated. Before going any further, I need to clarify one point which will at least make our lives a little simpler. Lemma 8.1.4. Let A be invertible and suppose that B and C are matrices such that AB = I = BA and AC = I = CA. Then B = C. Proof. Multiply AB = I both sides on the left by C. Then C(AB) = CI. Now CI = C, because I is the identity matrix, and C(AB) = (CA)B since matrix multiplication is associative. But CA = I thus (CA)B = IB = B. It follows that C = B. The above result tells us that if a matrix A is invertible then there is only one matrix B such that AB = I = BA. We call the matrix B the inverse of A. It is usually denoted by A−1. It is important to remember that we can only write A−1 if we know that A is invertible. In the following, we describe some important properties of the inverse of a matrix. Lemma 8.1.5. 1. If A is invertible then A−1 is invertible and its inverse is A. 2. If A and B are both invertible and AB is defined then AB is invertible with inverse B−1A−1.

3. If A1,...,An are all invertible and A1 ...An is defined then A1 ...An −1 −1 is invertible and its inverse is An ...A1 . Proof. (1) This is immediate from the equations A−1A = I = AA−1. (2) Show that

AB(B−1A−1) = I = (B−1A−1)AB.

(3) This follows from (2) above and induction. 212 CHAPTER 8. MATRICES II: INVERSES

We shall deal with the practical computation of inverse later. Let me conclude this section by returning to my original motivation for introducing an inverse. Theorem 8.1.6 (Matrix inverse method). A system of linear equations

Ax = b in which A is invertible has unique solution

x = A−1b.

Proof. Observe that

A(A−1b) = (AA−1)b = Ib = b.

Thus A−1b is a solution. It is unique because if x0 is any solution then

Ax0 = b giving A−1(Ax0) = A−1b and so x0 = A−1b.

Example 8.1.7. We shall solve the following system of equations using the matrix inverse method

x + 2y = 1 3x + y = 2

Write the equations in matrix form.  1 2   x   1  = 3 1 y 2 Determine the inverse of the coefficient matrix. In this case, you can check that this is the following −1  1 −2  A−1 = 5 −3 1 8.2. DETERMINANTS 213

Now we may solve the equations. From Ax = b we get that x = A−1b. Thus in this case −1      3  1 −2 1 5 x = = 1 5 −3 1 2 5 3 1 Thus x = 5 and y = 5 . Finally, it is always a good idea to check the solutions. There are two (equivalent) ways of doing this. The first is to check by direct substitution 3 1 x + 2y = + 2 · = 1 5 5 and 3 1 3x + y = 3 · + = 2 5 5 Alternatively, you can check by matrix mutiplication

   3  1 2 5 1 3 1 5 which gives  1  2 You can see that both calculations are, in fact, identical.

8.2 Determinants

The obvious questions that arise from the previous section are how do we decide whether a matrix is invertible or not and, if it is invertible, how do we compute its inverse? The material in this section is key to answering both of these questions. I shall define a number, called the determinant, that can be calculated from any square matrix. Unfortunately, the definition is completely unmotivated but it will justify itself by being useful. Let A be a square matrix. We denote its determinant by det(A) or by replacing the round brackets of the matrix A with straight brackets. It is defined inductively: this means that I define an n × n determinant in terms of (n − 1) × (n − 1) determinants.

• The determinant of the 1 × 1 matrix a  is a. 214 CHAPTER 8. MATRICES II: INVERSES

• The determinant of the 2 × 2 matrix  a b  A = c d

denoted a b

c d is the number ad − bc. • The determinant of the 3 × 3 matrix  a b c   d e f  g h i denoted

a b c

d e f

g h i is the number

e f d f d e a − b + c h i g i g h

We could in fact define the determinant of any square matrix of whatever size in much the same way. However, we shall limit ourselves to calculating the determinants of 3 × 3 matrices at most. It’s important to pay attention to the signs in the definition. You multiply alternately by plus one and minus one + − + − ... Examples 8.2.1.

1. 2 3 = 2 × 5 − 3 × 4 = −2. 4 5 2.

2 1 0 0 2 1 2 1 0 2 = 2 − 1 = −5 1 1 0 1 0 1 1 8.2. DETERMINANTS 215

3.

1 2 1 1 0 3 0 3 1 3 1 0 = 1 − 2 + = −7 0 1 2 1 2 0 2 0 1 Determinants have many interesting properties, but as far as their con- nection with inverses is concerned, the following is the most important. Theorem 8.2.2. Let A and B be square matrices having the same size. Then

det(AB) = det(A) det(B).

Proof. The result is true in general, but I shall only prove it for 2×2 matrices. Let  a b   e f  A = and B = c d g h We prove directly that det(AB) = det(A) det(B). First

 ae + bg af + bh  AB = ce + dg cf + dh Thus det(AB) = (ae + bg)(cf + dh) − (af + bh)(ce + dg). The first bracket multiplies out as

acef + adeh + bcgf + bdgh and the second as acef + adfg + bceh + bdgh. Subtracting these two expressions we get

adeh + bcgf − adfg − bceh.

Now we calculate det(A) det(B). This is just

(ad − bc)(eh − fg) which multiplies out to give

adeh + bcfg − adfg − bceh.

Thus the two sides are equal, and we have proved the result. 216 CHAPTER 8. MATRICES II: INVERSES

I shall mention one other property of determinants that we shall need when we come to study vectors and which will be useful in developing the theory of inverses. It can be proved in the 2 × 2 and 3 × 3 cases by direct verification.

Theorem 8.2.3. Let A be a square matrix and let B be obtained from A by interchanging any two columns. Then det(B) = − det(A).

There is a very important consequence of the above result.

Proposition 8.2.4. If two columns of a determinant are equal then the de- terminant is zero.

Proof. Let A be a matrix with two columns equal. Then if we swap those two columns the matrix remains unchanged. Thus by Theorem 8.2.3, we have that det A = − det A. It follows that det A = 0.

Exercises 8.2

1. Compute the following determinants.

(a)

1 −1

2 3 (b)

3 2

6 4 (c)

1 −1 1

2 3 4

0 0 1 (d)

1 2 0

0 1 1

2 3 1 8.3. WHEN IS A MATRIX INVERTIBLE? 217

(e)

2 2 2

1 0 5

100 200 300 (f)

1 3 5

102 303 504

1000 3005 4999 (g)

1 1 2

2 1 1

1 2 1 (h)

15 16 17

18 19 20

21 22 23

1 − x 4 2. Solve = 0. 2 3 − x 3. Calculate

x cos x sin x

1 − sin x cos x

0 − cos x − sin x

4. Prove that a b = 0 c d if, and only if, one column is a scalar multiple of the other. Hint: consider two cases: ad = bc 6= 0 and ad = bc = 0 for this case you will need to consider various possibilities.

8.3 When is a matrix invertible?

Recall from Theorem 8.2.2 that

det(AB) = det(A) det(B). 218 CHAPTER 8. MATRICES II: INVERSES

I use this property below to get a necessary condition for a matrix to be invertible. Lemma 8.3.1. If A is invertible then det(A) 6= 0. Proof. By assumption, there is a matrix B such that AB = I. Take deter- minants of both side of the equation AB = I to get det(AB) = det(I). By the key property of determinants recalled above det(AB) = det(A) det(B) and so det(A) det(B) = det(I). But det(I) = 1 and so det(A) det(B) = 1. In particular, det(A) 6= 0. Are there any other properties that a matrix must satisfy in order to have an inverse? The answer is, surprisingly, no. We shall prove that a square matrix A is invertible if, and only if, det A 6= 0. I shall only be able to sketch out the proof of this theorem below. The practical issue of actually computing inverses is dealt with in the next section. To motivate things, we start with a 2 × 2 matrix A where we can prove everything. Let  a b  A = c d

We construct a new matrix as follows. Replace each entry aij of A by the element you get when you cross out the ith row and jth column. Thus we get  d c  b a We now use the following matrix of signs  + −  − + 8.3. WHEN IS A MATRIX INVERTIBLE? 219 to get  d −c  −b a We now take the transpose of this matrix to get the matrix we call the adjugate of A  d −b  adj(A) = −c a The defining characteristic of the adjugate is that

Aadj(A) = det(A)I = adj(A)A which can easily be checked. We deduce from the defining characteristic of the adjugate that if det(A) 6= 0 then

1  d −b  A−1 = det(A) −c a We have therefore proved the following. Proposition 8.3.2. A 2×2 matrix is invertible if and only if its determinant is non-zero.  1 2  Example 8.3.3. Let A = . Determine if A is invertible and, if it 3 1 is, find its inverse, and check the answer. We calculate det(A) = −5. This is non-zero, and so A is invertible. We now form the adjugate of A:

 1 −2  adj(A) = −3 1

Thus the inverse of A is 1  1 −2  A−1 = − 5 −3 1

We now check that AA−1 = I (to make sure that we haven’t made any mistakes). We now consider the general case. Here I will simply sketch out the argument. Let A be an n × n matrix with entries aij. We define its adjugate as the result of the following sequence of operations. 220 CHAPTER 8. MATRICES II: INVERSES

• Pick a particular row i and column j. If we cross out this row and column we get an (n − 1) × (n − 1) matrix which I shall denote by M(A)ij. It is called a submatrix of the original matrix A.

• The determinant det(M(A)ij) is called the minor of the element aij.

• Finally, if we multiply det(M(A)ij) by the corresponding sign we get i+j the cofactor cij = (−1) det(M(A)ij) of the element aij.

• If we replace each element aij by its cofactor, we get the matrix C(A) of cofactors of A. • The transpose of the matrix of cofactors C(A), denoted adj(A), is called the adjugate1 matrix of A. Thus the adjugate is the transpose of the matrix of signed minors. The crucial property of the adjugate is described in the next result. Theorem 8.3.4. For any square matrix A, we have that A(adj(A)) = det(A)I = (adj(A))A. Proof. We have verified the above result in the case of 2 × 2 matrices. I shall now prove it in the case of 3 × 3 matrices by means of an argument that generalizes. Let A = (aij) and we write   c11 c21 c31 B = adj(A) =  c12 c22 c32  c13 c23 c33 We shall compute AB. We have that

(AB)11 = a11c11 + a12c12 + a13c13 = det A by expanding the determinant along the top row. The next element is

(AB)12 = a11c21 + a12c22 + a13c23. But this is the determinant of the matrix   a11 a12 a13  a11 a12 a13  a31 a32 a33

1This odd word comes from Latin and means ‘yoked together’. 8.3. WHEN IS A MATRIX INVERTIBLE? 221 which, having two rows equal, must be zero by Proposition 8.2.4. This pat- tern now continues with all the off-diagonal entries being zero for similar reasons and the diagonal entries all being the determinant.

We may now prove the main theorem of this section.

Theorem 8.3.5. Let A be a square matrix. Then A is invertible if and only if det(A) 6= 0. When A is invertible, its inverse is given by

1 A−1 = adj(A). det(A)

Proof. Let A be invertible. By our lemma above, det(A) 6= 0 and so we can form the matrix 1 adj(A). det(A) We now calculate 1 1 A adj(A) = A adj(A) = I det(A) det(A) by our theorem above. Thus A has the advertised inverse. Conversely, suppose that det(A) 6= 0. Then again we can form the matrix

1 adj(A) det(A) and verify that this is the inverse of A and so A is invertible.

Example 8.3.6. Let  1 2 3  A =  2 0 1  −1 1 2 We show that A is invertible and calculate its inverse. First, det(A) = −5 and so A is invertible. The matrix of minors is

 −1 5 2   1 5 3  2 −5 −4 222 CHAPTER 8. MATRICES II: INVERSES

The matrix of cofactors is  −1 −5 2   −1 5 −3  2 5 −4

The adjugate is the transpose of the matrix of cofactors

 −1 −1 2   −5 5 5  2 3 −4

Thus the inverse of A is the adjugate with each entry divided by the deter- minant of A  −1 −1 2  1 A−1 = − −5 5 5 5   2 −3 −4

The Moore-Penrose Inverse We have proved that a square matrix has an inverse if, and only if, it has a non-zero determinant. For rectangular matrices, the existence of an inverse doesn’t even come up for discussion. However, in later applications of matrix theory it is very convenient if every matrix have an ‘inverse’. Let A be any matrix. We say that A+ is its Moore-Penrose inverse if the following conditions hold:

1. A = AA+A.

2. A+ = A+AA+.

3.( A+A)T = A+A.

4.( AA+)T = AA+.

It is not obvious, but every matrix A has a Moore-Penrose inverse A+ and, in fact, such an inverse is uniquely determined by the above four conditions. In the case, where A is invertible in the vanilla-sense, its Moore-Penrose inverse is just its inverse. But even singular matrices have Moore-Penrose inverse. You can check that the matrix defined 8.4. COMPUTING INVERSES 223

below satisfies the four conditions above

 1 2 +  0 · 002 0 · 006  = 3 6 0 · 04 0 · 12

The Moore-Penrose inverse can be used to find approximate solutuions to systems of linear equations that might otherwise have no solution.

Exercises 8.3

1. Use the adjugate method to compute the inverses of the following ma- trices. In each case, check that your solution works.

 1 0  (a) 0 2  1 1  (b) 1 2  1 0 0  (c)  0 2 0  0 0 3  1 2 3  (d)  2 0 1  −1 1 2  1 2 3  (e)  1 3 3  1 2 4  2 2 1  (f)  −2 1 2  1 −2 2

8.4 Computing inverses

The practical way to compute inverses is to use elementary row operations. I shall first describe the method and then I shall prove that it works. Let A be a square n×n matrix. We want to determine whether it is invertible and, 224 CHAPTER 8. MATRICES II: INVERSES if it is, we want to calculate its inverse. We shall do this at the same time and we shall not need to calculate a determinant. We write down a new kind of augmented matrix this time of the form B = (A | I) where I is the n × n identity matrix. The first part of the algorithm is to carry out elementary row operations on B guided by A. Our goal is to convert A into a row echelon matrix. This will have zeros below the leading diagonal. We are interested in what entries lie on the leading diagonal. If one of them is zero we stop and say that A is not invertible. If all of them are 1 then the algorithm continues. We now use the 1’s that lie on the leading diagonal to remove all element above each 1. Our original matrix B now has the following form (I | A0). I claim that A0 = A−1. I shall illustrate this method by means of an example. Example 8.4.1. Let  2 −2 4  A =  2 3 2  −1 1 1 We shall show that A is invertible and calculate its inverse. We first write down the augmented matrix  2 −2 4 1 0 0   2 3 2 0 1 0  −1 1 1 0 0 1 We now carry out a sequence of elementary row operations to get the follow- ing  1 −1 1 0 0 −1  1 2  0 1 0 0 5 5  1 0 0 1 2 0 1 The leading diagonal contains only 1’s and so our original matrix is invertible. We now use these 1’s to insert zeros above using elementary row operations.

 1 1 8  1 0 0 − 2 5 − 5 1 2  0 1 0 0 5 5  1 0 0 1 2 0 1 It follows that the inverse of A is  1 1 8  − 2 5 − 5 −1 1 2 A =  0 5 5  1 2 0 1 8.4. COMPUTING INVERSES 225

At this point, it is always advisiable to check that A−1A = I in fact rather than just in theory. We now need to explain why this method works. An n × n matrix E is called an elementary matrix if it is obtained from the n × n identity matrix by means of a single elementary row operation. Example 8.4.2. Let’s find all the 2 × 2 elementary matrices. The first one is obtained by interchanging two rows and so is  0 1  1 0 Next we obtain two matrices by multiplying each row by a non-zero scalar λ  λ 0   1 0  0 1 0 λ Finally, we obtain two matrices by adding a scalar multiple of one row to another row  1 λ   1 0  0 1 λ 1 There are now two key results we shall need. Lemma 8.4.3. 1. Let B be obtained from A by means of a single elementary row operation ρ. Thus B = ρ(A). Let E = ρ(I). Then B = EA. 2. Each elementary row matrix is invertible. Proof. (1) This has to be verified for each of the three types of elementary row operation. I shall deal with the third class of such operations: Rj ← Rj +λRi. Apply this elementary row operation to the n × n identity matrix I to get the matrix E. This agrees with the identity matrix everywhere except the jth row. There it has a λ in the ith column and, of course, a 1 in the jth column. We now calculate the effect of E on any suitable matrix A. Then EA will be the same as A except in the jth row. This will consist of the jth row of A to which λ times the ith row of A has been added. (2) Let E be the elementary matrix that arises from the elementary row operation ρ. Thus E = ρ(I). Let ρ0 be the elementary row operation that undoes the effect of ρ. Thus ρρ0 and ρ0ρ are both identity functions. Let E0 = ρ0(I). Then E0E = ρ0(E) = ρ0(ρ(I)) = I. Similarly, EE0 = I. It follows that E is invertible with inverse E0. 226 CHAPTER 8. MATRICES II: INVERSES

Example 8.4.4. We give an example of 2×2 elementary matrices. Consider the elementary matrix  1 0  E = λ 1 which is obtained from the 2 × 2 identity matrix by carrying out the ele- mentary row operation R2 ← R2 + λR1. We now calculate the effect of this matrix when we multiply it into the following matrix

 a b c  A = d e f and we get  a b c  EA = λa + d λb + e λc + f But this matrix is what we would get if we applied the elementary row operation directly to the matrix A.

We may now prove that our the elementary row operation method for calculating inverses which we described above really works.

Proposition 8.4.5. If (I | B) can be obtained from (A | I) by means of elementary row operations then A is invertible with inverse B.

Proof. Let the elementary row operations that transform (A | I) to (I | B) be ρ1, . . . , ρn in this order. Thus

(ρn . . . ρ1)(A) = I and (ρn . . . ρ1)(I) = B.

Let Ei be the elementary matrix corresponding to the elementary row oper- ation ρi. Then

(En ...E1)A = I and (En ...E1)I = B.

Now the matrices Ei are invertible and so

−1 A = (En ...E1) and B = En ...E1.

Thus B is the inverse of A as claimed. 8.5. THE CAYLEY-HAMILTON THEOREM 227

Exercises 8.4

1. Use elementary row operations to compute the inverses of the following matrices. In each case, check that your solution works.

 1 0  (a) 0 2  1 1  (b) 1 2  1 0 0  (c)  0 2 0  0 0 3  1 2 3  (d)  2 0 1  −1 1 2  1 2 3  (e)  1 3 3  1 2 4  2 2 1  (f)  −2 1 2  1 −2 2

8.5 The Cayley-Hamilton theorem

The goal of this section is to prove a major theorem about square matrices. It is true in general, although I shall only prove it for 2×2 matrices. It provides a first indication of the importance of certain polynomials in studying matrices. Let A be a square matrix. We can therefore form the product AA which we write as A2. When it comes to multiplying A by itself three times there are apparently two possibilities: A(AA) and (AA)A. However, matrix multi- plication is associative and so these two products are equal. We write this as A3. In general An+1 = AAn = AnA. We define A0 = I, the identity matrix the same size as A. The usual properties of exponents hold

AmAn = Am+n and (Am)n = Amn. 228 CHAPTER 8. MATRICES II: INVERSES

One important consequence is that powers of A commute so that

AmAn = AnAm.

We can form powers of matrices, multiply them by scalars and add them together. We can therefore form sums like

A3 + 3A2 + A + 4I.

In other words, we can substitute A in the polynomial

x3 + 3x2 + x + 4 remembering that 4 = 4x0 and so has to be replaced by 4I. Example 8.5.1. Let f(x) = x2 + x + 2 and let

 1 1  A = 1 0

We calculate f(A). Remember that x2 + x + 2 is really x2 + x + 2x0. Replace x by A and so x0 is replaced by A0 which is I. We therefore get A2 + A + 2I and calculating gives  5 2  2 3 It is important to remember that when a square matrix A is substituted into a polynomial, you must replace the constant term of the polynomial by the constant term times the identity matrix. The identity matrix you use will have the same size as A. We now come to an important extension of what we mean by a root. If f(x) is a polynomial and A is a square matrix, we say that A is a matrix root of f(x) if f(A) is the zero matrix. Let A be a square n × n matrix. Put

χA(x) = det(A − xI). Observe that x is essentially a complex variable and so cannot be replaced by a matrix. Then χA(x) is a polynomial of degree n called the characteristic polynomial of A. It is worth observing that when x = 0 we get that χA(0) = det(A), which is therefore the value of the constant term of the characteristic polynomial. 8.5. THE CAYLEY-HAMILTON THEOREM 229

Theorem 8.5.2 (Cayley-Hamilton). Every square matrix is a root of its characteristic polynomial.

Proof. I shall only prove this theorem in the 2 × 2 case, though it is true in general. Let  a b  A = c d Then from the definition the characteristic polynomial is

a − x b

c d − x

Thus 2 χA(x) = x − (a + d)x + (ad − bc).

We now calculate χA(A) which is just

 a b 2  a b   ad − bc 0  − (a + d) + c d c d 0 ad − bc

This simplifies to  0 0  0 0 which proves the theorem in this case. The general proof uses the adjugate matrix of A − xI.

There is one very nice application of this theorem.

Proposition 8.5.3. Let A be an invertible n × n matrix. Then the inverse of A can be written as a polynomial in A of degree n − 1.

Proof. We may write χA(x) = f(x) + det(A) where f(x) is a polynomial with constant term zero. Thus f(x) = xg(x) for some polynomial g(x) of degree n − 1. By the Cayley-Hamilton theorem, 0 = Ag(A) + det(A)I Thus 1 Ag(A) = − det(A)I. Put B = − det(A) g(A). Then AB = I. But A and B commute since B is a polynomial in A. Thus BA = I. We have therefore proved that A−1 = B. 230 CHAPTER 8. MATRICES II: INVERSES 8.6 Complex numbers via matrices

Consider all matrices that have the following shape

 a −b  b a where a and b are arbitrary real numbers. You should show first that the sum, difference and product of any two matrices having this shape is also a matrix of this shape. Rather remarkably matrix multiplication is commutative for matrices of this shape. Observe that the determinant of our matrix above is a2 +b2. It follows that every non-zero matrix of the above shape is invertible. The inverse of the above matrix in the non-zero case is 1  a b  a2 + b2 −b a and again has the same form. It follows that the set of all these matrices satisfies the axioms of high-school algebra. Define

 1 0  1 = 0 1 and  0 −1  i = 1 0 We may therefore write our matrices in the form

a1 + bi.

Observe that i2 = −1. It follows that our set of matrices can be regarded as the complex numbers in disguise. Chapter 9

Vectors

Euclid’s book codified what was known about geometry into a handful of axioms and then showed that all of geometry could be deduced from those axioms by the use of mathematcial proof. Impressive though Euclid’s achieve- ment was, it does suffer one drawback in that it is not the easiest system to use. Even proving simple results, like Pythagoras’s theorem, takes dozens of intermediate results. So although it is a great theoretical achievement, it is not such a practical one. It was not until the nineteenth century that a practical tool for doing three-dimensional geometry was constructed. On the basis of the work carried out by Hamilton on quaternions — I say a little more about this later — the theory of vectors, the subject of this chapter, was developed by the American Josiah Willard Gibbs and promoted by the English electrical engineer Oliver Heaviside (whose formal schooling ended at the age of 16). In addition to setting up an algebraic system that will enable us to carry out geometrical calculations easily, I shall also touch on a deep connection with the work of the previous chapter. Each linear equation in three unknowns is in fact the equation of a plane in three-dimensional space. This means that the theory of linear equations in three unknowns has a geometrical interpretation. This may be generalized: the theory of matrices combined with a theory of vectors in arbitrary dimensions is known as linear algebra, one of the most important branches of algebra. I have not attempted to develop the subject in this chapter completely rigorously, so I often make appeals to geometric intuition in setting up the algebraic theory of vectors.

231 232 CHAPTER 9. VECTORS 9.1 Vector algebra

I shall assume you are familiar with the following ideas from school:

• The notion of a point.

• The notion of a line and of a line segment.

• The notion of the length of a line segment and the angle between two line segments.

• The notion of parallel lines.

The notion of a pair of lines being parallel is fundamental to Euclidean ge- ometry. We used it to prove that the angles in a triangle add up to two right angles.

9.1.1 Addition and scalar multiplication of vectors

Definition of a vector Two directed line segments which are parallel, have the same length, and point in the same direction are said to represent the same vector.

The word ‘vector’ means carrier in Latin and what a vector carries is information about length and direction and nothing else. Because vectors stay the same when they move parallel to themselves, they also preserve information about angles. Thus vectors have length and direction but no other properties. I shall denote vectors by bold letters a, b,... If P and Q are points then the directed −→ line segment from P to Q is written PQ or PQ. If P = Q then PQ is just a point. The zero vector 0 is represented by the degenerate line segment PP . Vectors are denoted by arrows: the vector starts at the base of the arrow (where the feathers would be) we shall call this the tail of the vector and ends at the tip (where the arrowhead is) which we shall call the point of the vector. 9.1. VECTOR ALGEBRA 233

Example 9.1.1. In the diagram below all the vectors shown are equal.

? ? ?                      ? ? ?                      ? ? ?                     

The set of vectors in space can be equipped with two operations: vector addition and multiplication by a scalar. Let a and b be vectors. Then their sum is defined as follows: slide the vectors parallel to themselves so that the point of a touches the tail of b. The directed line segment from the tail of a to the point of b represents the vector a + b. ? G ??  ??  ?? b  ??  ??  ??  ?? a  7  ooo  ooo  ooo  ooo  ooa+b  ooo  ooo ooo This definition does make sense though I will not justify that here. If a is a vector, then −a is defined to be the vector with the same length as a but pointing in the opposite direction.

?        a       −a          234 CHAPTER 9. VECTORS

Theorem 9.1.2 (Properties of vector addition).

(VA1) a + (b + c) = (a + b) + c. This is the associative law for vector addition.

(VA2) 0 + a = a = a + 0. The zero vector is the additive identity.

(VA3) a + (−a) = 0 = (−a) + a. The vector −a is the additive inverse of a.

(VA4) a + b = b + a. This is the commutative law for vector addition.

The proof of the commutativity of vector addition is illustrated below.

b 7/ ? ooo?  ooo   oo   ooo   ooo   ooo  a  a+b ooo   ooo   oob+a  a  oo   ooo   ooo   ooo   oo   ooo  oo  oo /  b

The proof of associativity is illustrated below.

We define a − b = a + (−b). 9.1. VECTOR ALGEBRA 235

Advanced remark We have seen the above properties before: real numbers with respect to addition, and m × n matrices with respect to matrix addi- tion. A set equipped with a binary operation that is associative, possesses an identity, possesses unique inverses and is commutative is called an abelian group.

Example 9.1.3. Consider the following square of vectors.

a O /

d b

 o c

Then we have a + b + c + d = 0. Thus, in particular, d = −c − b − a. 236 CHAPTER 9. VECTORS

Example 9.1.4. Consider the following diagram

b O ?/ ?  ??  ??  ??  ??  ??  ?? f  ?? c a  k ??  ??  ??  ??  ??  ??  ??  ?? o  o OOO g h OOO OOO OOO OOO OOO OOO OOO OOO d e OOO OOO OOO OOO OOO OOO OOO OO' 

(i) We may write c in terms of e, d and f. By following the arrows we get that c = d + ef

(ii) We may write g in terms of c, d, e and k. By following the arrows we get that g = −k + c + d − e.

(iii) We may solve x + b = f using similar methods to high-school algebra to get x = f − b which is just a.

(iv) We may solve x + h = d − e in a similar fashion to get x = d − e − h which is just g.

If a is a vector then kak is its length. If kak = 1 then a is called a unit vector. We have that kak ≥ 0, and kak = 0 iff a = 0. By results on triangles we have the triangle inequality

ka + bk ≤ kak + kbk .

We now define multiplication of a vector by a scalar. Let λ be a scalar and a a vector. If λ = 0 then λa = 0; if λ > 0 then λa has the same direction 9.1. VECTOR ALGEBRA 237 as a and length λ kak; if λ < 0 then λa has the opposite direction to a and length (−λ) kak. Observe that in all cases kλak = |λ| kak . If a is non-zero then a aˆ = kak is a unit vector in the same direction as a. We call this process normalisation. Vectors that differ by a scalar multiple are said to be parallel. Theorem 9.1.5 (Properties of scalar multiplication). (i)0 a = 0. (ii)1 a = a. (iii)( −1)a = −a. (iv)( λ + µ)a = λa + µa. (v) λ(a + b) = λa + λb. (vi) λ(µa) = (λµ)a. We can use what we have introduced so far to prove simple geometric theorems. Example 9.1.6. If the midpoints of the consecutive sides of any quadrilateral are joined by line segments, then the resulting quadrilateral is a parallelo- gram. We refer to the picture below. 238 CHAPTER 9. VECTORS

We have that a + b + c + d = 0. −→ −−→ Now AB = 1 a + 1 b and CD = 1 c + 1 d. But a + b = −(c + d). It follows −→ 2 −−→ 2 2 2 that AB = −CD. Hence the line segment AB is parallel to the line segment CD and they have the same lengths. Similarly, BC is parallel to AD and has the same length.

9.1.2 Inner, scalar or dot products We now introduce a notion that will enable us to measure angles and lengths. It is a development of the idea of the perpendicular projection of a line onto another line. Let a and b be two vectors. If a, b 6= 0 then we define

a · b = kak kbk cos θ where θ is the angle between a and b. Note that this angle is always chosen to be 0 ≤ θ ≤ π. If either a or b is zero then a · b is defined to be zero. We call a · b the inner product of a and b. It is also sometimes called the scalar product and the dot product. It is important to remember that it is a scalar and not a vector. We say that non-zero vectors a and b are orthogonal to each other if the angle between them is ninety degrees. The key property of the inner product is that for non-zero a and b we have that

a · b = 0 iff a and b are orthogonal.

Theorem 9.1.7 (Properties of the inner product).

(i) a · b = b · a.

(ii) a · a = kak2.

(iii) λ(a · b) = (λa) · b = a · (λb).

(iv) a · (b + c) = a · b + a · c.

Remarks

(i) The inner product a · a is often abbreviated a2. 9.1. VECTOR ALGEBRA 239

(ii) Property (iv) says that the inner product of a sum is the sum of the inner products. It will be very important to us. It is the only property that takes a bit of work to prove. I give the proof later.

The inner product enables us to prove much more interesting theorems.

Example 9.1.8. The angle in a semicircle is a right angle. Draw a semicircle. Choose any point on the circumference of the semicircle and join it to the points at either end of the diameter of the semicircle. Then the claim is that the resulting triangle is right-angled.

We are interested in the angle formed by AB and AC. Observe that −→ −→ AB = −(a + b) and AC = a − b. Thus −→ −→ AB · AC = −(a + b) · (a − b) = −(a2 − a · b + b · a − b2) = −(a2 − b2) = 0 using the fact that a·b = b·a and kak = kbk, because this is just the radius of the semicircle. It follows that the angle BAC is a right angle, as claimed. 240 CHAPTER 9. VECTORS

Example 9.1.9. Pythagoras’ theorem proved using vectors.

We have that a + b + c = 0 and so a + b = −c. Now (a + b)2 = (−c) · (−c) = kck2 . But (a + b)2 = kak2 + 2a · b + kbk2 and this is equal to kak2 + kbk2 because a · b = 0. It follows that kak2 + kbk2 = kck2 . Remark The set of 3-dimensional vectors equipped with the operations of vector addition and scalar multiplication together with the inner product is called three dimensional Euclidean space E3. This is precisely the space of Euclid’s geometry, but done in a modern way.

9.1.3 Vector or cross products In three dimensional space there is another operation available to us that is useful in many applications. Let a and b be non-zero vectors. We define a new vector a × b = kak kbk sin θn 9.1. VECTOR ALGEBRA 241 where θ is the angle between a and b, and n is a unit vector at right angles to the plane containing a and b — this determines n up to sign: we choose the direction of n so that when rotating a to b in a clockwise direction through the angle θ we are looking in the direction of n.

O

a×b

b / ?? ?? ?? ?? ?? ?? a ?? ?? ?? ?? ?? ?? ? If a or b is zero then a × b is the zero vector. We call it the vector product of a and b. It is sometimes called the cross product. It is important to remember that it is a vector. The key property of the vector product is that for non-zero vectors

a × b = 0 iff a and b are parallel.

Theorem 9.1.10 (Properties of the vector product). (i) a × b = −b × a.

(ii) λ(a × b) = (λa) × b = a × (λb).

(iii) a × (b + c) = a × b + a × c. Remark Property (iii) says that the vector product distributes over addition. This is the hardest property to prove; I give the proof later.

Warning! a × (b × c) 6= (a × b) × c. In other words, the vector product is not associative. 242 CHAPTER 9. VECTORS

Warning! Distinguish between the following:

• λa. This is a scalar λ times a vector a and the result is a vector.

• a · b. This is the inner product of two vectors and is a scalar.

• a × b. This is the vector product of two vectors and is a vector.

You must not interchange notation for these different products (unlike school algebra where you can).

Example 9.1.11. The area of the parallelogram determined by the vectors a and b is ka × bk as the following picture shows.

Example 9.1.12. We shall prove the law of sines for triangles using the vector product. With reference to the diagram below

we have that sin A sin B sin C = = . a b c 9.1. VECTOR ALGEBRA 243

We choose vectors as shown so that

kak = a, kbk = b, kck = c.

Then a + b + c = 0. Hence a + b = −c. Take the vector product of this equation on both sides on the left with a, b and c in turn. We get 1. a × b = c × a.

2. b × a = c × b.

3. c × a = b × c. From (1), we get ka × bk = kc × ak . Thus kbk sin C = kck sin B which gives us the second equation in the statement of the result. The remaining results follow similarly.

9.1.4 Scalar triple products This product is nothing more than a combination of the previous two. How- ever, it is included because, as we shall see, it has an important geometric interpretation. Let a, b and c be three vectors. Then b × c is a vector. Thus a · (b × c) is a scalar. We define [a, b, c] = a · (b × c). It is called the scalar triple product. Its properties are determined by the properties of the inner and vectors products. What it means geometrically will be described later.

Exercises 4.1 244 CHAPTER 9. VECTORS

1. Consider the following diagram.

a b A / B / C

c

 DEF Now answer the following questions.

(i) Write the vector BD in terms of a and c (ii) Write the vector AE in terms of a and c (iii) What is the vector DE? (iv) What is the vector CF ? (v) What is the vector AC? (vi) What is the vector BF ?

2. If a, b, c and d represent the consecutive sides of a quadrilateral, show that the quadrilateral is a parallelogram if and only if a + c = 0.

3. In the regular pentagon ABCDE, let AB = a, BC = b, CD = c, and DE = d. Express EA, DA, DB, CA, EC, BE in terms of a, b, c, and d.

4. Let a and b represent adjacent sides of a regular hexagon so that the initial point of b is the terminal point of a. Represent the remaining sides by means of vectors expressed in terms of a and b.

5. Prove that kak b + kbk a is orthogonal to kak b − kbk a for all vectors a and b.

6. Let a and b be two non-zero vectors. Let a · b u = a. a · a Show that b − u is orthogonal to a. 9.2. VECTOR ARITHMETIC 245

7. Simplify (u + v) × (u − v).

π 8. Let a and b be two unit vectors the angle between them being 3 . Show that 2b − a and a are orthogonal.

9. Prove that ku − vk2 + ku + vk2 = 2(kuk2 + kvk2). Deduce that the sum of the squares of the diagonals of a parallelogram is equal to the sum of the squares of all four sides.

9.2 Vector arithmetic

The theory I introduced in Section 4.1 is useful for proving general results about geometry, but what if we want to calculate with particular vectors: how do we describe them? To do this we need coordinates, and vectors viewed in terms of coordinates will occupy us for the remainder of this section.

9.2.1 i’s, j’s and k’s Set up a cartesian coordinate system consisting of x, y and z axes. We orient the system so that in rotating the x axis clockwise to the y axis, we are looking in the direction of the positive z axis. Let i, j and k be unit vectors parallel to the x, y and z axes respectively (pointing in the positive directions). Every vector a can be uniquely written in the form

a = a1i + a2j + a3k for some scalars a1, a2, a3. This is achieved by orthogonal projection of the vector a (moved so that it starts at the origin) onto each of the three coor- dinate axes. The numbers ai are called the components of a in each of the three directions.

Remarks

• If a = a1i + a2j + a3k and b = b1i + b2j + b3k then a = b iff ai = bi; that is, corresponding components are equal.

• 0 = 0i + 0j + 0k. 246 CHAPTER 9. VECTORS

• If a = a1i + a2j + a3k and b = b1i + b2j + b3k then

a + b = (a1 + b1)i + (a2 + b2)j + (a3 + c3)k.

• If a = a1i + a2j + a3k then λa = λa1i + λa2j + λa3k.

Theorem 9.2.1 (Scalar products). Let a = a1i + a2j + a3k and b = b1i + b2j + b3k. Then a · b = a1b1 + a2b2 + a3b3. Proof. This is proved using Theorem 9.1.7 (iv) and the following table · i j k i 1 0 0 j 0 1 0 k 0 0 1 computed from the definition of the inner product. We have that

a · b = a · (b1i + b2j + b3k) = b1(a · i) + b2(a · j) + b3(a · k). We now compute a · i, a · j, and a · k in turn:

• a · i = a1.

• a · j = a2.

• a · k = a3. Putting everything together we get

a · b = a1b1 + a2b2 + a3b3, as required.

p 2 2 2 Remark If a = a1i + a2j + a3k then kak = a1 + a2 + a3.

Theorem 9.2.2 (Vector products). Let a = a1i + a2j + a3k and b = b1i + b j + b k. Then 2 3 i j k

a × b = a1 a2 a3

b1 b2 b3 Warning! This ‘determinant’ can only be expanded along the first row. 9.2. VECTOR ARITHMETIC 247

Proof. This follows by Theorem 9.1.10 (iii) and the following table

× i j k i 0 k −j j −k 0 i k j −i 0 computed from the definition of the vector product. We have that

a × b = a × (b1i + b2j + b3k) = b1(a × i) + b2(a × j) + b3(a × k).

We now compute a × i, a × j, and a × k in turn:

• a × i = −a2k + a3j.

• a × j = a1k − a3i.

• a × k = −a1j + a2i. Putting everything together we get

a × b = (a2b3 − a3b2)i − (a1b3 − a3b1)j + (a1b2 − a2b1)k which is equal to the given determinant.

The proof of the following now follows by our two theorems above. Theorem 9.2.3 (Scalar triple products and determinants). Let

a = a1i + a2j + a3k, b = b1i + b2j + b3k, c = c1i + c2j + c3k.

Then

a1 a2 a3

[a, b, c] = b1 b2 b3

c1 c2 c3 Thus the properties of scalar triple products are the same as the properties of 3 × 3 determinants.

Proof. We calculate a · (b × c). This is equal to

(a1i + a2j + a3k) · [(b2c3 − b3c2)i − (b1c3 − b3c1)j + (b1c2 − b2c1)k]. 248 CHAPTER 9. VECTORS

But this is equal to

a1(b2c3 − b3c2) − a2(b1c3 − b3c1) + a3(b1c2 − b2c1) which is nothing other than

a1 a2 a3

b1 b2 b3

c1 c2 c3

Before we look at some examples it is worth stepping back a bit to see where we are.

Summary

In Section 4.1, we defined vectors and vector operations geomet- rically. In Section 4.2, we showed that once we had chosen a co- ordinate system, vectors and vector operations could be described algebraically. The important point to remember in what follows is that the two approaches must give the same answers.

Exercises 4.2

1. Let a = 3i + 4j, b = 2i + 2j − k and c = 3i − 4k.

(i) Find kak, kbk, and kck. (ii) Find a + b and a − c. (iii) Determine ka − ck.

2. (i) Let a = 4i + j − 3k and b = i + 2j + 2k. Find a · b. Are a and b orthogonal? (ii) Find the angle between −2(i − j) + k and j − i.

3. The unit cube is determined by the three vectors i, j and k. Find the angle between the long diagonal of the unit cube and one of its edges. 9.3. GEOMETRY WITH VECTORS 249

4. Calculate i × (i × k) and (i × i) × k. What do you deduce as a result of this?

5. Calculate u · (v × w) where u = 3i − 2j − 5k, v = i + 4j − 4k, and w = 3j + 2k.

6. If [a, b, c] = 0 what can you deduce?

9.3 Geometry with vectors

There are two kinds of vectors: the free vectors that we have been dealing with up to now and the position vectors we introduce next.

9.3.1 Position vectors So far, we have used vectors to describe line segments. But we can also use vectors to describe the precise location of points. To do this, we have to choose and fix a point O in space, called an origin. We can then consider all the directed line segments that start at O. Each such segment represents a vector and every vector is thus represented. The tops of the line segments are points in space, and every point thus occurs. It follows that once an origin has been fixed, vectors can be used to describe points. We talk about the position vectors of points. However, we can only talk about position vectors with respect to some fixed point O. 250 CHAPTER 9. VECTORS

Example 9.3.1. The point A has position vector a = −i + j and the point B has position vector b = 2i + j − k. Find the position vector of the point 2 P which is 3 of the way between A and B.

A O ? ?? ?? ?? ?? ?? ??2 ?? ?? ?? ?? ?? ?? a P ? ??  ??  ??  ??  ?? p  ?? 1  ??  ??  ??  ??  ??  ??  ? O / B b

We have that −→ −→ −→ OP = OA + AP −→ 2−→ = OA + AB 3 2 = a + (b − a) 3 1 2 = a + b 3 3 1 2 = (−i + j) + (2i + j − k) 3 3 2 = i + j − k 3

9.3.2 Linear combinations

Let v1,..., vn be n vectors and let λ1, . . . , λn be n scalars. Then the vector

v = λ1v1 + ... + λnvn 9.3. GEOMETRY WITH VECTORS 251 is called a linear combination of the n vectors. Only two cases of this definition are needed in this course. If we are given just one vector v1 then a linear combination is just a scalar multiple of that vector. The other case if where we have two vectors v1 and v2. Linear combinations then look like this

λ1v1 + λ2v2.

Let v be any non-zero vector. Then any vector parallel to this vector has the form λv for some scalar λ.

Now let v1 and v2 be two non-zero vectors where neither is a multiple of the other. Then these two vectors determine a plane in space. This plane is not rooted to any point and so, for convenience, we may move it parallel to itself so that it passes through some fixed point that we may treat as an origin. Now let v be any vector which is parallel to this plane. We may move it parallel to itself so that its tail is at the origin. By plane geometry, we may find real numbers λ1 and λ2 such that

v = λ1v1 + λ2v2.

We shall use these ideas in deriving formulae for lines and planes in space in the sections below.

9.3.3 Lines Intuitively, a line in space is determined by one of the following two pieces of information:

1. Two distinct points each described by a position vector.

2. One point and a direction where the point is given by a position vector and the direction by a (free) vector.

Let’s see how we can use vectors to obtain the equation of that line. Let a and b be the position vectors of two distinct points. Let r = xi + yj + zk be the position vector of a point on the line they determine. Observe that the line determined by the two points will be parallel to the vector b − a which is the direction the line is parallel to. 252 CHAPTER 9. VECTORS

The vectors r − a and b − a will be parallel. Thus there is a scalar λ such that r − a = λ(b − a). It follows that r = a + λ(b − a). This is called the (vector form of) the parametric equation of the line. The parameter in question is λ.

We now derive the coordinate form of the parametric equation. Let

a = a1i + a2j + a3k and b = b1i + b2j + b3k. Substituting in our vector form above and equating components we obtain

x = a1 + λ(b1 − a1), y = a2 + λ(b2 − a2), z = a3 + λ(b3 − a3).

For convenience, put ci = bi −ai. Thus the coordinate form of the parametric equation for the line is

x = a1 + λc1, y = a2 + λc2, z = a3 + λc3. 9.3. GEOMETRY WITH VECTORS 253

If c1, c2, c3 6= 0 then we can eliminate the parameters in the above equa- tions to get the non-parametric equations of the line: x − a y − a y − a z − a 1 = 2 , 2 = 3 . c1 c2 c2 c3

It’s worth noting that

• The parametric equation is useful for generating points on the line (by choosing values of the parameter λ).

• The non-parametric equation is useful for checking that given points lie on a given line.

Example 9.3.2. Find the parametric and the non-parametric equations of the line through the point with position vector i + 2j + 3k and parallel to the vector 4i + 5j + 6k. In this question, we are given the direction that the line is parallel to. Thus r − (i + 2j + 3k) is parallel to 4i + 5j + 6k. It follows that r = i + 2j + 3k + λ(4i + 5j + 6k) is the vector form of the parametric equation of the line. We now find the cartesian form of the parametric equation. Put

r = xi + yj + zk.

Then xi + yj + zk = i + 2j + 3k + λ(4i + 5j + 6k). These two vectors are equal iff their coordinates are equal. Thus we have that

x = 1 + 4λ y = 2 + 5λ z = 3 + 6λ 254 CHAPTER 9. VECTORS

This is the cartesian form of the parametric equation of the line. Finally, we eliminate λ to get the non-parametric equation of the line x − 1 y − 2 y − 2 z − 3 = and = . 4 5 5 6 These two equations can be rewritten in the form

5x − 4y = −3 and 6y − 5z = −3. 9.3. GEOMETRY WITH VECTORS 255

9.3.4 Planes Intuitively, a plane in space is determined by one of the following three pieces of information: 1. Any three points that do not all lie in a straight line; that is, the points form the vertices of a triangle. 2. One point and two non-parallel directions. 3. One point and a direction which is perpendicular or normal to the plane. We shall begin by finding the parametric equation of the plane determined by the three points with position vectors a, b and c.

The vectors b − a and c − a are both parallel to the plane, but are not parallel to each other. Thus every vector parallel to the plane they determine has the form λ(b − a) + µ(c − a) for some scalars λ and µ. Here we use the ideas of Section 4.3.2. Thus if the position vector of an arbitrary point on the plane is r, then r − a = 256 CHAPTER 9. VECTORS

λ(b − a) + µ(c − a). Thus the (vector form of) the parametric equation of the plane is r = a + λ(b − a) + µ(c − a). This can easily be written in coordinate form by equating components. To find the non-parametric equation of a plane, we use the fact that a plane is determined once a point on the plane is known and a vector orthog- onal to every vector in the plane — such a vector is said to be normal to the plane. Let n be a vector normal to our plane, and let a be the position vector of a point in the plane.

Then r − a is orthogonal to n. Thus (r − a) · n = 0. This is the (vector form) of the non-parametric equation of the plane. To find the coordinate form of the non-parametric equation, let

r = xi + yj + zk, a = a1i + a2j + a3k, n = n1i + n2j + n3k.

From (r − a) · n = 0 we get (x − a1)n1 + (y − a2)n2 + (z − a3)n3 = 0. Thus the non-parametric equation of the plane is

n1x + n2y + n3z = a1n1 + a2n2 + a3n3. Remark From the equation above, we deduce that the solutions of a linear equation in three unknowns ax + by + cz = d 9.3. GEOMETRY WITH VECTORS 257 all lie on a plane in general (although there are some degenerate cases where something different from a plane will be obtained). We observe that the non-parametric equation of the line in fact describes the line as the intersection of two planes. If we have three equations in three unknowns then, as long as the planes are angled correctly, they will intersect in a point — that is, the equations will have a unique solution. However, there are many cases where either the planes have no points in common (no solution) of have lines or indeed planes in common (infinitely many solutions). Thus the nature of the solutions of a system of linear equations in three unknowns is intimately bound up with the geometry of the planes they de- termine.

We have one final question to answer: given the parametric equation of the plane, how do we find the non-parametric equation? The vectors b − a and c − a are parallel to the plane but not parallel to each other. The vector

n = (b − a) × (c − a) is normal to our plane. Example 9.3.3. Find the parametric and non-parametric equations of the plane containing the three points with position vectors

a = j − k, b = i + j, c = i + 2j.

We have that b − a = i + k and c − a = i + j + k. Thus the parametric equation of the plane is

r = j − k + λ(i + k) + µ(i + j + k).

To find the non-parametric equation, we need to find a vector normal to the plane. We calculate (b − a) × (c − a) = k − i. Thus (r − a) · (k − i) = 0. 258 CHAPTER 9. VECTORS

That is (xi + (y − 1)j + (z + 1)k) · (k − i) = 0. This simplifies to z − x = −1, the non-parametric equation of the plane. We now check that our three original points satisfies this equation. The point a has co-ordinates (0, 1, −1); the point b has co-ordinates (1, 1, 0); the point c has co-ordinates (1, 2, 0). It is easy to check that each set of co-ordinates satisfies the equation.

9.3.5 Determinants Let’s start with 1 × 1 matrices. The determinant of (a) is just a. The length of a is |a|, the absolute value of the determinant of (a). 9.3. GEOMETRY WITH VECTORS 259

Theorem 9.3.4. Let a = ai + cj and b = bi + dj be a pair of plane vectors. Then the area of the parallelogram determined by these vectors is the absolute value of the determinant

a b

c d

Proof. The proof I give will be for the case where both vectors are in the first quadrant. I shall consider two cases.

(Case 1): b is to the left of a when standing at the origin and looking along a. Let a = ai + cj and b = bi + dj. The area of the parallelogram is the area of the rectangle defined by the points 0, (a + b)i, a + b, (c + d)j minus the area of two rectangles the same size, labelled (1), two triangles the same size, labelled (2), and another two triangles of the same size, labelled (3). That is 1 1 (a + b)(c + d) − 2bc − 2( )ac − 2( )bd 2 2 which is equal to

ac + ad + bc + bd − 2bc − bd − ac = ad − bc. 260 CHAPTER 9. VECTORS

(Case 2): b is to the right of a when standing at the origin and looking along a. A similar argument shows that the area is bc − ad which is the negative of the determinant. Putting these two cases together, we see that the area is the absolute value of the determinant, because we usually expect areas to be non-negative.

Theorem 9.3.5. Let a = ai + dj + gk, b = bi + ej + hk, c = ci + fj + ik be three vectors. Then the volume of the parallelepiped (‘squashed box’) de- termined by these three vectors is the absolute value of the determinant

a b c

d e f

g h i or its transpose. Proof. We refer to the diagram below.

The volume of the box determined by the vectors a, b, c is equal to the base area times the vertical height. This is equal to the absolute value of kak kbk sin θ kck cos φ. We have to use the absolute value of this expression because cos(φ) can take negative values if c is below rather than above the plane of a and b as I have drawn it. Now 9.3. GEOMETRY WITH VECTORS 261

• a × b = kak kbk sin θn, where n is the unit vector orthogonal to a and b and in the correct direction. • n · c = kck cos φ. Thus kak kbk sin θ kck cos φ = (a × b) · c. By the properties of the inner product (a × b) · c = c · (a × b) = [c, a, b]. We now use properties of the determinant [c, a, b] = −[a, c, b] = [a, b, c]. It follows that the volume of the box is the absolute value of [a, b, c].

It follows from the above theorem and our theorem on scalar triple prod- ucts that the volume of the parallelepiped determined by the three vectors a, b, and c is the absolute value of the scalar triple product [a, b, c]. The geometric significance of determinants is that they enable us to mea- sure lengths, areas and volumes.

Exercises 4.3

1. (i) Find the parametric and the non-parametric equations of the line through the two points with position vectors i − j + 2k and 2i + 3j + 4k. (ii) Find the parametric and the non-parametric equations of the plane containing the three points with position vectors i+3k, i+2j−k, and 3i − j − 2k. 2. Let c be the position vector of the centre of a sphere with radius R. Let an arbitrary point on the sphere have position vector r. Why is kr − ck = R? Squaring both sides we get (r − c) · (r − c) = R2. 262 CHAPTER 9. VECTORS

If r = xi + yj + zk and c = c1i + c2j + c3k, deduce that the equation of the sphere with centre c1i + c2j + c3k and radius R is

2 2 2 2 (x − c1) + (y − c2) + (z − c3) = R .

(i) Find the equation of the sphere with centre i + j + k and radius 2. (ii) Find the centre and radius of the sphere with equation

x2 + y2 + z2 − 2x − 4y − 6z − 2 = 0.

3. The distance of a point from a line is defined to be the length of the perpendicular from the point to the line. Let the line in question have parametric equation r = p + λd and let the position vector of the point be q. Show that the distance of the point from the line is

kd × (q − p)k . kdk

4. The distance of a point from a plane is defined to be the length of the perpendicular to the plane. Let the position vector of the point be q and the equation of the plane be (r−p)·n = 0. Show that the distance of the point from the plane is

|(q − p) · n| . knk 9.4. SUMMARY OF VECTORS 263 9.4 Summary of vectors

Inner products Definition Let a and b be two vectors. If a, b 6= 0 then we define

a · b = kak kbk cos θ where θ is the angle between a and b. Note that this angle is always chosen to be 0 ≤ θ ≤ π. If either a or b is zero then a · b is defined to be zero. We call a · b the inner product of a and b. Co-ordinate form

Let a = a1i + a2j + a3k and b = b1i + b2j + b3k. Then

a · b = a1b1 + a2b2 + a3b3.

Uses

• The most important application is the following: if the vectors a and b are non-zero then a · b = 0 precisely when a and b are orthogonal — meaning ‘at right angles to each other’.

• The inner product can more generally be used to work out the angle between two vectors a · b cos θ = kak kbk where θ is the angle between the non-zero vectors a and b.

• The inner√ product can be used to work out the lengths of vectors: kak = a · a. 264 CHAPTER 9. VECTORS

Vector products Definition Let a and b be non-zero vectors. We define a new vector

a × b = kak kbk sin θn where θ is the angle between a and b, and n is a unit vector at right angles to the plane containing a and b. This determines n up to sign: we choose the direction of n so that when rotating a to b in a clockwise direction through the angle θ we are looking in the direction of n. If a or b is zero then a × b is the zero vector. We call it the vector product of a and b. Co-ordinate form

Let a = a1i + a2j + a3k and b = b1i + b2j + b3k. Then

i j k

a × b = a1 a2 a3

b1 b2 b3 Uses

• The most important application of the vector product is in construct- ing a vector orthogonal to two other vectors, and in particular in con- structing a vector orthogonal to a plane. That is, a vector normal to the plane.

• If the vectors a and b are non-zero then a × b = 0 precisely when a and b are parallel to each other.

• The vector product can be used to calculate the sine of the angle be- tween two vectors ka × bk sin θ = kak kbk where θ is the angle between the non-zero vectors a and b. 9.4. SUMMARY OF VECTORS 265

Scalar triple products Definition Let a, b and c be three vectors. Then b × c is a vector. Thus a · (b × c) is a scalar. We define [a, b, c] = a · (b × c). It is called the scalar triple product. Co-ordinate form

Let a = a1i + a2j + a3k, b = b1i + b2j + b3k, and c = c1i + c2j + c3k. Then

a1 a2 a3

[a, b, c] = b1 b2 b3

c1 c2 c3 Uses

• The absolute value of [a, b, c] is the volume of the parallelepiped (‘squashed box’) determined by the three vectors.

• The scalar triple product gives a geometric interpretation of 3 × 3 de- terminants. 266 CHAPTER 9. VECTORS 9.5 *Two vector proofs*

This section will not be examined in 2013.

My development of the theory of vectors in this chapter depended on two important results: Theorem 9.1.7 (iv), the fact that

a · (b + c) = a · b + a · c and Theorem 9.1.10 (iii), the fact that

a × (b + c) = a × b + a × c.

I shall sketch out proofs of both of these results here. The proof of the first is not too difficult. Theorem 9.5.1. a · (b + c) = a · b + a · c.

Proof. Let x and y be a pair of vectors. Then the component of x in the direction of y, written comp(x, y), is by definition the number kxk cos θ where θ is the angle between x and y. Clearly

x · y = kyk comp(x, y).

Geometry shows (this means you should draw the pictures) that

comp(b + c, a) = comp(b, a) + comp(c, a).

We therefore have that

(b + c) · a = kak comp(b + c, a) = kak comp(b, a) + kak comp(c, a) = b · a + c · a

The proof of the second is hairier. Theorem 9.5.2. a × (b + c) = a × b + a × c. 9.5. *TWO VECTOR PROOFS* 267

Proof. We defined the vector product in terms of geometry and so we shall have to prove this property by means of geometry. I shall sketch out a proof following one given in Pettofrezzo’s book. We begin with what is in effect a lemma. Let a and b be a pair of vectors. It is convenient to move them so that they are both emanating from the same point P . They determine a plane. In that plane, we can draw a line perpendicular to the vector a and passing through the point P . We project the vector b onto this line and we get a vector b0. We claim that a × b = a × b0. The proof follows by observing that these two vectors clearly have the same direction and a calculation shows that they have the same length. We now prove our theorem. We orientate ourselves so that the vector a is at right angles to the page and pointing at you the reader. We project the vectors a and b onto the plane of the page to get the vectors a0 and b0. We shall prove that a × (b0 + c0) = a × b0 + a × c0. Let’s see first why this result is enough to prove the theorem. The vectors a and b + c define a plane. As in our lemma above, we have that a × (b + c) = a×(b+c)0. Also a×b = a×b0 and a×c = a×c0. As long as (b+c)0 = b0 +c0, our theorem will follow. We now prove that a × (b0 + c0) = a × b0 + a × c0. Now, by the way we have defined our vectors, a × b0 and a × c0 are in the plane of the page and are orthogonal to b0 and c0, respectively. This leads to the crux of the proof: the angle between a × b0 and a × c0 is the same as the angle between b0 and c0. The point is that because a is pointing out of the page, the operator a × − has the effect of rotating vectors by a right angle in the plane of the page. It follows that a×b0 +a×c0 is at right angles to b0 +c0. Thus a×b0 +a×c0 and a × (b0 + c0) are vectors pointing in the same direction. We now compare the lengths of these two vectors. We shall use the fact that the triangles formed by the vectors a × b0 and a × c0 is similar to the triangle formed by the vectors b0 and c0. Thus ka × b0 + a × c0k ka × b0k = . kb0 + c0k kb0k 268 CHAPTER 9. VECTORS

But this works out to give that

ka × b0 + a × c0k = kak kb0 + c0k .

Our claim is now proved.

9.6 Quaternions

The set of quaternions, denoted by H, was invented by the Irish mathemati- cian Sir William Rowan Hamilton in 1843. They are 4-dimensional gener- alisations of the complex numbers. It was from the theory of quaternions that the modern theory of vectors with inner and vector products developed. To describe what they are, I shall reverse history and derive them from vec- tors. Recall the following from an earlier exercise. The Pauli matrices are: I,X,Y,Z, −I, −X, −Y, −Z where

 0 1   i 0   0 −i  X = Y = and Z = −1 0 0 −i −i 0 where i is the complex number i. You were asked to show that the product of any two Pauli matrices is again a Pauli matrix by completing a table. We shall just need a portion of that table relating to X, Y and Z. This is

X Y Z X −I Z −Y Y −Z −I X Y Y −X −I

We shall now consider matrices of the form

λI + αX + βY + γZ where λ, α, β, γ ∈ R. We calculate the product of two such matrices using the distributivity and scalar multiplication properties of matrix multiplication and the above multiplication table. The product

(λI + αX + βY + γZ)(µI + α0X + β0Y + γ0Z) can be written in the form aI + bX + cY + dZ where a, b, c, d ∈ R although I shall write it in a slightly different form 9.6. QUATERNIONS 269

(λµ − αα0 − ββ0 − γγ0)I + λ(α0X + β0Y + γ0Z) + µ(αX + βY + γZ) + (βγ0 − γβ0)X + (γα0 − αγ0)Y + (αβ0 − βα0)Z.

Although this looks complicated there are some familiar things within it: the first term contains what looks like an inner product and the last term contains what looks like a vector product. Note that because this is matrix multiplication this operation is associative. The above calculation motivates the following construction. Let E3 de- note the set of all 3-dimensional vectors. Thus a typical element of E3 is αi + βj + γk. Put 3 H = R × E . The elements of H are therefore ordered pairs (λ, a) consisting of a real number λ and a vector a. We define the sum of two elements of H in a very simple way (λ, a) + (µ, a0) = (λ + µ, a + a0). The product is defined in a way that mimics what I did above (you should check this)

(λ, a)(µ, a0) = (λµ − a · a0, λa0 + µa + (a × a0)) .

It follows that this product is associative ! We shall now investigate what we can do with H. I shall only deal with multiplication because addition poses no problems.

• Consider the subset R of H which consists of elements of the form (λ, 0). You can check that (λ, 0)(µ, 0) = (λµ, 0). Thus R mimics the real numbers.

• Consider the subset C of H which consists of the elements of the form (λ, ai). You can check that

(λ, ai)(µ, a0i) = (λµ − aa0, (λa0 + µa)i).

In particular, (0, i)(0, i) = (−1, 0). Thus C mimics the set of complex numbers. 270 CHAPTER 9. VECTORS

• Consider the subset E of H which consists of elements of the form (0, a). You can check that

(0, a)(0, a0) = (−a · a0, a × a0).

Thus E mimics vectors, the inner product and the vector product.

The set H with the above operations of addition and multiplication is the set of quaternions. This structure pulls together most of the important elements of this course: complex numbers, vectors and matrices.