MATHEMATICS Algebra, geometry, combinatorics
Dr Mark V Lawson
November 17, 2014 ii Contents
Preface v
Introduction vii
1 The nature of mathematics 1 1.1 The scope of mathematics ...... 1 1.2 Pure versus applied mathematics ...... 3 1.3 The antiquity of mathematics ...... 4 1.4 The modernity of mathematics ...... 6 1.5 The legacy of the Greeks ...... 8 1.6 The legacy of the Romans ...... 8 1.7 What they didn’t tell you in school ...... 9 1.8 Further reading and links ...... 10
2 Proofs 13 2.1 How do we know what we think is true is true? ...... 14 2.2 Three fundamental assumptions of logic ...... 16 2.3 Examples of proofs ...... 17 2.3.1 Proof 1 ...... 17 2.3.2 Proof 2 ...... 20 2.3.3 Proof 3 ...... 22 2.3.4 Proof 4 ...... 23 2.3.5 Proof 5 ...... 25 2.4 Axioms ...... 31 2.5 Mathematics and the real world ...... 35 2.6 Proving something false ...... 35 2.7 Key points ...... 36 2.8 Mathematical creativity ...... 37
i ii CONTENTS
2.9 Set theory: the language of mathematics ...... 37 2.10 Proof by induction ...... 46
3 High-school algebra revisited 51 3.1 The rules of the game ...... 51 3.1.1 The axioms ...... 51 3.1.2 Indices ...... 57 3.1.3 Sigma notation ...... 60 3.1.4 Infinite sums ...... 62 3.2 Solving quadratic equations ...... 64 3.3 *Order ...... 70 3.4 *The real numbers ...... 71
4 Number theory 75 4.1 The remainder theorem ...... 75 4.2 Greatest common divisors ...... 85 4.3 The fundamental theorem of arithmetic ...... 91 4.4 *Modular arithmetic ...... 102 4.4.1 Congruences ...... 103 4.4.2 Wilson’s theorem ...... 106 4.5 *Continued fractions ...... 107 4.5.1 Fractions of fractions ...... 107 4.5.2 Rabbits and pentagons ...... 110
5 Complex numbers 117 5.1 Complex number arithmetic ...... 117 5.2 The fundamental theorem of algebra ...... 125 5.2.1 The remainder theorem ...... 126 5.2.2 Roots of polynomials ...... 128 5.2.3 The fundamental theorem of algebra ...... 130 5.3 Complex number geometry ...... 135 5.3.1 sin and cos ...... 135 5.3.2 The complex plane ...... 135 5.3.3 Arbitrary roots of complex numbers ...... 139 5.3.4 Euler’s formula ...... 142 5.4 *Making sense of complex numbers ...... 144 5.5 *Radical solutions ...... 145 5.5.1 Cubic equations ...... 145 CONTENTS iii
5.5.2 Quartic equations ...... 148 5.5.3 Symmetries and particles ...... 150 5.6 *Gaussian integers and factorizing primes ...... 151
6 *Rational functions 153 6.1 Numerical partial fractions ...... 153 6.2 Analogies ...... 156 6.3 Partial fractions ...... 157 6.4 Integrating rational functions ...... 161
7 Matrices I: linear equations 165 7.1 Matrix arithmetic ...... 165 7.1.1 Basic matrix definitions ...... 165 7.1.2 Addition, subtraction, scalar multiplication and the transpose ...... 167 7.1.3 Matrix multiplication ...... 169 7.1.4 Special matrices ...... 173 7.1.5 Linear equations ...... 175 7.1.6 Conics and quadrics ...... 176 7.1.7 Graphs ...... 177 7.2 Matrix algebra ...... 180 7.2.1 Properties of matrix addition ...... 180 7.2.2 Properties of matrix multiplication ...... 181 7.2.3 Properties of scalar multiplication ...... 182 7.2.4 Properties of the transpose ...... 183 7.2.5 Some proofs ...... 183 7.3 Solving systems of linear equations ...... 189 7.3.1 Some theory ...... 190 7.3.2 Gaussian elimination ...... 192 7.4 Blankinship’s algorithm ...... 200
8 Matrices II: inverses 203 8.1 What is an inverse? ...... 203 8.2 Determinants ...... 207 8.3 When is a matrix invertible? ...... 212 8.4 Computing inverses ...... 218 8.5 The Cayley-Hamilton theorem ...... 222 8.6 Determinants redux ...... 228 iv CONTENTS
8.7 *Complex numbers via matrices ...... 228
9 Vectors 231 9.1 Vectors geometrically ...... 232 9.1.1 Addition and scalar multiplication of vectors ...... 232 9.1.2 Inner products ...... 237 9.1.3 Vector products ...... 240 9.2 Vectors algebraically ...... 246 9.2.1 The geometric meaning of determinants ...... 249 9.3 Geometry with vectors ...... 253 9.3.1 Position vectors ...... 253 9.3.2 Linear combinations ...... 253 9.3.3 Lines ...... 254 9.3.4 Planes ...... 258 9.3.5 The geometric meaning of linear equations ...... 261 9.4 *Quaternions ...... 262
10 Combinatorics 265 10.1 More set theory ...... 265 10.1.1 Operations on sets ...... 265 10.1.2 Partitions ...... 270 10.2 Ways of counting ...... 271 10.2.1 Counting principles ...... 271 10.2.2 The power set ...... 273 10.2.3 Counting arrangements: permutations ...... 274 10.2.4 Counting choices: combinations ...... 275 10.2.5 Examples of counting ...... 277 10.3 The binomial theorem ...... 279 10.4 *Infinite numbers ...... 285 Preface
Mathematics is the single most important cultural innovation after language. But if your recollections of school mathematics don’t go much beyond solv- ing quadratic equations, then you would be forgiven for thinking this a wild claim. In fact, the modern world would be impossible without mathemat- ics. I don’t mean just more difficult and inconvenient — I mean impossible. However, the mathematics that makes the world go round is hidden, usually embedded in the programs that turn inert silicon into useful technology. But the usefulness of mathematics is not the only reason to study it. Mathemat- ics is also a man-made universe that is endlessly fascinating. Where else, for example, could such ideas as being infinitely small or infinitely large be contemplated and used? The aim of this book is essentially a practical one: to provide a bridge between school and university mathematics, and a firm foundation for fur- ther study. But to do this, I have to talk about mathematics as well as do mathematics. The talking-about should help you with the doing — it is not supposed to be just waffle. This book is mainly self-contained but where there are connections with calculus I have mentioned them. Mathematics does not divide into water-tight compartments. For example, some of the deepest theorems about the prime numbers require the tools of calculus ap- plied to complex numbers, that is complex analysis, for their proof. Purity of method is artificial and misleading. Maths is difficult and ’twas always so. The commentator Proclus in the fifth century records a story about the mathematician Euclid. He was asked by the ruler of Egypt, Ptolomy, if there wasn’t some easier way of learning maths than through Euclid’s big book on geometry, known as the Elements. Euclid’s reply was correct in every respect but didn’t contribute to the pop- ularity of mathematicians. There was, he said, no royal road to geometry. In other words: no short-cuts, not even for god-kings. Despite that, I hope
v vi PREFACE this book will make the road a little easier. Introduction: what are algebra, geometry and combinatorics?
Algebra
Algebra started as the study of equations. The simplest kinds of equations are ones like 3x − 1 = 0 where there is only one unknown x and that unknown occurs to the power 1. This means we have x alone and not, say, x1000. It is easy to solve this specific equation. Add 1 to both sides to get
3x = 1 and then divide both sides by 3 to get 1 x = . 3 This is the solution to my original equation and, to make sure, we check our answer by calculating 1 3 · − 1 3 and observing that we really do get 0 as required. Even this simple example raises an important point: to carry out these calculations, I had to know what rules the numbers and symbols obeyed. You probably applied these rules unconsciously, but in this book it will be important to know explicitly what they are. The method used for the specific example above can be applied to any equation of the form
ax + b = 0
vii viii INTRODUCTION as long as a 6= 0. Here a, b are specific numbers, probably real numbers, and x is the real number I am trying to find. This equation is the most general example of a linear equation in one unknown. If x occurs to the power 2 then we get
ax2 + bx + c = 0 where a 6= 0. This is an example of a quadratic equation in one unknown. You will have learnt a formula to solve such equations. But there is no reason to stop at 2. If x occurs to the power 3 we get a cubic equation in one unknown
ax3 + bx2 + cx + d = 0 where a 6= 0. Solving such equations is much harder than solving quadratics but there is also an algebraic formula for the roots. But there is no reason to stop at cubics. We could look at equations in which x occurs to the power 4, quartics, and once again there is a formula for finding the roots. The highest power of x that occurs in such an equation is called its degree. These results might lead you to expect that there are always algebraic formulae for finding the roots of any polynomial equation whatever its degree. There aren’t. For equations of degree 5, the quintics, and more, there are no algebraic formulae which enable you to solve the equations. I don’t mean that no formulae have yet been discovered, I mean that someone has proved that such a formula is impossible, that someone being the young French mathematician Evariste Galois (1811–1832), the James Dean of mathematics. Galois’s work meant the end of the view that algebra was about finding formulae to solve equations. We shall not study Galois’s work in this book but it has had a huge impact on algebra. It is one of the reasons why the algebra you study later in your university careers will look very different from the algebra you studied at school. In fact, one of my goals in writing this book is to help you navigate this transition. I have talked about solving equations where there is one unknown but there is no reason to stop there. We can also study equations where there are any finite number of unknowns and those unknowns occur to any powers. The best place to start is where we have any number of unknowns but each unknown can occur only to the first power and no products of unknowns are allowed. This means we are studying linear equations like
x + 2y + 3z = 4. ix
Our goal is to find all the values of x, y and z that satisfy this equation. Thus the solutions are ordered triples (x, y, z). For example, both (0, 2, 0) and (2, 1, 0) are solutions whereas (1, 1, 1) is not a solution. It is unusual to have just one linear equation to solve. Usually we have two or more such as
x + 2y + 3z = 4 and x + y + z = 0.
We then need to find all the triples (x, y, z) that satisfy both equations simultaneously. In fact, as you should check, all the triples
(λ − 4, 4 − 2λ, λ) where λ is any number satisfy both equations. For this reason, we often speak about simultaneous linear equations. It turns out that solving systems of linear equations never becomes difficult however many unknowns there are. The modern way of studying systems of linear equations uses matrix theory. That leaves studying equations where there are at least 2 unknowns and where there are no constraints on the powers of the unknowns and the extent to which they may be multiplied together. This is much more complicated. If you only allow squares such as x2 or products of at most two unknowns, such as xy, then there are relatively simple methods for solving them. But, even here, strange things happen. For example, the solutions to
x2 + y2 = 1 can be written (x, y) = (sin θ, cos θ). If you allow cubes or products of more than two unknowns then you enter the world of subjects like algebraic ge- ometry and even connect with current research. In this book, I shall introduce you to the theory of polynomial equations and also to the theory of linear equations. I shall also show you how to solve equations that look like this
ax2 + bxy + cy2 + dx + ey + f = 0.
So far, I have been talking about the algebra of numbers. But I shall also introduce you to the algebra of matrices, and the algebra of vectors, and the algebra of subsets of a set, amongst others. In fact, I think the first shock on encountering university mathematics can be summed up in the following x INTRODUCTION statement.
There is not one algebra, but many different algebras, each de- signed for different purposes.
These different algebras are governed by different sets of rules. For this reason, it becomes crucial in university mathematics to make those rules explicit. In this book, the algebra you studied at school I often call high- school algebra so we know what we are talking about. In my description of solving equations, I have left to one side something that probably seemed obvious: the nature of the solutions. These solutions are of course numbers but what do we mean by ‘numbers’? You might think that a number is a number but in mathematics this concept turns out to be much more interesting than it might first appear. The everyday idea of a number is essentially that of a real number. Informally, these are the numbers that can be expressed as positive or negative decimals, with possibly an infinite number of digits after the decimal place such as
π = 3 · 14159265358 ... where the dots indicate that this can be continued forever. Whilst such numbers are sufficient to solve linear equations in one unknown, they are not enough to solve quadratics, cubics, quartics etc. These require the in- troduction of complex numbers which involve such apparent ineffabilities as the square root of minus one. Because such numbers don’t occur in everyday life, there is a temptation to view them as somehow artificial or of purely theoretical interest. This is wrong with a capital w. All numbers are artifi- cial, in that they are artefacts of our imaginations that help us to understand the world. Although you can see examples of two things you cannot see the number two. It is an idea, an abstraction. As for being of only theoretical interest, it is worth noting that quantum mechanics, the theory that explains the behaviour of atoms and their constituents, uses complex numbers in an essential way. In fact, for mathematicians the word ‘number’ usually means ‘complex number’ and mathematics is unthinkable without them. But this is not the end of our excavations of what we mean by the word ‘number’. There are occasions when we want to restrict the solutions: we might want whole number solutions or solutions as fractions. It turns out that the usual high-school methods for solving equations don’t work in these xi cases. For example, consider the equation
2x + 4y = 3.
To find the real or complex solutions, we let x be any real or complex value and then we can solve the equation to work out the corresponding value of y. But suppose that we are only interested in whole number solutions? In fact, there are none. You can see why by noting that the lefthand side of the equation is exactly divisible by 2, whereas the righthand side isn’t. When we are interested in solving equations, of whatever type, by means of whole numbers or fractions we say that we are studying Diophantine equations. The name comes from Diophantus of Alexandria who flourished around 250 CE, and who studied such equations in his book Arithmetica. It is ironic that solving Diophantine equations is often much harder than solving equations using real or complex numbers.
Geometry
If algebra is about manipulating symbols, then geometry is about pic- tures. The Ancient Greeks developed geometry to a very high level. Some of their achievements are recorded in Euclid’s book the Elements which I shall have more to say about later. It developed the whole of what became known as Euclidean geometry on the basis of a few rules known as axioms. This ge- ometry gives every impression of being a faithful mathematical version of the geometry of actual space and for that reason you might expect that, unlike algebra, there is only one geometry and that’s that. In fact, it was discovered in the nineteenth century that there are other mathematical geometries such as spherical geometry and hyperbolic geometry. In the twentieth century, it became apparent that even the space we inhabit was much more complex than it appeared. First came the four dimensional geometry of special rela- tivity and then the curved space-time of general relativity. Modern particle physics suggests that there may be many more dimensions in real space than we can see. So, in fact, we have the following.
There is not one geometry, but many different geometries, each designed for different purposes.
In this book, I will only talk about three-dimensional Euclidean geometry, but this is the gateway to all these other geometries. xii INTRODUCTION
This, however, is not the end of the story. In fact, any book about algebra must also be about geometry. The two are indivisible but it was not always like that. Unlike geometry which began with a sort of Big Bang in Ancient Greece, algebra crystallized much more slowly over time and in different places. There is even some algebra, disguised, in the Elements. In the 17th century, Ren´eDescartes discovered the first connection between algebra and geometry which will be completely familiar to you. For example, x2 + y2 = 1 is an algebraic equation, but it also describes something geometric: a circle of unit radius centred on the origin. This connection between algebra and geometry will play an important role in our study of linear equations and vectors. But it is just a beginning.
If you are studying an algebra look for an accompanying geometry, and if you are studying a geometry find a companion algebra.
This is quite a fancy way of saying things, but it boils down to the fact that manipulating symbols is often helped by drawing pictures, and sometimes the pictures are too complex so it is helpful to replace them with symbols. It’s not a one-way street. I want to give you some idea of why the connection between algebra and geometry is so significant. Let me start with a problem that looks completely algebraic. Problem: find all whole numbers a, b, c that satisfy the equation a2 + b2 = c2. I’ll write solutions that satisfy this equation as (a, b, c). Such numbers are called Pythagorean triples. Thus (0, 0, 0) is a solution and so is (3, 4, 5), and I can put in minus signs since when squared they disappear so (−3, 4, −5) is a solution. In addition, if (a, b, c) is a solution so is (λa, λb, λc) where λ is any whole number. I shall now show that this problem is equivalent to one in geometry. Suppose first that a2 + b2 = c2. We exclude the case where c = 0 since then a = 0 and b = 0. We may therefore divide both sides by c2 and get a2 b2 + = 1. c c Recall that a rational number is a real number that can be written in the u form v where u and v are whole numbers and v 6= 0. It follows that
a b (x, y) = , c c xiii is a rational point on the unit circle; that is, a point with rational co-ordinates. On the other hand, if m p (x, y) = , n q is a rational point on the unit circle then
(mq)2 (np)2 + = 1. (nq)2 (nq)2
Thus (mq, pn, nq) is a Pythagorean triple. We may therefore interpret our algebraic question as a geometric one: to find all Pythagorean triples, find all those points on the unit circle with centre the origin whose x and y co- ordinates are both rational. In fact, this can be used to get a very nice solution to the original algebraic problem as we shall show later.
Combinatorics
The term ‘combinatorics’ may not be familiar though the sorts of ques- tions it deals with are. Combinatorics is the branch of mathematics that deals with arrangements and the counting of arrangements. The fact that it deals in counting makes it sound like this should be an easy subject. In fact, it is often very difficult. For example, counting lies behind probability theory, a subject that can often defy intuition. Let me give you a simple example. In a class of, say, 25 students, how likely do you think it is that two students will share the same birthday? By this I mean, the same date and month, though not year. Unless you’ve seen this problem before, I think the instinct is to say ‘not very’. This is because we imagine in our mind’s eye those 25 students to be arranged across 365 days without any pair of students landing on the same date. In fact the answer, which you can cal- culate using the methods of this book, is just over a half. In other words, there is the same chance of two students sharing the same birthday as there is of tossing a coin and getting heads. This little problem is often known as the birthday paradox. It is a good example of where maths can be used to correct our faulty intuition. But this is really a counting problem. To get the right answers to such problems, you need to think about what you are counting in the right way. xiv INTRODUCTION Chapter 1
The nature of mathematics
This chapter is a guide to the mathematics described in this book.
1.1 The scope of mathematics
The most common replies to the question ‘what is mathematics?’ addressed to a non-mathematician are usually the depressing ‘arithmetic’ or ‘accoun- tancy’. Asked what they remember about school maths and they might be able to dredge up some more-or-less arcane words with challenging spellings: hypotenuse, isosceles, parallelogram. It either sounds a bit boring or a bit weird, but in any event is so obviously completely removed from real life that it can safely be ignored. Mathematics, therefore, has an image problem. I think part of the reason for this is the kind of maths that is taught in schools and the way it is taught. School mathematics suffers by being based on the narrow syllabuses proscribed by examining boards under political direction. As a result, it is more by luck than design if anyone at school gets an idea of what maths is actually about. In addition, teaching too often means teaching to the exam, which means working through past exam papers and learning tricks1. Let me begin by showing you just how vast a subject mathematics really is. The official Mathematics Subject Classification currently divides math- ematics into 64 broad areas in any one of which a mathematician could
1I say teaching and not teachers. My criticism is directed at policy not those who are forced to carry out that policy often under enormous pressures.
1 2 CHAPTER 1. THE NATURE OF MATHEMATICS work their entire professional life. You can see what they are in the box. By the way, the missing numbers are deliberate and not because I cannot count.
Mathematics Subject Classification 2010 (adapted) 00. General 01. History and biography 03. Mathematical logic and foundations 05. Combinatorics 06. Order theory 08. Gen- eral algebraic systems 11. Number theory 12. Field theory 13. Commutative rings 14. Algebraic geometry 15. Linear and multi- linear algebra 16. Associative rings 17. Non-associative rings 18. Category theory 19. K-theory 20. Group theory and generaliza- tions 22. Topological groups 26. Real functions 28. Measure and integration 30. Complex functions 31. Potential theory 32. Several complex variables 33. Special functions 34. Ordinary dif- ferential equations 35. Partial differential equations 37. Dynamical systems 39. Difference equations 40. Sequences, series, summa- bility 41. Approximations and expansions 42. Harmonic analysis 43. Abstract harmonic analysis 44. Integral transforms 45. Integral equations 46. Functional analysis 47. Operator theory 49. Calcu- lus of variations 51. Geometry 52. Convex geometry and discrete geometry 53. Differential geometry 54. General topology 55. Algebraic topology 57. Manifolds 58. Global analysis 60. Proba- bility theory 62. Statistics 65. Numerical analysis 68. Computer science 70. Mechanics 74. Mechanics of deformable solids 76. Fluid mechanics 78. Optics 80. Classical thermodynamics 81. Quantum theory 82. Statistical mechanics 83. Relativity 85. As- tronomy and astrophysics 86. Geophysics 90. Operations research 91. Game theory 92. Biology 93. Systems theory 94. Information and communication 97. Mathematics education
Each of these broad areas is then subdivided into a large number of smaller areas, any one of which could be the subject of a PhD thesis. This is a little overwhelming, so to make it more manageable it can be summarized, very roughly, into the following ten areas: 1.2. PURE VERSUS APPLIED MATHEMATICS 3
Algebra Number theory Calculus and analysis Probability and statistics Combinatorics Differential equations Geometry and topology Mathematical physics Logic Computing
Most undergraduate courses will fit under one of these headings. But it is important to remember that mathematics is one subject — dividing it up into smaller areas is done for convenience only. When solving a problem any and all of the above areas might be needed.
1.2 Pure versus applied mathematics
Sometimes a distinction is drawn between pure and applied mathematics. Pure maths is supposed to be maths done for its own sake with no thought to applications, whereas applied maths is maths used to solve some, presumably practical, problem. I think there is often an implicit moralistic undertone to this distinction with pure maths being viewed as perhaps rather self-indulgent and decorative, and applied maths as socially responsible grown-up maths that pays its way. Politicians prefer applied maths because they think it will make money. Evidence for this distinction is the following quote from the English mathematician G. H. Hardy (1877–1947) that is often used to prove the point:
“I have never done anything ‘useful’. No discovery of mine has made, or is likely to make, directly or indirectly, for good or ill, the least difference to the amenity of the world.”
Hardy was a truly great mathematician and a decent human being. As his dates show, he was of the generation that witnessed the First World War where science and technology were applied to the business of wholesale slaughter. His views on maths are therefore a not unnatural reaction on the part of someone who taught young people who then went to war never to return. Maths for him was perhaps a sanctuary2. In reality, the terms pure and applied are extremely fuzzy. A mathematician might start work on solving a real-life problem and then be led to develop new pure mathematics,
2There was a similar reaction at the end of the Second World War amongst physicists who turned instead to biology as an alternative to building weapons. 4 CHAPTER 1. THE NATURE OF MATHEMATICS or start in pure maths and develop an application. Calculus, for example, developed mainly out of the need to solve problems in physics and then was applied to pure maths. Complex numbers couldn’t have been more pure, introduced to provide the missing roots to polynomial equations, but are now the basis of quantum mechanics. In reality, there is just one mathematics.
The Banach-Tarski Paradox The glory of mathematics is often to be found in its sheer weirdness. For a universe founded on logic, it can lead to some pretty confounding conclusions. For example, a solid the size of a pea may be cut into a finite number of pieces which may then be reassembled in such a way as to form another solid the size of the sun. This is known as the Banach- Tarski Paradox (1924). There’s no trickery involved here and no sleight of hand. This is clearly pure maths — give me a real pea and whatever I do it will remain resolutely pea-sized — but the ideas it uses involve such fundamental and seemingly straightforward notions as length, area and volume that have important applications in applied maths.
1.3 The antiquity of mathematics
The history of chemistry or astronomy is not hugely relevant, however inter- esting it may be, to modern theories of chemistry or astronomy. A few hun- dred years ago, chemistry was alchemy and astronomy was astrology: modern chemists are not searching for the philosopher’s stone and astronomers don’t caste horoscopes. Alchemists and astrologers are often the forbears they would prefer to forget.3 Maths is different, since what was mathematically true hundreds of years ago remains true today. Here is a famous example. Plimpton 322 is a small clay tablet kept in the George A. Plimpton Collection at Columbia University dating to about 1,800 BCE. Impressed on the tablet are a number of columns of numbers written in cuneiform. The numbers are written not in base 10 but in base 60, the base that still lies behind the way we tell the time and measure angles. The meaning and purpose of this clay tablet is much disputed. But the second and third columns consist of the following numbers, where I have given the usual corrected numbers. I have
3I am exaggerating a little here for rhetorical purposes. In fact, much fine work was carried out under the guise of alchemy and astrology. 1.3. THE ANTIQUITY OF MATHEMATICS 5 given the first seven lines of the table — there are fifteen in the original.
B C 1 119 169 2 3367 4825 3 4601 6649 4 12709 18541 5 65 97 6 319 481 7 2291 3541
If you calculate C2 − B2 you will get a perfect square D2. Thus (B,D,C) is a Pythagorean triple. How such large Pythagorean triples were computed is a mystery. This antiquity, combined with the fact that maths is a cumulative subject, meaning that you have to learn X before you can learn Y , has the unfortunate effect that most of the mathematics you learnt at school was invented before 1800. Here is a very rough chronology.
BCE CE 2000 Solving quadratics 1550 Solving cubics and quartics 400 Existence of irrational numbers 1590 Logarithms 300 Euclidean geometry 1630 Analytic geometry 200 Conics 1675 Calculus 1700 Probability 1795 Complex numbers
Only matrices (1850) and vectors (1880) were introduced more recently. How- ever, if you think of all the developments in physics since 1800 such as black holes, the big bang theory, parallel universes, quantum then you might sus- pect that there have also been big developments in mathematics. There have, but you would be forgiven for not knowing about them because they are not promoted in the media or taught in school. I should add that like any other field of human endeavour, it is of course true that mathematical ideas go in and out of fashion, but crucially they don’t become wrong with time. 6 CHAPTER 1. THE NATURE OF MATHEMATICS 1.4 The modernity of mathematics
The fact that what’s taught in schools doesn’t seem to change much from generation to generation leads to one of the biggest misconceptions about mathematics: that it has already all been discovered. To try and bring you up to date, I am going to say a little about three mathematicians and their work: Alan Turing (1912–1954), Sir Andrew Wiles (b. 1953), and Terence Tao (b. 1975). I have chosen them to illustrate some additional points I want to make about maths.
Alan Turing
Alan Turing is the only mathematician I know who has had a West End play written about his life: the 1986 play Breaking the code by Hugh White- more. Turing is best known as one of the leading members of Bletchley Park during the Second World War, for his role in the British development of computers during and after the War, and for the ultimately tragic nature of his early death. Here I want to return to Turing the mathematician. As a graduate student, he wrote a paper in 1936 entitled On computable numbers with an application to the Entscheidungsproblem, where the long German word means decision problem and refers to a specific question in mathemat- ical logic. It was as a result of solving this problem that Turing was led to formulate a precise mathematical blueprint for a computer now called Tur- ing machines in his honour. This is the most extreme example I know of a problem in pure maths leading to new applied maths — in fact, it led to the whole field of computer science and the information age we now inhabit. Amongst computer scientists, Turing is regarded as the father of computer science. So, mathematicians invented the modern world.
Andrew Wiles
Mathematicians operate on a completely different timescale from everyone else. I have already talked about Pythagorean triples, those whole numbers (x, y, z) that satisfy the equation x2 + y2 = z2. Here’s an idle thought. What happens if we try to find whole number solutions to x3+y3 = z3 or x4+y4 = z4 or more generally xn + yn = zn where n ≥ 3. Let’s exclude the trivial case where some of the numbers x, y or z are 0. So, here is the question: for n ≥ 3 find all whole number solutions to xn + yn = zn where xyz 6= 0. Back in the 17th century, Pierre de Fermat (1601?–1665) wrote in the margin of a book, 1.4. THE MODERNITY OF MATHEMATICS 7 the Arithmetica of Diophantus, that he had found a proof that there were no such solutions but that sadly there wasn’t enough room for him to record it. This became known as Fermat’s Last Theorem. In fact, since Fermat’s supposed proof was never found, it was really a conjecture. More to the point, it is highly unlikely that he ever had a proof since in the subsequent centuries many attempts were made to prove this result, all in vain, although substantial progress was made. This problem became one of mathematics’ many Mount Everests: the peak that everyone wanted to scale. Finally, on Monday 19th September, 1994, sitting at his desk, Andrew Wiles, building on over three centuries of work, and haunted by his premature announcement of his success the previous year, had a moment of inspiration as the following quote from the Daily Telegraph dated 3rd May 1997 reveals “Suddenly, totally unexpectedly, I had this incredible revelation. It was so indescribably beautiful, it was so simple and so elegant.” As a result Fermat’s Conjecture really is a theorem, but the proof required travelling through what can only be described as mathematical hyperspace. Wiles’s reaction to his discovery is also a glimpse of the profound intellectual excitement that engages the emotions as well as the intellect when doing mathematics4.
Terence Tao
Tao won the 2006 Field’s medal. This is a mathematical honour compa- rable with a Nobel Prize though with the added twist that you have to be under 40 to get one. You can read his thoughts at his blog, as well as use it to find all manner of interesting things. So, what sorts of things does he do? Here is one example that is remarkably easy to explain though the proof is formidable. You know what primes are and, in any event, we shall talk about them later. They can be regarded as the atoms of numbers and their prop- erties have inspired hard questions and deep results. One of the things that interests mathematicians is the sorts of patterns that can be found in primes. An arithmetic progression is a sequence of numbers of the form a + dk where a and d are fixed numbers. Consider the arithmetic progression 3 + 2k. Ob- serve that for the consecutive values of k = 0, 1, 2, the numbers 3, 5, 7 which
4There is a BBC documentary directed by Simon Singh about Andrew Wiles made for the BBC’s Horizon series. It is an exemplary example of how to portray complex mathematics in an accessible way and cannot be too highly recommend. 8 CHAPTER 1. THE NATURE OF MATHEMATICS arise are all prime. But when k = 3 we get 9 which is not prime. Our little example is an instance of an arithmetic progression with 3 terms all prime. Here is one with 10 terms 199 + 210k where k = 0, 1,..., 9. In 2004, Tao and his colleague Ben Green proved that there were arithmetic progressions of arbitrary length all of whose terms are prime. In other words, for any number n there is an arithmetic progression so that the first n terms are all prime.
1.5 The legacy of the Greeks
The word ‘mathematics’ is Greek. In fact, many mathematical terms are Greek: lemma, theorem, hypotenuse, orthogonal, polygon, to name just a few. The Greek alphabet is used as a standard part of mathematical nota- tion. The very concept of a mathematical proof is a Greek idea. All of this reflects the fact that Ancient Greece is the single most important historical influence on the development and content of mathematics. By Ancient Greek mathematics, I mean the mathematics developed in the wider Greek world around the Mediterranean in the thousand or more years between roughly 600 BCE and and 600 CE. It begins with the work of semi-mythical figures, such as Thales of Miletus and Pythagoras of Samos, and is developed in the books of such mathematicians as Euclid, Archimedes, Apollonius of Perga, Diophantus and Pappus. Of all the Ancient Greek mathematicians the great- est was Archimedes. His work is sophisticated mathematics of the highest order. In particular, he developed methods that are close to those of integral calculus and used them to calculate areas and volumes of complicated curved shapes.
1.6 The legacy of the Romans
For all their aqueducts, roads, baths and maintenance of public order, it has been said of the Romans that their only contribution to mathematics was when Cicero rediscovered the grave of Archimedes and had it restored5.
5George Simmons, Calculus Gems, McGraw-Hill, Inc., New York, 1992, page 38. 1.7. WHAT THEY DIDN’T TELL YOU IN SCHOOL 9 1.7 What they didn’t tell you in school
This book is written to help you make the transition from school maths to university maths. You might well still be in school, or you might have left school fifty years ago, it doesn’t matter. Maths as taught in school and the maths taught at university are very different, but the failure to understand those differences can cause problems. To be successful in university mathe- matics you have to think in new ways. University Mathematics is not just School Mathematics with harder sums and fancier notation, it is different, fundamentally different, from what you did at school.
In much of school mathematics, you learn methods for solving spe- cific problems. Often, you just learn formulae.
A method for solving a problem that requires little thought in its appli- cation is called an algorithm. Computer programs are the supreme examples of algorithms, and it is certainly true that finding algorithms for solving spe- cific problems is an important part of mathematics, but it is by no means the only part. Problems do not come neatly labelled with the methods needed for their solution. A new problem might be solvable using old methods or it might require you to adapt those methods. On the other hand, you may have to invent completely new methods to solve it. Such new methods re- quire new ideas. In fact, what you might not have appreciated from school mathematics is the important role played in mathematics by ideas. An idea is a tool to help you think.
Mathematics at school is often taught without reasons being given for why the methods work.
This is the fundamental difference between school mathematics and uni- versity mathematics. A reason why something works is called a proof. I shall say a lot more about proofs in Chapter 2.
The Millennium Problems Mathematics is difficult but intellectually rewarding. Just how hard can be gauged by the following. The Millennium Problems is a list of seven outstanding problems posed by the Clay Institute in the year 2000. A 10 CHAPTER 1. THE NATURE OF MATHEMATICS
correct solution to any one of them carries a one million dollar prize. To date, only one has been solved, the Poincar´econjecture, by Grigori Perelman in 2010, who declined to take the prize money. The point is that no one offers a million dollars for something that is trivial. You can read more about these problems at
http://www.claymath.org/millennium-problems
1.8 Further reading and links
There is a wealth of material about mathematics available on the Web and I would encourage exploration. Here, I will point out some books and links that develop the themes of this chapter. A book that is in tune with the goals of this chapter is
P. Davis, R. Hersh, E. A. Marchisotto, The mathematical experience, Birkh¨auser, 2012.
It’s one of those books that you can dip into and you will learn something interesting but, most importantly, it will expand your understanding of what mathematics is, as it did mine. A good source book for the history of mathematics, and again something that can be dipped into, is
C. B. Boyer, U. C. Merzbach, A history of mathematics, Jossey Bass, 3rd Edition, 2011.
The books above are about maths rather than doing maths. Let me now turn to some books that do maths in a readable way. There is a plethora of popular maths books now available, and if you pick up any books by Ian Stewart — though if the book appears to be rather more about volcanoes than is seemly in a maths book, you have Iain Stewart — and Peter Higgins then you will find something interesting. Sir (William) Timothy Gowers won a Field’s Medal in 1998 and so can be assumed to know what he is talking about.
T. Gowers, Mathematics: A Very Short Introduction, Oxford University 1.8. FURTHER READING AND LINKS 11
Press, 2002
It is worth checking out his homepage for some interesting links. He also has his own blog which is worth checking out. I think the Web is serving to humanize mathematicians: their ivory towers all have wi-fi. A classic book of this type is
R. Courant, H. Robbins, What is mathematics, OUP, 1996.
This is also an introduction to university-level maths, and it has influenced my thinking on the subject. If you have never looked into Euclid’s book the Elements, then I would recommend you do6. There is an online version that you can access via David E. Joyce’s website at Clark University. A handsome printed version, edited by Dana Densmore, has been published by Green Lion Press, Santa Fe, New Mexico. Finally, let me mention the books of Martin Gardner. For a quarter of a century, he wrote a monthly column on recreational mathematics for the Scientific American which inspired amateurs and professionals alike. I would start with
M. Gardner, Hexaflexagons, probability paradoxes, and the Tower of Hanoi: Martin Gardner’s first book of mathematical puzzles and games, CUP, 2002 and follow your interests.
6Whenever I refer to Euclid, it will always be to this book. It consists of thirteen chapters, themselves called ‘books’, which are numbered in the Roman fashion I–XIII. 12 CHAPTER 1. THE NATURE OF MATHEMATICS Chapter 2
Proofs
Part of the argument sketch, Monty Python
M = Man looking for an argument A = Arguer
M: An argument isn’t just contradiction. A: It can be. M: No it can’t. An argument is a connected series of statements intended to establish a proposition. A: No it isn’t. M: Yes it is! It’s not just contradiction. A: Look, if I argue with you, I must take up a contrary position. M: Yes, but that’s not just saying ‘No it isn’t.’ A: Yes it is! M: No it isn’t! A: Yes it is! M: Argument is an intellectual process. Contradiction is just the automatic gainsaying of any statement the other person makes. (short pause) A: No it isn’t.
The most fundamental difference between school and university mathe- matics lies in proofs. At school, you were probably told mathematical facts and given recipes that solved particular kinds of problems. But the chances
13 14 CHAPTER 2. PROOFS are, you were not given any reasons to back up those facts or explanation as to why those recipes worked. University and professional mathematics is different. Reasons and explanations are essential and are called proofs. They are the essence of mathematics. Mathematical truth, and the notion of proof that supports it, is so different from what we encounter in everyday life that I shall need to begin by setting the scene.
2.1 How do we know what we think is true is true?
Human beings usually believe something first for emotional reasons, and then look for the evidence to back it up. The pitfalls of this are obvious. We shall therefore be interested in reasons that do not involve emotion. To be concrete, how would you verify the following claim: Mount Everest is between 8 and 9 km high?
The appeal to authority
In the past, claims such as this would be resolved by consulting an en- cyclopedia or atlas whereas today, of course, we would simply go online. If you do this, you will find that a height of about 8.8 km is quoted. For most purposes this would settle things. But it’s important to understand what this entails. We are, in effect, taking someone’s word for it. We assume that whoever posted this information knows what they are talking about. What we are doing, therefore, is appealing to authority. Most of what we take to be true is based on such appeals to authority: parents, teachers, politicians, religiosi etc tell us things that they claim to be true and more often than not we believe them. There’s a small element of laziness involved on our part, but it is so convenient. The pitfalls of this are also obvious.
The appeal to experiment
But where did the figure of 8.8km come from? It wasn’t just plucked from the sky. The height of Mount Everest was first measured as part of the great survey of India undertaken in the nineteenth century. This consisted of a team of expert surveyers who not only employed extremely precise instru- ments that were used to take multiple measurements but who also tried to 2.1. HOW DO WE KNOW WHAT WE THINK IS TRUE IS TRUE? 15 minimize the effect of factors influencing the accuracy of their measurements such as temperature and, amazingly, variations in gravity. Making measure- ments and taking great pains over those measurements together with esti- mations of the error bounds is such an important part of science that science itself would be impossible without it. Let’s call this the appeal to experiment.
This brings me to how we know statements are true in mathematics. The essential point is the following:
Neither of the above methods for ascertaining truth plays any role whatsoever in determining mathematical truth.
This is so important, I am going to say it again in a different way:
• Results are not true in maths because I say so or because someone important said they were true a long time ago.
• Results are not true in mathematics because I have carried out exper- iments and I always get the same answer.
• Results are not true in maths ‘just because they are’.
How then can we determine whether something in mathematics is true?
• Results are true in maths only because they have been proved to be true.
• A proof shows that a result is true.
• A proof is something that you yourself can follow and at the end you will see the truth of what has been proved.
• A result that has been proved to be true is called a theorem.
• The appeal to authority and the appeal to experiment are both fallible. The appeal to proof is never fallible. The only truths we know for certain are mathematical truths.
This is heady stuff. So what, then, is a proof? The remainder of this chapter is devoted to an introductory answer to this question. 16 CHAPTER 2. PROOFS 2.2 Three fundamental assumptions of logic
In order to understand how mathematical proofs work, there are three sim- ple, but fundamental, assumptions you have to understand.
I. Mathematics only deals in statements that are capable of being either true or false.
Mathematics does not deal in statements which are ‘sometimes true’ or ‘mostly false’. There are no approximations to the truth in mathematics and no grey areas. Either a statement is true or a statement is false, though we might not know which. This is quite different from everyday life, where we often say things which contain a grain of truth or where we say things for rhetorical reasons which we don’t entirely mean. Mathematics also doesn’t deal in statements that are neither true nor false like exclamations such as ‘Out damned spot!’ or with questions such as ‘To be or not to be?’.
II. If a statement is true then its negation is false, and if a statement is false then its negation is true.
In natural languages, negating a sentence is achieved in different ways. In English, the negation of ‘It is raining’ is ‘It is not raining’. In French, the negation of ‘Il pleut’ is obtained by wrapping the verb in ‘ne . . . pas’ to get ‘It ne pleut pas’. To avoid grammatical idiosyncracies, we can use the formal phrase ‘it is not the case that’ and place it in front of any sentence to negate it. So, ‘It is not the case that it is raining’ is the negation of ‘It is raining’. In some languages, and French is one of them, adding negatives is used for emphasis. This used to be the case in older forms of English and is often the case in informal English. In formal English, we are taught that two negatives make a positive which is actually the rule taken from mathematics above where it is true. In fact, negating negatives in natural languages is more complex than this. For example, if your partner says they are ‘not un- happy’ then this isn’t quite the same as being ‘happy’ and maybe you need to talk.
III. Mathematics is free of contradictions. 2.3. EXAMPLES OF PROOFS 17
A contradiction is where both a statement and its negation are true. This is impossible by (II) above. This assumption will play a vital role in proofs as we shall see later.
2.3 Examples of proofs
Armed with the three assumptions above, I am going to take you through five proofs of five results, three of them being major theorems. This will enable me to show you examples of proofs but will also illustrate important issues about how proofs, and mathematics, work. Although proofs can be long or short, hard or easy they all tend to follow the same script. First, there will be a statement of what is going to be proved. This usually has the form: if a bunch of things are assumed true then something else is also true. If the things assumed true are lumped together as A, for assumptions, and the thing to be proved true is labelled C, for conclusion, then a statement to be proved usually has the shape ‘if A then C’ or ‘A implies C’ or, in notation, ‘A ⇒ C’. The proof itself should be thought of as a (rational) argument between two protagonists whom we shall call Alice and Bob. We assume that Alice wants to prove C. She can use any of the assumptions A, any previously proved theorems, the rules of logic, which I shall describe as we meet them, and definitions. Bob’s role is to act like an attorney and to demand that Alice justify each claim she makes. Thus Alice cannot just make assertions without justifying them, and she is limited in the sorts of things that count as justifications. At the end of this, Alice can say something like ‘ . . . and so C is proved’ and Bob will be forced to agree.
2.3.1 Proof 1 We shall prove the following statement.
The square of an even number is even, and the square of an odd number is odd.
In fact, this is really two statements ‘If n is an even number then n2 is even’ and ‘If n is an odd number then n2 is odd.’ Before we can prove them, we 18 CHAPTER 2. PROOFS need to understand what they are actually saying. The terms odd and even are only used of whole numbers such as
0, 1, 2, 3, 4,...
These numbers are called the natural numbers and they are the first kinds of numbers we learn about as children. Thus we are being asked to prove a statement about natural numbers. The terms ‘odd’ and ‘even’ might seem obvious, but we need to be clear about how they are used in maths. By definition, a natural number n is even if it is exactly divisible by 2, otherwise it is said to be odd. In maths, we usually just say divisible rather than exactly divisible. This definition of divisibility only makes sense when talking about whole numbers. For fractions, for example, it is pointless since one fraction will always divide another fraction. Notice that 0 is an even number because 0 = 2 × 0. In other words, 0 is exactly divisible by 2. However, remember, you cannot divide by 0 but you can certainly divide into 0. You might have been told that a number is even if its last digit is one of the digits 0, 2, 4, 6, 8. In fact, this is a consequence of our definition rather than a definition itself. I shall ask you to prove this result in the exercises. I shall say no more about the definition of even. What about the definition of odd? A number is odd if it is not even. This is not a very useful definition since a number is odd if it fails to be even. We want a more positive characterization. So we shall describe a better one. If you attempt to divide a number by 2 then there are two possibilities: either it goes exactly, in which case the number is even, or it goes so many times plus a remainder of 1, in which case the number is odd. It follows that a better way of defining an odd number n is one that can be written n = 2m + 1 for some natural number m. So, the even numbers are those natural numbers that are divisible by 2, thus the numbers of the form 2n for some n, and the odd numbers are those that leave the remainder 1 when divided by 2, thus the numbers of the form 2n + 1 for some n. Every number is either odd or even but not both. There is a moral to be drawn from what I have just done, and I shall state it boldly because of its importance. It may seem obvious but experi- ence shows that it is, in fact, not.
Every time you are asked to prove a statement, you must ensure that you understand what that statement is saying. This means, in particular, checking that you understand what all the words in 2.3. EXAMPLES OF PROOFS 19 the statement mean.
The next point is that we are making a claim about all even numbers. If you pick a few even numbers at random and square them then you will find in every case that the result is even but this does not prove our claim. Even if you checked a trillion even numbers and squared them and the results were all even it wouldn’t prove the claim. Maths, remember, is not an experimental science. There are plenty of examples in maths of statements that look true and are true for umpteen cases but are in fact bunkum. This means that, in effect, we have to prove an infinite number of state- ments: 02 is even, and 22 is even, and 42 is even . . . I cannot therefore prove my claim by picking a specific even number, like 12, and checking that its square is even. This simply verifies one of the infinitely many statements above. As a result, the starting point for my proof cannot be a specific even number. It has to be a general even number. We are now in a position to prove our claims. First, we prove that the square of an even number is even.
1. Let n be an even number. This is the assumption that gets the ball rolling. Notice that n is not a specific even number. We want to prove something for all even numbers so we cannot argue with a specific one.
2. Then n = 2m for some natural number m. Here we are using the definition of what it means to be an even number.
3. Square both sides of the equation in (2) to get n2 = 4m2. To do this correctly, you need to follow the rules of high-school algebra.
4. Now rewrite this equation as n2 = 2(2m2). This uses more basic high- school algebra.
5. Since 2m2 is a natural number, it follows that n2 is even using our definition of an even number. This proves our claim.
Second, we prove that the square of an odd number is odd. I’ll provide less commentary than in the previous case.
1. Let n be an odd number.
2. By definition n = 2m + 1 for some natural number m. 20 CHAPTER 2. PROOFS
3. Square both sides of the equation in (2) to get n2 = 4m2 + 4m + 1.
4. Now rewrite the equation in (3) as n2 = 2(2m2 + 2m) + 1.
5. Since 2m2 + 2m is a natural number, it follows that n2 is odd using our definition of an odd number. This proves our claim.
We have therefore proved our two claims. I admit that they are not exciting but just bear with me.
2.3.2 Proof 2 We shall prove the following statement.
If the square of a number is even then that number is even, and if the square of a number is odd then that number is odd.
In fact, this is really two statements ‘If n2 is even then n is even’ and ‘If n2 is odd then n is odd’. At first reading, you might think that I am simply repeating what I proved above. But in Proof 1, I proved
‘if n is even then n2 is even’ whereas now I want to prove
‘if n2 is even then n is even’.
Our assumptions in each case are different and our conclusions in each case are different. It is therefore important to distinguish between A ⇒ B and B ⇒ A. The statement B ⇒ A is called the converse of the statement A ⇒ B. Experience shows that people are prone to swapping assumptions and conclusions without being aware of it. We prove the first claim.
1. Suppose that n2 is even.
2. Now it is very tempting to try and use the definition of even here, just as we did in Proof 1, and write n2 = 2m for some natural number m. But this turns out to be a dead-end. Just like playing a game such as chess, not every possible move is a good one. Choosing the right move comes with experience and sometimes just plain trial-and-error. 2.3. EXAMPLES OF PROOFS 21
3. So we make a different move. We know that n is either odd or even. Our goal is to prove that it must be even.
4. Could n be odd? The answer is no, because as we showed in Proof 1, if n is odd then, as we showed above, n2 is odd.
5. Therefore n is not odd.
6. But a number that is not odd must be even. It follows that n is even.
We use a similar strategy to prove the second claim. The proofs here were more subtle, and less direct, than in our first ex- ample and they employed the following important strategy: if there are two possibilities exactly one of which is true; we rule out one of those possibilities and so deduce that the other possibility must be true.1 Here is a concrete example. There are two politicians, Alice and Bob. One of them always lies and the other always tells the truth. Suppose you ask Bob the question: is it true that 2 + 2 = 5? If he replies ‘yes’ then you know Bob is lying. Without further ado, you can deduce that Alice is that paragon of politicians and always tells the truth. If A ⇒ B and B ⇒ A then we say that A if, and only, if B or A iff B or A ⇔ B. The use of the word iff is peculiar to mathematical English. If we combine Proofs 1 and 2, we have proved the following two statements for all natural numbers n:‘n is even if, and only if, n2 is even’ and ‘n is odd if, and only if, n2 is odd’. It is important to remember that the statement ‘A if, and only, if B’ is in fact two statements in one. It means (1) ‘A implies B’ and (2) ‘B implies A’. So, to prove the statement ‘A if and only if B’ we have to prove TWO statements: we have to prove ‘A implies B’ and we have to prove ‘B implies A’. The results of this example were trickier to prove than the previous ones, but not much more exciting. However, we have now laid the foundations for a truly remarkable result.
1This might be called the Sherlock Holmes method. “How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?” The Sign of Four, 1890. 22 CHAPTER 2. PROOFS
2.3.3 Proof 3 We shall now prove our first real theorem. √ 2 cannot be written as an exact fraction.
If you square each of the fractions in turn 3 7 17 41 , , , ,... 2 5 12 29 you will find that you get closer and closer to 2 and so each of these numbers is an approximation to the square root of 2. This raises the question: is x it possible to find a fraction y whose square is exactly 2? In fact, it isn’t but that isn’t proved just because my attempts above failed. Maybe, I just haven’t looked√ hard enough. So, I have to prove that it is impossible. To prove that 2 is not an exact fraction, I am actually going to begin by trying to show you that it is. √ x 1. Suppose that 2 = y where x and y are positive whole numbers where y 6= 0.
x 2. We may assume that y is a fraction in its lowest terms so that the only natural number that divides both x and y is 1. Keep your eye on this assumption because it will come back to sting us later.
x2 3. Square both sides of the equation in (2) to get 2 = y2 . 4. Multiply both sides of the equation in (3) by y2.
5. We therefore get the equation 2y2 = x2.
6. Since 2 divides the lefthandside of this equation, it must divide the righthandside. This means that x2 is even.
7. We now use Proof 2 to deduce that x is even.
8. We may therefore write x = 2u for some natural number u.
9. Substitute this value for x we have found in (5) to get 2y2 = 4u2.
10. Divide both sides of the equation in (9) by 2 to get y2 = 2u2. 2.3. EXAMPLES OF PROOFS 23
11. Since the righthand-side of the equation in (10) is even so is the left- handside. Thus y2 is even. 12. Since y2 is even, it follows by Proof 2, that y is even. 13. If (1) is true then we are led to the following two conclusions. From (2), we have that the only natural number to divide both x and y is 1. From (7) and (12), 2 divides both√ x and y. This is a contradiction. Thus (1) cannot be true. Hence 2 cannot be written as an exact fraction.
This result is phenomenal. It says that no matter how much money you √spend on a computer it will never be able to calculate the exact value of 2, just a very, very good approximation. We now make a very important definition. A real number√ that is not rational is called irrational. We have therefore proved that 2 is irrational.
2.3.4 Proof 4 We now prove our second real theorem.
The sum of the angles in a triangle add up to 180◦.
This is a famous result that everyone knows. You might have learnt about it at school by drawing lots of triangles and measuring their angles but as I said above, maths is not an experimental science and so this enterprize proves nothing. The proof I give is very old and occurs in Euclid’s book the Elements: specifically, Book I, Proposition 32. Draw a triangle and call its three angles α, β and γ respectively.
β
α γ
Our goal is to prove that α + β + γ = 180◦. In fact, we shall show that the three angles add up to a straight line which is the same thing. Draw a line through the point P parallel to the base of the triangle. 24 CHAPTER 2. PROOFS
P β
α γ
Then extend the two sides of the triangle that meet at the point P as shown.
0 β0 γ α0 β
α γ
As a result, we get three angles that I have called α0, β0 and γ0. I now make the following claims
• β0 = β because the angles are opposite each other in a pair of inter- secting straight line.
• α0 = α because these two angles are formed from a straight line cutting two parallel lines.
• γ0 = γ for the same reason as above.
But since α0 and β0 and γ0 add up to give a straight line, we have proved the claim. Now this is all well and good, but we have proved our result on the basis of three other results currently unproved:
1. That given a line l and a point P not on that line I may draw a line through the point P and parallel to l.
2. If two line intersect, then opposite angles are equal.
3. If a line l cuts two parallel lines l1 and l2 the angle l makes with l1 is the same as the angle it makes with l2. 2.3. EXAMPLES OF PROOFS 25
How do we know they are true? Result (2) can readily be proved. We shall use the diagram below.
β α γ δ
The proof that α = γ follows from the simple observation that α+β = β +γ. This still leaves (1) and (3). I shall say more about them later when I talk about axioms.
2.3.5 Proof 5
The most famous theorem of them all is the one attributed to Pythagoras and proved in Book I, Proposition 47 of Euclid. We begin with a right-angled triangle.
c a
b
We want to prove, of course, that
a2 + b2 = c2.
Consider the shape below. It has been constructed from four copies of our triangle and two squares of areas a2 and b2, respectively. I claim that this shape is actually a square. First, the sides all have the same length a + b. Second, the angles at the corners are right angles by Proof 4. 26 CHAPTER 2. PROOFS
a b
a a2
b b2
Now look at the following picture. This is also a square with sides a + b so it has the same area as the first square. Using Proof 4, the shape in the middle really is a square with area c2. b a
a b
c2
b a
a b
If we subtract the four copies of the original triangle from both squares, the shapes that remain must have the same areas, and we have proved the claim. 2.3. EXAMPLES OF PROOFS 27
Exercises 2.3
1. Raymond Smullyan is both a mathematician and a magician. Here are two of his puzzles. On an island there are two kinds of people: knights who always tell the truth and knaves who always lie. They are indistinguishable.
(a) You meet three such inhabitants A, B and C. You ask A whether he is a knight or knave. He replies so softly that you cannot make out what he said. You ask B what A said and they say ‘he said he is a knave’. At which point C interjects and says ‘that’s a lie!’. Was C a knight or a knave? (b) You encounter three inhabitants: A, B and C. A says ‘exactly one of us is a knave’. B says ‘exactly two of us are knaves’. C says: ‘all of us are knaves’. What type is each?
2. There are five houses, from left to right, each of which is painted a different colour, their inhabitants are called W, C, O, S and M, but not necessarily in that order, who own different pets, drink different drinks and drive different cars.
(a) There are five houses. (b) W lives in the red house. (c) C owns the dog. (d) Coffee is drunk in the green house. (e) O drinks tea. (f) The green house is immediately to the right (that is: your right) of the ivory house. (g) The Oldsmobile driver owns snails. (h) The Bentley owner lives in the yellow house. (i) Milk is drunk in the middle house. (j) S lives in the first house. 28 CHAPTER 2. PROOFS
(k) The person who drives the Chevy lives in the house next to the man with the fox. (l) The Bentley owner lives in a house next to the house where the horse is kept. (m) The Lotus owner drinks orange juice. (n) M drives the Porsche. (o) S lives next to the blue house.
There are two questions: who drinks water and who owns the aardvark?
3. Prove that the sum of any two even numbers is even, that the sum of any two odd numbers is even, and that the sum of an odd and an even number is odd.
4. Prove that the sum of the interior angles in any quadrilateral is equal to 360◦.
5.
(a) A rectangular box has side of length 2, 3 and 7 units. What is the length of the longest diagonal? (b) I draw a square. Without measuring any lengths, you now have construct a square that has exactly twice the area. (c) A right-angled triangle has sides with lengths x, y and hypotenuse z2 z. Prove that if the area of the triangle is 4 then the triangle is isosceles.
6.
(a) Prove that the last digit in the square of a positive whole number must be one of 0,1,4,5,6, or 9. Is the converse true? (b) Prove that a natural number is even if, and only if, its last digit is even. (c) Prove that a natural number is exactly divisible by 9 if, and only if, the sum of its digits is divisible by 9. √ 7. Prove that 3 cannot be written as an exact fraction. 2.3. EXAMPLES OF PROOFS 29
8. *The goal of this question is to prove Ptolomy’s theorem2. This deals with cyclic quadrilaterals, that is those quadrilaterals whose vertices lie on a circle. With reference to the diagram below,
C c B b y
D x d a
A
this theorem states that
xy = ac + bd.
Hint. Show that on the line BD there is a point X such that the angle XADˆ is equal to the angle BACˆ . Deduce that the triangles AXD and ABC are similar, and that the triangles AXB and ACD are similar. Let the distance between D and X be e. Show that
e c y − e b = and that = . a x d x
From this, the result follows by simple algebra. To help you show that the triangles are similar, you will need to use Proposition III.21 from Euclid which is illustrated by the following diagram
2Claudius Ptolomeus was a Greek mathematician and astronomer who flourished around 150 CE in the city of Alexandria. 30 CHAPTER 2. PROOFS
9. *The goal of this question is to find all Pythagorean triples. That is natural numbers (a, b, c) such that a2 + b2 = c2. We shall do this using geometry by finding all the rational points on the unit circle. We shall use the diagram below.
P
A
We have drawn a unit circle centre the origin. From the point (−1, 0), called A, we draw a line to any other point P on the circle.
(a) Show that any line passing through the point A has the equation y = t(x + 1) where t is any real number. (b) Show that this line intersects the circle at some point P on the circle, different from A, when
1 − t2 2t (x, y) = , . 1 + t2 1 + t2
(c) Deduce that the rational points on the circle correspond to the values of t which are rational. 2.4. AXIOMS 31
p (d) Put t = q , in its lowest terms. Deduce that all Pythagorean triples are obtained as the following (r(q2 − p2), 2pqr, r(p2 + q2)) where p, q, r are any integers. 10. *Take any positive natural number n; so n = 1, 2, 3,... If n is even, n divide it by 2 to get 2 ; if n is odd, multiply it by 3 and add 1 to obtain 3n+1. Now repeat this process and stop only if you get 1. For example, if n = 6 you get 6, 3, 10, 5, 16, 8, 4, 2, 1. What happens if n = 11? What about n = 27? Prove that no matter what number you start with, you will always eventually reach 1.
2.4 Axioms
At this point, I need to confront some potential problems with the idea of proof I have been developing. Once this is done, I will then be able to complete the proof of Proof 4. Suppose I am trying to prove the statement S. Then I am done if I can find a theorem S1 so that S1 ⇒ S. But this raises the question of how I know that S1 is a theorem. This can only be because I can find a theorem S2 such that S2 ⇒ S1. There are now three possibilities:
1. At some point I find a theorem Sn such that S ⇒ Sn. This is clearly a bad thing. In trying to prove S I have in fact used S and so haven’t proved anything at all. This is an example of circular reasoning and has to be avoided. I can do this by organizing what I know in a hierarchy — so to prove a result, I am only allowed to use those theorems already proved. In this way, I can avoid going around in circles. 2. Assuming I have avoided the above pitall, the next nasty possibility is that I get an infinite sequence of implications:
... ⇒ Sn ⇒ Sn−1 ⇒ ... ⇒ S1 ⇒ S. I never actually know that S is a theorem because it is always proved in terms of something else without end. This is also clearly a bad thing. I establish relative truth, a statement is true if another is true, but not absolute truth. I clearly don’t want this to happen. But if not, then I am led inexorably to the third possibility. 32 CHAPTER 2. PROOFS
3. To prove S, I only have to prove only a finite number of implications
Sn ⇒ Sn−1 ⇒ ... ⇒ S1 ⇒ S.
But, if Sn is supposed to be a theorem then how do I know it is true if not in terms of something else, contradicting the assumption that this was supposed to be a complete argument? I shall now delve into case (3) above in more detail, since resolving it will lead to an important insight. Maths is supposed to be about proving theo- rems but the analysis above has led us to the uncomfortable possibility that some things have to be accepted as true ‘because they are’ which contradicts what I went to great trouble to rubbish earlier. Before I explain the way out of this conundrum, let me first consider an example from an apparently completely different enterprize: playing a game. To be concrete, let’s take the game of chess. Most people have learnt chess at some point even if, like me, you are not very good at it. This game consists of a board and some pieces. The pieces are of different types — kings, queens, knights, bishops, castles, pawns — each of which can be moved in different ways. To play chess means to accept the rules of chess and to move the pieces in accordance with the rules. Whether one player wins or there is a draw is also described by the rules of chess. It’s meaningless to ask whether the rules of chess are true. But a move in chess is valid, which is another way of saying true, if it is made according to those rules. This example provides a way of understanding how maths works. Maths should be viewed as a collection of different mathematical domains each described by its own ‘rules of the game’ which in maths are termed axioms. These axioms are the basic assumptions on which the theory is built and are the building blocks of all proofs within that mathematical domain. Our goal is to prove interesting theorems from those axioms. As an example, consider Euclidean geometry. The Greeks attributed the discovery of geometry to the Ancient Egyptians who needed it in recalculat- ing land boundaries for the purposes of tax assessment after the yearly flood of the Nile. Thus geometry probably first existed as a collection of geomet- rical methods that worked: the tax was calculated, the pyramids built and everyone was happy. But it was the Ancient Greeks themselves who elevated it into a mathematical science and a model of what could be achieved in mathematics. Euclid’s book the Elements codified what was known about geometry into a handful of axioms and then showed that all of geometry 2.4. AXIOMS 33 could be deduced from those axioms by the use of mathematcial proof. The Elements is not only the single most important mathematics book ever writ- ten but one of the most important books — fullstop. Here is a list of the key axioms.
1. Two distinct points determine a unique straight line.
2. A line segment can be extended infinitely in either direction.
3. Circles can be drawn with any centre and any radius.
4. Any two right angles are equal to each other.
5. Suppose that a straight line cuts two lines l1 and l2. If the interior ◦ angles on the same side add up to strictly less than 180 , then if l1 and l2 are extended on that side they will eventually meet.
The last axiom needs a picture to illustrate what is going on.
l1
l2
In principle, all of the results you learnt in school about triangles and cir- cles can be proved from these axioms. I say ‘in principle’ since there were a few bugs which were later fixed by a number of mathematicians most no- tably David Hilbert. But this shouldn’t detract from what an enormous achievement Euclid’s book was and is. We may now finish off Proof 4: claim 34 CHAPTER 2. PROOFS
(1) is proved in Book I, Proposition 31, and claim (3) is proved in Book I, Proposition 29. One way of teaching maths at university would therefore be to start with a list of axioms and start proving things. But this approach has a num- ber of disadvantages: it is time-consuming, laborious, sometimes, even, a bit tedious, and takes a very, very long time to reach the really interesting theorems. Therefore, in this book, I shall usually base each topic on quite high-level axioms so that we can get to the interesting theorems quickly, but I shall also give pointers to readers who want to see the full axiomatic development.
Exercises 2.4
1.* Hofstadter’s MU-puzzle.A string is just an ordered sequence of sym- bols. In this puzzle, you will construct strings using the letters M,I,U. You are given the string MI which is your only axiom. You can make new strings only by using the following rules any number of times in succession in any order:
(I) If you have a string that ends in I then you can add a U on at the end. (II) If you have a string Mx where x is a string then you may form Mxx. (III) If III occurs in a string then you may make a new string with III replaced by U. (IV) If UU occurs in a string then you may erase it.
I shall write x → y to mean that y is the string obtained from the string x by applying one of the above four rules. Here are some examples:
• By rule (I), MI → MIU. • By rule (II), MIU → MIUIU. • By rule (III), UMIIIMU → UMUMU. • By rule (IV), MUUUII → MUII.
The question is: can you make MU? 2.5. MATHEMATICS AND THE REAL WORLD 35 2.5 Mathematics and the real world
Euclidean geometry appears to be about the real world. In fact, for thou- sands of years this was what mathematicians believed until they discovered other geometries with different properties. On the surface of a sphere, for example, the sum of the angles in a spherical triangle will actually be bigger than 180◦, the exact amount being determined by the area of the triangle. This result played an important role in surveying. But our analysis above leads us to the following conclusion:
Mathematics is about logically consistent mathematical universes.
A mathematical truth is therefore something proved in one of those math- ematical universes, and is not a truth about ‘out there’. Despite this, math- ematical truths do help us to understand the actual physical universe we inhabit. For example, does the geometry of the universe follow the rules of Euclidean geometry? Here is what NASA says on the basis of the Wilkinson Microwave Anisotropy Probe (WMAP): “WMAP also confirms the predictions that the amplitude of the variations in the density of the universe on big scales should be slightly larger than smaller scales, and that the universe should obey the rules of Euclidean geometry so the sum of the interior angles of a triangle add to 180 degrees.”
http://map.gsfc.nasa.gov/news/index.html
2.6 Proving something false
‘Proving a statement true’ and ‘proving a statement false’ sound similar but it turns out that ‘proving a statement false’ requires a lot less work than ‘proving a statement true’. There is an asymmetry between them. To prove a statement false all you need do is find a counterexample. Here is an example. Consider the following statement: every odd number bigger than 1 is a prime. This is false. The reason is that 9 is odd, bigger than 1, and not prime. Thus 9 is a counterexample. The number 9 here can be regarded as a witness that shows the claim to be false. To prove a statement true, you have to work hard. To prove a statement false, you only have to find one 36 CHAPTER 2. PROOFS counterexample and you are done. (Though in research mathematics finding a counterexample can be a Herculean task).
2.7 Key points
• One of the goals of this book is to introduce you to proofs. This does not mean that you will afterwards be able to do proofs. That takes time and practice.
• Initially, you should aim to understand proofs. This means seeing why a proof is true. A good test of whether you really understand a proof is whether you can explain it to someone else. It is much easier to check that a proof is correct then it is to invent the proof in the first place. Nevertheless, be warned, it can also take a long time just to understand a proof.
• I shall ask you to find some proofs for yourself. But do not expect to find them in a few minutes. Constructing proofs takes time, trial and error and, yes, luck.
• If you don’t understand the words used in a statement that you are asked to prove then you are not going to be able to prove that state- ment. Definitions are vitally important in mathematics.
• Every statement that you make in a proof must be justified: if it is a definition, say that it is a definition; if it is a result known to be true, that is a theorem, say that it is known to be true; if it is one of the assumptions, say that it is one of the assumptions; if it is an axiom, say that it is an axiom.
• When starting out, it is probably best to write each statement of a proof on a separate line followed by its justification.
Finally, there are one or two pieces of terminology and notation that are worth mentioning here. The conclusion of a proof is marked using the symbol 2. This replaces the older use of QED. If we believe something might be true but there isn’t yet a proof we say that it is a conjecture. The things we can prove fall, roughly, into the following categories: a theorem is a major result, worthy of note; a proposition is a result, and a lemma is an auxiliary result, 2.8. MATHEMATICAL CREATIVITY 37 a tool, useful in many different places; a corollary is a result we can deduce with little or no effort from a proposition or theorem.
2.8 Mathematical creativity
Everything I have said above is true, but does need to be placed in perspec- tive. Where do proofs come from? More to the point, where do theorems come from? Music is a useful analogy. You can learn how to write music down, but that doesn’t make you a musician. In fact, there are some talented musicians who cannot even read music. Proofs keep us honest and ground what we are doing, but what makes maths fun is that it is creative, and for creativity there are no rules. For example, in dreaming up a theorem, experimentation may well play a role. Sometimes a theorem may evolve in tandem with a proof, at other times the theorem, or more accurately, the conjecture comes first and then there is the struggle to prove it, which may take place over many generations and centuries.
2.9 Set theory: the language of mathematics
Everyday English is good at everyday jobs, but can be hopelessly impre- cise where accuracy is important. To get around this, special varieties of English, little dialects, have been constructed for particular purposes. In mathematics, we use precise versions of everyday language augmented with special symbols. Part of this special language is that of set theory, invented by Georg Cantor (1845–1918) in the last quarter of the nineteenth century. This section is mainly a phrasebook of the most important terms we shall need for most of this book. I shall develop this language further when I need to when studying combinatorics. The starting point of set theory are the following two deceptively simple definitions:
• A set is a collection of objects which we wish to regard as a whole. The members of a set are called its elements3.
• Two sets are equal precisely when they have the same elements.
3Strictly speaking this definition is nonsense. Why? 38 CHAPTER 2. PROOFS
We often use capital letters to name sets: such as A, B, or C or fancy capital letters such as N and Z. The elements of a set are usually denoted by lower case letters. If x is an element of the set A then we write
x ∈ A and if x is not an element of the set A then we write
x∈ / A.
A set should be regarded as a bag of elements, and so the order of the elements within the set is not important. In addition, repetition of elements is ignored.4
Examples 2.9.1.
1. The following sets are all equal: {a, b}, {b, a}, {a, a, b}, {a, a, a, a, b, b, b, a} because the order of the elements within a set is not important and any repetitions are ignored. Despite this it is usual to write sets without repetitions to avoid confusion. We have that a ∈ {a, b} and b ∈ {a, b} but α∈ / {a, b}.
2. The set {} is empty and is called the empty set. It is given a special symbol ∅, which is taken from Danish and is the first letter of the Danish word meaning ‘empty’. Remember that ∅ means the same thing as {}. Take careful note that ∅= 6 {∅}. The reason is that the empty set contains no elements whereas the set {∅} contains one element. By the way, the symbol for the emptyset is different from the Greek letter phi: φ or Φ.
The number of elements in a set is called its cardinality. If X is a set then |X| denotes its cardinality. A set is finite if it only has a finite number of elements, otherwise it is infinite. If a set has only finitely many elements then we might be able to list them if there aren’t too many: this is done by putting them in ‘curly brackets’ { and }. We can sometimes define infinite sets by using curly brackets but then, because we can’t list all elements in an infinite set, we use ‘...’ to mean ‘and so on in the obvious way’. This can also be used to define finite sets where there is an obvious pattern. Often,
4If you want to take account of repetitions you have to use multisets. 2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS 39 we describe a set by saying what properties an element must have to belong to the set. Thus {x: P (x)} means ‘the set of all things x which satisfy the condition P ’. Here are some examples of sets defined in various ways.
Examples 2.9.2.
1. D = { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sun- day }, the set of the days of the week. This is a small finite set and so we can conveniently list its elements.
2. M = { January, February, March, . . . , November, December }, the set of the months of the year. This is a finite set but I didn’t want to write down all the elements so I wrote ‘. . . ’ to indicate that there were other elements of the set which I was too lazy to write down explicitly but which are, nevertheless, there.
3. A = {x: x is a prime number}. I define a set by describing the proper- ties that the elements of the set must have. Here P (x) is the statement ‘x is a prime number’ and those natural numbers x are admitted mem- bership to the set when they are indeed prime.
In this book, the following sets of numbers will play a special role. We shall use this notation throughout and so it is worthwhile getting used to it.
Examples 2.9.3.
1. The set N = {0, 1, 2, 3,...} of all natural numbers.
2. The set Z = {..., −3, −2, −1, 0, 1, 2, 3,...} of all integers. The reason Z is used to designate this set is because ‘Z’ is the first letter of the word ‘Zahl’, the German for number.
3. The set Q of all rational numbers i.e. those numbers that can be written as fractions whether positive or negative.
4. The set R of all real numbers i.e. all numbers which can be represented by decimals with potentially infinitely many digits after the decimal point. 40 CHAPTER 2. PROOFS
5. The set C of all complex numbers, which I shall introduce from scratch later on.
Given a set A, a new set B can be formed by choosing elements from A to put in B. We say that B is a subset of A, which is written B ⊆ A. In mathematics, the word ‘choose’ also includes the possibilty of choosing nothing and the possibility of choosing everything. In addition, there doesn’t have to be any rhyme or reason to your choices: you can pick elements ‘at random’ if you want. If B ⊆ A and A 6= B then we say that B is a proper subset of A.
Examples 2.9.4.
1. ∅ ⊆ A for every set A, where we choose no elements from A. It is a very common mistake to forget the empty set when listing subsets of a set.
2. A ⊆ A for every set A, where we choose all the elements from A. It is a very common mistake to forget the set itself when listing subsets of a set.
3. N ⊆ Z ⊆ Q ⊆ R ⊆ C.
4. E, the set of even natural numbers, is a subset of N.
5. O, the set of odd natural numbers, is a subset of N.
6. P = {2, 3, 5, 7, 11, 13, 17, 19, 23,...}, the set of primes, is a subset of N.
7. A = {x: x ∈ R and x2 = 4} which is just the set {−2, 2}. There is a particular kind of subset that will be convenient to define now. If A and B are sets we define the set A \ B to consist of those elements of A that are not in B. Thus, in particular, A \ B ⊆ A. The operation is called relative complement. For example, N \ E = O. The set R \ Q is precisely the set of irrational numbers. When set theory is first encountered it doesn’t look very impressive. What could you possibly do with these very simple, if not simple-minded, defini- tions? In fact, all of mathematics can be developed using set theory. I am going to finish off this section with a first glimpse at the power of set theory. 2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS 41
Consider the set {a, b}. I have explained above that order doesn’t matter and so this is the same set as {b, a}. But there are many occasions where we do want order to matter. For example, in the Olympics it is important to know who came first and second in the 100m sprint not merely that the first two over the finishing line were X and Y in alphabetical order. So we need a new notion where order does matter. It is called an ordered pair and is written (a, b), where a is called the first component and b is called the second component5. The key feature of this new object is that (a, b) = (c, d) if, and only if, a = c and b = d. So, order matters. For example, the or- dered pair (1, 2) is different from the ordered pair (2, 1). Furthermore, (1, 1) does not mean the same as 1 on its own. The idea of an ordered pair is a familiar one from co-ordinate geometry. We use ordered pairs of real num- bers (x, y) to specifiy points in the plane. At first blush, set theory seems inadequate to define ordered pairs. But in fact it can. I have put the details in a box and you don’t need to read them to understand the rest of the book.
Ordered Pairs I am going to show you how sets, which don’t encode order directly, can nevertheless be used to define ordered pairs. It is an idea due to Kuratowski (1896–1980). Define
(a, b) = {{a}, {a, b}}.
We have to prove, using only this definition, that we have (a, b) = (c, d) if, and only if, a = c and b = d. The proof is essentially an exercise in special cases. I shall prove the hard direction. Suppose that
{{a}, {a, b}} = {{c}, {c, d}}.
Since {a} is an element of the lefthand side it must be an element of the righthand side. So {a} ∈ {{c}, {c, d}}. There are now two possibilities. Either {a} = {c} or {a} = {c, d}. The first case gives us that a = c, and the second case gives us that a = c = d. Since {a, b} is an element of the
5This notation should not be confused with the notation for real intervals where (a, b) denotes the set {r : r ∈ R and a < r < b}, nor with the use of brackets in clarifying the meaning of algebraic expressions. The context should make clear what is intended. 42 CHAPTER 2. PROOFS
lefthand side it must be an element of the righthand side. So {a, b} ∈ {{c}, {c, d}}. There are again two possibilities. Either {a, b} = {c} or {a, b} = {c, d}. The first case gives us that a = b = c, and the second case gives us that (a = c and b = d) or (a = d and b = c). We therefore have the following possibilities:
• a = b = c. But then {{a}, {a, b}} = {{a}}. It follows that c = d and so a = b = c = d and, in particular, a = c and b = d.
• a = c and b = d.
• In all remaining cases, a = b = c = d and so, in particular, a = c and b = d.
We can now build sets of ordered pairs. Let A and B be sets. Define A × B, the product of A and B, to be the set
A × B = {(a, b): a ∈ A and b ∈ B}.
Example 2.9.5. Let A = {1, 2, 3} and let B = {a, b}. Then
A × B = {(1, a), (1, b), (2, a), (2, b), (3, a), (3, b)} and B × A = {(a, 1), (1, b), (a, 2), (b, 2), (a, 3), (b, 3)}. So, in particular, A × B 6= B × A, in general.
If A = B it is natural to abbreviate A × A as A2. This now agrees with the notation R2 which is the set of all ordered pairs of real numbers and, geometrically, can be regarded as the real plane. We have defined ordered pairs but there is no reason to stop with just pairs. We may also define ordered triples. This can be done by defining
(x, y, z) = ((x, y), z).
The key property of ordered triples is that if (a, b, c) = (d, e, f) then a = d, b = e and c = f. Given three sets A, B and C we may define their product A × B × C to be the set of all ordered triples (a, b, c) where a ∈ A, b ∈ B and c ∈ C. A good example of an ordered triple in everyday life is a date that consist of a day, a month and a year. Thus the 16th June 1904 is really 2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS 43 an ordered triple (16, June, 1904) where we specify day, month and year in that order. If A = B = C then we write A3 rather than A × A × A. Thus the set R3 consists of all Cartesian co-ordinates (x, y, z). In general, we may define ordered n-tuples, which look like this (x1, . . . , xn), and products of n- n sets A1 × ... × An. And if A1 = ... = An then we write A for their n-fold product.
Russell’s Paradox There is more to sets than meets the eye. I shall now describe a famous result in the history of mathematics called Russell’s Paradox. Define the following R = {x: x∈ / x}, in other words: the set of all sets that do not contain themselves as an element. For example, ∅ ∈ R. We now ask the question: is R ∈ R? Before resolving this question, let’s back off a bit and ask what it means for X ∈ R. From the entry requirements, we would have to show that X ∈/ X . Putting X = R we deduce that R ∈ R is true only if R ∈/ R. Since this is an evident contradiction, we are inclined to deduce that R ∈/ R. However, if R ∈/ R then in fact R satisfies the entry requirements to be an element of R and so R ∈ R. Thus exactly one of R ∈ R and R ∈/ R must be true but assuming one is true implies the other is true. We therefore have an honest-to-goodness contradiction. Our only way out is to conclude that, whatever R might be, it is not a set. But this in turn contradicts my definition of a set as a collection of objects since R is a collection of objects. If you want to understand how to escape this predicament, you will have to study set theory. Disconcerting as this might be to you, imagine how much more so it was to the mathematician Gottlob Frege (1848–1925). He was working on a book which based the development of maths on sets when he received a letter from Russell describing this paradox and undermining what Frege was attempting to achieve. Bertrand Russell himself was an Anglo-Welsh philosopher born in 1872, when Queen Victoria still had another thirty years on the throne as ‘Queen empress’, and who died in 1970 a few months after Neil Arm- strong stepped onto the moon. As a young man he made important contributions to the foundations of mathematics but in the course of his 44 CHAPTER 2. PROOFS
extraordinary life he found time to stand for parliament, encouraged the philosopher Ludwig Wittgenstein, received two prison sentences, won the Nobel prize for literature, was the first president of CND, and cam- paigned against the Vietnam war. See Russell: a very short introduction by A. C. Grayling published by OUP, 2002, for a very short introduction.
I shall conclude this section by touching on a fundamental notion of math- ematics: that of a function. I shall approach it by first defining something more general. Let A and B be any sets. By definition a subset X ⊆ A × B is called a relation from A to B. To motivate this definition, and new terminology, I shall consider an example.
Example 2.9.6. Let A be the set {A(dam), B(eth), C(ate), D(ave)} of peo- ple. Let B be the set {a(apples), b(ananas), o(ranges)} of fruit. Define X to be the following set of ordered pairs
{(A, a), (A, o), (B, b), (D, a), (D, b), (D, o)} which tells us who likes which fruit. Thus, for example, Adam likes apples and oranges (but not bananas) and Cate doesn’t like any of the fruit on offer. It is pretty irresistible to represent this information by means of a directed graph, such as the one below. Clearly, such graphs can be drawn to represent any relation. The term ‘relation’ is now explained by the fact that X tells us how the elements of A are related to the elements of B. In this case, the relation is ‘likes to eat’.
A a B o C b D
Let X be a relation from A to B. We say that X is a function if it satisfies two additional conditions: first, for each a ∈ A there is at least one b ∈ B such that (a, b) ∈ X; second, if (a, b), (a, c) ∈ X then b = c. If we think back to the graph in our example above, then the first condition says that every 2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS 45 element in A is at the base of an arrow, and the second condition says that for each element in A is never at the base of two, or more, arrows. Slightly different notation is used when dealing with functions. Rather than thinking of ordered pairs, we think instead of inputs and outputs. Thus a function from A to B is determined when for each a ∈ A there is associated exactly one element b ∈ B. We think of a as the input and b as the corresponding, uniquely determined, output. If we denote our function by f then we write b = f(a) or that a 7→ b. Thus the corresponding relation is the set of all ordered pairs (a, f(a)) where a ∈ A We call the set A the domain of the function and the set B the codomain of the function. We write f : A → B or f A → B.
Example 2.9.7. Here is an example of a function f : A → B. Let A be the set of all students in the lecture theatre at this time. Let B be the set of natural numbers. Then f is defined when for each student a ∈ A we associate their age f(a). We can see why this is precisely a function and not merely a relation. First, everyone has an age and, assuming they don’t lie, they have exactly one age. On the other hand, if we kept A as it is and let B be the set of nationalities then we will no longer have a function in general. Some people might be stateless, but even if we include that as a possibility in the set B, we still won’t necessarily have a function since some people own more than one passport.
Exercises 2.9
1. Let A = {♣, ♦, ♥, ♠}, B = {♠, ♦, ♣, ♥} and C = {♠, ♦, ♣, ♥, ♣, ♦, ♥, ♠}. Is it true or false that A = B and B = C? Explain.
2. Let X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Write down the following subsets of X:
(a) The subset A of even elements of X. (b) The subset B of odd elements of X. (c) C = {x: x ∈ X and x ≥ 6}. (d) D = {x: x ∈ X and x > 10}. (e) E = {x: x ∈ X and x is prime}. 46 CHAPTER 2. PROOFS
(f) F = {x: x ∈ X and (x ≤ 4 or x ≥ 7)}.
3. (a) Find all subsets of {a, b}. How many are there? Write down also the number of subsets with respectively 0, 1 and 2 elements. (b) Find all subsets of {a, b, c}. How many are there? Write down also the number of subsets with respectively 0, 1, 2 and 3 elements. (c) Find all subsets of the set {a, b, c, d}. How many are there? Write down also the number of subsets with respectively 0, 1, 2, 3 and 4 elements. (d) What patterns do you notice arising from these calculations?
4. If the set A has m elements and the set B has n elements how many elements does the set A × B have?
5. If A has m elements, how many elements does the set An have?
6. Prove that that two sets A and B are equal if, and only if, A ⊆ B and B ⊆ A.
2.10 Proof by induction
This is a method of proof that, although useful, does not always deliver much insight into why something is true. The basis of this method is the following:
Let X be a subset of N that satisfies the following two conditions: first, 0 ∈ X, and second if n ∈ X then n + 1 ∈ X. Then X = N. This fact is called the induction principle, and can be viewed as one of the basic axioms describing the natural numbers. We may use it as a proof technique in the following way. Suppose we have an infinite number of statements S0,S1,S2,... which we want to prove. By the induction principle, it is enough to do two things:
1. Show that S0 is true.
2. Show that if Sn is true then Sn+1 is also true.
It will then follow that Si is true for all positive i. Proofs by induction have the following script: 2.10. PROOF BY INDUCTION 47
Base step Show that the case n = 0 holds.
Induction hypothesis (IH) Assume that the case where n = k holds.
Proof bit Now use (IH) to show that the case where n = k + 1 holds.
Conclude that the result holds for all n by the induction principle.
Example 2.10.1. Prove by induction that n3 + 2n is exactly divisible by 3 for all natural numbers n ≥ 0. Base step: when n = 0, we have that 03 + 2 · 0 = 0 which is exactly divisible by 3. Induction hypothesis: assume result is true for n = k. We prove it for n = k + 1. We need to prove that (k + 1)3 + 2(k + 1) is exactly divisible by 3 assuming only that k3 + 2k is exactly divisible by 3. We first expand (k + 1)3 + 2(k + 1) to get
k3 + 3k2 + 3k + 1 + 2k + 2.
This is equal to (k3 + 2k) + 3(k2 + k + 1) which is exactly divisible by 3 using the induction hypothesis.
In practice, some simple variants of this principle are used. Rather than the whole set N, we often work with a set of the form
≥k N = N \{0, 1, . . . , k − 1} where k ≥ 1. Our induction principle is modified accordingly: a subset X of N≥k that contains k and contains n + 1 whenever it contains n must be equal to the whole of N≥k. In our script above, the base step involves checking the case where n = k. What I described above I shall call basic induction. There is also some- thing called the strong induction principle which runs as follows:
Let X be a subset of N that satisfies the following two conditions: first, 0 ∈ X and second, if {0, 1 . . . , n} ⊆ X, where n ≥ 1, then {0, 1 . . . , n + 1} ⊆ X. Then X = N. 48 CHAPTER 2. PROOFS
Finally, there is the well-ordering principle that states that every non- empty subset of the natural numbers has a smallest element. Induction, strong induction and well-ordering look very different from each other. In fact, they are equivalent and all useful in proving theorems.
Proposition 2.10.2. The following are equivalent.
1. The induction principle.
2. The strong induction principle.
3. The well-ordering principle.
Proof. (1)⇒(2). I shall assume that the induction principle holds and prove that the strong induction principle holds. Let X ⊆ N be such that 0 ∈ X and and if {0, 1 . . . , n} ⊆ X, where n ≥ 1, then {0, 1 . . . , n + 1} ⊆ X. We shall use induction to prove that X = N. Let Y ⊆ N consist of all natural numbers n such that {0, 1, . . . , n} ⊆ X. We have that 0 ∈ Y and we have that n + 1 ∈ Y whenever n ∈ Y . By induction, we deduce that Y = N. It follows that X = N. (2)⇒(3). I shall assume that the strong induction principle holds and prove that the well-ordering principle holds. Let X ⊆ N be a subset that has no smallest element. I shall prove that X must be empty. Put Y = N \ X.I claim that 0 ∈ Y . If not, then 0 ∈ X and that would obviously have to be the smallest element, which is a contradiction. Suppose that {0, 1, . . . , n} ⊆ Y . Then we must have that n + 1 ∈ X because otherwise n + 1 would be the smalest element of X. We now invoke strong induction to deduce that Y = N and so X = ∅. (3)⇒(1). I shall assume the well-ordering principle and prove the induc- tion principle. Let X ⊆ N be a subset such that 0 ∈ X and whenever n ∈ X then n + 1 ∈ X. Suppose that N \ X is non-empty. Then it would have a smallest element k say. But then k − 1 ∈ X and so, by assumption, k ∈ X, which is a contradiction. Thus N \ X is empty and so X = N. Strong induction will be used in a few places in this book but I will discuss it in more detail when needed.
Exercises 2.10 2.10. PROOF BY INDUCTION 49
1. Prove that for each natural number n ≥ 3, we have that
n2 > 2n + 1.
2. Prove that for each natural number n ≥ 5, we have that
2n > n2.
3. Prove that for each natural number n ≥ 1, the number 4n +2 is divisible by 3.
4. Prove that n(n + 1) 1 + 2 + 3 + ... + n = . 2 5. Prove that 2 + 4 + 6 + ... + 2n = n(n + 1).
6. Prove that
n(n + 1)2 13 + 23 + 33 + ... + n3 = . 2
7. Prove that a set with n ≥ 0 elements has exactly 2n subsets. 50 CHAPTER 2. PROOFS Chapter 3
High-school algebra revisited
In this chapter, I will review some of the basic constructions from high-school algebra from the perspective of this book.
3.1 The rules of the game
3.1.1 The axioms Algebra deals with the manipulation of symbols. This means that symbols are altered and combined according to certain rules. In high-school, the alge- bra you studied was mainly based on the properties of the real numbers. This means that when you write x you mean an unknown or yet-to-be-determined real number. In this section, I shall describe the rules, or axioms, that you use for doing algebra with real numbers. The primary operations we are interested in are addition x + y and multiplication x × y. As usual, I shall abbreviate the operation of multiplication by concatenation, which simply means we write xy. Sometimes, it is helpful to denote multiplication as follows x · y. Of course, there are two other familiar operations: subtrac- tion and division. We shall see that these should be treated in a different way: subtraction as the inverse of addition, and division as the inverse of multiplication. Both addition and multiplication require two inputs and then deliver one output with the inputs and outputs all being taken from the same set. They are therefore examples of what are called binary operations and are the commonest kinds of operations in algebra. For example, as we shall see
51 52 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED later, matrix addition and matrix multiplication are both binary operations, the vector product of two vectors is a binary operation, and the intersection and union of two sets are both binary operations. A binary operation on a set X is nothing other than a function from X × X to X. I shall use ∗ to mean any binary operation defined on some specified set X. We usually write binary operations between the inputs rather than using the usual functional notation. a a ∗ b b ∗
The two most important properties a binary operation may have is com- mutativity and associativity. A binary operation is commutative if a ∗ b = b ∗ a in all cases. That is, the order in which you carry out the operation is not important. Addition and multiplication of real, and as we shall see later, complex numbers are commutative. But we shall also meet binary operations that are not commutative: both matrix multiplication and vector products are examples. Commutativity is therefore not automatic. A binary operation is associative if (a ∗ b) ∗ c = a ∗ (b ∗ c) in all cases. Remember that the brackets tell you how to work out the product. Thus (a ∗ b) ∗ c means first work out a ∗ b, let’s call it d, and then work out d ∗ c. Almost all the binary operations we shall meet in this book are associative, the one important exception being the vector product. In order to show that a binary operation ∗ is associative, we have to check that all possible products (a ∗ b) ∗ c and a ∗ (b ∗ c) are equal. To show that a binary operation is not associative, we simply have to find specific values for a, b and c so that (a ∗ b) ∗ c 6= (a ∗ b) ∗ c. Here are examples of both of these possibilities.
Example 3.1.1. Let’s take the set or real numbers R and investigate a new binary operation denoted by ◦ that is defined as follows a ◦ b = a + b + ab. 3.1. THE RULES OF THE GAME 53
We shall prove that it is associative. First, we have to understand what it is we have to show. From the definition of associativity, we have to prove that (a ◦ b) ◦ c = a ◦ (b ◦ c) for all real numbers a, b and c. To do this, we calculate first the lefthand side and then the righthand side and then verify they are equal. Because we are trying to prove a result true for all real numbers, we cannot choose specific values of a, b and c. We first calculate (a ◦ b) ◦ c. Using the axioms for real numbers, we get that (a ◦ b) ◦ c = (a + b + ab) ◦ c = (a + b + ab) + c + (a + b + ab)c which is equal to a + b + c + ab + ac + bc + abc. Now we calculate a ◦ (b ◦ c). We get that a ◦ (b ◦ c) = a ◦ (b + c + bc) = a + (b + c + bc) + a(b + c + bc) which is equal to a + b + c + ab + ac + bc + abc. We now see that we get the same answers however we bracket the product and so we have proved that the binary operation ◦ is associative.
Example 3.1.2. Let’s take the set N and define the binary operation ⊕ as follows a ⊕ b = a2 + b2. I shall show that this binary operation is not associative. Let’s calculate first (1 ⊕ 2) ⊕ 3. By definition this is computed as follows (1 ⊕ 2) ⊕ 3 = (12 + 22) ⊕ 3 = 5 ⊕ 3 = 52 + 32 = 25 + 9 = 34. Now we calculate 1 ⊕ (2 ⊕ 3) as follows 1 ⊕ (2 ⊕ 3) = 1 ⊕ (22 + 32) = 1 ⊕ (4 + 9) = 1 ⊕ 13 = 12 + 132 = 1 + 169 = 170. Therefore (1 ⊕ 2) ⊕ 3 6= 1 ⊕ (2 ⊕ 3). It follows that the binary operation ⊕ is not associative. We are now ready to state the algebraic axioms that form the basis of high-school algebra. We shall split them up into three groups: those dealing only with addition, those dealing only with multiplication, and finally those that deal with both operations together. 54 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
Axioms for addition
(F1) Addition is associative. Let x, y and z be any real numbers. Then (x + y) + z = x + (y + z).
(F2) There is an additive identity. The number 0 (zero) is the additive identity. This means that for an real number x we have that x + 0 = x = 0 + x. Thus adding zero to a number leaves it unchanged.
(F3) Each element has a unique additive inverse. This means that for each number x there is another number, written −x, with the property that x+(−x) = 0 = (−x)+x. The number −x is called the additive inverse of the number x.
(F4) Addition is commutative. Let x and y be any real numbers. Then x + y = y + x. The word commutative means that the order in which you add the numbers does not matter.
The first thing to understand is that none of these axioms should be surprising. They should all agree with your intuition.
Axioms for multiplication
(F5) Multiplication is associative. Let x, y and z be any real numbers. Then (xy)z = x(yz).
(F6) There is a multiplicative identity. The number 1 is the multiplicative identity. This means that for any real number x we have that 1x = x = x1.
(F7) Each non-zero number has a unique multiplicative inverse. Let x 6= 0. Then there is a unique real number written x−1 with the property that x−1x = 1 = xx−1. The number x−1 is called the multiplicative inverse 1 of x. It is, of course, the number x . It is very important to observe that zero does not have a multiplicative inverse.
(F8) Multiplication is commutative. Let x and y be any real numbers. Then xy = yx. Once again the word commutative means that the order in which you carry out the operations doesn’t matter. In this case, the operation is multiplication. 3.1. THE RULES OF THE GAME 55
The axioms for multiplication are very similar to those for addition. The only real difference between them is axiom (F7). This expresses the fact that you cannot divide by zero.
Linking axioms
(F9)0 6= 1.
(F10) The additive identity is a multiplicative zero. This means that 0x = 0 = x0. If you multiply any real number by 0 then you get 0.
(F11) Multiplication distributes over addition on the left and the right. There are actually two distributive laws: the left distributive law
x(y + z) = xy + xz
and the right distributive law
(y + z)x = yx + zx.
Let me come back to the omission of subtraction and division. These are not viewed as binary operations in their own right. Instead, we define a − b to mean a + (−b). Thus to subtract b means the same thing as adding −b. Likewise, we define a ÷ b, when b 6= 0 to mean a × b−1. Thus to divide by b is to multiply by b−1. We have missed out one further ingredient in algebra, and that is the properties of equality.
Properties of equality
(E1) If a = b then c + a = c + b.
(E2) If a = b then ca = cb.
Example 3.1.3. When I talked about algebra in Chapter 1, I mentioned that the usual way of solving a linear equation in one unknown depended on the properties of real numbers. Let me now show you how we use the above 56 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED axioms to solve ax+b = 0 where a 6= 0. Throughout, I use without comment the two properties of equality I have listed above.
ax + b = 0 (ax + b) + (−b) = 0 + (−b) by (F3) ax + (b + (−b)) = 0 + (−b) by (F1) ax + 0 = 0 + (−b) by (F3) ax = 0 + (−b) by (F2) ax = −b by (F2) a−1(ax) = a−1(−b) by (F10) since a 6= 0 (a−1a)x = a−1(−b) by (F5) 1x = a−1(−b) by (F10) x = a−1(−b) by (F5)
I don’t propose that you go into quite such gory detail when solving equations, but I wanted to show you what actually lay behind the rules that you might have been taught at school. Example 3.1.4. We can use our axioms to prove that −1×−1 = 1 something which is hard to understand in any other way. By definition, −1 is the additive inverse of 1. This means that 1 + (−1) = 0. Let us calculate (−1)(−1) − 1. We have that
(−1)(−1) − 1 = (−1)(−1) + (−1) by definition of subtraction = (−1)(−1) + (−1)1 since 1 is the multiplicative identity = (−1)[(−1) + 1] by the left distributivity law = (−1)0 by properties of additive inverses = 0 by properties of zero
Hence (−1)(−1) = 1. In other words, the result follows from the usual rules of algebra. My final example explains the reason for the prohibition about dividing by zero. Example 3.1.5. The following fallacious ‘proof’ shows that 1 = 2. 1. Let a = b. 3.1. THE RULES OF THE GAME 57
2. Then a2 = ab when we multiply both sides by a. 3. Now add a2 to both sides to get 2a2 = a2 + ab. 4. Subtract 2ab from both sides to get 2a2 − 2ab = a2 + ab − 2ab. 5. Thus 2(a2 − ab) = a2 − ab. 6. We deduce that 2 = 1 by cancelling. The source of the problem is in passing from line (5) to line (6). We are in fact dividing by zero and this is the source of the problem.
3.1.2 Indices We usually write a2 rather than aa, and a3 instead of aaa. In this section, r r s I want to review the meaning of algebraic expressions such as a where s is any rational number. Our starting point is a result that I would encourage you to assume as an axiom at a first reading. I have included the proof to show you a more sophisticated example of proof by induction. Lemma 3.1.6 (Generalized associativity). Let ∗ be any binary operation defined on a set X. If ∗ is associative then however you bracket a product such as x1 ∗ ... ∗ xn you will always get the same answer.
Proof. If x1, x2, ··· , xn are elements of the set X then one particular brack- eting will play an important role in our proof
x1 ∗ (x2 ∗ (··· (xn−1 ∗ xn) ··· )) which we write as [x1x2 . . . xn]. The proof is by strong induction on the length n of the product in ques- tion. The base case is where n = 3 and is just an application of the associative law. Assume that n ≥ 4 and that for all k < n, all bracketings of a sequence of k elements of X lead to the same answer. This is therefore the induc- tion hypothesis for strong induction. Let X denote any properly bracketed expression obtained by inserting brackets into the sequence x1, x2, ··· , xn. Observe that the computation of such a bracketed product involves comput- ing n − 1 products. This is because at each step we can only compute the 58 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
product of adjacent letters xi ∗ xi+1. Thus at each step of our calculation we reduce the number of letters by one until there is only one letter left. However the expression may be bracketed, the final step in the computation will be of the form Y ∗ Z, where Y and Z will each have arisen from properly bracketed expressions. In the case of Y it will involve a bracketing of some sequence x1, x2, . . . , xr, and for Z the sequence xr+1, xr+2, . . . xn for some r such that 1 ≤ r ≤ n − 1. Since Y involves a product of length r < n, we may assume by the induction hypothesis that Y = [x1x2 . . . xr]. Observe that [x1x2 . . . xr] = x1 ∗ [x2 . . . xr]. Hence by associativity,
X = Y ∗ Z = (x1 ∗ [x2 . . . xr]) ∗ Z = x1 ∗ ([x2 . . . xr] ∗ Z).
But [x2 . . . xr] ∗ Z is a properly bracketed expression of length n − 1 in x2, ··· , xn and so using the induction hypothesis must equal [x2x3 . . . xn]. It follows that X = [x1x2 . . . xn]. We have therefore shown that all possible bracketings yield the same result in the presence of associativity. We illustrate a special case of the above proof in the example below.
Example 3.1.7. Take n = 5. Then the notation [x1x2x3x4x5] introduced in the above proof means x1 ∗ (x2 ∗ (x3 ∗ (x4 ∗ x5))). Consider the product ((x1 ∗ x2) ∗ x3) ∗ (x4 ∗ x5). Here we have Y = (x1 ∗ x2) ∗ x3 and Z = x4 ∗ x5. By associativity Y = x1 ∗ (x2 ∗ x3). Thus Y ∗ Z = (x1 ∗ (x2 ∗ x3)) ∗ (x4 ∗ x5). But this is equal to x1 ∗ ((x2 ∗ x3) ∗ (x4 ∗ x5)) again by associativity. By the induction hypothesis (x2 ∗ x3) ∗ (x4 ∗ x5) = x2 ∗ (x3 ∗ (x4 ∗ x5)), and so
((x1 ∗ x2) ∗ x3) ∗ (x4 ∗ x5) = x1 ∗ (x2 ∗ (x3 ∗ (x4 ∗ x5))), as required. If a binary operation is associative then the above lemma tells us that computing products of elements is straightforward because we never have to worry about how to evaluate it as long as we maintain the order of the elements. We now consider a special case of this result. Let a be any real number. Define the nth power an of a, where n is a natural number, as follows: a1 = a and an = aan−1 for any n ≥ 2. Generalized associativity tells us that an can in fact be calculated in any way we like because we shall always obtain the same answer. The following result should be familiar. I shall ask you to prove it in the exercises. Lemma 3.1.8 (Laws of exponents). Let m, n ≥ 1 be any natural numbers. 3.1. THE RULES OF THE GAME 59
1. am+n = aman. 2. (am)n = amn. It follows from the above lemma that powers of the same element a com- mute with one another: aman = anam as both products equal am+n. Our goal now is to define what am means when m is an arbitrary rational number. We shall be guided by the requirement that the above laws of exponents should continue to hold. We may extend the laws of exponents to allow m or n to be 0. The only way to do this is to define a0 = 1, where 1 is the identity and a 6= 0.
An extreme case! What about 00? This is a can of worms. For this book, it is probably best to define 00 = 1.
We have explained what an means when n is positive but what can we say when the exponent is negative? In other words, what does a−n mean? We assume that the rules above still apply. Thus whatever a−n means we should −n n 0 −n 1 have that a a = a = 1. It follows that a = an . With this interpretation we have defined an for all integer values of x. 1 We now investigate what a n should mean. If the law of exponents are to 1 1 √ n 1 n continue holidng, then (a n ) = a = a. It follows that a n = a. r We may now calculate a s it is equal to
r √ s r a s = ( a) . How do we calculate (ab)n? This is just ab times itself n times. But the order in which we multiply a’s and b’s doesn’t matter and so we can arrange all the a’s to the front. Thus (ab)n = anbn. We also have similar results for addition. We define 2x = x + x and nx = x + ... + x where the x occurs n times. We have 1x = x and 0x = 0. Let {a1, . . . , an} be a set of n elements. If we write them all in some order ai1 , . . . , ain then we have what is called a permutation of the elements. The following lemma can be treated as an axiom and the proof omitted until later. Lemma 3.1.9 (Generalized commutativity). Let ∗ be an associative and commutative binary operation on a set X. Let a1, . . . , an be any n elements of X. Then
a1 ∗ ... ∗ an = ai1 ∗ ... ∗ ain . 60 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
Proof. First prove by induction the result that
a1 ∗ ... ∗ an ∗ b = b ∗ a1 ∗ ... ∗ an.
Let a1, . . . , an, an+1 be n+1 elements. Consider the product ai1 ∗...∗ain ∗ain+1 .
Suppose that an+1 = air . Then
ai1 ∗ ... ∗ air ∗ ... ∗ ain ∗ ain+1 = (ai1 ∗ ... ∗ ain ) ∗ an+1 where the expression in the backets is a product of some permutation of the elements a1, . . . , an. We have used here our result above. But by the induction hypothesis, we may write ai1 ∗ ... ∗ ain = a1 ∗ ... ∗ an.
3.1.3 Sigma notation At this point, it is appropriate to introduce some useful notation. Let a1, a2, . . . , an be n numbers. Their sum is a1 + a2 + ... + an and because of generalized associativity we don’t have to worry about brackets. We now abbreviate this as n X ai. i=1 Where P is Greek ‘S’ and stands for Sum. The letter i is called a subscript. The equality i = 1 tells us that we start the value of i at 1. The equality i = n tells us that we end the value of i at n. Although I have started the sum at 1, I could, in other circumstances, have started at 0, or any other appropriate number. This notation is very useful and can be manipulated using the rules above. If 1 < s < n, then we can write
n s n X X X ai = ai + ai. i=1 i=1 s+1
If b is any number then
n ! n X X b ai = bai i=1 i=1 is the generalized distributivity law that you are asked to prove in the exer- cises. These uses of sigma-notation shouldn’t cause any problems. 3.1. THE RULES OF THE GAME 61
The most complicated use of P-notation arises when we have to sum up what is called an array of numbers aij where 1 ≤ i ≤ m and 1 ≤ j ≤ n. This arises in matrix theory, for example. For concreteness, I shall give the example where m = 3 and n = 4. We can therefore think of the numbers aij as being arranged in a 3 × 4 array as follows:
a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34
Observe that the first subscript tells you the row and the second subscript tells you the column. Thus a23 is the number in the second row and the third column. Now we can add these numbers up in two different ways getting the same answer in both cases. The first way is to add the numbers up along the rows. So, we calculate the following sums
4 4 4 X X X a1j, a2j, a3j. j=1 j=1 j=1
We then add up these three numbers
4 4 4 3 4 ! X X X X X a1j + a2j + a3j = aij . j=1 j=1 j=1 i=1 j=1
The second way is to add the numbers up along the columns. So, we calculate the following sums
3 3 3 3 X X X X ai1, ai2, ai3, ai4. i=1 i=1 i=1 i=1 We then add up these four numbers
n n n n 4 3 ! X X X X X X ai1 + ai2 + ai3 + ai4 = aij . i=1 i=1 i=1 i=1 j=1 i=1
The fact that 3 4 ! 4 3 ! X X X X aij = aij i=1 j=1 j=1 i=1 62 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED is a consequence of the generalized commutativity law that you are asked to prove in the exercises. We therefore have in general that
m n ! n m ! X X X X aij = aij . i=1 j=1 j=1 i=1
3.1.4 Infinite sums What I have defined so far are finite sums and form part of algebra. There are also infinite sums ∞ X ai i=1 which form part of analysis, the subject that provides the foundations for calculus. There is one place where we use infinite sums in everyday life, and 1 that is in the decimal representations of numbers. Thus the fraction 3 can be written as 0 · 3333 ... and this is in fact an infinite sum: it means the infinite sum ∞ X 3 . 10i i=1 But in general infinite sums are problematic. For example, consider the infinite sum ∞ X S = (−1)i+1. i=1 So, this is just S = 1 − 1 + 1 − 1 + ... What is S? You’re first instinct might be to say 0 because
S = (1 − 1) + (1 − 1) + ...
But it could equally well be 1 calculated as follows
S = 1 + (−1 + 1) + (−1 + 1) + ...
1 1 In fact, it could even be 2 since S + S = 1 and so S = 2 . There is clearly something seriously awry here, and it is that infinite sums have to be handled very carefully if they are to make sense. Just how is the business of analysis 3.1. THE RULES OF THE GAME 63 and won’t be an issue in this book.
Warning! ∞ is not a number. It simply tells us to keep adding on terms for increasing values of i without end so we never write 3 . 10∞
Exercises 3.1
1. Prove the following identities using the axioms introduced.
(a)( a + b)2 = a2 + 2ab + b2. (b)( a + b)3 = a3 + 3a2b + 3ab2 + b3 (c) a2 − b2 = (a + b)(a − b) (d)( a2 + b2)(c2 + d2) = (ac − bd)2 + (ad + bc)2
2. Calculate the following.
(a)2 3. 1 (b)2 3 . (c)2 −4. − 3 (d)2 2 .
3. Assume that aij are assigned the following values
a11 = 1 a12 = 2 a13 = 3 a14 = 4 a21 = 5 a22 = 6 a23 = 7 a24 = 8 a31 = 9 a32 = 10 a33 = 11 a34 = 12
Calculate the following sums. P3 (a) i=1 ai2. P4 (b) j=1 a3j.
P3 P4 2 (c) i=1 j=1 aij . 64 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
4. Let a, b, c ∈ R. If ab = ac is it true that b = c? Explain. 5. Laws of exponents.
(a) Prove by induction that am+n = aman. To do this, fix m and then prove the result by induction on n. Deduce that it holds for all m. (b) Prove by induction that (am)n = amn. To do this, fix m and then prove the result by induction on n. Deduce that it holds for all m.
6. Prove by induction that the left generalized distributivity law holds
a(b1 + b2 + b3 + ... + bn) = ab1 + ab2 + ab3 + ... + abn,
for any n ≥ 2.
3.2 Solving quadratic equations
The previous section might have given the impression that algebraic calcu- lations are routine. In fact, once you pass beyond linear equations, they usually require good ideas. The first place where a good idea is needed is in solving quadratic equations. Quadratic equations were solved by the Baby- lonians and the Egyptians and are dealt with in all school algebra courses. I have included them here because I want to show you that you don’t have to remember a formula to solve such equations; what you have to remember is a method. Let’s recall some definitions. An expression of the form
ax2 + bx + c where a, b, c are numbers and a 6= 0 is called a quadratic polynomial or a polynomial of degree 2. The numbers a, b, c are called the coefficients of the quadratic. A quadratic where a = 1 is said to be monic. A number r such that ar2 + br + c = 0 is called a root of the polynomial. The problem of finding all the roots of a quadratic is called solving the quadratic. Usually this problem is stated in the form: ‘solve the quadratic equation ax2 + bx + c = 0’. Equation because 3.2. SOLVING QUADRATIC EQUATIONS 65 we have set the polynomial equal to zero. I shall now show you how to solve a quadratic equation without having to remember a formula. Observe first that if ax2 + bx + c = 0 then
b c x2 + x + = 0. a a
Thus it is enough to find the roots of monic quadratics. We shall solve this 2 b equation by trying to do the following: write x + a x as a perfect square plus a number. This will turn out to be the crux of solving the quadratic. We shall illustrate our construction by using some diagrams. First, we represent 2 b geometrically the expression x + a x.
x
x
b a
Now cut the red rectangle into two pieces along the dotted line and rearrange them as shown below. 66 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
b x 2a
x
b 2a
It is now geometrically obvious that if we add in the small dotted square, we get a new bigger square. This explain why the procedure is called completing the square. We now express in algebraic terms what these diagrams suggest. b b b2 b2 b 2 b2 x2 + x = x2 + x + − = x + − . a a 4a2 4a2 2a 4a2 We therefore have that b b 2 b2 x2 + x = x + − . a 2a 4a2 Look carefully at what we have done here: we have rewritten the lefthand side as a perfect square — the first term on the righthandside — plus a number — the second term on the righthandside. It follows that b c b 2 b2 c b 2 4ac − b2 x2 + x + = x + − + = x + + . a a 2a 4a2 a 2a 4a2 Setting the last expression equal to zero and rearranging, we get b 2 b2 − 4ac x + = . 2a 4a2 Now take square roots of both sides, remembering that a non-zero number has two square roots: r b b2 − 4ac x + = ± 2a 4a2 3.2. SOLVING QUADRATIC EQUATIONS 67 which of course simplifies to √ b b2 − 4ac x + = ± . 2a 2a Thus √ −b ± b2 − 4ac x = 2a the usual formula for finding the roots of a quadratic. Example 3.2.1. Solve the quadratic equation 2x2 − 5x + 1 = 0. by completing the square. Divide through by 2 to make the quadratic monic giving 5 1 x2 − x + = 0. 2 2 We now want to write 5 x2 − x 2 as a perfect square plus a number. We get 5 52 25 x2 − x = x − − . 2 4 16 Thus our quadratic becomes 52 25 1 x − − + = 0. 4 16 2 Rearranging and taking roots gives us √ √ 5 17 5 ± 17 x = ± = . 4 4 4 We now check our answer by substituting each of our two roots back into the original quadratic and ensuring that we get zero in both cases. For the quadratic equation ax2 + bx + c = 0 the number D = b2 − 4ac, called the discriminant of the quadratic, plays an important role. 68 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
• If D > 0 then the quadratic equation has two distinct real solutions.
• If D = 0 then the quadratic equation has one real root repeated. In b 2 this case, the quadratic is the perfect square x + 2a .
• If D < 0 then we shall see that the quadratic equation has two complex roots which are complex conjugate to each other. This is called the irreducible case.
If we put y = ax2 + bx + c then we may draw the graph of this equation. The roots of the original quadratic therefore correspond to the points where this graph crosses the x-axis. The diagrams below illustrate the three cases that can arise.
D > 0
D = 0 3.2. SOLVING QUADRATIC EQUATIONS 69
D < 0
Exercises 3.2 1. Calculate the discriminants of the following quadratics and so deter- mine whether they have two distinct roots, or repeated roots, or no real roots. (a) x2 + 6x + 5. (b) x2 − 4x + 4. (c) x2 − 2x + 5. 2. Solve the following quadratic equations by completing the square. Check your answers.
(a) x2 + 10x + 16 = 0. (b) x2 + 4x + 2 = 0. (c)2 x2 − x − 7 = 0.
3. I am thinking of two numbers x and y. I tell you their sum a and their product b. What are x and y in terms of a and b?
2 4. Let p(x) = x + bx + c be a monic quadratic with roots x1 and x2. Express the discriminant of p(x) in terms of x1 and x2.
5. This question is an interpretation of part√ of Book X of Euclid. We shall be interested in numbers√ of the form a + b where a and b are rational and b > 0 where b is irrational1. 1Remember that irrational means not rational. 70 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED √ √ √ (a) If a = b + c where c is irrational Then b = 0. √ √ √ √ (b) If a + b = c + d where √a and √c are rational and b and d are irrational then a = c and b = d. √ √ √ (c) Prove that the square roots of a + b have the form ±( x + y).
3.3 *Order
In addition to algebraic operations, the real numbers are also ordered: we can always say of two real numbers whether they are equal or whether one of them is bigger than the other. I shall write down first the axioms for order that hold both for rational and complex numbers. The following notation is important. If a ≤ b and a 6= b then we write a < b and say that a is strictly less than b.
Axioms for order
(O1) For every element a ≤ a.
(O2) If a ≤ b and b ≤ a then a = b.
(O3) If a ≤ b and b ≤ c then a ≤ c.
(O4) Given any two elements a and b then either a ≤ b or b ≤ a or a = b.
If a > 0 the we say that it is positive and if a < 0 we say it is negative.
(O5) If a ≤ b and c ≤ d then a + b ≤ b + d.
(O6) If a ≤ b and c is positive then ac ≤ bc.
The only axiom that you really have to watch is (O6). Here is an example of a proof using these axioms.
Example 3.3.1. We prove that a ≤ b if, and only if, b − a is positive. Since this statement involves an ‘if, and only, if’ there are, as usual,two statements to be proved. Suppose first that a ≤ b. By axiom (O5), we may add −a to both sides to get a+(−a) ≤ b+(−a). But a+(−a) = 0 and b+(−a) = b−a, by definition. It follows that 0 ≤ b − a and so b − a is positive. Now we prove the converse. Suppose that b − a is positive. Then by definition 0 ≤ b − a. 3.4. *THE REAL NUMBERS 71
Also by definition, b − a = b + (−a). Thus 0 ≤ b + (−a). By axiom (O5), we may add a to both sides to get 0 + a ≤ (b + (−a)) + a. But 0 + a = a and (b + (−a)) + a quickly simplifies to b. We have therefore proved that a ≤ b, as required.
Exercises 3.3
1. Prove that between any two distinct rational numbers there is another rational number.
2. Prove the following using the axioms.
(a) If a ≤ b then −b ≤ −a. (b) a2 is positive for all a 6= 0. (c) If 0 < a < b then 0 < b−1 < a−1.
3.4 *The real numbers
The axioms I have introduced so far apply equally well to both the rational numbers Q and the real numbers R. But we have seen that√ although Q ⊆ R the two sets are not equal because we have proved that 2 ∈/ Q. In fact, we shall see later that there are many more irrational numbers than there are rational numbers. In this section, I shall explain the fundamental difference between rationals and reals. This material will not be needed in the rest of this book instead its rˆoleis to connect with the foundations of calculus, that is, with analysis. It is convenient to write K to mean either Q or R in what follows because I want to make the same definitions for both sets. A non-empty subset A ⊆ K is said to be bounded above if there is some number b ∈ K so that for all a ∈ A we have that a ≤ b. For example, the set A = {2n : n ≥ 0} is not bounded above since its elements getter bigger and bigger without limit. On 1 n the other hand, the set B = { 2 : n ≥ 0} is bounded above, for example by 1. A non-empty subset A as above is said to have a least upper bound if you can find a number a ∈ K with the following two properties: first of all, a but be an upper bound for A and second of all if b is any upper bound for 72 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
A then a ≤ b. We shall now apply these definitions to a result we obtained earlier. Let 2 A = {a: a ∈ Q and a ≤ 2} and let 2 B = {a: a ∈ R and a ≤ 2}. 1 Then A ⊆ Q and B ⊆ R. Both sets are bounded above: the number 1 2 , for example, works in both case. However, I shall prove that the subset A does not have a least upper bound, whereas the subset B does. Let’s consider subset A first. Suppose that r were a least upper bound. 2 I claim that√r would have to equal 2 which is impossible because we have proved that 2 is irrational. 2 Suppose first that r < 2. Then I claim there is a rational number r1 such 2 that r < r1 and r1 < 2. Choose any rational number h such that 0 < h < 1 and 2 − r2 h < . 2r + 1 2 Put r1 = r + h. By construction r1 > r. We calculate r1 as follows
2 2 2 2 2 2 2 r1 = r + 2rh + h = r + (2r + h)h < r + (2r + 1)h = r + 2 − r = 2.
2 Thus r1 < 2 as claimed. But this contradicts the fact that r is an upper bound of the set A. Suppose now that 2 < r2. Then I claim that I can find a rational number 2 r2−2 r1 such that r1 < r and 2 < r1. Put h = 2r and define r1 = r − h. Clearly, 2 0 < r1 < r. We calculate r2 as follows
2 2 2 2 2 2 2 2 r1 = r − 2rh + h = r − (r − 2) + h > r − (r − 2) = 2.
But this contradicts the fact that r is supposed to be a least upper bound. We√ have therefore proved that if r is a least upper bound√ of A then r = 2. But this is impossible because we have proved that 2 is irrational. Thus the set A does not have a least upper bound in the rationals. However, by essentially the same reasoning√ the set B does have a least upper bound in the reals: the number 2. This motivates the following definition. It is this axiom that is needed to develop calculus properly. 3.4. *THE REAL NUMBERS 73
The completeness axiom for R Every non-empty subset of the reals that is bounded above has a least upper bound.
The Peano Axioms Set theory is supposed to be a framework in which all of mathematics can take place. Let me briefly sketch out how we can construct the real numbers using set theory. The starting point are the Peano axioms studied by G. Peano (1858–1932). These deal with a set P and an operation on this set called the successor function which for each n ∈ P produces a unique element n+. The following four axioms should hold:
(P1) There is a distinguished element of P that we denote by 0.
(P2) There is no element n ∈ P such that n+ = 0.
(P3) If m, n ∈ P and m+ = n+ then m = n,
(P4) If X ⊆ P is such that 0 ∈ X and if n ∈ X then n+ ∈ X then X = P .
By using ideas from set theory, one shows that P is essentially the set of natural numbers together with its operations of addition and multi- plication. The natural numbers are deficient in that it is not always possible to solve equations of the form a + x = b because of the lack of negative numbers. However, we can use set theory to construct Z from N by using ordered pairs. The idea is to regard (a, b) as meaning a − b. However, there are many names for the same negative number so we should have (0, 1) and (2, 3) and (3, 4) all signifying the same number: namely, −1. To make this work, one uses another idea from set theory, that of equivalence relations which we shall meet later. This gives rise to the set Z. Again using ideas from set theory, the usual operations can be constructed on Z. But the integers are deficient because we cannot always solve equa- tions of the form ax + b = 0 because of the lack of rational numbers. To construct them we use ordered pairs again. This time (a, b), where 74 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
a b 6= 0, is interpreted as b . But again we have the problem of multiple names for what should be the same number. Thus (1, 2) should equal (−1, −2) should equal (2, 4) and so forth. Once again this problem is solved by using an equivalence relation, and once again, the set which arises, which is denoted by Q, is endowed with the usual operations. As√ we have seen, the rationals are deficient in not containing numbers like 2. The intuitive idea behind the construction of the reals from the rationals is that we want to construct R as all the numbers that can be approximated arbitrarily by rational numbers. To do this, we form the set of all subsets X of Q which have the following characteristics: X 6= ∅, X 6= Q, if x ∈ X and y ≤ x then y ∈ X, and X doesn’t have a biggest element. These subsets are called Dedekind cuts and should be regarded as defining the real number r so that X consists of all the rational numbers less than r. Chapter 4
Number theory
Number theory is one of the oldest branches of mathematics and deals, mainly, with the properties of the integers, the simplest kinds of numbers. It is a vast subject, and so this chapter can only be an introduction. The main result proved is that every natural number greater than one can be written as a product of powers of primes, a result known as the fundamental theorem of arithmetic. This shows that the primes are the building blocks, or atoms, from which all natural numbers are constructed. The primes are still the subject of intensive research and the source of many unanswered questions. It is ironic that the numbers we learn about first as children are the source of some of mathematics’ most difficult and interesting questions. The tool that enables this chapter to work is the remainder theorem so that is where we shall start.
4.1 The remainder theorem
We begin by stating a basic result that you may assume as an axiom but which I shall also set as a proof in one of the exercises.
Lemma 4.1.1 (Remainder Theorem). Let a and b be integers where b > 0. Then there are unique integers q and r such that
a = bq + r where 0 ≤ r < b.
75 76 CHAPTER 4. NUMBER THEORY
The number q is called the quotient and the number r is called the re- mainder. For example, if we consider the pair of natural numbers 14 and 3 then 14 = 3 · 4 + 2 where 4 is the quotient and 2 is the remainder. Your first reaction to this result is that it is obvious and you might conclude from this that it is there- fore uninteresting. But this would be wrong. It is certainly not hard to understand but despite that it is important. The reason is that whenever we have a question that involves divisibility, it is very likely going to require the use of this result.
Example 4.1.2. From the remainder theorem, we know that every natural number n can be written as n = 10q + r where 0 ≤ r ≤ 9. The integer r is nothing other than the units digit in the usual base 10 representation of n. Thus, for example, 42 = 10 × 4 + 2. Similarly, it is the remainder theorem that tells us that odd numbers are precisely those that leave remainder 1 when divided by 2.
Let a and b be integers where a 6= 0. We say that a divides b or that b is divisible by a if there is a q such that b = aq. In other words, there is no remainder. We also say that a is a divisor or factor of b. We write a | b to mean the same thing as ‘a divides b’. It is very important to remember that a a | b does not mean the same thing as b . The latter is a number, the former is a statement about two numbers. As a very simple example of the remainder theorem, we shall look at how we write numbers down. I don’t think our hunter-gatherer ancestors worried too much about writ- ing numbers down because there wasn’t any need: they didn’t have to fill in tax-returns and so didn’t need accountants. However, organizing cities does need accountants and so ways had to be found of writing numbers down. The simplest way of doing this is to use a mark like |, called a tally, for each thing being counted. So |||||||||| means 10 things. This system has advantages and disadvantages. The ad- vantage is that you don’t have to go on a training course to learn it. The disadvantage is that even quite small numbers need a lot of space like
|||||||||||||||||||||||||||||||||||||| 4.1. THE REMAINDER THEOREM 77
It’s also hard to tell whether
||||||||||||||||||||||||||||||||||||||| is the same number or not. (It’s not.) It’s inevitable that people will in- troduce abbreviations to make the system easier to use. Perhaps it was in this way that the next development occurred. Both the ancient Egyptians and Romans used similar systems but I’ll describe the Roman system because it involves letters rather than pictures. First, you have a list of basic symbols:
number 1 5 10 50 100 500 1000 symbol I V X L C D M
There are more symbols for bigger numbers. Numbers are then written according to the additive principle. Thus MMVIIII is 2009. Incidently, I understand that the custom of also using a subtractive principle so that, for example, IX means 9 rather than using VIIII, is a more modern innovation. This system is clearly a great improvement on the tally-system. Even quite big numbers are written compactly and it is easy to compare numbers. On the other hand, there is more to learn. The other disadvantage is that we need separate symbols for different powers of 10 and their multiples by 5. This was probably not too inconvenient in the ancient world where it is likely that the numbers needed on a day-to-day basis were never going to be that big. A common criticism of this system is that it is hard to do multiplication in. However, that turns out to be a non-problem because, like us, the Romans used pocket calculators or, more accurately, a device called an abacus that could easily be carried under a toga. The real evidence for the usefulness of this system of writing numbers is that it survived for hundreds and hundreds of years. The system used throughout the world today is quite different and is called the positional number system. It seems to have been in place by the ninth century in India but it was hundreds of years in development and the result of ideas from many different cultures: the invention of zero on its own is one of the great steps in human intellectual development. The genius of the system is that it requires only 10 symbols
0, 1, 2, 3, 4, 5, 6, 7, 8, 9 78 CHAPTER 4. NUMBER THEORY and every natural number can be written using a sequence of these symbols. The trick to making the system work is that we use the position on the page of a symbol to tell us what number it means. Thus 2009 means
103 102 101 100 2 0 0 9
In other words 2 × 103 + 0 × 102 + 0 × 101 + 9 × 100. Notice the important rˆoleplayed by the symbol 0 which makes it clear to which column a symbol belongs otherwise we couldn’t tell 29 from 209 from 2009. The disadvantage of this system is that you do have to go on a course to learn it because it is a highly sophisticated way of writing numbers. On the other hand, it has the enormous advantage that any number can be written down in a compact way. Once the basic system had been accepted it could be adapted to deal not only with positive whole numbers but also negative whole numbers, using the symbol −, and also fractions with the introduction of the decimal point. By the end of the sixteenth century, the full decimal system was in place. In the UK, we use a raised decimal point like 0 · 123 and not a comma. Also we generally write the number 1 without a long hook at the top. If you do write it like that there is a danger that people will confuse it with the number 7 which is not always written in the UK with a line through it. We shall now look in more detail at the way in which numbers can be written down using a positional notation. In order not to be biased, we shall not just work in base 10 but show how any base can be used. Our main tool is the remainder theorem. Let’s see how to represent numbers in base b where b ≥ 2. If d ≤ 10 then we represent numbers by sequences of symbols taken from the set
Zd = {0, 1, 2, 3, . . . d − 1} but if d > 10 then we need new symbols for 10, 11, 12 and so forth. It’s convenient to use A,B,C, .... For example, if we want to write numbers in base 12 we use the set of symbols {0, 1,..., 9, A, B} 4.1. THE REMAINDER THEOREM 79 whereas if we work in base 16 we use the set of symbols
{0, 1,..., 9, A, B, C, D, E, F }.
If x is a sequence of symbols then we write xd to make it clear that we are to interpret this sequence as a number in base d. Thus BAD16 is a number in base 16. The symbols in a sequence xd, reading from right to left, tell us the con- tribution each power of d such as d0, d1, d2, etc makes to the number the sequence represents. Here are some examples.
Examples 4.1.3. Converting from base d to base 10.
1. 11A912 is a number in base 12. This represents the following number in base 10: 1 × 123 + 1 × 122 + A × 121 + 9 × 120, which is just the number
123 + 122 + 10 × 12 + 9 = 2001.
2. BAD16 represents a number in base 16. This represents the following number in base 10:
B × 162 + A × 161 + D × 160,
which is just the number
11 × 162 + 10 × 16 + 13 = 2989.
3. 55567 represents a number in base 7. This represents the following number in base 10:
5 × 73 + 5 × 72 + 5 × 71 + 6 × 70 = 2001.
These examples show how easy it is to convert from base d to base 10.
There are two ways to convert from base 10 to base d. 80 CHAPTER 4. NUMBER THEORY
1. The first runs in outline as follows. Let n be the number in base 10 that we wish to write in base d. Look for the largest power m of d such that adm ≤ n where a < d. Then repeat for n − adm. Continuing in this way, we write n as a sum of multiples of powers of d and so we can write n in base d. 2. The second makes use of the remainder theorem. The idea behind this method is as follows. Let
n = am . . . a1a0 in base d. We may think of this as
n = (am . . . a1)d + a0
It follows that a0 is the remainder when n is divided by d, and the 0 quotient is n = am . . . a1. Thus we can generate the digits of n in base d from right to left by repeatedly finding the next quotient and next remainder by dividing the current quotient by d; the process starts with our input number as first quotient. Examples 4.1.4. Converting from base 10 to base d. 1. Write 2001 in base 7. I’ll solve this question in two different ways: the long but direct route and then the short but more thought-provoking route. We see that 74 > 2001. Thus we divide 2001 by 73. This goes 5 times plus a remainder. Thus 2001 = 5 × 73 + 286. We now repeat with 286. We divide it by 72. It goes 5 times again plus a remainder. Thus 286 = 5 × 72 + 41. We now repeat with 41. We get that 41 = 5 × 7 + 6. We have therefore shown that 2001 = 5 × 73 + 5 × 72 + 5 × 7 + 6. Thus 2001 in base 7 is just 5556. Now for the short method. quotient remainder 7 2001 7 285 6 7 40 5 7 5 5 0 5 4.1. THE REMAINDER THEOREM 81
Thus 2001 in base 7 is: 5556.
2. Write 2001 in base 12. quotient remainder 12 2001 12 166 9 12 13 10 = A 12 1 1 0 1
Thus 2001 in base 12 is: 11A9.
3. Write 2001 in base 2. quotient remainder 2 2001 2 1000 1 2 500 0 2 250 0 2 125 0 2 62 1 2 31 0 2 15 1 2 7 1 2 3 1 2 1 1 0 1
Thus 2001 in base 2 is (reading from bottom to top):
11111010001.
When converting from one base to another it is always wise to check your calculations by converting back. Number bases have some special terminology associated with them which you might encounter: 82 CHAPTER 4. NUMBER THEORY
Base 2 binary.
Base 8 octal.
Base 10 decimal.
Base 12 duodecimal.
Base 16 hexadecimal.
Base 20 vigesimal.
Base 60 sexagesimal.
Binary, octal and hexadecimal occur in computer science; there are remnants of a vigesimal system in French and the older Welsh system of counting; base 60 was used by astronomers in ancient Mesopotamia and is still the basis of time measurement (60 seconds = 1 minute, and 60 minutes = 1 hour) and angle measurement. As a final example of the importance of the remainder theorem, we look at how we may write proper fractions as decimals. To see what’s involved, let’s calculate some decimal fractions.
Examples 4.1.5.
1 1. 20 = 0 · 05. This fraction has a finite decimal representation.
1 2. 7 = 0 · 142857142857142857142857142857 .... This fraction has an infinite decimal representation, which consists of the same sequence of numbers repeated. We abbreviate this decimal to 0 · 142857.
37 3. 84 = 0 · 44047619. This fraction has an infinite decimal representation, which consists of a non-repeating part followed by a part which repeats.
I shall characterize those fractions which have a finite decimal represen- tation once we have proved our main theorem. I want to focus here on the last two cases. Case (2) is said to be a purely periodic decimal whereas case (3), which is more general, is said to be ultimately periodic.
Proposition 4.1.6. An infinite decimal fraction represents a rational num- ber if and only if it is ultimately periodic. 4.1. THE REMAINDER THEOREM 83
Proof. The key is in the remainders. Consider the ultimately periodic deci- mal number r = 0 · a1 . . . asb1 . . . bt. We shall prove that r is rational. Observe that
s 10 r = a1 . . . as · b1 . . . bt and s+t 10 = a1 . . . asb1 . . . bt · b1 . . . bt. From which we get that
s+t s 10 r − 10 r = a1 . . . asb1 . . . bt − a1 . . . as where the righthand side is the decimal form of some integer that we shall call a. It follows that a r = 10s+t − 10s is a rational number. The proof of the converse is based on the method we use to compute m the decimal expansion of n . We carry out repeated divisions by n and at each step of the computation we use the remainder obtained to calculate the next digit. But there are only a finite number of possible remainders and our expansion is assumed infinite. Thus at some point there must be repetition.
Example 4.1.7. We shall write the ultimately periodic decimal 0 · 94.¯ as a proper fraction in its lowest terms. Put r = 0 · 94.¯ Then
• r = 0 · 94.¯
• 10r = 9.444 ...
• 100r = 94.444 ....
85 17 Thus 100r −10r = 94−9 = 85 and so r = 90 . We can simplify this to r = 18 . We can now easily check that this is correct.
Exercises 4.1 84 CHAPTER 4. NUMBER THEORY
1. Find the quotients and remainders for each of the following pair of numbers. Divide the smaller into the larger.
(a) 30 and 6. (b) 100 and 24. (c) 364 and 12.
2. Write the number 2009 in
(a) Base 5. (b) Base 12. (c) Base 16.
3. Write the following numbers in base 10.
(a) DAB16.
(b) ABBA12.
(c) 443322115.
4. Write the following decimals as fractions in their lowest terms.
(a)0 · 534. (b)0 · 2106. (c)0 · 076923.
5. Prove the following properties of the division relation on Z.
(a) If a 6= 0 then a | a. (b) If a | b and b | a then a = ±b. (c) If a | b and b | c then a | c. (d) If a | b and a | c then a | (b + c).
6. *This question develops a proof of the remainder theorem. Let a and b be integers with b > 0. Then there exist a unique pair of integers q and r such that a = qb + r where 0 ≤ r < b. 4.2. GREATEST COMMON DIVISORS 85
(a) Let X = {a − nb: n ∈ Z}. Show that this set contains non-negative elements. (b) Let X+ be the subset of X consisting of non-negative elements. This subset is non-empty by the first step. Use the well-ordering principle to deduce that this set contains a minimum element r. Thus r = a − qb ≥ 0 for some q ∈ Z. (c) Show that if r ≥ b then X+ in fact contains a smaller element, which is a contradiction. (d) We therefore have that a = bq + r where 0 ≤ r < b. It remains to prove that q and r are unique with these propertries. Assume therefore that a = bq0 + r0 where 0 ≤ r0 < b. Deduce that q = q0 and r = r0.
4.2 Greatest common divisors
Let a, b ∈ N. A number d which divides both a and b is called a common divisor of a and b. The largest number which divides both a and b is called the greatest common divisor of a and b and is denoted by gcd(a, b). A pair of natural numbers a and b is said to be coprime if gcd(a, b) = 1. For us gcd(0, 0) is undefined but if a 6= 0 then gcd(a, 0) = a.
Example 4.2.1. Consider the numbers 12 and 16. The set of divisors of 12 is {1, 2, 3, 4, 6, 12}. The set of divisors of 16 is {1, 2, 4, 8, 16}. The set of common divisors is the set of numbers that belong to both of these two sets: namely, {1, 2, 4}. The greatest common divisor of 12 and 16 is therefore 4. Thus gcd(12, 16) = 4.
One application of greatest common divisors is in simplifying fractions. 12 3 For example, the fraction 16 is equal to the fraction 4 because we can divide out by the common divisor of numerator and denominator. The fraction which results cannot be simplified further and is in its lowest terms.
a b Lemma 4.2.2. Let d = gcd(a, b). Then gcd( d , d ) = 1. Proof. Because d divides both a and b we may write a = a0d and b = b0d for some natural numbers a0 and b0. We therefore need to prove that gcd(a0, b0) = 86 CHAPTER 4. NUMBER THEORY
1. Suppose that e | a0 and e | b0. Then a0 = ex and b0 = ey for some natural numbers x and y. Thus a = exd and b = eyd. Observe that ed | a and ed | b and so ed is a common divisor of both a and b. But d is the greatest common divisor and so e = 1, as required.
Let me paraphrase what the result above says since it is not surprising. If I divide two numbers by their greatest common divisor then the numbers that remain are coprime. This seems intuitively plausible and the proof ensures that our intuition is correct.
Example 4.2.3. Greatest common divisors arise naturally in solving lin- ear equations where we require the solutions to be integers. Consider, for example, the linear equation
12x + 16y = 5.
If we want our solutions (x, y) to have real number co-ordinates, then it is of course easy to solve this equation and find infinitely many solutions since the solutions form a line in the plane. But suppose now that we require (x, y) ∈ Z2; that is, we want the solutions to be integers. In other words, we want to know whether the line contains any points with integer co-ordinates. We can see immediately that this is impossible. We have calculated that gcd(12, 16) = 4. Thus if x and y are integers, the number 4 divides the lefthand side of our equation. But clearly, 4 does not divide the righthand side of our equation. Thus the set
2 {(x, y):(x, y) ∈ Z and 12x + 16y = 5} is empty.
If the numbers a and b are large, then calculating their gcd in the way I did above would be time-consuming and error-prone. We want to find an efficient method of calculating the greatest common divisor. The following lemma is the basis of just such a method.
Lemma 4.2.4. Let a, b ∈ N, where b 6= 0, and let a = bq+r where 0 ≤ r < b. Then gcd(a, b) = gcd(b, r). 4.2. GREATEST COMMON DIVISORS 87
Proof. Let d be a common divisor of a and b. Since a = bq + r we have that a − bq = r so that d is also a divisor of r. It follows that any divisor of a and b is also a divisor of b and r. Now let d be a common divisor of b and r. Since a = bq + r we have that d divides a. Thus any divisor of b and r is a divisor of a and b. It follows that the set of common divisors of a and b is the same as the set of common divisors of b and r. Thus gcd(a, b) = gcd(b, r). The point of the above result is that b < a and r < b. So calculat- ing gcd(b, r) will be easier than calculating gcd(a, b) because the numbers involved are smaller. Compare z }| { a = b q + r with a = bq + r . | {z } The above result is the basis of an efficient algorithm for computing greatest common divisors. It was described in Propositions 1 and 2 of Book VII of Euclid. Algorithm 4.2.5 (Euclid’s algorithm). Input: a, b ∈ N such that a ≥ b and b 6= 0. Output: gcd(a, b). Procedure: write a = bq + r where 0 ≤ r < b. Then gcd(a, b) = gcd(b, r). If r 6= 0 then repeat this procedure with b and r and so on. The last non-zero remainder is gcd(a, b) Example 4.2.6. Let’s calculate gcd(19, 7) using Euclid’s algorithm. I have highlighted the numbers that are involved at each stage. 19 = 7 · 2 + 5 7 = 5 · 1 + 2 5 = 2 · 2 + 1 ∗ 2 = 1 · 2 + 0 By Lemma 4.2.4 we have that gcd(19, 7) = gcd(7, 5) = gcd(5, 2) = gcd(2, 1) = gcd(1, 0). The last non-zero remainder is 1 and so gcd(19, 7) = 1 and, in this case, the numbers are coprime. 88 CHAPTER 4. NUMBER THEORY
There are occasions when we need to extract more information from Eu- clid’s algorithm as we shall discover later when we come to deal with prime numbers. The following provides what we need.
Theorem 4.2.7 (B´ezout’stheorem). Let a and b be natural numbers. Then there are integers x and y such that
gcd(a, b) = xa + yb.
I shall prove this theorem by describing an algorithm that will compute the integers x and y above. This is achieved by running Euclid’s algorithm in reverse and is called the extended Euclidean algorithm. The procedure for doing so is outlined below but the details are explained in the example that follows it.
Algorithm 4.2.8 (Extended Euclidean algorithm). Input: a, b ∈ N where a ≥ b and b 6= 0. Output: numbers x, y ∈ Z such that gcd(a, b) = xa + yb. Procedure: apply Euclid’s algorithm to a and b; working from bottom to top rewrite each remainder in turn.
Example 4.2.9. This is a little involved so I have split the process up into steps. I shall apply the extended Euclidean algorithm to the example I calculated above. I have highlighted the non-zero remainders wherever they occur, and I have discarded the last equality where the remainder was zero. I have also marked the last non-zero remainder.
19 = 7 · 2 + 5 7 = 5 · 1 + 2 5 = 2 · 2 + 1 ∗
The first step is to rearrange each equation so that the non-zero remainder is alone on the lefthand side.
5 = 19 − 7 · 2 2 = 7 − 5 · 1 1 = 5 − 2 · 2 4.2. GREATEST COMMON DIVISORS 89
Next we reverse the order of the list
1 = 5 − 2 · 2 2 = 7 − 5 · 1 5 = 19 − 7 · 2
We now start with the first equation. The lefthand side is the gcd we are interested in. We treat all other remainders as algebraic quantities and sys- tematically substitute them in order. Thus we begin with the first equation
1 = 5 − 2 · 2.
The next equation in our list is
2 = 7 − 5 · 1 so we replace 2 in our first equation by the expression on the right to get
1 = 5 − (7 − 5 · 1) · 2.
We now rearrange this equation by collecting up like terms treating the high- lighted remainders as algebraic objects to get
1 = 3 · 5 − 2 · 7.
We can of course make a check at this point to ensure that our arithmetic is correct. The next equation in our list is
5 = 19 − 7 · 2 so we replace 5 in our new equation by the expression on the right to get
1 = 3 · (19 − 7 · 2) − 2 · 7.
Again we rearrange to get
1 = 3 · 19 − 8 · 7 .
The algorithm now terminates and we can write
gcd(19, 7) = 3 · 19 + (−8) · 7 , as required. We can also, of course, easily check the answer! 90 CHAPTER 4. NUMBER THEORY
I shall describe a much more efficient algorithm for implementing the extended Euclidean algorithm later in this book when I have discussed ma- trices. A very useful application of B´ezout’stheorem is the following. Lemma 4.2.10. Let a and b be natural numbers. Then a and b are coprime if, and only if, we may find integers x and y such that
1 = xa + yb.
Proof. Suppose first that a and b are coprime. Then by B´ezout’stheorem
gcd(a, b) = ax + by for some integers a and b. But, by assumption, gcd(a, b) = 1. Conversely, suppose that 1 = xa + yb. Then any natural number that divides both a and b must divide 1. It follows that gcd(a, b) = 1. The significance of the above lemma is that whenever you know that a and b are coprime, you can actually write down an expression 1 = xa + yb which means the same thing. This turns out to be enormously useful.
Exercises 4.2
1. Use Euclid’s algorithm to find the gcd’s of the following pairs of num- bers.
(a) 35, 65. (b) 135, 144. (c) 17017, 18900.
2. Use the extended Euclidean algorithm to find integers x and y such that gcd(a, b) = ax+by for each of the following pairs of numbers. You should ensure that your answers for x and y have the correct signs.
(a) 112, 267. (b) 242, 1870. 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 91
3. *We know how to find the greatest natural number that divides two numbers. Define now gcd(a, b, c) to be the greatest common divisor of a and b and c jointly. Prove that
gcd(a, b, c) = gcd(gcd(a, b), c).
Deduce that gcd(gcd(a, b), c) = gcd(a, gcd(b, c)). We may similarly define gcd(a, b, c, d) to be the greatest common divisor of a and b and c and d jointly. Calculate gcd(910, 780, 286, 195) and justify your calculations.
4. *The following question is by Dubisch Amer. Math. Mon. 69. Define N∗ = N \{0}. A binary operation ◦ defined on N∗ is known to have the following properties:
(a) a ◦ b = b ◦ a. (b) a ◦ a = a. (c) a ◦ (a + b) = a ◦ b.
Prove that a ◦ b = gcd(a, b). Hint: the question is not asking you to prove that gcd(a, b) has these properties.
5. You have an unlimited supply of 3 cent stamps and an unlimited supply of 5 cent stamps. By combining stamps of different values you can make up other values: for example, three 3 cent stamps and two 5 cent stamps make the value 19 cents. What is the largest value you cannot make? Hint: you need to show that the question makes sense.
6. Let n ≥ 1. Define φ(n) to be the number of numbers less than or equal to n and coprime to n. This is the Euler totient function. Tabulate the values of φ(n) for 1 ≤ n ≤ 12.
4.3 The fundamental theorem of arithmetic
The goal of this section is to state and prove the most basic result about the natural numbers: each natural number, excluding 0 and 1, can be written 92 CHAPTER 4. NUMBER THEORY as a product of powers of primes in essentially one way. The primes are therefore the ‘atoms’ from which all natural numbers can be built. A proper divisor of a natural number n is a divisor that is neither 1 nor n. A natural number n is said to be prime if n ≥ 2 and the only divisors of n are 1 and n itself. A number bigger than or equal to 2 which is not prime is said to be composite. It is important to remember that the number 1 is not a prime. The only even prime is the number 2. The properties of primes have exercised a great fascination ever since they were first studied and continue to pose questions that mathematicians have yet to solve. There are no nice formulae to tell us what the nth prime is but there are still some interesting results in this direction. The polynomial p(n) = n2 − n + 41 has the property that its value for n = 1, 2, 3, 4,..., 40 is always prime. Of course, for n = 41 it is clearly not prime. In 1971, the mathematician Yuri Matijasevic found a polynomial in 26 variables of degree 25 with the property that when non-negative integers are substituted for the variables the positive values it takes are all and only the primes. However, this polynomial does not generate the primes in any particular order. Lemma 4.3.1. Let n ≥ 2. Either n is prime or the smallest proper divisor of n is prime. Proof. Suppose n is not prime. Let d be the smallest proper divisor of n. If d were not prime then d would have a smallest proper divisor and this divisor would in turn divide n, but this would contradict the fact that d was the smallest proper divisor of n. Thus d must itself be prime. The following was also proved by Euclid: it is Proposition 20 of Book IX of Euclid. Theorem 4.3.2. There are infinitely many primes.
Proof. Let p1, . . . , pn be the first n primes. Put
N = (p1 . . . pn) + 1.
If N is a prime, then N is a prime bigger than pn. If N is composite, then N has a prime divisor p by Lemma 4.3.1. But p cannot equal any of the primes p1, . . . , pn because N leaves remainder 1 when divided by pi. It follows that p is a prime bigger than pn. Thus we can always find a bigger prime. It follows that there must be an infinite number of primes. 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 93
Example 4.3.3. It’s interesting to consider some specific cases of the num- bers introduced in the above proof. The first few are already prime. • 2 + 1 = 3 prime.
• 2 · 3 + 1 = 7 prime.
• 2 · 3 · 5 + 1 = 31 prime.
• 2 · 3 · 5 · 7 + 1 = 211 prime.
• 2 · 3 · 5 · 7 · 11 + 1 = 2, 311 prime.
• 2 · 3 · 5 · 7 · 11 · 13 + 1 = 30, 031 = 59 · 509.
The Prime Number Theorem There are infinitely many primes but how are those primes distributed? For example, are they arranged fairly regularly, or do the gaps between them get bigger and bigger? There are no formulae which output the nth prime in a usable way, but if we adopt a statistical approach then we can obtain much more useful results. The idea is that for each natural number n we count the number of primes π(n) less than or equal to n. The graph of π(n) has a staircase shape — it certainly isn’t smooth — but as you zoom away it begins to look smoother and smoother. This raises the question of whether there is a smooth function that is a good approximation to π(n). In 1792, the young Carl Friedrich Gauss (1777–1855) observed that π(n) appeared to be close to the value of the n amazingly simple function ln(n) . But proving that this was always true, and not just an artefact of the comparatively small numbers he looked at, turned out to be difficult. Eventually, in 1896 two mathematicians, Jacques Hadamard (1865–1963) and the spectacularly named Charles Jean Gustave Nicolas Baron de la Vall´eePoussin (1866–1962), proved independently of each other that
π(x) lim = 1 x→∞ x/ ln(x)
a result known as the Prime Number Theorem. It was proved using com- plex analysis; that is, calculus using complex numbers. As an example, 94 CHAPTER 4. NUMBER THEORY
we have that π(1, 000, 000) = 78, 498 whereas 106 = 72, 382. ln 106
Algorithm 4.3.4. To decide√ whether a number n is prime or composite. Check to see if any prime p ≤ n divides n. If none of them do, the number n is prime. We shall now explain why this√ works. If a divides√ n then we can√ write n =√ab for some number b. If a < n then b > n whilst if a > n then b < n. Thus to decide if√n is prime or not we need only carry out trial divisions by all numbers a ≤ n. However, this is inefficient because if a divides n and a is not prime then a is divisible by some prime p which must therefore also divide√n. It follows that we need only carry out trial divisions by the primes p ≤ n.
Example 4.3.5. Determine whether 97 is prime using the above√ algorithm. We first calculate the largest whole number less than or equal to 97. This is 9. We now carry out trial divisions of 97 by each prime number p where 2 ≤ p ≤ 9; by the way, if you aren’t certain which of these numbers is prime: just try them all. You’ll get the right answer although not as efficiently. You might also want to remember that if m doesn’t divide a number neither can any multiple of m. In any event, in this case we carry out trial divisions by 2, 3, 5 and 7. None of them divides 97 exactly and so 97 is prime.
Cryptography Prime numbers play an important role in exchanging secret information. In 1976, Whitfield Diffie and Martin Hellman wrote a paper on cryptog- raphy that can genuinely be called ground-breaking. In ‘New directions in cryptography’ IEEE Transactions on Information Theory 22 (1976), 644–654, they put forward the idea of a public-key cryptosystem which would enable
. . . a private conversation . . . [to] be held between any two in- dividuals regardless of whether they have ever communicated before.
With considerable farsightedness, Diffie and Hellman foresaw that such 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 95
cryptosystems would be essential if communication between computers was to reach its full potential. However, their paper did not describe a concrete way of doing this. It was R. I. Rivest, A. Shamir and L. Adle- man (RSA) who found just such a concrete method described in their paper, ‘A method for obtaining digital signatures and public-key cryp- tosystems’ Communications of the ACM 21 (1978), 120–126. Their method is based on the following observation. Given two prime num- bers it takes very little time to multiply them together, but if I give you a number that is a product of two primes and ask you to factorize it then it takes a lot of time. You might like to think about why in relation to the algorithm I gave for factroizing numbers above. After considerable experimentation, RSA showed how to use little more than undergraduate mathematics to put together a public-key cryptosystem that is an essential ingredient in e-commerce. Ironically, this secret code had in fact been invented in 1973 at GCHQ, who had kept it secret.
The following is the key property of primes we shall need to prove the fundamental theorem of arithmetic. It is the main reason why we needed B´ezout’stheorem. It is Proposition 30 of Book VII of Euclid.
Lemma 4.3.6 (Euclid’s lemma).
1. Let p | ab where p is a prime. Then p | a or p | b.
2. Let p | a1 . . . an where p is a prime. Then p | ai for some i.
Proof. (1) Suppose that p does not divide a. We shall prove that p must then divide b. If p does not divide a, then a and p are coprime. By Lemma 4.2.10, there exist integers x and y such that 1 = px + ay. Thus b = bpx + bay. Now p | bp and p | ba, by assumption, and so p | b, as required. (2) This is a typical application for proof by induction. We have proved the base case where n = 2. Assume that the result holds when n = k. We prove that it holds for n = k + 1. Suppose that p | (a1 . . . ak)ak+1. From the base case, either p | a1 . . . ak or p | ak+1. But we may now deduce that p | pi for some 1 ≤ i ≤ k or p | ak+1 by the induction hypothesis. We have therefore proved the result.
Example 4.3.7. The above result is not true if p is not a prime. For example, 6 | 9 × 4 but 6 divides neither 9 nor 4. 96 CHAPTER 4. NUMBER THEORY
Lemma 4.3.6 is so important, I want to spell out in words what it says: If a prime divides a product of numbers it must divide at least one of them.
There is a very nice application of Euclid’s lemma√ to proving that certain numbers are irrational. It generalizes our proof that 2 is irrational described in Chapter 2. Theorem 4.3.8. The square root of every prime number is irrational. √ Proof. We shall prove this by contradiction. Assume that we can write p as a rational. I shall show that this assumption leads to a contradiction and √ a so must be false. We are assuming that p = b . By cancelling the greatest common divisor of a and b we can in fact assume that gcd(a, b) = 1. This √ a will be crucial to our argument. Squaring both sides of the equation p = b and multiplying the resulting equation by b2 we get that pb2 = a2. This says that a2 is divisible by p. But if a prime divides a product of two numbers it must divide at least one of those numbers by Euclid’s lemma. Thus p divides a. Thus we can write a = pc for some natural number c. Substituting this into our equation above we get that pb2 = p2c2. Dividing both sides of this equation by p gives b2 = pc2. This tells us that b2 is divisible by p and so in the same way as above p √ divides b. We have therefore shown that our assumption that p is rational leads to both a and b being divisible by p. But this contradicts the fact that √ gcd(a, b) = 1. Our assumption is therefore wrong, and so p is not a rational number. We now come to the main theorem of this chapter. Theorem 4.3.9 (Fundamental theorem of arithmetic). Every number n ≥ 2 can be written as a product of primes in one way if we ignore the order in which the primes appear. By product we allow the possibility that there is only one prime. 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 97
Proof. Let n ≥ 2. If n is already a prime then there is nothing to prove, so we can suppose that n is composite. Let p1 be the smallest prime divisor of 0 0 0 n. Then we can write n = p1n where n < n. Once again, n is either prime or composite. Continuing in this way, we can write n as a product of primes. We now prove uniqueness. Suppose that
n = p1 . . . ps = q1 . . . qt are two ways of writing n as a product of primes. Now p1 | n and so p1 | q1 . . . qt. By Euclid’s Lemma, the prime p1 must divide one of the qi’s and, since they are themselves prime, it must actually equal one of the qi’s. By relabelling if necessary, we can assume that p1 = q1. Cancel p1 from both sides and repeat with p2. Continuing in this way, we see that every prime occurring on the lefthand side occurs on the righthand side. Changing sides, we see that every prime occurring on the righthand side occurs on the lefthand side. We deduce that the two prime decompositions are identical. When we write a number as a product of primes we usually gather to- gether the same primes into a prime power, and write the primes in increasing order which then gives a unique representation. This is illustrated in the ex- ample below. Example 4.3.10. Let n = 999, 999. Write n as a product of primes. There are a number of ways of doing this but in this case there is an obvious place to start. We have that n = 32 ·111, 111 = 33 ·37, 037 = 33 ·7·5, 291 = 33 ·7·11·481 = 33 ·7·11·13·37. Thus the prime factorisation of 999, 999 is 999, 999 = 33 · 7 · 11 · 13 · 37.
Supernatural Numbers There are natural numbers. Are there super natural numbers? It sounds like a joke but in fact there are, though to be honest they are only encountered in advanced work. But since they are easy to understand and I like the name, I have included a brief description List the primes in order 2, 3, 5, 7,.... By the fundamental theorem of arithmetic, each natural number ≥ 2 may be expressed as a unique product of powers of primes. Let’s write each such natural number as a product all primes. 98 CHAPTER 4. NUMBER THEORY
This can be achieved by including those primes not needed by raising them to the power 0. For example,
10 = 2 · 5 = 21 · 30 · 51 · 70 ...
which we could write as
(1, 0, 1, 0, 0, 0 ...)
and 12 = 22 · 3 = 22 · 31 · 50 · 70 ... which we could write as
(2, 1, 0, 0, 0, 0 ...)
Of course, for each natural number from some point on all the entries will be zero. Thus each natural number ≥ 2 is encoded by an infinite sequence of natural numbers that are zero from some point onwards. We now define a supernatural number to be any sequence
(a1, a2, a3,...)
where the ai are natural numbers. We define a natural number to be a supernatural number where the ai = 0 for all i ≥ m for some natural number m ≥ 1. This makes sense because each natural supernatural number can be regarded as the encoded version of a natural number in the non-super sense. I shall denote the set of supernatural numbers by S; this is not yet the complete list since I still have to add some special such numbers. I shall denote supernatural numbers by bold letters such as a. I shall also denote the ith component by ai. Let a and b be two supernatural numbers. We define their product as follows
(a · b)i = ai + bi. This makes sense because, for example,
10 · 12 = 120 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 99
and
(1, 0, 1, 0, 0, 0 ...) · (2, 1, 0, 0, 0, 0 ...) = (3, 1, 1, 0, 0, 0 ...)
which encodes 233151 = 120. I shall leave you to check that the multi- plication is associative. If we define
1 = (0, 0, 0,...)
and allow it to be supernatural then we also have a multiplicative iden- tity because 1 · a = a = a · 1. Now introduce a new symbol ∞ which satisfies a + ∞ = ∞ = ∞ + a. Then if we allow 0 = (∞, ∞, ∞,...) as a supernatural number then we also have a zero in the set of super- natural numbers since 0 · a = 0 = a · 0. Finally, allow ∞ to occur anyway any number of times in the definition of a supernatural number. Then we have the full set of supernatural numbers. How do you think that we could define gcd(a, b) and lcm(a, b) of supernatural numbers?
I shall now describe two simple applications of our main theorem. The greatest common divisor of two numbers a and b is the largest number that divides into both a and b. On the other hand, if a | c and b | c then we say that c is a common multiple of a and b. The smallest common multiple of a and b is called the least common multiple of a and b and is denoted by lcm(a, b). You might expect that to calculate the least common multiple we would need a new algorithm, but in fact we can use Euclid’s algorithm as the following result shows.
Proposition 4.3.11. Let a and b be natural numbers not both zero. Then
gcd(a, b) · lcm(a, b) = ab.
Proof. We begin with a special case to motivate the idea. Suppose that a = pr and b = ps where p is a prime. Then it is immediate from the 100 CHAPTER 4. NUMBER THEORY properties of indices that
gcd(a, b) = pmin(r,s) and lcm(a, b) = pmax(r,s) and so, in this special case, we have that gcd(a, b) · lcm(a, b) = ab. Next suppose that the prime factorizations of a and b are
r1 rm s1 sm a = p1 . . . pm and b = p1 . . . pm where the pi are primes. We may easily determine the prime factorization of gcd(a, b) when we bear in mind the following points. The primes that occur in the prime factorization of gcd(a, b) must be from the set {p1, . . . , pm}, the min(ri,si) number pi divides gcd(a, b) but no higher power does. It follows that
min(r1,s1) min(rm,sm) gcd(a, b) = p1 . . . pm .
A similar kind of argument proves that
max(r1,s1) max(rm,sm) lcm(a, b) = p1 . . . pm .
The proof of the fact that gcd(a, b)·lcm(a, b) = ab now follows by multiplying the above two prime factorizations together. In the above proof, we assumed that a and b had prime factorizations using the same set of primes. This need not be true in general, but by allowing zero powers of primes we can easily arrange for the same sets of primes to occur and the argument above remains valid.
For our next result, we begin with an observation. Some fractions, such 1 as 2 , can be written with only a finite number of digits after the decimal 1 place, but others, such as 3 , require an infinite number of digits. We can now account for this using our main theorem.
a Proposition 4.3.12. A proper rational number b in its lowest terms has a finite decimal expansion if and only if b = 2m5n for some natural numbers m and n.
a Proof. Let b have the finite decimal representation 0 · a1 . . . an. This means a a a a = 1 + 2 + ... + n . b 10 102 10n 4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 101
The righthand side is just the fraction a 10n−1 + a 10n−2 + ... + a 1 2 n . 10n The denominator contains only the prime factors 2 and 5 and so the reduced form will also only contain at most the prime factors 2 and 5. To prove the converse, consider the proper fraction a . 2α5β If α = β then the denominator is 10α. If α 6= β then multiply the fraction by a suitable power of 2 or 5 as appropriate so that the resulting fraction has denominator a power of 10. But any fraction with denominator a power of 10 has a finite decimal expansion.
Exercises 4.3
1. List the primes less than 100. Hint: use the Sieve of Eratosthenes1 which can be used to construct a table of all primes up to the number N. List all numbers from 2 to N inclusive. Mark 2 as prime and then cross out from the table all numbers which are multiples of 2. The process now iterates as follows. Find the smallest number which is not marked as a prime and which has not been crossed out. Mark it as a prime and cross out all its multiples. If no such number can be found then you have found all primes less than or equal to N.
2. For each of the following numbers use Algorithm 4.3.4 to determine whether they are prime or composite. When they are composite find a prime factorization. Show all working.
(a) 131. (b) 689. (c) 5491.
3. Find the lowest common multiples of the following pairs of numbers.
1Eratosthenes of Cyrene who lived about 250 BCE. He is famous for using geometry and some simple observations to estimate the circumference of the earth. 102 CHAPTER 4. NUMBER THEORY
(a) 22, 121. (b) 48, 72. (c) 25, 116.
4. Given 24 · 3 · 55 · 112 and 22 · 56 · 114, calculate their greatest common divisor and least common multiple.
5. Use the√ fundamental theorem of arithmetic to show that we can always write n, where n is a natural number, as a product of a natural number and a product of square roots of primes. Calculate the square roots of the following numbers exactly using the above method.
(a) 10. (b) 42. (c) 54.
6. Let a and b be coprime. Prove that if a | bc then a | c.
4.4 *Modular arithmetic
From an early age, we are taught to think of numbers as being strung out along the number line
−3 −2 −1 0 1 2 3
But that is not the only way we count. We count the seasons in a cyclic manner
. . . autumn, winter, spring, summer . . . and likewise the days of the week
. . . Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday . . .
Also the months of the year or the hours in a day, whether by means of the 12- hour clock or the 24-hour clock. The fact that we use words for these events obscures the fact that we really are counting. This is clearer in the names for the months since October, November and December were originally the 4.4. *MODULAR ARITHMETIC 103 eighth, ninth and tenth months, respectively, until Roman politics intervened and they were shifted. But the counting in all these cases is not linear but cyclic. Rather than using a number line to represent this type of counting, we use instead number circles, and rather than using the words above, I shall use numbers. Here is the number circle for the seasons with numbers replacing words.
0
3 1
2
Adding in these systems of arithmetic means stepping around in a clockwise direction, whereas subtracting means stepping around in an anticlockwise direction. Modular arithmetic is the name given to these different systems of cyclic counting. It was Gauss who realised that these different systems of counting were mathematically interesting.
4.4.1 Congruences Let n ≥ 2 be a fixed natural number which in this context we call the modulus. If a, b ∈ Z we write a ≡ b if, and only if, a and b leave the same remainder when divided by n or, what amounts to the same thing, n | a − b. Here are a couple of simple examples. If n = 2, then a ≡ b if, and only if, a and b are either both odd or both even. On the other hand, if n = 10 then a ≡ b if, and only if, a and b have the same units digit. The symbol ≡ is a modification of the equality symbol =. If a ≡ b with respect to n we say that a is congruent to b modulo n. In fact, congruence behaves like a weakened form of equality as we now show.
Lemma 4.4.1. Let n ≥ 2 be a fixed modulus.
1. a ≡ a. 104 CHAPTER 4. NUMBER THEORY
2. a ≡ b implies b ≡ a.
3. a ≡ b and b ≡ c implies that a ≡ c.
4. a ≡ b and c ≡ d implies that a + c ≡ b + d.
5. a ≡ b and c ≡ d implies that ac ≡ bd.
Here is a very simple application of modular arithmetic.
Lemma 4.4.2. A natural number n is divisible by 9 if, and only if, the sum of the digits of n is divisible by 9.
Proof. We shall work modulo 9. The proof hinges on the fact that 10 ≡ 1 modulo 9. By using Lemma 4.4.1, we quickly find that 10r ≡ 1 for all natural numbers r ≥ 1. We use this result now. Let
n n−1 n = an10 + an−110 + ... + a110 + a0.
Then n ≡ an + ... + a0. Thus n and the sum of the digits of n leave the same remainder when divided by 9, and so n is divisible by 9 if, and only if, the sum of the digits of n are divisible by 9. Solving a linear equation such as ax+by = c is very easy. For each possible real value of x we can compute the corresponding real value of y. But suppose now that a, b and c are integers and we only want to find solutions (x, y) whose co-ordinates are integers? This is an example of a Diophantine equation. We shall show how it may solved with the help of modular arithmetic. First, we shall show that the problem of finding integer solutions is equivalent to solving a simple kind of liner equation in one unknown in modular arithmetic.
Lemma 4.4.3. Let a, b and c be integers. Then the following are equivalent.
1. The pair (x1, y1) is an integer solution to ax + by = c for some y1.
2. The integer x1 is a solution to the equation ax ≡ c (mod b).
Proof. (1) ⇒ (2). Suppose that ax1 + by1 = c. Then it is immediate that ax1 ≡ c (mod b). (2) ⇒ (1). Suppose that ax1 ≡ c (mod b). Then by definition, ax1 − c = bz1 for some integer z1. Thus ax1 + b(−z1) = c. We may therefore put y1 = z1. 4.4. *MODULAR ARITHMETIC 105
We shall now describe how to solve all equations of the form
ax ≡ b (mod n).
Lemma 4.4.4. Consider the linear congruence ax ≡ b (mod n). 1. The linear congruence has a solution if, and only if, d = gcd(a, n) is such that d | b.
2. If the condition in part (1) holds and x0 is any solution, then all solu- tions have the form n x = x + t 0 d where t ∈ Z.
Proof. (1). Suppose first that x1 is a solution to our linear congruence. Then by definition, ax1 −b = nq for some integer q. It follows that ax1 +n(−q) = b. By definition d | a and d | n and so d | b. We now prove the converse. By B´ezout’s theorem, we may find integers u and v such that au + nv = d. By assumption, d | b and so b = dw for some integer w. It follows that auw + nvw = dw = b. Thus a(uw) ≡ b (mod n), and we have found a solution. (2) Let x0 be any one solution to ax ≡ b (mod n). It is routine to check n that x = x0 + t d for any t ∈ Z. Let x1 be any solution to ax ≡ b (mod n). Then a(x1 − x0) ≡ 0 (mod n). Thus a(x1 − x0) = tn for some integer t. The result now follows. There is a special case of the above result that is very important. Its proof is immediate. Corollary 4.4.5. Let p be a prime. Then the linear congruence ax ≡ b (mod p), where a is not congruent to 0 modulo p, always has a solution, and all solutions are congruent modulo p. Example 4.4.6. Let’s find all the points on the line 2x + 3y = 5 that have integer co-ordinates. Observe first that gcd(2, 3) = 1. Thus such points exist. In this case, by inspection, 1 = 2 · 2 + (−1)3. Thus 5 = 10 · 2 + (−5)3. It follows that (10, −5) is one point on the line with integer co-ordinates. Thus the set of integer solutions is
{(10 + 3t, −5 − 2t): t ∈ Z}. 106 CHAPTER 4. NUMBER THEORY
4.4.2 Wilson’s theorem I shall finish off this section with an application of congruences to primes. It is the first hint of hidden patterns in the primes. We need some notation first. For each natural number n define n!, pronounced n factorial, or if you are more extrovert n shriek, as follows: 0! = 1 and for n > 0 define n! = n · (n − 1)!. In other words, n! is what you get when you multiply together all the positive integers less than or equal to n. For each natural number n, we shall be interested in the value of (n − 1)! modulo n. Observe that there is no point in studying n! (mod n) since the answer is always 0. It’s worth doing some numerical calculations first to see if you can spot a pattern. Theorem 4.4.7 (Wilson’s Theorem). Let n be a natural number. Then n is a prime if, and only if, (n − 1)! ≡ n − 1 (mod n) Since n − 1 ≡ −1 (mod n) this is usually expressed in the form (n − 1)! ≡ −1 (mod n). Proof. The statement to be proved is an ‘if, and only if’ and so we have to prove two statements: (1) If n is prime then (n − 1)! ≡ n − 1 (mod n). (2) If (n − 1)! ≡ n − 1 (mod n) then n is prime. We prove (1) first. Let n be a prime. The result is clearly true when n = 2 so we may assume n is an odd prime. For each 1 ≤ a ≤ n − 1 there is a unique number 1 ≤ b ≤ n − 1 such that ab ≡ 1 (mod n). If a = b then a1 ≡ 1 (mod n) which means that n | (a − 1)(a + 1). Since n is a prime either n | a − 1 or a | a + 1. This can only occur if a = 1 or a = n − 1. Thus (n − 1)! ≡ n − 1 (mod n), as claimed. We now prove (2). Suppose that (n−1)! ≡ n−1 (mod n). We prove that n is a prime. Observe that when n = 1 we have that (n − 1)! = 1 which is not congruent to 0 modulo 1. When n = 4, we get that (4−1)! ≡ 2 (mod 4). Suppose that n > 4 is not prime. Then n = ab where 1 < a, b < n. If a 6= b then ab occurs as a factor of (n − 1)! and so this is congruent to 0 modulo n. If a = b then a occurs in (n − 1)! and so does 2a. Thus n is again a factor of (n − 1)!. This theorem is interesting for another reason. To show that a number is prime, we would usually apply the algorithm we described earlier which 4.5. *CONTINUED FRACTIONS 107 is just a systematic way of carrying out trial division. This theorem shows that a number is prime in a completely different way. Although it is not a pratical test for deciding whether a number is prime or composite, since n! gets very big very quickly, it shows that there might be backdoor ways of showing that a number is prime. This is a very important question in the light of the rˆoleof prime numbers in cryptography.
4.5 *Continued fractions
The goal of this section is to show how some of the ideas we have introduced so far can interact with each other. The material we cover is not needed elsewhere in this book.
4.5.1 Fractions of fractions We return to an earlier calculation. We used Euclid’s algorithm to calculate gcd(19, 7) as follows.
19 = 7 · 2 + 5 7 = 5 · 1 + 2 5 = 2 · 2 + 1 2 = 1 · 2 + 0
We first rewrite each line, except the last, as follows 19 5 = 2 + 7 2 7 2 = 1 + 5 5 5 1 = 2 + 2 2 Take the first equality 19 5 = 2 + . 7 2 5 7 But 7 is the reciprocal of 5 , and from the second equality 7 2 = 1 + . 5 5 108 CHAPTER 4. NUMBER THEORY
If we combine them, we get
19 1 = 2 + 2 7 1 + 5 however strange this may look. We may repeat the process to get
19 1 = 2 + 7 1 1 + 1 2 + 2 Fractions like this are called continued fractions. Suppose I just gave you
1 2 + 1 1 + 1 2 + 2 You could work out what the usual rational expression was by working from the bottom up. First compute the part in bold below
1 2 + 1 1 + 1 2 + 2 to get 1 2 + 1 1 + 5 2 which simplifies to 1 2 + 2 1 + 5 This process can no be repeated and we shall eventually obtain a standard fraction. I am not going to develop the theory of continued fractions, but I shall show you one more application. Let r be a real number. We may write r as r = m1 + r1 where 0 ≤ r1 < 1. For example, π may be written as π = 3 · 14159265358 ... where here m = 3 and r1 = 0 · 14159265358 .... Now 4.5. *CONTINUED FRACTIONS 109
1 since r1 < 1 and assume that it is non-zero. Then > 1. We may therefore r1 1 repeat the above process and write = m2 + r2 where once again r2 < 1. r1 This begin to feel an aweful lot like what we did above. In fact, we may write 1 r = m1 + , m2 + r2 and we can continue the above process with r2. It looks like we would obtain a continued fraction representation of r with the big difference that it could be infinite. Here is a concrete example. √ √ Example 4.5.1. We apply the above process to 3. Clearly, 1 < 3 < 2. Thus √ √ 3 = 1 + ( 3 − 1) √ where 3 − 1 < 1. We now focus on 1 √ . 3 − 1
√To convert this into a more usable form we multiple top and bottom by 3 + 1. We therefore get that 1 1 √ √ = ( 3 + 1). 3 − 1 2 √ 1 1 It is clear that 1 < 2 ( 3 + 1) < 1 2 . Thus √ 1 3 − 1 √ = 1 + . 3 − 1 2 We now focus on 2 √ 3 − 1 √ which simplifies to 3 + 1. Clearly √ 2 < 3 + 1 < 3. √ √ Thus 3 + 1 = 2 + ( 3 − 1). However, we have now gone full circle. Let’s see what we have obtained. We have that √ 1 3 = 1 + . 1 1 + √ 2 + ( 3 − 1) 110 CHAPTER 4. NUMBER THEORY √ However, we saw above that the pattern repeats as 3 − 1, so what we actually have is √ 1 3 = 1 + . 1 1 + 1 2 + 1 1 + ... Let’s see where we are by computing
1 1 + 1 1 + 1 2 + 1 √ 7 which simplifies to 4 . You can check that this is an approximation to 3.
4.5.2 Rabbits and pentagons We now illustrate some of the ways that algebra and geometry may inter- act. We begin with an artificial looking question. In his book, Liber Abaci, Fibonacci raised the following little puzzle which I’ve taken from MacTutor: “A certain man put a pair of rabbits in a place surrounded on all sides by a wall. How many pairs of rabbits can be produced from that pair in a year if it is supposed that every month each pair begets a new pair which from the second month on becomes productive?” These are obviously mathematical rabbits rather than real ones so let me spell out the rules more explicitly: Rule 1 The problem begins with one pair of immature rabbits.2
Rule 2 Each immature pair of rabbits takes one month to mature.
Rule 3 Each mature pair of rabbits produces a new immature pair at the end of a month. 2Fibonacci himself seems to have assumed that the starting pair was already mature but we shan’t. 4.5. *CONTINUED FRACTIONS 111
Rule 4 The rabbits are immortal. The important point is that we must solve the problem using the rules we have been given. To do this, I am going to draw a picture. I will represent an immature pair of rabbits by ◦ and a mature pair by •. Rule 2 will be represented by ◦
• and Rule 3 will be represented by • @ ~~ @@ ~~ @@ ~~ @@ •~ ◦ Rule 1 tells us that we start with ◦. Applying the rules we obtain the following picture for the first 4 months. ◦ 1 pair
• 1 pair ¢ << ¢¢ << ¢¢ << ¢¢ << • Ñ¢ ◦ 2 pairs ¢ << << ¢¢ << << ¢¢ << << ¢¢ << << • Ñ¢ ◦ • 3 pairs ¢ << ¢ << ¢¢ << ¢¢ << ¢¢ << ¢¢ << ¢¢ << ¢¢ << •Ñ¢ ◦ • •Ñ¢ ◦ 5 pairs
We start with 1 pair and at the end of the first month we still have 1 pair, at the end of the second month 2 pairs, at the end of the third month 3 pairs, and at the end of the fourth month 5 pairs. I shall write this F0 = 1, F1 = 1, F2 = 2, F3 = 3, F4 = 5, and so on. Thus the problem will be solved if we can compute F12. There is an apparent pattern in the sequence of numbers 1, 1, 2, 3, 5,... after the first two terms in the sequence each number is the sum of the previous two. Let’s check that we are not just seeing things. Suppose that the number of immature pairs of rabbits at a given time t is 112 CHAPTER 4. NUMBER THEORY
It and the number of mature pairs is Mt. Then using our rules at time t + 1 we have that Mt+1 = Mt + It and It+1 = Mt. Thus
Ft+1 = 2Mt + It. Similarly Ft+2 = 3Mt + 2It. It is now easy to check that
Ft+2 = Ft+1 + Ft.
The sequence of numbers such that F0 = 1, F1 = 1 and satisfying the rule Ft+2 = Ft+1 + Ft is called the Fibonacci sequence. We have that
F0 = 1,F1 = 1,F2 = 2,F3 = 3,F4 = 5,F5 = 8,F6 = 13,F7 = 21,
F8 = 34,F9 = 55,F10 = 89,F11 = 144,F12 = 233. The solution to the original question is therefore 233 pairs of rabbits. Fibonacci numbers arise in the most diverse situations: famously, in phyl- lotaxis which is the study of how leaves and petals are arranged on plants. We shall now look for a formula that will enable us to calculate Fn directly. To begin, we’ll follow an idea due to the astronomer Jonannes Kepler, and look at the behaviour of the fractions Fn+1 as n gets bigger and bigger. I Fn have tabulated some calculations below.
F1 F2 F3 F4 F5 F6 F7 F14 F0 F1 F2 F3 F4 F5 F6 F13 1 2 1 · 5 1 · 6 1 · 625 1 · 615 1 · 619 1 · 6180 These ratios seem to be going somewhere; the question is: where? Notice that F F + F F 1 n+1 = n n−1 = 1 + n−1 = 1 + . Fn Fn Fn Fn Fn−1 But for very large n we suspect that Fn+1 and Fn will be almost the same. Fn Fn−1 This suggests, but doesn’t prove, that we need to find the positive solution x to 1 x = 1 + . x Thus x is a number that when you take its reciprocal and add 1 you get x back again. This problem is really a quadratic equation in disguise x2 = x + 1 or more usually x2 − x − 1 = 0. 4.5. *CONTINUED FRACTIONS 113
This equation can be solved very simply to give us √ 1 ± 5 x = . 2 That is √ √ 1 + 5 1 − 5 φ = and φ¯ = . 2 2 The number φ is called the golden ratio, about which a deal of nonsense has been written. Let’s go back and see if this calculation makes sense. First we calculate φ and we get φ = 1 · 618033988 ... I compute F 6765 19 = = 1 · 618033963 F18 4181 on my pocket calculator. This is pretty close. We can now get our formula for the Fibonacci numbers. Define
1 n+1 ¯n+1 fn = √ φ − φ . 5
I’m going to show you that Fn = fn. To do this, I’ll use the following identities which are straightforward to check √ φ − φ¯ = 5 φ2 = φ + 1 and φ¯2 = φ¯ + 1.
Let’s start with f0. We know that √ φ − φ¯ = 5 and so we really do have that f0 = 1. To calculate f1 we use the other formulae and again we get f1 = 1. We now calculate fn + fn+1 we get 1 n+1 ¯n+1 1 n+2 ¯n+2 fn + fn+1 = √ φ − φ + √ φ − φ 5 5 1 = √ φn+1 + φn+2 − (φ¯n+1 + φ¯n+2) 5 1 = √ φn+1(1 + φ) − φ¯n+1(1 + φ¯) 5 1 = √ φn+1φ2 − φ¯n+1φ¯2 5 1 n+3 ¯n+3 = √ φ − φ = fn+2 5 114 CHAPTER 4. NUMBER THEORY
Because fn and Fn start in the same place and satisfy the same rules, we have therefore proved that
F = √1 φn+1 − φ¯n+1 . n 5
At this point, we can go back and verify our original idea that the fractions Fn+1 seem to get closer and closer to φ as n gets larger and larger. We have Fn that F φn+2 − φ¯n+2 n+1 = n+1 n+1 Fn φ − φ¯ φ 1 = − φ¯ n+1 1 ( φ )n+1 − 1 1 − ( φ ) φ¯ φ¯ φ¯ I have rewritten it like this so that we can see what happens as n gets larger φ¯ and larger. Observe that the absolute value of φ is less than 1. So as n gets larger and larger the first term above gets closer and closer to φ. Now look φ at the second term. The absolute value of the fraction φ¯ is strictly greater than 1. Thus as n gets larger and larger the denominator of the second term gets larger and larger and so the fraction as a whole gets smaller and smaller. Thus we have proved that Fn+1 really is close to φ when n is large. Fn So far, what we have been doing is algebra. I shall now show that there is geometry here as well. Below is a picture of a regular pentagon. I have assumed that the length of the sides is 1. I claim that the length of a diagonal, such as BE, is equal to φ.
B C 1
φ A
D
E
To prove this I am going to use Ptolomy’s theorem. We shall concentrate on the cyclic quadrilateral formed by the vertices ABDE. 4.5. *CONTINUED FRACTIONS 115
B C
A
D
E
I’ll let the side of a diagonal be x. Then by Ptolomy’s theorem, we have that
x2 = 1 + x.
But this is precisely the quadratic equation we solved above. Its positive solution is φ and so the length of a diagonal of a regular pentagon with side 1 is φ. This raises the question of whether we can somehow see the Fibonacci numbers in the regular pentagon. The answer is: almost. Consider the diagram below.
B C e0 d0 a0 A c0 D b0
E
I’ve drawn in all the diagonals. The shaded triangle BCD is similar to the shaded triangle Ac0E. This means that they have exactly the same shapes just different sizes. It follows that
Ac0 BC = . AE BD 116 CHAPTER 4. NUMBER THEORY
But AE is a side of the pentagon and so has unit length, and BD is of length φ. Thus 1 AC0 = . φ Now, Dc0 has the same length as BC which is a side of the pentagon. Thus Dc0 = 1. We now have 1 φ = DA = Dc0 + c0A = 1 + . φ Thus, just from geometry, we get 1 φ = 1 + . φ This is a very odd equation because φ is mentioned on both sides. Let’s go with it and repeat: 1 φ = 1 + 1 1 + φ and 1 φ = 1 + 1 1 + 1 1 + φ and 1 φ = 1 + 1 1 + 1 1 + 1 1 + φ and so on. We therefore obtain a continued fraction. For each of these 1 fractions cover up the term φ and then calculate what you see to get
1 1 3 1 5 1, 1 + = 2, 1 + 1 = , 1 + = ,... 1 1 + 1 2 1 3 1 + 1 1 + 1 and the Fibonacci sequence reappears. Chapter 5
Complex numbers
Why be one-dimensional when you can be two-dimensional? ?
−3 −2 −1 0 1 2 3
?
We begin by returning to the familiar number line, where I have placed the question marks there appear to be no numbers. I shall rectify this by defining the complex numbers which give us a number plane rather than just a number line. Complex numbers play a fundamental rˆolein mathematics. For example, in this chapter, I shall use them to show how e and π — numbers of radically different origins — are in fact connected.
5.1 Complex number arithmetic
In the set of real numbers we can add, subtract, multiply and divide, but we cannot always extract square roots. For example, the real number 1 has the two real square roots 1 and −1, whereas the real number −1 has no real
117 118 CHAPTER 5. COMPLEX NUMBERS square roots, the reason being that the square of any real non-zero number is always positive. In this section, we shall repair this lack of square roots and, as we shall learn, we shall in fact have achieved much more than this. Com- plex numbers were first studied in the 1500’s but were only fully accepted and used in the 1800’s. √ Warning! If r is a positive real number then r is usually interpreted to mean the positive square root. If I want√ to emphasize that both square roots need to be considered I shall write ± r.
When the discriminant of a quadratic equation is strictly less than zero, we know that it has no real roots. In this section, we shall show that in this case the equation has two complex roots. This will mean that quadratic equations will always have two roots. The key step is the following We introduce a new number, denoted by i, whose defining property is that i2 = −1. We shall assume that in all other respects it satisfies the usual axioms of high-school algebra. This assumption will be justified later. We shall now explore the consequences of this definition which turns out to be a profound one for mathematics. The numbers i and −i are the two ‘missing’ square roots of 1. In all other respects, the number i will behave like a real number. Thus if b is any real number then bi is a number, and if a is any real number then a + bi is a number. We therefore formally define a complex number to be a number of the form a+bi where a, b ∈ R. We denote the set of complex numbers by C. Complex numbers are sometimes called imaginary numbers. This is not such a good term: they are not figments of our imagination like unicorns or dragons. Like all numbers they are, however, products of our imagination: no one has seen the complex number number i but, then again, no one has seen the number 2. If z = a + bi then we call a the real part of z, denoted Re(z), and b the complex or imaginary part of z, denoted Im(z). Two complex numbers a + bi and c + di are equal precisely when a = c and b = d. In other words, when their real parts are equal and when their complex parts are equal. We can think of every real number as being a special kind of complex number because if a is real then a = a + 0i. Thus R ⊆ C. Complex numbers 5.1. COMPLEX NUMBER ARITHMETIC 119 of the form bi are said to be purely imaginary. Now we show that we can add, subtract, multiply and divide complex numbers. Addition, subtraction and multiplication are all easy. Let a + bi, c + di ∈ C. To add these numbers means to calculate (a + bi) + (c + di). We assume that the order in which we add complex numbers doesn’t matter and that we may bracket sums of complex numbers how we like and still get the same answer and so we can rewrite this as a+c+bi+di. Next we assume that multiplication of complex numbers distributes over addition of complex numbers to get (a+c)+(b+d)i. Thus (a + bi) + (c + di) = (a + c) + (b + d)i. The definition of subtraction is similar and justified in the same way
(a + bi) − (c + di) = (a − c) + (b − d)i.
To multiply our numbers means to calculate (a + bi)(c + di). We first assume complex multiplication distributes over complex addition to get (a + bi)(c + di) = ac + adi + bic + bidi. Next we assume that the order in which we multiply complex numbers doesn’t matter to get ac + adi + bic + bidi = ac + adi+bci+bdi2. Now we use the fact that i2 = −1 to get ac+adi+bci+bdi2 = ac+adi+bci−bd. We now rearrange the terms to get the following definition of multiplication
(a + bi)(c + di) = (ac − bd) + (ad + bc)i.
Examples 5.1.1. Carry out the following calculations.
1. (7 − i) + (−6 + 3i). We add together the real parts to get 1; adding together −i and 3i we get 2i. Thus the solution is 1 + 2i.
2. (2 + i)(1 + 2i). First we multiply out the brackets as usual to get 2 + 4i + i + 2i2. We now use the fact that i2 = −1 to get 2 + 4i + i − 2. Finally we simplify to get 0 + 5i = 5i. 2 3. 1√−i . Multiply out and simplify to get −i. 2 The final operation is division. We have to show that when a + ib 6= 0 the reciprocal 1 a + ib 120 CHAPTER 5. COMPLEX NUMBERS is also a complex number. We use an idea that can also be applied in other situations called rationalizing the denominator. It is convenient first to define a new operation on complex numbers. Let z = a + bi ∈ C. Define z¯ = a − bi.
The numberz ¯ is called the complex conjugate of z. Why is this operation useful? Let’s calculate zz¯. We have
zz¯ = (a + bi)(a − bi) = a2 − abi + abi − b2i2 = a2 + b2.
Notice that zz¯ = 0 if and only if z = 0. Thus for non-zero complex numbers z, the number zz¯ is a positive real number. Let’s see how we can use the complex conjugate to define division of complex numbers. Our goal is to calculate 1 a + bi where a+bi 6= 0. The first step is to multiply top and bottom by the complex conjugate of a + bi. We therefore get a − bi a − bi 1 = = (a − bi) . (a + bi)(a − bi) a2 + b2 a2 + b2
Examples 5.1.2. Carry out the following calculations.
1+i 1. i . The complex conjugate of i is −i. Multiply top and bottom of the −i+1 fraction to get 1 = 1 − i. i 2. 1−i . The complex conjugate of 1 − i is 1 + i. Multiply top and bottom i(1+i) i−1 of the fraction to get 2 = 2 . 4+3i 3. 7−i . The complex conjugate of 7 − i is 7 + i. Multiply top and bottom (4+3i)(7+i) 1+i of the fraction to get 50 = 2 . We shall need the following properties of the complex conjugate later on.
Lemma 5.1.3.
1. z1 + ... + zn = z1 + ... + zn.
2. z1 . . . zn = z1 ... zn. 5.1. COMPLEX NUMBER ARITHMETIC 121
3. z is real if and only if z = z.
Proof. (1) We prove the case where n = 2. The general case can then be proved using induction. Let z1 = a + ib and z2 = c + id. Then z1 + z2 = (a + c) + i(b + d). Thus z1 + z2 = (a + c) − i(b + d). But z1 = a − ib and z2 = c − id and so z1 + z2 = (a − ib) + (c − id) = (a + c) − (b + d)i. Thus z1 + z2 = z1 + z2. (2) We prove the case where n = 2. The general case can then be proved using induction. Using the notation form part (1), we have that z1z2 = (ac − bd) + (ad + bc)i. Thus z1z2 = (ac − bd) − (ad + bc)i. On the other hand, z1z2 = (ac − bd) − i(ad + bd), as required. (3) If z is real then it is immediate that z = z. Suppose that z = z where z = a + ib. Then a + ib = a − ib. Hence b = −b and so b = 0. It follows that z is real.
We now introduce a way of thinking about complex numbers that enables us to visualize them. A complex number z = a + bi has two components: a and b. It is irresistible to plot these as a point in the plane. The plane used in this way is called the complex plane: the x-axis is the real axis and the y-axis is interpreted as the complex axis.
z = a + ib ib
a
Although a complex number can be thought of as labelling a point in the complex plane, it can also be regarded as labelling the directed line segment from the origin to the point, and this turns out to be the√ more fruitful viewpoint. By Pythagoras’ theorem, the length of this line is a2 + b2. We define √ |z| = a2 + b2 122 CHAPTER 5. COMPLEX NUMBERS where z = a + bi. This is called the modulus1 of the complex number z. Observe that √ |z| = zz.¯ We shall use the following important property of moduli. Lemma 5.1.4. |wz| = |w| |z|. Proof. Let w = a + bi and z = c + di. Then wz = (ac − bd) + (ad + bc)i. Now |wz| = p(ac − bd)2 + (ad + bc)2 whereas |w| |z| = p(a2 + b2)(c2 + d2). But (ac − bd)2 + (ad + bc)2 = (ac)2 + (bd)2 + (ad)2 + (bc)2 = (a2 + b2)(c2 + d2). Thus the result follows. The complex numbers were obtained from the reals by simply adjoining one new number, i, a square root of −1. Remarkably, every complex number has a square root — there is no need to invent any new numbers. Theorem 5.1.5. Every nonzero complex number has exactly two square roots. Proof. Let z = a + bi be a nonzero complex number. We want to find a complex number w so that w2 = z. Let w = x + yi. Then we need to find real numbers x and y such that (x+yi)2 = a+bi. Thus (x2−y2)+2xyi = a+bi, and so equating real and imaginary parts, we have to solve the following two equations x2 − y2 = a and 2xy = b. Now we actually have enough information to solve our problem, but we can make life easier for ourselves by adding one extra equation. To get it, we use 2 2 the modulus function. From (x+yi) = a√+bi we get that |x + yi| = |a + bi|. Now |x + yi|2 = x2 + y2 and |a + bi| = a2 + b2. We therefore have three equations √ x2 − y2 = a and 2xy = b and x2 + y2 = a2 + b2. If we add the first and third equation together we get √ √ a a2 + b2 a + a2 + b2 x2 = + = . 2 2 2 We can now solve for x and therefore for y.
1Plural: moduli 5.1. COMPLEX NUMBER ARITHMETIC 123
Example 5.1.6. Every negative real number has√ two square roots. We have that the square roots of −r, where r > 0 are ±i r.
Example 5.1.7. Find both square roots of 3 + 4i and check your answers. We assume that there is a complex number x + yi where both x and y are real such that (x + yi)2 = 3 + 4i. Squaring and comparing real and imaginary parts we get that the following two equations must be satisfied by x and y
x2 − y2 = 3 and 2xy = 4.
We also have a third equation by taking moduli
x2 + y2 = 5.
Adding the first and third equation together we get x = ±2. Thus y = 1 if x = 2 and y = −1 if x = −2. The roots we want are therefore 2 + i and −2 − i. Of course, one root will be minus the other. Now square either root to check your answer: (2 + i)2 = 4 + 4i − 1 = 3 + 4i, as required.
Remark Notice that the two square roots of a non-zero complex number will have the form w and −w; in other words, one root will be −1 times the other.
If we combine our method for solving quadratics with our method for determining the square roots of complex numbers, we have a method for finding the roots of quadratics with any coefficients, whether they be real or complex.
Example 5.1.8. Solve the quadratic equation
4z2 + 4iz + (−13 − 16i) = 0.
The complex numbers obey the same algebraic laws as the reals and so we can solve this equation by completing the square or we can simply plug the numbers into the formula for the roots of a quadratic. Here I shall complete the square. First, we convert the equation into a monic one
(−13 − 16i) z2 + iz + = 0. 4 124 CHAPTER 5. COMPLEX NUMBERS
Next, we observe that
i 2 1 z + = z2 + iz − . 2 4
Thus i 2 1 z2 + iz = z + + . 2 4 Our equation therefore becomes
i 2 1 13 z + + + − − 4i = 0. 2 4 4
We therefore have i 2 z + = 3 + 4i. 2 Taking square roots of both sides using a previous calculation, we have that i z + = 2 + i or − 2 − i. 2
i 3i It follows that z = 2 + 2 or − 2 − 2 . Now check that these roots really do work.
Every quadratic equation ALWAYS has exactly two roots.
Exercises 5.1
1. Solve the following problems in complex number arithmetic. In each case, the answer should be in the form a + ib where a and b are real.
(a) (2+3i) + (4 + i). (b) (2 + 3i)(4 + i). (c) (8 + 6i)2. 2+3i (d) 4+i . 1 3 (e) i + 1+i . 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 125
3+4i 3−4i (f) 3−4i − 4+4i .
2. Find the square roots of each of the following complex numbers and check your answers.
(a) −i. √ (b) −1 + i 24. (c) −13 − 84i.
3. Solve the following quadratic equations and check your answers.
(a) x2 + x + 1 = 0. (b)2 x2 − 3x + 2 = 0. (c) x2 − (2 + 3i)x − 1 + 3i = 0.
5.2 The fundamental theorem of algebra
We have proved that every quadratic equation has exactly two roots. The goal of this section is to generalize this result: I shall prove that every poly- nomial equation of degree n has exactly n roots. This result plays a key role in calculus where it is used (in its real version which I also describe) to prove that any rational function can be integrated using partial fractions. In this section, we shall work with arbitrary polynomials so I shall now recall some of the terminology needed to handle them. An expression
n n−1 anx + an−1x + ... + a1x + a0 where ai are complex numbers, called the coefficients, is called a polynomial. If all the coefficients are zero then the polynomial is identically zero and we shall call it the zero polynomial. We assume an 6= 0. The degree of this polynomial is n. We abbreviate this to deg. If an = 1 the polynomial is said to be monic. The term a0 is called the constant term and the term n anx is called the leading term. Polynomials can be added, subtracted and multiplied. Two polynomials are equal if they have the same degree and the coefficients of terms of the same degree are equal.
• Polynomials of degree 1 are said to be linear. 126 CHAPTER 5. COMPLEX NUMBERS
• those of degree 2, quadratic.
• those of degree 3, cubic.
• those of degree 4, quartic.
• those of degree 5, quintic. There are special terms for polynomials of degree higher than 5, if you want them. Why are polynomials interesting? There are two answers to this ques- tion. First, they have widespread applications such as in helping to solve linear differential equations and in studying matrices. Second, a polynomial defines a function which is calculated in a very simple way using the op- erations of addition, subtraction and multiplication. However many, more complicated, functions can be usefully approximated by polynomial ones. We denote by C[x] the set of polynomials with complex coefficients and by R[x], the set of polynomials with real coefficients. I will write F [x] to mean F = R or F = C.
5.2.1 The remainder theorem The addition, subtraction and multiplication of polynomials is easy. We shall therefore concentrate in this section on division. Let f(x), g(x) ∈ F [x]. We say that g(x) divides f(x), denoted by
g(x) | f(x), if there is a polynomial q(x) ∈ F [x] such that f(x) = g(x)q(x). We say that g(x) is a factor of f(x). There are obvious similarities here with our work in Chapter 4. Example 5.2.1. Let f(x) = x4 + 2x + 1 and g(x) = x + 1. Then
x + 1 | x4 + 2x + 1 since x4 + 2x + 1 = (x + 1)(x3 − x2 + x + 1). In multiplying and dividing polynomials the following result is key. Lemma 5.2.2. Let f(x), g(x) ∈ F [x] be non-zero polynomials. Then
deg f(x)g(x) = deg f(x) + deg g(x). 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 127
m n Proof. Let f(x) have leading term amx and let g(x) have leading term bnx . m+n Then the leading term of f(x)g(x) is ambnx . Now ambn 6= 0 and so the degree of f(x)g(x) is m + n, as required. The following result is analogous to the remainder theorem for integers Lemma 4.1.1 I shall not prove it here. Lemma 5.2.3 (Remainder theorem). Let f(x) and g(x) be polynomials in F [x] where deg f(x) ≥ deg g(x). Then either
g(x) | f(x) or f(x) = g(x)q(x) + r(x) where deg r(x) < deg g(x). Example 5.2.4. Let f(x) = x3 +x+3 and g(x) = x2 +x. Then x3 +x+3 = (x − 1)(x2 + x) + (2x + 3). Here x − 1 is the quotient and 2x + 3 is the remainder. The following example is a reminder of how to carry out long division of polynomials. Remember that answers can always be checked by multiplying out. Example 5.2.5. Divide 6x4 + 5x3 + 4x2 + 3x + 2 by 2x2 + 4x + 5 and so find the quotient and remainder. We set out the computation in the following form. 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 To get the term involving 6x4 we would have to multiply the lefthand side by 3x2. As a result we write down the following 3x2 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2
We now subtract the lower righthand side from the upper and we get 3x2 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 −7x3 − 11x2 + 3x + 2 128 CHAPTER 5. COMPLEX NUMBERS
The procedure is now repeated with the new polynomial.
2 7 3x − 2 x 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 −7x3 − 11x2 + 3x + 2 3 2 35 −7x − 14x − 2 x 2 41 3x + 2 x + 2 The procedure is repeated one more time with the new polynomial
2 7 3 3x − 2 x + 2 quotient 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 −7x3 − 11x2 + 3x + 2 3 2 35 −7x − 14x − 2 x 2 41 3x + 2 x + 2 2 12 15 3x + 2 x + 2 29 11 2 x − 2 remainder This is the end of the line because the new polynomial we obtain has degree strictly less than the polynomial we are dividing by. What we have shown is that 7 3 29 11 6x4 + 5x3 + 4x2 + 3x + 2 = 2x2 + 4x + 5 3x2 − x + + x − . 2 2 2 2
You can verify this is true by multiplying out the righthand side.
5.2.2 Roots of polynomials Let f(x) ∈ F [x]. A number r ∈ F is said to be a root or zero of f(x) if f(r) = 0. The roots of f(x) are the solutions of the equation f(x) = 0.
Example 5.2.6. The number 1 is a root of x100−2x98+1 because 1−2+1 = 0.
Checking whether a number is a root is easy, but finding a root in the first place is trickier. The next result tells us that when we find roots of poly- nomials we are in fact determining linear factors. It is crucial to eveything we shall do. 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 129
Proposition 5.2.7. Let r ∈ F . Then r is a root of f(x) ∈ F [x] if and only if (x − r) | f(x).
Proof. Suppose that (x − r) | f(x). Then by definition f(x) = (x − r)q(x) for some polynomial q(x). If we now calculate f(r) we see immediately that it must be zero. We now prove the converse. Suppose that r is a root of f(x). By the remainder theorem, either (x − r) | f(x) or f(x) = q(x)(x − r) + r(x) where deg(r(x)) < deg(x − r) = 1. If the former then we are done. If the latter then it follows that r(x) is in fact a constant (that is, just a number). Call this number a. If we calculate f(r) we get a. It follows that in fact a = 0 and so (x − r) | f(x).
Example 5.2.8. We have seen that the number 1 is a root of x100 −2x98 +1. Thus by the above result (x − 1) | x100 − 2x98 + 1.
A root r of a polynomial f(x) is said to have multiplicity m if
(x − r)m | f(x) but (x − r)m+1 does not divide f(x). A root is always counted according to its multiplicity.
Example 5.2.9. The polynomial x2 + 2x + 1 has −1 as a root and no other roots. However (x + 1)2 = x2 + 2x + 1 and so the root −1 occurs with multiplicity 2. Thus the polynomial has two roots counting multiplicities. This is the sense in which we can say that a quadratic equation always has two roots.
The following result is extremely useful. It provides an upper bound to the number of roots a polynomial may have.
Theorem 5.2.10. A non-constant polynomial of degree n has at most n roots.
Proof. Let f(x) be a non-zero polynomial of degree n > 0. Suppose that f(x) has a root a. Then f(x) = (x − a)f1(x) by Proposition 5.2.7 and the degree of f1(x) is n − 1. This argument can be repeated and we reach the desired conclusion. 130 CHAPTER 5. COMPLEX NUMBERS
5.2.3 The fundamental theorem of algebra The big question I have so far not dealt with is whether a polynomial need have a root at all. This is answered by the following theorem whose name reflects its importance when first discovered, though not its significance in modern algebra. We shall not give a proof because that would require more advanced methods than are covered in this book. It was first proved by Gauss.
Theorem 5.2.11 (Fundamental theorem of algebra (FTA)). Every non- constant polynomial of degree n with complex coefficients has a root.
The fundamental theorem of algebra has the following important conse- quence using Theorem 5.2.10.
Corollary 5.2.12. Every non-constant polynomial with complex coefficients of degree n has exactly n complex roots (counting multiplicities). Thus every such polynomial can be written as a product of linear polynomials.
Proof. Let f(x) be a non-constant polynomial of degree n. By the FTA, this polynomial has a root r1. Thus f(x) = (x−r1)f1(x) where f1(x) is a polyno- mial of degree n − 1. This argument can be repeated and we eventually end up with f(x) = a(x − r1) ... (x − rn) where a is the last quotient, necessarily a complex number.
Example 5.2.13. It can be checked that the quartic x4 − 5x2 − 10x − 6 has roots −1, 3, i − 1 and −1 − i. We can therefore write
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).
In many practical examples, our polynomials will have real coefficients and we will want any factors of the polynomial to likewise be real. The result above doesn’t do that because it could produce complex factors. However, we can rectify this situation at a very small price. We shall use the notion of the complex conjugate of a complex number that we introduced earlier. We may now prove the following key lemma.
Lemma 5.2.14. Let f(x) be a polynomial with real coefficients. If the com- plex number z is a root then so too is z. 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 131
Proof. Let n n−1 f(x) = anx + an−1x + ... + a1x + a0 where the ai are real numbers. Let z be a complex root. Then
n n−1 0 = anz + an−1z + ... + a1z + a0.
Take the complex conjugate of both side and use the properties of the complex conjugate to get
n n−1 0 = anz¯ + an−1z¯ + ... + a1z¯ + a0 and soz ¯ is also a root.
Example 5.2.15. We saw above that
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).
Observe that the complex roots −1 − i and −1 + i are complex conjugates of each other.
Lemma 5.2.16. Let z be a complex number which is not real. Then
(x − z)(x − z¯) is an irreducible quadratic with real coefficients. On the other hand, if x2 + bx + c is an irreducible quadratic with real coefficients then its roots are complex conjugates of each other.
Proof. To prove the first claim, we multiply out to get
(x − z)(x − z¯) = x2 − (z +z ¯)x + zz.¯
Observe that z +z ¯ and zz¯ are both real numbers. The discriminant of this polynomial is (z − z¯)2. You can check that if z is complex and non-real then z − z¯ is purely complex. It follows that its square is negative. We have therefore shown that our quadratic is irreducible. The proof of the second claim follows from the formula for the roots of a quadratic combined with the fact that the square root of a negative real will have the form ±αi where α is real. 132 CHAPTER 5. COMPLEX NUMBERS
Example 5.2.17. We saw above that
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).
Multiply out (x + 1 + i)(x + 1 − i) and we get x2 + 2x + 2. Thus
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x2 + 2x + 2) with all the polynomials involved being real. The following theorem is the one that we can use to help us solve problems involving real polynomials. Theorem 5.2.18 (Fundamental theorem of algebra for real polynomials). Every non-constant polynomial with real coefficients can be written as a prod- uct of polynomials with real coefficients which are either linear or irreducible quadratic. Proof. We can write the polynomial as a product of linear polynomials. Bring the real linear factors to the front. The remaining linear polynomials will have complex coefficients. They correspond to roots that come in complex conjugate pairs. Multiplying together those complex linear factors corre- sponding to complex conjugate roots we get real quadratics and the result is proved. In fact, we can write any real polynomial as a real number times a product of monic linear and quadratic factors. This result is the basis of the method of partial fractions used in integrating rational functions in calculus. This is discussed in Chapter 6. Finding the exact roots of a polynomial is difficult, in general. However, the following result tells us how to find the rational roots of polynomials with integer coefficients. It is a nice, and perhaps unexpected, application of the number theory we developed in Chapter 4. Theorem 5.2.19 (Rational root theorem). Let
n n−1 f(x) = anx + an−1x + ... + a1x + a0
r be a polynomial with integer coefficients. If s is a root with r and s coprime then r | a0 and s | an. In particular, if the polynomial is monic then any rational roots must be integers and divide the constant term. 5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 133
r Proof. Substituting s into f(x) we have, by assumption, that r r r 0 = a ( )n + a ( )n−1 + ... + a ( ) + a . n s n−1 s 1 s 0 Multiply through by sn to get
n n−1 n−1 n 0 = anr + an−1sr + ... + s r + a0s .
n n We now make two observations. First, r | a0s . I claim that r and s are coprime. We may now deduce that r | a0 from a previous exercise. It only remains to prove the claim. Let p be any prime that divides r and sn. Then by Euclid’s lemma, p divides r and s which is a contradiction since r and s n are coprime. It follows that r | a0. Second, s | anr . By a similar argument to the previous case s | an. Example 5.2.20. Find all the roots of the following polynomial
x4 − 8x3 + 23x2 − 28x + 12.
The polynomial is monic and so the only possible rational roots are integers and must divide 12. Thus the only possible rational roots are
±1, ±2, ±3, ±4, ±6, ±12.
We find immediately that 1 is a root and so (x−1) must be a factor. Dividing out by this factor we get the quotient
x3 − 7x2 + 16x − 12.
We check this polynomial for rational roots and find 2 works. Dividing out by (x − 2) we get the quotient
x2 − 5x + 6.
Once we get down to a quadratic we can solve it directly. In this case it factorizes as (x − 2)(x − 3). We therefore have that
x4 − 8x3 + 23x2 − 28x + 12 = (x − 1)(x − 2)2(x − 3).
At this point, multiply out the righthand side and check that we really do have an equality. In this case, all roots are rational and are 1,2,2,3. 134 CHAPTER 5. COMPLEX NUMBERS
Exercises 5.2
1. Find the quotient and remainder when the first polynomial is divided by the second.
(a) x3 − 7x − 1 and x − 2. (b) x4 − 2x2 − 1 and x2 + 3x − 1. (c)2 x3 − 3x2 + 1 and x.
2. Find all roots using the information given.
(a) 4 is a root of 3x3 − 20x2 + 36x − 16. (b) −1, −2 are both roots of x4 + 2x3 + x + 2.
3. Find a cubic having roots 2, −3, 4.
4. Find a quartic having roots i, −i, 1 + i and 1 − i.
5. The cubic x3 + ax2 + bx + c has roots α, β and γ. Show that a, b, c can each be written in terms of the roots. √ 6.3+ i 2 is a root of x4 + x3 − 25x2 + 41x + 66. Find the remaining roots. √ 7.1 − i 5 is a root of x4 − 2x3 + 4x2 + 4x − 12. Find the remaining roots.
8. Find all the roots of the following polynomials.
(a) x3 + x2 + x + 1. (b) x3 − x2 − 3x + 6. (c) x4 − x3 + 5x2 + x − 6.
9. Write each of the following polynomials as a product of linear or quadratic real factors.
(a) x3 − 1. (b) x4 − 1. (c) x4 + 1. 5.3. COMPLEX NUMBER GEOMETRY 135 5.3 Complex number geometry
We have proved that every non-zero complex number has two square roots and from the fundamental theorem of algebra (FTA), we know that every non-zero complex number has three cube roots, and four fourth roots, and more generally n nth roots. However, we didn’t prove the FTA. The main goal of this section is to prove that every non-zero complex number has n nth-roots. To do this, we shall think about complex numbers in a geometric, rather than an algebraic, way. Throughout this section we shall not assume FTA. We shall only need Theorem 5.2.10: every polynomial of degree n has at most n roots.
5.3.1 sin and cos We recall some well-known properties of the trigonometric functions sin and cos. First the addition formulae
sin(α + β) = sin α cos β + cos α sin β and cos(α + β) = cos α cos β − sin α sin β. These formulae were important historically because they enabled unknown values of sin’s and cos’s to be calculated from known ones, and so they were useful in constructing trig tables in the days before calculators Angles are most naturally measured in radians rather than degrees — the system of angle measurement based on degrees is an historical accident. Why 360 degrees in a circle? You would have to ask the Ancient Babylonians. Recall that positive angles are measures in an anticlockwise direction. The sin and cos functions are periodic functions with period 2π. This means that for all angles θ
sin(θ + 2πn) = sin θ and cos(θ + 2πn) = cos θ for all n ∈ Z. This fact will be crucial in what follows.
5.3.2 The complex plane In this section, we shall describe in more detail an alternative way of thinking about complex numbers which turns out to be very fruitful. Recall that a 136 CHAPTER 5. COMPLEX NUMBERS complex number z = a + bi has two components: a and b. We can plot these as a point in the plane. The plane used in this way is called the complex plane: the x-axis is the real axis and the y-axis is interpreted as the complex axis. Although a complex number can be thought of as labelling a point in the complex plane, it can more usefully be regarded as labelling the directed line segment from the origin to the point. This is how we shall regard it. Let z = a + bi be a non-zero complex number and let θ be the angle that it makes with the positive reals. The length of z as a directed line segment in the complex plane is |z|, and by basic trig a = |z| cos θ and b = |z| sin θ. It follows that z = |z| (cos θ + i sin θ) .
z i |z| sin θ
θ |z| cos θ
Observe that |z| is a non-negative real number. This way of writing complex numbers is called the polar form. At this point, I need to clarify the only feature of complex numbers that causes confusion. I have already mentioned that the functions sin and cos are periodic. For that reason, there is not just one number θ that yields the complex number z but infinitely many of them: namely, all the numbers θ + 2πk where k ∈ Z. For this reason, we define the argument of z, denoted by arg z, not merely to be the single angle θ but the set
arg z = {θ + 2πk : where k ∈ Z}. The angle θ is chosen so that 0 ≤ θ < 2π and is called, for convenience, the principal argument. But note that books vary on what they choose to call the principal argument. This feature of the argument plays a crucial role when we come to calculate nth roots. Let w = r (cos θ + i sin θ) and z = s (cos φ + i sin φ) be two non-zero complex numbers. We shall calculate wz. We have that 5.3. COMPLEX NUMBER GEOMETRY 137
wz = rs (cos θ + i sin θ) (cos φ + i sin φ) = rs[(cos θ cos φ − sin θ sin φ) + (sin θ cos φ + cos θ sin φ)i] but using the properties of the sin and cos functions this reduces to
wz = rs (cos(θ + φ) + i sin(θ + φ)) .
We thus have the following important result:
when two non-zero complex numbers are multiplied together their lengths are multiplied and their arguments are added.
This result helps us to understand the meaning of i. Multiplication by i is the same as a rotation about the origin by a right angle. Multiplication by i2 is therefore the same as a rotation about the origin by two right angles. But this is exactly the same as multiplication by −1.
i
−1 1
−i
We may apply similar reasoning to explain geometrically why −1 × −1 = 1. We of course proved this algebraically in Chapter 3. Multiplication by −1 is interpreted as rotation about the origin by 180◦. It follows that doing this twice takes us back to where we started and so is equivalent to multiplication by 1. The proof of the next theorem follows by induction from the result we proved above. But it is important to note that it is the result above that is really fundamental. 138 CHAPTER 5. COMPLEX NUMBERS
Theorem 5.3.1 (De Moivre). Let n be a positive integer. If z = r (cos θ + i sin θ) then zn = rn (cos nθ + i sin nθ) . Example 5.3.2. Observe that complex numbers of the form 1 S = {cos θ + i sin θ : θ ∈ R} can be interpreted geometrically as being the unit circle with centre the origin in the complex plane. Thus every non-zero complex number is a real number times a complex number lying on the unit circle. The set S1 has some interesting algebraic properties as well. Observe that if u, v ∈ S1 then uv ∈ S1, and that if u ∈ S1 then u−1 ∈ S1. Our results above have nice applications in painlessly obtaining trigono- metric identities. Example 5.3.3. If you remember that when multiplying complex numbers in polar form you add their arguments, then you can easily reconstitute the identities we started with since (cos α + i sin β)(cos α + i sin β) = cos(α + β) + i sin(α + β). This is helpful in getting both sines and signs right. Example 5.3.4. Express cos 3θ in terms of cos θ and sin θ using De Moivre’s Theorem. We have that (cos θ + i sin θ)3 = cos 3θ + i sin 3θ. However, we can expand the lefthand side to get cos3 θ + 3i cos2 θ sin θ + 3 sin θ(i sin θ)2 + (i sin θ)3 which simplifies to cos3 θ − 3 cos θ sin2 θ + i 3 cos2 θ sin θ − sin3 θ where we use the fact that i2 = −1 and i3 = −i and i4 = 1. Equating real and imaginary parts we get cos 3θ = cos3 θ − 3 cos θ sin2 θ. We also get the formula sin 3θ = 3 cos2 θ sin θ − sin3 θ for free. 5.3. COMPLEX NUMBER GEOMETRY 139
5.3.3 Arbitrary roots of complex numbers In this section, we shall prove that every non-zero complex number has n nth roots: thus it has three cube roots, and four fourth roots and so on. We begin with a special case that turns out to give us almost all the information we need to solve the general case. We shall also need the following important idea. The word radical simply means a square root, or a cube root, or a fourth root and so on. We regard the four basic operations of algebra — addition, subtraction, multiplication and division — together with the extraction of nth roots as purely algebraic operations. Although slightly failing as a precise definition, I shall say that a radical expression is an algebraic expression involving nth roots. For example, the formula for the roots of a quadratic describes the roots as radical expressions in terms of the coefficients of the quadratic. Thus a radical expression is supposed to be an explicit description of some real number. The following table gives some easy to find radical expressions for the sines and cosines of some well-known angles.
θ sin θ cos θ 0◦ 0 1 √ ◦ 1 3 30 2 2 45◦ √1 √1 √2 2 ◦ 3 1 60 2 2 90◦ 1 0
The nth roots of unity
We shall show that the number 1 has n nth roots — these are called the n roots of unity. We know that the equation zn − 1 = 0 has at most n roots, so all we need do is find n roots and we are home and dry. We begin with a motivating example.
Example 5.3.5. We find the three cube roots of 1. There are two ways of writing these roots: as trigonometric expressions and as radical expressions. Divide the unit circle in the complex plane into an equilateral triangle with 1 ◦ ◦ as one of its vertices. Then the other two roots are ω1 = cos 120 + i sin 120 ◦ ◦ 2π obtained by dividing 2π by 3 and ω2 = cos 240 + i sin 240 which is twice 3 . 2 If we put ω = ω1 then in fact ω2 = ω . This is the trigonometric form of the roots. 140 CHAPTER 5. COMPLEX NUMBERS
ω
1
ω2
In this case, it is easy to write down the radical expressions for the roots as well since we already have radical expressions for sin 60◦ and cos 60◦. We therefore have that 1 √ 1 √ ω = −1 + i 3 and ω2 = − 1 + i 3 . 2 2 The general case is solved in a similar way to our example above using regular n-gons in the complex plane where one of the vertices is 1. Theorem 5.3.6 (Roots of unity). The n roots of unity are given by the following formula 2kπ 2kπ cos + i sin n n for k = 1, 2, . . . , n. These complex numbers are arranged uniformly on the unit circle and form a regular polygon with n sides: the cube roots of unity form an equilateral triangle, the fourth roots form a square, the fifth roots form a pentagon, and so on. There is one point here that is potentially confusing. It is always possible, and easy, to write down trigonometric expressions for the nth roots of unity. Using such an expression, we can then write down numerical values of the nth roots to any desired degree of accuracy. Thus, from a purely practical point of view, we can find the nth roots of unity. It is also always possible to write down the radical expressions of the nth roots of unity but this is far from easy in general. In fact, it forms part of the advanced subject known as Galois theory. Example 5.3.7. Gauss proved the following result which is highly non- trivial. You can verify that it is true by using a calculator — at least up to the limits of your calculator. It is a good example of a radical expression 5.3. COMPLEX NUMBER GEOMETRY 141 where, on this occasion, the only radicals that occur are square roots; the theory Gauss developed showed that this implied that the 17-gon could be constructed using only a ruler and compass. 2π √ q √ 16 cos = −1 + 17 + 34 − 2 17 17 s √ q √ √ q √ + 68 + 12 17 − 16 34 + 2 17 − 2(1 − 17) 34 − 2 17
Arbitrary nth roots The nth roots of unity play an important role in finding arbitrary nth roots. We begin with an example to illustrate the idea.
Example 5.3.8. We√ find the three cube roots of 2. If you use your calculator you will simply find 3 2, a real number. There should be two others: where are they? The explanation is that the other two cube roots are complex. Let ω be the complex cube root of 1 that we described above. Then the three cube roots of 2 are the following √ √ √ 3 2, ω 3 2, ω2 3 2. The above example generalizes. Theorem 5.3.9 (nth roots). Let z = r (cos θ + i sin θ) be a non-zero complex number. Put √ θ θ u = n r cos + i sin , n n the obvious nth root, and put 2π 2π ω = cos + i sin , n n the first interesting nth root of unity. Then the nth roots of z are as follows u, uω, . . . , uωn−1. It follows that the nth roots of z = r (cos θ + i sin θ) can be written in the form √ θ 2kπ θ 2kπ n r cos + + i sin + n n n n for k = 0, 1, 2, . . . , n − 1. This is the reason why every non-zero number has two square roots that differ by a multiple of −1: the two square roots of 1 are 1 and -1. 142 CHAPTER 5. COMPLEX NUMBERS
5.3.4 Euler’s formula We have seen that every real number can be written as a whole number plus a possibly infinite decimal part. It turns out that many functions can also be written as a sort of decimal. I shall illustrate this by means of an example. Consider the function ex. All you need to know about this function is that it is equal to its derivative and e0 = 1. We would like to write
x 2 3 e = a0 + a1x + a2x + a3x + ... where the ai are real numbers that we have yet to determine. We can work out the value of a0 easily by putting x = 0. This tells us that a0 = 1. To get the value of a1 we first differentiate our expression to get
x 2 e = a1 + 2a2x + 3a3x + ...
Now put x = 0 again and this time we get that a1 = 1. To get the value of a2 we differentiate our expression again to get
x e = 2a2 + 3 · 2 · a3x + ...
1 Now put x = 0 and we get that a2 = 2 . Continuing in this way we quickly 1 spot the pattern for the values of the coefficient an. We find that an = n! where n! = n(n − 1)(n − 2) ... 2 · 1. What we have done for ex we can also do for sin x and cos x and we obtain the following series expansions of each of these functions.
x x2 x3 x4 • e = 1 + x + 2! + 3! + 4! + ....
x3 x5 x7 • sin x = x − 3! + 5! − 7! + ....
x2 x4 x6 • cos x = 1 − 2! + 4! − 6! + ....
There are interesting connections between these three series. We shall now show that complex numbers help to explain them. Without worrying about the validity of doing so, we calculate the infinite series expansion of eiθ. We have that 1 1 eiθ = 1 + (iθ) + (iθ)2 + (iθ)3 + ... 2! 3! 5.3. COMPLEX NUMBER GEOMETRY 143 that is 1 1 1 eiθ = 1 + iθ − θ2 − θ3i + θ4 + ... 2! 3! 4! By separating out real and complex parts, and using the infinite series we obtained above, we get Euler’s remarkable formula
eiθ = cos θ + i sin θ.
Thus the complex numbers enable us to find the hidden connections between the three most important functions of calculus: the exponential function and the sine and cosine functions. It follows that every non-zero complex number can be written in the form reiθ. If we put θ = π in Euler’s formula, we get the following result, which is widely regarded as one of the most amazing in mathematics.
Theorem 5.3.10 (Euler’s identity).
eπi = −1.
This result shows us that the real numbers π, e and −1 are connected, but that to establish that connection we have to use the complex number i. This is one of the important roles of the complex numbers in mathematics in that they enable us to make connections between topics that look different: they form a mathematical hyperspace.
Exercises 5.3
1. Express cos 5x and sin 5x in terms of cos x and sin x.
2. Prove the following where x is real.2
1 ix −ix (a) sin x = 2i (e − e ). 1 ix −ix (b) cos x = 2 (e + e ). 4 1 Hence show that cos x = 8 [cos 4x + 4 cos 2x + 3]. 3. Find the 4th roots of unity as radical expressions.
2 1 x −x 1 x −x Compare (a) and (b) below with sinh x = 2 (e − e ) and cosh x = 2 (e + e ). 144 CHAPTER 5. COMPLEX NUMBERS
4. Find the 6th roots of unity as radical expressions.
5. Find the 8th roots of unity as radical expressions.
6. Solve x3 = −8i.
7. *Find radical expresssions for the roots of x5 − 1, and so show that √ p √ 5 − 1 10 + 2 5 cos 72◦ = and sin 72◦ = . 4 4 To do this, consider the equation
x4 + x3 + x2 + x + 1 = 0.
Divide through by x2 to get 1 1 x2 + + x + + 1 = 0. x2 x
1 Put y = x + x . Show that y satisfies the quadratic y2 + y − 1 = 0.
You can now find all four values of x.
8. *Determine all the values of ii. What do you notice?
5.4 *Making sense of complex numbers
In this chapter, I have assumed that complex numbers exist and that they obey the usual high-school rules of algebra. In this section, I shall sketch out a proof of this. We start with the set R×R whose elements are ordered pairs (a, b) where a and b are real numbers. It will be helpful to denote these ordered pairs by bold letters so a = (a1, a2). We define 0 = (0, 0), 1 = (1, 0) and i = (0, 1). We now define operations as follows
• If a = (a1, a2) and b = (b1, b2), define a + b = (a1 + b1, a2 + b2).
• If a = (a1, a2) define −a = (−a1, −a2). 5.5. *RADICAL SOLUTIONS 145
• If a = (a1, a2) and b = (b1, b2), define
ab = (a1b1 − a2b2, a1b2 + a2b1).
• If a = (a1, a2) 6= 0 define a −a a−1 = ( 1 , 2 ). p 2 2 p 2 2 a1 + a2 a1 + a2
It is now a long exercise to check that all the usual axioms of high-school algebra hold. Observe now that the element (a1, a2) can be written
(a1, 0)1 + (a2, 0)i and that i2 = (0, 1)(0, 1) = (−1, 0) = −1. The elements of the form (a, 0) can be identified with the real numbers. This proves that the complex numbers as I described them earlier in this chapter really do exist.
5.5 *Radical solutions
There are two great historical revolutions in the history of algebra. The first is the discovery that there are irrational numbers — this means that we have to learn√ to work with real numbers that are described by radical expressions such as 2. The second is Galois’s discovery that the roots of a polynomial need not be radical expressions of the coefficients of the polynomial — put simply, that there is not always a formula for the roots of a polynomial equation. We begin by describing the way in which cubics and quartics can be solved purely algebraically.
5.5.1 Cubic equations Let 3 2 f(x) = a3x + a2x + a1x + a0 where a3 6= 0. I shall assume all coefficients are real though the theory works in general. We shall find all the roots of f(x). This problem can be 146 CHAPTER 5. COMPLEX NUMBERS
simplified in two ways. First, we may divide through by a3 and so, without loss of generality, we may assume that f(x) is monic. That is a3 = 1. Second, by means of a substitution we may obtain a cubic in which the coefficient of 2 a3 the term in x is zero. Put x = y − 3 . You should do this and check that you get a polynomial of the form
g(y) = y3 + py + q.
We say that such a cubic is reduced. It follows that without loss of generality, we need only solve the cubic
g(x) = x3 + px + q.
To do this needs what looks like a minor miracle. Let u and v be two complex 2π 2π variables. Let ω = cos 3 + i sin 3 , one of the complex cube roots of unity. You should now check that the following cubic
t(x) = x3 − 3uv − (u3 + v3) has the roots u + v, uω + vω2, uω2 + vω. Now we can solve x3 + px + q = 0 if we can find u and v such that
p = −3uv, q = −u3 − v3.
Now if we cube the first equation, we get the following two equations −p = u3v3, −q = u3 + v3. 27 If we regard u3 and v3 as the unknowns we know their sum and we know their product. This means that u3 and v3 are the roots of the quadratic equation p3 x2 + qx − = 0. 27 We therefore have that r ! 1 27q2 + 4p3 u3 = −q + 2 27 5.5. *RADICAL SOLUTIONS 147 and r ! 1 27q2 + 4p3 v3 = −q − . 2 27 To find u we have to take a cube root of the number u3 and there are three possible such roots. Choose one such value for u. We then choose the value of v so that p = −3uv.
Example 5.5.1. Find the roots of x3 − 9x − 2 = 0. Here p = 9 and q = −2. The quadratic equation we have to solve is therefore
x2 − 2x − 27 = 0. √ √ This has roots 1 ± 2 7. Put u3 = +2 7. We may choose a real cube root in this case to get q3 √ u = 1 + 28. We must then choose v to be
q3 √ u = 1 − 28.
We may now write down the three roots of our original cubic.
The following cubic equation was studied by Bombelli in 1572 and had an important influence on the development of complex numbers.
Example 5.5.2. Consider the cubic
x3 − 15x − 4 = 0.
The associated quadratic in this case is
x2 + 4x + 125 = 0.
This gives the two solutions that Bombelli would have written in a way equivalent to the following √ x = 2 ± −121.
We would write this as x = 2 ± 11i. 148 CHAPTER 5. COMPLEX NUMBERS
Thus u3 = 2 + 11i and v3 = 2 − 11i.
There are√ three cube roots of 2 + 11i all complex. Let’s press on√ regardless. Write 3 2 + 11i to represent one of those cube roots. Write 3 2 − 11i to be the corresponding cube root such that their product is 5. Thus at least symbolically we may write √ √ u + v = 3 2 + 11i + 3 2 − 11i.
What is surprising is that for some choice of these cube roots this value must be real. The reason is that the graph of our cubic has one real root which can easily be checked to be 4. To see why, observe that
(2 + i)3 = 2 + 11i and (2 − i)3 = 2 − 11i.
If we choose 2 + i as one of the cube roots of 2 + 11i then we have to choose 2 − i as the corresponding cube root of 2 − 11i. In this way, we get
4 = (2 + 11i) + (2 − 11i) as a root. It was the fact that real roots arose in this way that provided the first inkling that there was a number system, the complex numbers, that extended the so-called real numbers, but had every much as tangible existence.
5.5.2 Quartic equations Let 4 3 2 f(x) = a4x + a3x + a2x + a1x + a0.
As usual, we may assume that a4 = 1. By means of a suitable substitution, which is left as an exercise, we may eliminate the cubed term. We therefore end up with a reduced quartic which it is convenient to write in the following way x4 = ax2 + bx + c. Suppose that we could write the righthand side as a perfect square (dx + e)2. Then our quartic could be written as the product of two quadratics