Hello Prospective Students of aBa’s Math 395 course on the Beauty of Math,

This 66 page PDF contains the 1pg course outline and an excerpt from the text Proofs from THE BOOK. It is chapter one on the topic of . This is one of 5 areas of math which we will explore in Part 1 of the course dealing with BEAUTIFUL PROOFS. You ABSOLUTELY need to have taken (even concurrently this Fall 2014) at least ONE of the following proof-writing courses: 314, 324, or 425. By the way, there have been selective blank pages placed into the PDF to preserve the front and back pages as they appear in the book. So print this pdf DOUBLE- SIDED starting with pg.1 or pg.3 (if you want to start with the “cover” of the book – well, I made this cover, it’s not the official cover of the book). Part 2 of the course will be on combinatorics of permutations. There is a lot of beauty here too and this is part of my research area, so this part of the course will give you a flavor of what doing research with me might entail. Specifically in my research with a future student, I would like to extend the topics in part 2 to the world of imprimitive complex reflection groups (think of permutation matrices where the nonzero entries are rth roots of unity). This group is a generalization of the symmetric group which you have seen in Math 312 and definitely in Math 425 (and in its permutation matrix form in Math 324). I will brush you up on this if it is foggy or missing completely from your memory.

Is Prof aBa “Power of Shalom, brANDing” this course aBa with me, the cat? Prof aBa’s Math 395 Directed Studies Course – Fall 2014 (1 credit course meeting 1 hour per week – time & place TBD)

“Beautiful Proofs in and an Exploration of the Combinatorics of Permutations”

GOAL OF COURSE: There are two. First, we explore the most BEAUTIFUL proofs in mathematics. And I mean that! Second, we focus on combinatorial objects called permutations and study various combinatorial properties of these objects.

TEXTBOOKS: (I will provide excerpts of these for you as needed) Proofs from THE BOOK (4th Edition) by Martin Aigner and Günther M. Ziegler (2010) Combinatorics of Permutations by Miklos Bóna (2004) The Symmetric Group (2nd Edition) by Bruce Sagan (2001)

FORMAT OF PART 1 OF THE COURSE: Since this is a directed studies course, we will all take part in teaching it. The plan for the “Beauty of Mathematics” part of the course is for each of us to take turns presenting proofs from the book called Proofs from THE BOOK. This part will be REALLY fun because there are quite a variety of topics presented and EVERYTHING is beautiful. We will be learning some of the most fundamental results in areas of math from number theory, geometry, analysis, combinatorics, and graph theory.

FORMAT OF PART 2 OF THE COURSE: This will start out with me lecturing on permutations like in a standard lecture class. Then students present material. Time-permitting, we will explore possible directions of research that we can do next semester, next summer, next year, or whenever!

CAVEAT!!!: There being two parts to this course does NOT mean that Part 1 takes half the semester and Part 2 is the other half. If we are having LOADS of fun with Part 1, then we can just continue presenting the most beautiful proofs to one another. I can always teach Part 2 next semester, if that is the case! And if after the 1st two weeks or so, everyone is bored with beautiful proofs, then we can move on to Part 2.

Paul Erdős liked to talk about THE BOOK, in which God maintains the perfect proofs for mathematical theorems, following the dictum of G.H. Hardy that there

is no permanent place for ugly mathematics.

“You do not have to believe in God but, as a , you should believe in the book.” -Paul Erdős

Martin Aigner Günter M. Ziegler Proofs from THE BOOK

Fourth Edition

Including Illustrations by Karl H. Hofmann

123 Prof. Dr. Martin Aigner Prof.GünterM.Ziegler FB Mathematik und Informatik Institut für Mathematik, MA 6-2 Freie Universität Berlin Technische Universität Berlin Arnimallee 3 Straße des 17. Juni 136 14195 Berlin 10623 Berlin Deutschland Deutschland [email protected] [email protected]

ISBN 978-3-642-00855-9 e-ISBN 978-3-642-00856-6 DOI 10.1007/978-3-642-00856-6 Springer Dordrecht London New York

c Springer-Verlag Berlin Heidelberg 2010 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

Cover design: deblik, Berlin

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com) Preface

Paul Erdosliked˝ to talk aboutThe Book, in which God maintainsthe perfect proofsfor mathematical theorems, following the dictum of G. H. Hardy that there is no permanent place for ugly mathematics. Erdos˝ also said that you need not believe in God but, as a mathematician, you should believe in The Book. A few years ago, we suggested to him to write up a first (and very modest) approximation to The Book. He was enthusiastic about the idea and, characteristically, went to work immediately, filling page after page with his suggestions. Our book was supposed to appear in March 1998 as a present to Erdos’˝ 85th birthday. With Paul’s unfortunate death in the summer of 1996, he is not listed as a co-author. Instead this book is dedicated to his memory. Paul Erdos˝ We have no definition or characterization of what constitutes a proof from The Book: all we offer here is the examples that we have selected, hop- ing that our readers will share our enthusiasm about brilliant ideas, clever insights and wonderful observations. We also hope that our readers will enjoy this despite the imperfections of our exposition. The selection is to a great extent influenced by Paul Erdos˝ himself. A large number of the topics were suggested by him, and many of the proofs trace directly back to him, or were initiated by his supreme insight in asking the right question or in making the right conjecture. So to a large extent this book reflects the views of Paul Erdos˝ as to what should be considered a proof from The Book. “The Book” A limiting factor for our selection of topics was that everything in this book is supposed to be accessible to readers whose backgrounds include only a modest amount of technique from undergraduate mathematics. A little linear algebra, some basic analysis and number theory, and a healthy dollop of elementary concepts and reasonings from discrete mathematics should be sufficient to understand and enjoy everything in this book. We are extremely grateful to the many people who helped and supported us with this project — among them the students of a seminar where we discussed a preliminary version, to Benno Artmann, Stephan Brandt, Stefan Felsner, Eli Goodman, Torsten Heldmann, and Hans Mielke. We thank Margrit Barrett, Christian Bressler, Ewgenij Gawrilow, Michael Joswig, Elke Pose, and Jörg Rambau for their technical help in composing this book. We are in great debt to Tom Trotter who read the manuscript from first to last page, to Karl H. Hofmann for his wonderful drawings, and most of all to the late great Paul Erdos˝ himself.

Berlin, March 1998 Martin Aigner Günter M. Ziegler · VI

Preface to the Fourth Edition

When we started this project almost fifteen years ago we could not possibly imagine what a wonderful and lasting response our book about The Book would have, with all the warm letters, interesting comments, new editions, and as of now thirteen translations. It is no exaggeration to say that it has become a part of our lives. In addition to numerous improvements, partly suggested by our readers, the present fourth edition contains five new chapters: Two classics, the law of quadratic reciprocity and the fundamental theorem of algebra, two chapters on tiling problems and their intriguing solutions, and a highlight in graph theory, the chromatic number of Kneser graphs. We thank everyone who helped and encouraged us over all these years: For the second edition this included Stephan Brandt, Christian Elsholtz, Jürgen Elstrodt, Daniel Grieser, Roger Heath-Brown, Lee L. Keener, Christian Lebœuf, Hanfried Lenz, Nicolas Puech, John Scholes, Bernulf Weißbach, and many others. The third edition benefitted especially from input by David Bevan, Anders Björner, Dietrich Braess, John Cosgrave, Hubert Kalf, Günter Pickert, Alistair Sinclair, and Herb Wilf. For the present edi- tion, we are particularly grateful to contributions by Dacar, Oliver Deiser, Anton Dochtermann, Michael Harbeck, Stefan Hougardy, Hendrik W. Lenstra, Günter Rote, Moritz Schmitt, and Carsten Schultz. Moreover, we thank Ruth Allewelt at Springer in Heidelberg as well as Christoph Eyrich, Torsten Heldmann, and Elke Pose in Berlin for their help and sup- port throughout these years. And finally, this book would certainly not look the same without the original design suggested by Karl-Friedrich Koch, and the superb new drawings provided for each edition by Karl H. Hofmann.

Berlin, July 2009 Martin Aigner Günter M. Ziegler · Table of Contents

Number Theory 1

1. Sixproofsoftheinfinityofprimes ...... 3 2. Bertrand’spostulate ...... 7 3. Binomial coefficients are (almost) never powers ...... 13 4. Representingnumbersas sumsoftwo squares ...... 17 5. Thelawofquadraticreciprocity ...... 23 6. Everyfinitedivisionringisafield ...... 31 7. Someirrationalnumbers ...... 35 8. Three times π2/6 ...... 43

Geometry 51

9. Hilbert’s third problem: decomposingpolyhedra ...... 53 10. Linesin the planeand decompositionsof graphs ...... 63 11. Theslopeproblem ...... 69 12. ThreeapplicationsofEuler’sformula ...... 75 13. Cauchy’srigiditytheorem ...... 81 14. Touchingsimplices ...... 85 15. Everylargepointsethasanobtuseangle ...... 89 16. Borsuk’sconjecture ...... 95

Analysis 101

17. Sets, functions,and the continuumhypothesis ...... 103 18. Inpraiseofinequalities ...... 119 19. Thefundamentaltheoremofalgebra ...... 127 20. Onesquareandanoddnumberoftriangles ...... 131 VIII Table of Contents

21. AtheoremofPólyaonpolynomials ...... 139 22. OnalemmaofLittlewoodandOfford ...... 145 23. CotangentandtheHerglotztrick ...... 149 24. Buffon’sneedleproblem ...... 155

Combinatorics 159

25. Pigeon-holeanddoublecounting ...... 161 26. Tilingrectangles ...... 173 27. Threefamoustheoremsonfinitesets ...... 179 28. Shufflingcards ...... 185

29. Latticepathsanddeterminants ...... 195 30. Cayley’sformulaforthenumberoftrees ...... 201 31. Identitiesversusbijections ...... 207 32. CompletingLatinsquares ...... 213

Graph Theory 219

33. TheDinitzproblem ...... 221 34. Five-coloringplanegraphs ...... 227 35. Howtoguardamuseum ...... 231 36. Turán’sgraphtheorem ...... 235 37. Communicatingwithouterrors ...... 241 38. ThechromaticnumberofKnesergraphs ...... 251 39. Offriendsandpoliticians ...... 257 40. Probability makes counting(sometimes) easy ...... 261

About the Illustrations 270 Index 271 Number Theory

1 Six proofs of the infinity of primes 3 2 Bertrand’s postulate 7 3 Binomial coefficients are (almost) never powers 13 4 Representing numbers as sums of two squares 17 5 The law of quadratic reciprocity 23 6 Every finite division ring is a field 31 7 Some irrational numbers 35 8 Three times π2/6 43

“Irrationality and π”

Six proofs Chapter 1 of the infinity of primes

It is only natural that we start these notes with probably the oldest Book Proof, usually attributed to Euclid (Elements IX, 20). It shows that the sequence of primes does not end.

 Euclid’s Proof. For any finite set p1, . . . , pr of primes, consider the number n = p p p + 1. This n{ has a prime} divisor p. But p is 1 2 ··· r not one of the pi: otherwise p would be a divisor of n and of the product p p p , and thus also of the difference n p p p = 1, which is 1 2 ··· r − 1 2 ··· r impossible. So a finite set p1, . . . , pr cannot be the collection of all prime numbers. { }  Before we continue let us fix some notation. N = 1, 2, 3,... is the set { } of natural numbers, Z = ..., 2, 1, 0, 1, 2,... the set of integers, and { − − } P = 2, 3, 5, 7,... the set of primes. { } In the following, we will exhibit various other proofs (out of a much longer list) which we hope the reader will like as much as we do. Although they use differentview-points, the following basic idea is commonto all of them: The natural numbers grow beyond all bounds, and every natural number n 2 has a prime divisor. These two facts taken together force P to be infinite.≥ The next proof is due to Christian Goldbach (from a letter to Leon- hard Euler 1730), the third proof is apparently folklore, the fourth one is by Euler himself, the fifth proof was proposed by Harry Fürstenberg, while the last proof is due to Paul Erdos.˝

2n F0 = 3  Second Proof. Let usfirst lookat the Fermat numbers Fn = 2 +1 for F1 = 5 n = 0, 1, 2,.... We will show that any two Fermat numbers are relatively F2 = 17 prime; hence there must be infinitely many primes. To this end, we verify F3 = 257 the recursion n 1 − F4 = 65537 F = F 2 (n 1), F5 = 641 6700417 k n − ≥ · kY=0 The first few Fermat numbers from which our assertion follows immediately. Indeed, if m is a divisor of, say, Fk and Fn (k < n), then m divides 2, and hence m = 1 or 2. But m = 2 is impossible since all Fermat numbers are odd.

To prove the recursion we use induction on n. For n = 1 we have F0 = 3 and F 2 = 3. With induction we now conclude 1 − n n 1 − F = F F = (F 2)F = k k n n − n k=0  k=0  Y Yn n n+1 = (22 1)(22 + 1) = 22 1 = F 2.  − − n+1 − 4 Six proofs of the infinity of primes

 Third Proof. Suppose P is finite and p is the largest prime. We consider Lagrange’s Theorem the so-called Mersenne number 2p 1 and show that any prime factor q p − If G is a finite (multiplicative) group of 2 1 is bigger than p, which will yield the desired conclusion. Let q be − and U is a subgroup, then U a prime dividing 2p 1, so we have 2p 1 (mod q). Since p is prime, this | | − ≡ divides G . means that the element 2 has order p in the multiplicative group Zq 0 of | | \{ } the field Zq. This group has q 1 elements. By Lagrange’s theorem (see  Proof. Consider the binary rela- − tion the box) we know that the order of every element divides the size of the −1 group, that is, we have p q 1, and hence p < q.  a b : ba U. ∼ ⇐⇒ ∈ | − It follows from the group axioms Now let us look at a proof that uses elementary calculus. that is an equivalence relation.  Fourth Proof. Let π(x) := # p x : p P be the numberof primes The equivalence∼ class containing an that are less than or equal to the{ real≤ number∈ x}. We number the primes element a is precisely the coset P = p1, p2, p3,... in increasing order. Consider the natural logarithm { } x Ua = xa : x U . log x, defined as log x = 1 dt. { ∈ } 1 t Now we compare the area below the graph of f(t) = 1 with an upper step Since clearly Ua = U , we find R t | | | | function. (See also the appendix on page 10 for this method.) Thus for that G decomposes into equivalence n x < n + 1 we have classes, all of size U , and hence ≤ | | that U divides G .  1 1 1 1 | | | | log x 1 + + + ... + + ≤ 2 3 n 1 n In the special case when U is a cyclic − 2 m 1 subgroup a, a , . . . , a we find , where the sum extends over all m N which have { } m that m (the smallest positive inte- ≤ only prime divisors p x. ∈ ger such that am = 1, called the X ≤ order of a) divides the size G of Since every such m can be written in a unique way as a productof the form | | kp the group. p , we see that the last sum is equal to p x Q≤ 1 . pk p P k 0 pY∈x  X≥  ≤ 1 1 The inner sum is a geometric series with ratio p , hence

π(x) 1 p pk log x 1 = = . ≤ 1 p 1 pk 1 p P − p p P − k=1 − pY∈x pY∈x Y 21 n n+1 ≤ ≤ Steps above the function f(t) = 1 Now clearly pk k + 1, and thus t ≥ p 1 1 k + 1 k = 1 + 1 + = , p 1 p 1 ≤ k k k − k − and therefore π(x) k + 1 log x = π(x) + 1. ≤ k kY=1 Everybody knows that log x is not bounded, so we conclude that π(x) is unbounded as well, and so there are infinitely many primes.  Six proofs of the infinity of primes 5

 Fifth Proof. After analysis it’s topology now! Consider the following curious topology on the set Z of integers. For a, b Z, b > 0, we set ∈

Na,b = a + nb : n Z . { ∈ }

Each set Na,b is a two-way infinite arithmetic progression. Now call a set O Z open if either O is empty, or if to every a O there exists some ⊆ ∈ b > 0 with Na,b O. Clearly, the union of open sets is open again. If O ,O are open,⊆ and a O O with N O and N O , 1 2 ∈ 1 ∩ 2 a,b1 ⊆ 1 a,b2 ⊆ 2 then a Na,b1b2 O1 O2. So we conclude that any finite intersection of open∈ sets is again⊆ open.∩ So, this family of open sets induces a bona fide topology on Z. Let us note two facts:

(A) Any nonempty open set is infinite.

(B) Any set Na,b is closed as well.

Indeed, the first fact follows from the definition. For the second we observe

b 1 − Na,b = Z Na+i,b, \ i=1 [ which proves that Na,b is the complement of an open set and hence closed.

So far the primes have not yet entered the picture — but here they come. Since any number n = 1, 1 has a prime divisor p, and hence is contained 6 − in N0,p, we conclude

“Pitching flat rocks, infinitely” Z 1, 1 = N0,p. \{ − } p P [∈

Now if P were finite, then p P N0,p would be a finite union of closed sets (by (B)), and hence closed. Consequently,∈ 1, 1 would be an open set, in violation of (A). S { − } 

 Sixth Proof. Our final proof goes a considerable step further and demonstrates not only that there are infinitely many primes, but also that 1 the series p P p diverges. The first proof of this important result was given by Euler∈ (and is interesting in its own right), but our proof, devised by Erdos,˝ isP of compelling beauty.

Let p1, p2, p3,... be the sequence of primes in increasing order, and 1 assume that p P p converges. Then there must be a natural number k ∈ 1 1 such that i k+1 p < 2 . Let us call p1, . . . , pk the small primes, and P≥ i pk+1, pk+2,... the big primes. For an arbitrary natural number N we there- fore find P N N < . (1) pi 2 i k+1 ≥X 6 Six proofs of the infinity of primes

Let N be the number of positive integers n N which are divisible by at b ≤ least one big prime, and Ns the number of positive integers n N which have only small prime divisors. We are going to show that for a≤ suitable N

Nb + Ns < N,

which will be our desired contradiction, since by definition Nb + Ns would have to be equal to N. N To estimate Nb note that counts the positive integers n N which ⌊ pi ⌋ ≤ are multiples of pi. Hence by (1) we obtain N N Nb < . (2) ≤ pi 2 i k+1 ≥X j k

Let us now look at Ns. We write every n N which has only small prime 2 ≤ divisors in the form n = anbn, where an is the square-free part. Every an is thus a product of different small primes, and we conclude that there are k precisely 2 different square-free parts. Furthermore, as bn √n √N, we find that there are at most √N different square parts, and≤ so ≤

N 2k√N. s ≤ Since (2) holds for any N, it remains to find a number N with 2k√N N ≤ 2 or 2k+1 √N, and for this N = 22k+2 will do.  ≤ References

[1] B. ARTMANN: Euclid — The Creation of Mathematics, Springer-Verlag, New York 1999.

RDOS˝ 1 [2] P. E : Über die Reihe p , Mathematica, Zutphen B 7 (1938), 1-2. [3] L. EULER: Introductio inP Analysin Infinitorum, Tomus Primus, Lausanne 1748; Opera Omnia, Ser. 1, Vol. 8.

[4] H. FÜRSTENBERG: On the infinitude of primes, Amer. Math. Monthly 62 (1955), 353. Bertrand’s postulate Chapter 2

We have seen that the sequence of prime numbers 2, 3, 5, 7,... is infinite. To see that the size of its gaps is not bounded,let N := 2 3 5 p denote the product of all prime numbers that are smaller than k +· 2·, and··· note that none of the k numbers

N + 2,N + 3,N + 4,...,N + k, N + (k + 1) is prime, since for 2 i k + 1 we know that i has a prime factor that is smaller than k + 2, and≤ this≤ factor also divides N, and hence also N + i. With this recipe, we find, for example, for k = 10 that none of the ten numbers 2312, 2313, 2314,..., 2321 is prime. But there are also upper bounds for the gaps in the sequence of prime num- bers. A famous bound states that “the gap to the next prime cannotbe larger than the number we start our search at.” This is known as Bertrand’s pos- tulate, since it was conjectured and verified empirically for n < 3 000 000 by Joseph Bertrand. It was first proved for all n by Pafnuty Chebyshev in Joseph Bertrand 1850. A much simpler proof was given by the Indian genius Ramanujan. Our Book Proof is by Paul Erdos:˝ it is taken from Erdos’˝ first published paper, which appeared in 1932, when Erdos˝ was 19.

Bertrand’s postulate. For every n 1, there is some p with n < p 2n. ≥ ≤

 2n Proof. We will estimate the size of the binomial coefficient n care- fully enough to see that if it didn’t have any prime factors in the range  n < p 2n, then it would be “too small.” Our argument is in five steps. ≤ (1) We first prove Bertrand’s postulate for n < 4000. For this one does not need to check 4000 cases: it suffices (this is “Landau’s trick”) to check that

2, 3, 5, 7, 13, 23, 43, 83, 163, 317, 631, 1259, 2503, 4001 is a sequence of prime numbers, where each is smaller than twice the previ- ous one. Hence every interval y : n < y 2n , with n 4000, contains one of these 14 primes. { ≤ } ≤ 8 Bertrand’s postulate

(2) Next we prove that

x 1 p 4 − for all real x 2, (1) ≤ ≥ p x Y≤ where our notation — here and in the following — is meant to imply that the product is taken over all prime numbers p x. The proof that we present for this fact uses induction on the number≤ of these primes. It is not from Erdos’˝ original paper, but it is also due to Erdos˝ (see the margin), and it is a true Book Proof. First we note that if q is the largest prime with q x, then ≤ q 1 x 1 p = p and 4 − 4 − . ≤ p x p q Y≤ Y≤ Thusit suffices to check(1) for the case where x = q is aprimenumber. For q = 2 we get “2 4,” so we proceed to consider odd primes q = 2m + 1. (Here we may assume,≤ by induction, that (1) is valid for all integers x in the set 2, 3,..., 2m .) For q = 2m + 1 we split the product and compute { } 2m + 1 p = p p 4m 4m22m = 42m. · ≤ m ≤ p 2m+1 p m+1 m+1

p 4m ≤ p m+1 ≤Y holds by induction. The inequality 2m + 1 p ≤ m m+1

times. 2m+1 2m + 1 2m+1  Proof. Exactly n of the factors = 2 . p k of n! = 1 2 3 n are divisible by kX=0   · · ···¨ ˝ n p, which accounts for p p-factors. 2n (2n)! n (3) Next, of the factors of n! are From Legendre’s theorem (see the box) we get that n = n!n! con- p2 ¨ ˝ 2 tains the prime factor p exactly even divisible¨ ˝ by p , which accounts  for the next n prime factors p 2n n p2 2 of n!, etc.  pk − pk ¨ ˝ k 1     X≥ Bertrand’s postulate 9 times. Here each summand is at most 1, since it satisfies 2n n 2n n 2 < 2 1 = 2, pk − pk pk − pk −       and it is an integer. Furthermore the summands vanish whenever pk > 2n. 2n Thus n contains p exactly  2n n 2 max r : pr 2n pk − pk ≤ { ≤ } k 1     X≥ 2n times. Hence the largest power of p that divides n is not larger than 2n. In particular, primes p > √2n appear at most once in 2n .  n Furthermore — and this, according to Erdos,˝ is the key fact for his proof Examples such as  — primes p that satisfy 2 n < p n do not divide 2n at all! Indeed, 26 = 23 52 7 17 19 23 3 ≤ n 13 · · · · · 3p > 2n implies (for n 3, and hence p 3) that p and 2p are the only 28 3 3 2 14 = 2 3 5 17 19 23 ≥ ≥ (2n)! ` ´ · · · · · multiples of p that appear as factors in the numerator of , while we get 30 = 24 32 5 17 19 23 29 n!n! `15´ · · · · · · two p-factors in the denominator. illustrate that “very small” prime factors ` ´ (4) Now we are ready to estimate 2n . For n 3, using an estimate from p < √2n can appear as higher powers n ≥ 2n √ page 12 for the lower bound, we get in n , “small” primes with 2n <  p 2 n appear at most once, while ≤` ´3 4n 2n factors in the gap with 2 n < p n 2n p p 3 ≤ 2n ≤ n ≤ · · don’t appear at all.   p √2n √2n

(5) Assume now that there is no prime p with n < p 2n, so the second product in (2) is 1. Substituting (1) into (2) we get ≤

n 1+√2n 2 n 4 (2n) 4 3 ≤ or 1 n 1+√2n 4 3 (2n) , (3) ≤ which is false for n large enough! In fact, using a + 1 < 2a (which holds for all a 2, by induction) we get ≥ 6 6 6 6 2n = √6 2n < √6 2n + 1 < 26 √2n 26 √2n, (4) ≤   and thus for n 50 (and hence 18< 2√2n) we obtain from (3) and (4) ≥

6 6 2/3 22n (2n)3 1+√2n < 2 √2n 18+18√2n < 220 √2n√2n = 220(2n) . ≤   This implies (2n)1/3 < 20, and thus n < 4000.  10 Bertrand’s postulate

One can extract even more from this type of estimates: From (2) one can derive with the same methods that 1 n p 2 30 for n 4000, ≥ ≥ n

1 1 n 30 n log2n 2 = 30 log2 n + 1 primes in the range between n and 2n. This is not that bad an estimate: the “true” number of primes in this range is roughly n/ log n. This follows from the “prime number theorem,” which says that the limit # p n : p is prime lim { ≤ } n →∞ n/ log n exists, and equals 1. This famous result was first proved by Hadamard and de la Vallée-Poussin in 1896; Selberg and Erdos˝ found an elementary proof (without complex analysis tools, but still long and involved) in 1948. On the prime number theorem itself the final word, it seems, is still not in: for example a proof of the (see page 49), one of the major unsolved open problems in mathematics, would also give a substan- tial improvement for the estimates of the prime number theorem. But also for Bertrand’s postulate, one could expect dramatic improvements. In fact, the following is a famous unsolved problem: Is there always a prime between n2 and (n + 1)2? For additional information see [3, p. 19] and [4, pp. 248, 257].

Appendix: Some estimates Estimating via integrals There is a very simple-but-effective method of estimating sums by integrals (as already encountered on page 4). For estimating the harmonic numbers n 1 H = n k kX=1 we draw the figure in the margin and derive from it n 1 n 1 Hn 1 = < dt = log n 1 − k 1 t Xk=2 Z 1 by comparing the area below the graph of f(t) = t (1 t n) with the area of the dark shaded rectangles, and ≤ ≤

n 1 1 − 1 n 1 H = > dt = log n n − n k t 1 2 n k=1 1 X Z Bertrand’s postulate 11 by comparing with the area of the large rectangles (including the lightly shaded parts). Taken together, this yields 1 log n + < H < log n + 1. n n

In particular, lim Hn , and the order of growth of Hn is given by n →∞ → ∞ lim Hn = 1. But much better estimates are known (see [2]), such as n log n →∞ 1 Here O n6 denotes a function f(n) 1 1 1 1 such that f(n) c 1 holds for some Hn = log n + γ + + + O , ` ´ n6 2n − 12n2 120n4 n6 ≤   constant c. where γ 0.5772 is “Euler’s constant.” ≈ Estimating factorials — Stirling’s formula The same method applied to

n log(n!) = log2+log3+ ... + log n = log k kX=2 yields n log((n 1)!) < log t dt < log(n!), − Z1 where the integral is easily computed:

n n log t dt = t log t t = n log n n + 1. 1 − 1 − Z h i Thus we get a lower estimate on n! n n log n n+1 n n! > e − = e e   and at the same time an upper estimate n n log n n+1 n n! = n (n 1)! < ne − = en . − e   Here a more careful analysis is needed to get the asymptotics of n!, as given by Stirling’s formula Here f(n) g(n) means that n n ∼ n! √2πn . f(n) ∼ e lim = 1. n→∞ g(n) And again there are more precise versions available, such as n n 1 1 139 1 n! = √2πn 1 + + + O . e 12n 288n2 − 5140n3 n4     

Estimating binomial coefficients n Just from the definition of the binomial coefficients k as the number of k-subsets of an n-set, we know that the sequence n , n ,..., n of 0 1 n binomial coefficients    12 Bertrand’s postulate

n n n sums to k = 2 • k=0 P  1 n n is symmetric: k = n k . 1 1 • − 1 2 1   n n k+1 n 1 3 3 1 From the functional equation k = −k k 1 one easily finds that for 1 4 6 4 1 every n the binomial coefficients n form a− sequence that is symmetric 1 5 10 10 5 1  k  and unimodal: it increases towards the middle, so that the middle binomial 1 6 15 20 15 6 1  1 7 2135 2135 7 1 coefficients are the largest ones in the sequence:

Pascal’s triangle n n n n n n 1 = 0 < 1 < < n/2 = n/2 > > n 1 > n = 1. ··· ⌊ ⌋ ⌈ ⌉ ··· − Here x resp. x denotes the number x rounded down resp. rounded  up to the⌊ nearest⌋ integer.⌈ ⌉ From the asymptotic formulas for the factorials mentioned above one can obtain very precise estimates for the sizes of binomial coefficients. How- ever, we will only need very weak and simple estimates in this book, such as the following: n 2n for all k, while for n 2 we have k ≤ ≥  n 2n , n/2 ≥ n ⌊ ⌋ with equality only for n = 2. In particular, for n 1, ≥ 2n 4n . n ≥ 2n   n This holds since n/2 , a middle binomial coefficient, is the largest entry n⌊ ⌋n n n n n in the sequence 0 + n , 1 , 2 ,..., n 1 , whosesumis 2 , and whose n  average is thus 2 . − n      On the other hand, we note the upper bound for binomial coefficients

n n(n 1) (n k + 1) nk nk = − ··· − , k k! ≤ k! ≤ 2k 1   − which is a reasonably good estimate for the “small” binomial coefficients at the tails of the sequence, when n is large (compared to k).

References

[1] P. ERDOS˝ : Beweis eines Satzes von Tschebyschef, Acta Sci. Math. (Szeged) 5 (1930-32), 194-198.

[2] R. L. GRAHAM,D.E.KNUTH &O.PATASHNIK: Concrete Mathematics. A Foundation for Computer Science, Addison-Wesley, Reading MA 1989.

[3] G. H. HARDY &E.M.WRIGHT: An Introduction to the Theory of Numbers, fifth edition, Oxford University Press 1979.

[4] P. RIBENBOIM: The New Book of Prime Number Records, Springer-Verlag, New York 1989. Binomial coefficients Chapter 3 are (almost) never powers

There is an epilogue to Bertrand’s postulate which leads to a beautiful re- sult on binomial coefficients. In 1892 Sylvester strengthened Bertrand’s postulate in the following way:

If n 2k, then at least one of the numbers n, n 1, . . . , n k + 1 has a≥ prime divisor p greater than k. − − Note that for n = 2k we obtain precisely Bertrand’s postulate. In 1934, Erdos˝ gave a short and elementary Book Proof of Sylvester’s result, running along the lines of his proof of Bertrand’s postulate. There is an equivalent way of stating Sylvester’s theorem: The binomial coefficient n n(n 1) (n k + 1) = − ··· − (n 2k) k k! ≥   always has a prime factor p > k. With this observation in mind, we turn to another one of Erdos’˝ jewels. n ℓ When is k equal to a power m ? It is easy to see that there are infinitely n 2 many solutions for k = ℓ = 2, that is, of the equation 2 = m . Indeed,  2 if n is a square, then so is (2n 1) . To see this, set 2. 2 −2 n(n 1) = 2m It follows that −   (2n 1)2((2n 1)2 1) = (2n 1)24n(n 1) = 2(2m(2n 1))2, − − − − − − and hence (2n 1)2 − = (2m(2n 1))2. 2 −   9 2 Beginning with 2 = 6 we thus obtain infinitely many solutions — the next one is 289 = 2042. However, this does not yield all solutions. For 2  50 2 1682 2 example, 2 = 35 starts another series, as does 2 = 1189 . For  n 2 50 2 k = 3 it is known that = m has the unique solution n = 50, m = 140. 3 = 140  3  But now we are at the endof the line. For k 4 and any ℓ 2 no solutions is` the´ only solution for k = 3, ℓ = 2 exist, and this is what Erdos˝ proved by an ingenious≥ argument.≥

n ℓ Theorem. The equation k = m has no integer solutions with ℓ 2 and 4 k n 4. ≥ ≤ ≤ −  14 Binomial coefficients are (almost) never powers

 Proof. Note first that we may assume n 2k because of n = n . ≥ k n k Suppose the theorem is false, and that n = mℓ. The proof, by contra-− k   diction, proceeds in the following four steps.  n (1) By Sylvester’s theorem, there is a prime factor p of k greater than k, hence pℓ divides n(n 1) (n k + 1). Clearly, only one of the factors  n i can be a multiple− of p···(because− of p > k), and we conclude pℓ n i, and− therefore | − n pℓ > kℓ k2. ≥ ≥ (2) Consider any factor n j of the numerator and write it in the form n j = a mℓ, where a is− not divisible by any nontrivial ℓ-th power. We − j j j note by (1) that aj has only prime divisors less than or equal to k. We want to show next that ai = aj for i = j. Assume to the contrary that ai = aj for some i < j. Then6 m m +6 1 and i ≥ j k > (n i) (n j) = a (mℓ mℓ) a ((m + 1)ℓ mℓ) − − − j i − j ≥ j j − j ℓ 1 ℓ 1/2 1/2 > a ℓm − ℓ(a m ) ℓ(n k + 1) j j ≥ j j ≥ − ℓ( n + 1)1/2 > n1/2, ≥ 2 which contradicts n > k2 from above.

(3) Next we prove that the ai’s are the integers 1, 2, . . . , k in some order. (According to Erdos,˝ this is the crux of the proof.) Since we already know that they are all distinct, it suffices to prove that

a0a1 ak 1 divides k!. ··· − Substituting n j = a mℓ into the equation n = mℓ, we obtain − j j k ℓ ℓ a0a1 ak 1(m0m1 mk 1) = k!m . ··· − ··· −

Cancelling the common factors of m0 mk 1 and m yields ··· − ℓ ℓ a0a1 ak 1u = k!v ··· − with gcd(u, v) = 1. It remains to show that v = 1. If not, then v con- tains a prime divisor p. Since gcd(u, v) = 1, p must be a prime divisor of a0a1 ak 1 and hence is less than or equal to k. By the theorem of ··· − k Legendre (see page 8) we know that k! contains p to the power i 1 pi . ≥ ⌊ ⌋ We now estimate the exponent of p in n(n 1) (n k + 1). Let i be a − ··· − P i positive integer, and let b1 < b2 < < bs be the multiples of p among n, n 1, . . . , n k + 1. Then b =···b + (s 1)pi and hence − − s 1 − (s 1)pi = b b n (n k + 1) = k 1, − s − 1 ≤ − − − which implies k 1 k s − + 1 + 1. ≤ pi ≤ pi j k j k Binomial coefficients are (almost) never powers 15

So for each i the number of multiples of pi among n, . . . , n k+1, and hence among the a ’s, is bounded by k + 1. This implies that− the expo- j ⌊ pi ⌋ nent of p in a0a1 ak 1 is at most ··· − ℓ 1 − k + 1 pi i=1 X j k  with the reasoning that we used for Legendre’s theorem in Chapter 2. The only difference is that this time the sum stops at i = ℓ 1, since the aj’s contain no ℓ-th powers. − Taking both counts together, we find that the exponent of p in vℓ is at most

ℓ 1 − k k + 1 ℓ 1, pi − pi ≤ − i=1 i 1 X j k  X≥ j k and we have our desired contradiction, since vℓ is an ℓ-th power. This suffices already to settle the case ℓ = 2. Indeed, since k 4 one of We see that our analysis so far agrees ≥ 50 2 the ai’s must be equal to 4, but the ai’s contain no squares. So let us now with 3 = 140 , as assume that ℓ 3. 50 = 2 52 ≥ ` ´ · 49 = 1 72 (4) Since k 4, we must have ai1 = 1, ai2 = 2, ai3 = 4 for some i1, i2, i3, · 2 ≥ 48 = 3 4 that is, · and 5 7 4 = 140. n i = mℓ , n i = 2mℓ , n i = 4mℓ . · · − 1 1 − 2 2 − 3 3 2 We claim that (n i2) = (n i1)(n i3). If not, put b = n i2 and n i = b x, n− i =6 b + y−, where 0−< x , y < k. Hence − − 1 − − 3 | | | | b2 = (b x)(b + y) or (y x)b = xy, − − where x = y is plainly impossible. Now we have by part (1)

xy = b y x b > n k > (k 1)2 xy , | | | − | ≥ − − ≥ | | which is absurd. 2 2 So we have m2 = m1m3, where we assume m2 > m1m3 (the other case being analogous),6 and proceed to our last chains of inequalities. We obtain

2(k 1)n > n2 (n k + 1)2 > (n i )2 (n i )(n i ) − − − − 2 − − 1 − 3 = 4[m2ℓ (m m )ℓ] 4[(m m + 1)ℓ (m m )ℓ] 2 − 1 3 ≥ 1 3 − 1 3 ℓ 1 ℓ 1 4ℓm − m − . ≥ 1 3 Since ℓ 3 and n > kℓ k3 > 6k, this yields ≥ ≥ 2(k 1)nm m > 4ℓmℓ mℓ = ℓ(n i )(n i ) − 1 3 1 3 − 1 − 3 n > ℓ(n k + 1)2 > 3(n )2 > 2n2. − − 6 16 Binomial coefficients are (almost) never powers

Now since m n1/ℓ n1/3 we finally obtain i ≤ ≤ kn2/3 km m > (k 1)m m > n, ≥ 1 3 − 1 3 or k3 > n. With this contradiction, the proof is complete. 

References

[1] P. ERDOS˝ : A theorem of Sylvester and Schur, J. London Math. Soc. 9 (1934), 282-288.

[2] P. ERDOS˝ : On a diophantine equation, J. London Math. Soc. 26 (1951), 176-178.

[3] J. J. SYLVESTER: On arithmetical series, Messenger of Math. 21 (1892), 1-19, 87-120; Collected Mathematical Papers Vol. 4, 1912, 687-731. Representing numbers Chapter 4 as sums of two squares

1 = 12 + 02 Which numbers can be written as sums of two squares? 2 = 12 + 12 3 = 4 = 22 + 02 This question is as old as number theory, and its solution is a classic in the 5 = 22 + 12 field. The “hard” part of the solution is to see that every prime number of 6 = the form 4m + 1 is a sum of two squares. G. H. Hardy writes that this 7 = two square theorem of Fermat “is ranked, very justly, as one of the finest in 8 = 22 + 22 arithmetic.” Nevertheless, one of our Book Proofs below is quite recent. 9 = 32 + 2 Let’s start with some “warm-ups.” First, we need to distinguish between 10 = 3 + the prime p = 2, the primes of the form p = 4m + 1, and the primes of 11 = . the form p = 4m + 3. Every prime number belongs to exactly one of these . three classes. At this point we may note (using a method “à la Euclid”) that there are infinitely many primes of the form 4m + 3. In fact, if there were only finitely many, then we could take pk to be the largest prime of this form. Setting N := 22 3 5 p 1 k · · ··· k − (where p1 = 2, p2 = 3, p3 = 5, ...denotes the sequence of all primes), we find that Nk is congruent to 3 (mod 4), so it must have a prime factor of the form 4m + 3, and this prime factor is larger than pk — contradiction. Our first lemma characterizes the primes for which 1 is a square in the Pierre de Fermat − field Zp (which is reviewed in the box on the next page). It will also give us a quick way to derive that there are infinitely many primes of the form 4m + 1.

Lemma 1. For primes p = 4m + 1 the equation s2 1 (mod p) has two solutions s 1, 2, . . ., p 1 , for p = 2 there is one≡ such− solution, while for primes of∈ the { form p =− 4m} + 3 there is no solution.

 Proof. For p = 2 take s = 1. For odd p, we construct the equivalence relation on 1, 2, . . . , p 1 that is generated by identifying every element { − } with its additive inverse and with its multiplicative inverse in Zp. Thus the “general” equivalence classes will contain four elements

x, x, x, x { − − } since such a 4-element set contains both inverses for all its elements. How- ever, there are smaller equivalence classes if some of the four numbers are not distinct: 18 Representing numbers as sums of two squares

x x is impossible for odd p. • ≡ − x x is equivalent to x2 1. This has two solutions, namely x = 1 • and≡x = p 1, leading to the≡ equivalence class 1, p 1 of size 2. − { − } x x is equivalent to x2 1. This equation may have no solution • ≡ − ≡ − or two distinct solutions x0, p x0: in this case the equivalence class is x , p x . − { 0 − 0} For p = 11 the partition is The set 1, 2, . . . , p 1 has p 1 elements, and we have partitioned it into { − } − 1, 10 , 2, 9, 6, 5 , 3, 8, 4, 7 ; quadruples (equivalence classes of size 4), plus one or two pairs (equiva- { } { } { } for p = 13 it is lence classes of size 2). For p 1 = 4m + 2 we find that there is only the − 1, 12 , 2, 11, 7, 6 , 3, 10, 9, 4 , one pair 1, p 1 ,the rest is quadruples,and thus s2 1 (mod p) has no { } { } { } { − } ≡ − 5, 8 : the pair 5, 8 yields the two solution. For p 1 = 4m there has to be the second pair, and this contains { } 2 { } − solutions of s 1 mod 13. the two solutions of s2 1 that we were looking for.  ≡ − ≡ − Lemma 1 says that every odd prime dividing a number M 2 + 1 must be of the form 4m + 1. This implies that there are infinitely many primes of this 2 form: Otherwise, look at (2 3 5 qk) + 1, where qk is the largest such prime. The same reasoning as· above· ··· yields a contradiction.

Prime fields

If p is a prime, then the set Zp = 0, 1, . . . , p 1 with addition and multiplication defined “modulo p{” forms a finite− } field. We will need the following simple properties:

For x Zp, x = 0, the additive inverse (for which we usually • ∈ 6 + 0 1 2 3 4 write x) is given by p x 1, 2, . . . , p 1 . If p > 2, then x − − ∈ { − } 0 0 1 2 3 4 and x are different elements of Zp. 1 1 2 3 4 0 − 2 2 3 4 0 1 Each x Zp 0 has a unique multiplicative inverse x Zp 0 , • ∈ \{ } ∈ \{ } 3 3 4 0 1 2 with xx 1 (mod p). ≡ 4 4 0 1 2 3 The definition of primes implies that the map Zp Zp, z xz → 7→ is injective for x = 0. Thus on the finite set Zp 0 it must be 0 1 2 3 4 6 \{ } · surjective as well, and hence for each x there is a unique x = 0 0 0 0 0 0 0 with xx 1 (mod p). 6 1 0 1 2 3 4 ≡ 2 2 2 2 2 0 2 4 1 3 The squares 0 , 1 , 2 , . . . , h define different elements of Zp, for 3 0 3 1 4 2 • p h = 2 . 4 0 4 3 2 1 This⌊ is since⌋ x2 y2, or (x + y)(x y) 0, implies that x y ≡ p − ≡2 2 2 ≡ Addition and multiplication in Z5 or that x y. The 1 + elements 0 , 1 , . . . , h are called ≡ − ⌊ 2 ⌋ the squares in Zp.

At this point, let us note “on the fly” that for all primes there are solutions 2 2 p for x + y 1 (mod p). In fact, there are 2 + 1 distinct squares 2 ≡ − p ⌊ ⌋ 2 x in Zp, and there are + 1 distinct numbers of the form (1 + y ). ⌊ 2 ⌋ − These two sets of numbers are too large to be disjoint, since Zp has only p elements, and thus there must exist x and y with x2 (1 + y2)(mod p). ≡ − Representing numbers as sums of two squares 19

Lemma 2. No number n = 4m + 3 is a sum of two squares.

 Proof. The square of any even number is (2k)2 = 4k2 0 (mod 4), while squares of odd numbers yield (2k+1)2 = 4(k2 +k)+1≡ 1 (mod 4). Thus any sum of two squares is congruent to 0, 1 or 2 (mod 4)≡.  This is enough evidence for us that the primes p = 4m+3 are “bad.” Thus, we proceed with “good” properties for primes of the form p = 4m + 1. On the way to the main theorem, the following is the key step. Proposition. Every prime of the form p = 4m + 1 is a sum of two squares, that is, it can be written as p = x2 +y2 for some natural numbers x, y N. ∈ We shall present here two proofs of this result — both of them elegant and surprising. The first proof features a striking application of the “pigeon- hole principle” (which we have already used “on the fly” before Lemma 2; see Chapter 25 for more), as well as a clever move to arguments“modulo p” and back. The idea is due to the Norwegian number theorist Axel Thue.

 Proof. Consider the pairs (x′, y′) of integers with 0 x′, y′ √p, that 2 ≤ ≤ is, x′, y′ 0, 1,..., √p . There are ( √p + 1) such pairs. Using the ∈ { ⌊ ⌋} ⌊ ⌋ estimate x + 1 > x for x = √p, we see that we have more than p such ⌊ ⌋ pairs of integers. Thus for any s Z, it is impossible that all the values For p = 13, √p = 3 we consider ∈ ′ ′ ⌊ ⌋ x′ sy′ produced by the pairs (x′, y′) are distinct modulo p. That is, for x , y 0, 1, 2, 3 . For s = 5, the sum − ′ ∈′ { } every s there are two distinct pairs x sy (mod 13) assumes the following − values: 2 (x′, y′), (x′′, y′′) 0, 1,..., √p ′ Qy 0 1 2 3 ∈ { ⌊ ⌋} x′ Q with x′ sy′ x′′ sy′′ (mod p). Now we take differences: We have 0 0 8 3 11 − ≡ − x′ x′′ s(y′ y′′)(mod p). Thus if we define x := x′ x′′ , y := − ≡ − | − | 1 1 9 4 12 y′ y′′ , then we get | − | 2 2 10 5 0 (x, y) 0, 1,..., √p 2 with x sy (mod p). 3 3 11 6 1 ∈ { ⌊ ⌋} ≡ ± Also we know that not both x and y can be zero, because the pairs (x′, y′) and (x′′, y′′) are distinct. Now let s be a solution of s2 1 (mod p), which exists by Lemma 1. Then x2 s2y2 y2 (mod p≡), and − so we have produced ≡ ≡ − (x, y) Z2 with 0 < x2 + y2 < 2p and x2 + y2 0 (mod p). ∈ ≡ But p is the only number between 0 and 2p that is divisible by p. Thus x2 + y2 = p: done!  Our second proof for the proposition — also clearly a Book Proof — was discovered by Roger Heath-Brown in 1971 and appeared in 1984. (A condensed “one-sentence version” was given by Don Zagier.) It is so elementary that we don’t even need to use Lemma 1. Heath-Brown’s argument features three linear involutions: a quite obvious one, a hidden one, and a trivial one that gives “the final blow.” The second, unexpected, involution corresponds to some hidden structure on the set of integral solutions of the equation 4xy + z2 = p. 20 Representing numbers as sums of two squares

 Proof. We study the set

S := (x, y, z) Z3 : 4xy + z2 = p, x > 0, y > 0 . { ∈ } p p This set is finite. Indeed, x 1 and y 1 implies y 4 and x 4 . So there are only finitely many possible≥ values≥ for x and y≤, and given≤x and y, there are at most two values for z. 1. The first linear involution is given by

f : S S, (x, y, z) (y, x, z), −→ 7−→ − that is, “interchange x and y, and negate z.” This clearly maps S to itself, and it is an involution: Applied twice, it yields the identity. Also, f has no fixed points, since z = 0 would imply p = 4xy, which is impossible. Furthermore, f maps the solutions in

T := (x, y, z) S : z > 0 { ∈ } to the solutions in S T , which satisfy z < 0. Also, f reverses the signs of x y and of z, so it\ maps the solutions in T − f U := (x, y, z) S :(x y) + z > 0 { ∈ − } to the solutions in S U. For this we have to see that there is no solution with (x y)+z = 0,\ but there is none since this would give p = 4xy+z2 = U 4xy + (−x y)2 = (x + y)2. − What do we get from the study of f? The main observation is that since f maps the sets T and U to their complements, it also interchanges the elements in T U with these in U T . That is, there is the same number of solutions in U\that are not in T as\ there are solutions in T that are not in U — so T and U have the same cardinality. 2. The second involution that we study is an involution on the set U:

g : U U, (x, y, z) (x y + z, y, 2y z). −→ 7−→ − − First we check that indeed this is a well-defined map: If (x, y, z) U, then x y + z > 0, y > 0 and 4(x y + z)y + (2y z)2 = 4xy∈+ z2, so g(−x, y, z) S. By (x y + z) −y + (2y z) = x >− 0 we find that indeed g(x, y, z) ∈ U. − − − ∈ Also g is an involution: g(x, y, z) = (x y + z, y, 2y z) is mapped by g to ((x y + z) y + (2y z), y, 2y −(2y z)) = (−x, y, z). − − − − − g And finally: g has exactly one fixed point:

(x, y, z) = g(x, y, z) = (x y + z, y, 2y z) − − holds exactly if y = z: But then p = 4xy + y2 = (4x + y)y, which holds p 1 only for y = 1 = z, and x = −4 . But if g is an involution on U that has exactly one fixed point, then the U cardinality of U is odd. Representing numbers as sums of two squares 21

3. The third, trivial, involution that we study is the involution on T that interchanges x and y:

h : T T, (x, y, z) (y, x, z). −→ 7−→ This map is clearly well-defined, and an involution. We combine now our knowledge derived from the other two involutions: The cardinality of T is T equal to the cardinality of U, which is odd. But if h is an involution on a finite set of odd cardinality, then it has a fixed point: There is a point On a finite set of odd cardinality, every (x, y, z) T with x = y, that is, a solution of involution has at least one fixed point. ∈ p = 4x2 + z2 = (2x)2 + z2. 

Note that this proof yields more — the number of representations of p in the form p = x2 +(2y)2 is odd for all primes of the form p = 4m+1. (The representation is actually unique, see [3].) Also note that both proofs are not effective: Try to find x and y fora ten digit prime! Efficient ways to find such representations as sums of two squares are discussed in [1] and [7]. The following theorem completely answers the question which started this chapter. Theorem. A natural number n can be represented as a sum of two squares if and only if every prime factor of the form p = 4m + 3 appears with an even exponent in the prime decomposition of n.

 Proof. Call a number n representable if it is a sum of two squares, that 2 2 is, if n = x + y for some x, y N0. The theorem is a consequence of the following five facts. ∈

(1) 1 = 12 + 02 and 2 = 12 + 12 are representable. Every prime of the form p = 4m + 1 is representable.

2 2 (2) The product of any two representable numbers n1 = x1 + y1 and n2 = x2 + y2 is representable: n n = (x x + y y )2 + (x y x y )2. 2 2 1 2 1 2 1 2 1 2 − 2 1 (3) If n is representable, n = x2 + y2, then also nz2 is representable, by nz2 = (xz)2 + (yz)2.

Facts (1), (2) and (3) together yield the “if” part of the theorem.

(4) If p = 4m + 3 is a prime that divides a representable number n = x2 + y2, then p divides both x and y, and thus p2 divides n. In fact, if we had x 0 (mod p), then we could find x such that xx 1 (mod p), multiply the6≡ equation x2 + y2 0 by x2, and thus obtain≡1 + y2x2 = 1 + (xy)2 0 (mod p), which≡ is impossible for p = 4m + 3 by Lemma 1. ≡

(5) If n is representable, and p = 4m + 3 divides n, then p2 divides n, and n/p2 is representable. This follows from (4), and completes the proof.  22 Representing numbers as sums of two squares

Two remarks close our discussion: If a and b are two natural numbers that are relatively prime, then there are • infinitely many primes of the form am + b (m N) —thisisafamous (and difficult) theorem of Dirichlet. More precisely,∈ one can show that the number of primes p x of the form p = am + b is described very ≤ 1 x accurately for large x by the function ϕ(a) log x , where ϕ(a) denotes the number of b with 1 b < a that are relatively prime to a. (This is a substantial refinement≤ of the prime number theorem, which we had discussed on page 10.) This means that the primes for fixed a and varying b appear essentially • at the same rate. Nevertheless, for example for a = 4 one can observe a rather subtle, but still noticeable and persistent tendency towards “more” primesof the form 4m+3: Ifyoulookfora largerandom x, then chances are that there are more primes p x of the form p = 4m + 3 than of the form p = 4m + 1. This effect≤ is known as “Chebyshev’s bias”; see Riesel [4] and Rubinstein and Sarnak [5].

References

[1] F. W. CLARKE, W. N. EVERITT,L.L.LITTLEJOHN &S.J.R.VORSTER: H. J. S. Smith and the Fermat Two Squares Theorem, Amer. Math. Monthly 106 (1999), 652-665.

[2] D. R. HEATH-BROWN: Fermat’s two squares theorem, Invariant (1984), 2-5.

[3] I. NIVEN &H.S.ZUCKERMAN: An Introduction to the Theory of Numbers, Fifth edition, Wiley, New York 1972.

[4] H. RIESEL: Prime Numbers and Computer Methods for Factorization, Second edition, Progress in Mathematics 126, Birkhäuser, Boston MA 1994.

[5] M. RUBINSTEIN & P. SARNAK: Chebyshev’s bias, Experimental Mathematics 3 (1994), 173-197.

[6] A. THUE: Et par antydninger til en taltheoretisk metode, Kra. Vidensk. Selsk. Forh. 7 (1902), 57-75.

[7] S. WAGON: Editor’s corner: The Euclidean algorithm strikes again, Amer. Math. Monthly 97 (1990), 125-129.

[8] D. ZAGIER: A one-sentence proof that every prime p 1 (mod 4) is a sum of two squares, Amer. Math. Monthly 97 (1990), 144. ≡ The law of quadratic reciprocity Chapter 5

Which famous mathematical theorem has been provedmost often? Pythago- ras would certainly be a good candidate or the fundamental theorem of al- gebra, but the champion is without doubt the law of quadratic reciprocity in number theory. In an admirable monograph Franz Lemmermeyer lists as of the year 2000 no fewer than 196 proofs. Many of them are of course only slight variations of others, but the array of different ideas is still im- pressive, as is the list of contributors. Carl Friedrich Gauss gave the first complete proof in 1801 and followed up with seven more. A little later Ferdinand Gotthold Eisenstein added five more — and the ongoing list of provers reads like a Who is Who of mathematics. With so many proofs present the question which of them belongs in the Book can have no easy answer. Is it the shortest, the most unexpected, or should one look for the proof that had the greatest potential for general- izations to other and deeper reciprocity laws? We have chosen two proofs (based on Gauss’ third and sixth proofs), of which the first may be the sim- plest and most pleasing, while the other is the starting point for fundamental Carl Friedrich Gauss results in more general structures. As in the previous chapter we work “modulo p”, where p is an odd prime; Zp is the field of residues upon division by p, and we usually (but not al- ways) take these residues as 0, 1, . . . , p 1. Consider some a 0 (mod p), − 6≡ that is, p ∤ a. We call a a quadratic residue modulo p if a b2 (mod p) for some b, and a quadratic nonresidue otherwise. The quadratic≡ residues are 2 2 p 1 2 p 1 therefore 1 , 2 ,..., ( −2 ) , and so there are −2 quadratic residues and For p = 13, the quadratic residues are p 1 2 2 p 1 − quadratic nonresidues. Indeed, if i j (mod p) with 1 i, j − , 12 1, 22 4, 32 9, 42 3, 2 ≡ ≤ ≤ 2 ≡ ≡ ≡ ≡ then p i2 j2 = (i j)(i + j). As 2 i + j p 1 we have p i j, 52 12, and 62 10; the nonresidues | − − ≤ ≤ − | − ≡ ≡ that is, i j (mod p). are 2, 5, 6, 7, 8, 11. ≡ At this point it is convenient to introduce the so-called Legendre symbol. Let a 0 (mod p), then 6≡ a 1 if a is a quadratic residue, := p 1 if a is a quadratic nonresidue.  −  The story begins with Fermat’s “little theorem”: For a 0 (mod p), 6≡ p 1 a − 1 (mod p). (1) ≡ In fact, since Zp∗ = Zp 0 is a group with multiplication, the set 1a, 2a, 3a, . . . , (p 1)a runs\{ again} through all nonzero residues, { − } (1a)(2a) ((p 1)a) 1 2 (p 1) (mod p), ··· − ≡ · ··· − p 1 and hence by dividing by (p 1)!, we get a − 1 (mod p). − ≡ 24 The law of quadratic reciprocity

p 1 In other words, the polynomial x − 1 Zp[x] has as roots all nonzero residues. Next we note that − ∈

− − p 1 p 1 p 1 x − 1 = (x 2 1)(x 2 + 1). − − Suppose a b2 (mod p) is a quadratic residue. Then by Fermat’s little − p ≡1 p 1 theorem a 2 b − 1 (mod p). Hence the quadratic residues are ≡ ≡ p−1 p 1 2 precisely the roots of the first factor x 1, and the −2 nonresidues − p−1 must therefore be the roots of the second factor x 2 + 1. Comparing this to the definition of the Legendre symbol, we obtain the following important tool.

For example, for p = 17 and a = 3 we have 38 = (34)2 = 812 ( 4)2 Euler’s criterion. For a 0 (mod p), ≡ − ≡ 1 (mod 17), while for a = 2 we get 6≡ − −8 4 2 2 a p 1 2 = (2 ) ( 1) 1 (mod 17). a 2 (mod p). ≡ − ≡ Hence 2 is a quadratic residue, while 3 p ≡ is a nonresidue.  This gives us at once the important product rule ab a b = , (2) p p p    since this obviously holds for the right-hand side of Euler’s criterion. The product rule is extremely helpful when one tries to compute Legendre sym- bols: Since any integer is a product of 1 and primes we only have to 1 2 q ± compute ( −p ), ( p ), and ( p ) for odd primes q. 1 1 By Euler’s criterion ( − ) = 1 if p 1 (mod 4), and ( − ) = 1 if p p ≡ p − ≡ 3 (mod 4), something we have already seen in the previous chapter. The case ( 2 ) will follow from the Lemma of Gauss below: ( 2 ) = 1 if p p p ≡ 1 (mod 8), while ( 2 ) = 1 if p 3 (mod 8). ± p − ≡ ± Gauss did lots of calculations with quadratic residues and, in particular, studied whether there might be a relation between q being a quadratic residue modulo p and p being a quadratic residue modulo q, when p and q are odd primes. After a lot of experimentation he conjectured and then proved the following theorem.

Law of quadratic reciprocity. Let p and q be different odd primes. Then q p p−1 q−1 ( )( ) = ( 1) 2 2 . p q −

p 1 q 1 If p 1 (mod 4) or q 1 (mod 4), then −2 (resp. −2 ) is even, and ≡ p−1 q−1 ≡ q p 2 2 therefore ( 1) = 1; thus ( p ) = ( q ). When p q 3 (mod 4), we p − q p ≡ q≡ Example: ( 3 ) = ( 17 ) = ( 2 ) = 1, have ( ) = ( ). Thus for odd primes we get ( ) = ( ) unless both p 17 3 3 − q − p q p so 3 is a nonresidue mod 17. and q are congruent to 3 (mod 4). The law of quadratic reciprocity 25

First proof. The key to our first proof (which is Gauss’ third) is a counting formula that soon came to be called the Lemma of Gauss:

Lemma of Gauss. Suppose a 0 (mod p). Take the numbers p 1 6≡ 1a, 2a, . . . , −2 a and reduce them modulo p to the residue system p 1 p 1 smallest in absolute value, ia ri (mod p) with −2 ri −2 for all i. Then ≡ − ≤ ≤ a ( ) = ( 1)s, where s = # i : r < 0 . p − { i }

 Proof. Suppose u1, . . . , us are the residues smaller than 0, and that v1, . . . , v p−1 are those greater than 0. Then the numbers u1,..., us 2 s − p 1 − − are between 1 and −2 , and are all different from the vjs (see the margin); If ui = vj , then ui + vj 0 (mod p). p 1 − ≡ − hence u1,..., us, v1, . . . , v p 1 s = 1, 2,..., − . Therefore Now ui ka, vj ℓa (mod p) implies 2 2 ≡ ≡ {− − − } { } p (k + ℓ)a. As p and a are relatively p 1 | ( u ) v = − !, prime, p must divide k + ℓ which is im- − i j 2 i j possible, since k + ℓ p 1. Y Y ≤ − which implies

s p 1 ( 1) u v − !(mod p). − i j ≡ 2 i j Y Y Now remember how we obtained the numbers ui and vj; they are the p 1 residues of 1a, , − a. Hence ··· 2 − p 1 s s p 1 p 1 − ! ( 1) u v ( 1) − ! a 2 (mod p). 2 ≡ − i j ≡ − 2 i j Y Y p 1 Cancelling −2 ! together with Euler’s criterion gives

− a p 1 s ( ) a 2 ( 1) (mod p), p ≡ ≡ − and therefore ( a ) = ( 1)s, since p is odd.  p − 2 p 1 With this we can easily compute ( p ): Since 1 2, 2 2,..., −2 2 are all between 1 and p 1, we have · · · − p 1 p 1 p 1 p 1 s = # i : − < 2i p 1 = − # i : 2i − = − . { 2 ≤ − } 2 − { ≤ 2 } ⌈ 4 ⌉ Check that s is even precisely for p = 8k 1. ± The Lemma of Gauss is the basis for many of the published proofs of the quadratic reciprocity law. The most elegant may be the one suggested by Ferdinand Gotthold Eisenstein, who had learned number theory from Gauss’ famous Disquisitiones Arithmeticae and made important contribu- tions to “higher reciprocity theorems” before his premature death at age 29. His proof is just counting lattice points! 26 The law of quadratic reciprocity

q Let p and q be odd primes, and consider ( p ). Suppose iq is a multiple of q that reduces to a negative residue ri < 0 in the Lemma of Gauss. This p means that there is a unique integer j such that 2 < iq jp < 0. Note that 0 < j < q since 0 < i < p . In other words,−( q ) = ( −1)s, where s is 2 2 p − the number of lattice points (x, y), that is, pairs of integers x, y satisfying p p q 0 < py qx < , 0 < x < , 0 < y < . (3) − 2 2 2 Similarly, ( p ) = ( 1)t where t is the number of lattice points (x, y) with q − q p q 0 < qx py < , 0 < x < , 0 < y < . (4) − 2 2 2 p q Now look at the rectangle with side lengths 2 , 2 , and draw the two lines parallel to the diagonal py = qx, y = q x+ 1 or py qx = p , respectively, p 2 − 2 y = q (x 1 ) or qx py = q . p − 2 − 2 The figure shows the situation for p = 17, q = 11. 11 2

p = 17 q = 11 R s = 5 t = 3 S q = ( 1)5 = 1 p − − p ` ´ = ( 1)3 = 1 q − − 17 ` ´ 2 The proof is now quickly completed by the following three observations:

1. There are no lattice points on the diagonal and the two parallels. This is so because py = qx would imply p x, which cannot be. For the parallels observe that py qx is an integer| while p or q are not. − 2 2 2. The lattice points observing (3) are precisely the points in the upper p strip 0 < py qx < 2 , and those of (4) the points in the lower strip − q 0 < qx py < 2 . Hence the number of lattice points in the two strips is s + t.−

p q 3. The outer regions R : py qx > 2 and S : qx py > 2 contain the same number of points. To− see this consider the map− ϕ : R S which maps (x, y) to ( p+1 x, q+1 y) and check that ϕ is an involution.→ 2 − 2 − p 1 q 1 Since the total number of lattice points in the rectangle is −2 −2 , we p 1 q 1 · infer that s + t and − − have the same parity, and so 2 · 2 − − q p s+t p 1 q 1 = ( 1) = ( 1) 2 2 .  p q − −   The law of quadratic reciprocity 27

Second proof. Our second choice does not use Gauss’ lemma, instead it employs so-called “Gauss sums” in finite fields. Gauss invented them in his study of the equation xp 1 = 0 and the arithmetical properties of the field − Q(ζ) (called cyclotomic field), where ζ is a p-th root of unity. They have been the starting point for the search for higher reciprocity laws in general number fields. Let us first collect a few facts about finite fields. A. Let p and q be different odd primes, and consider the finite field F with p 1 q − elements. Its prime field is Zq, whence qa = 0 for any a F . This q q q ∈ q implies that (a + b) = a + b , since any binomial coefficient i is a multiple of q for 0 < i < q, and thus 0 in F . Note that Euler’s criterion is p q−1  2 an equation ( q ) = p in the prime field Zq. p 1 B. The multiplicative group F ∗ = F 0 is cyclic of size q − 1 (see the box on the next page). Since by\{ Fermat’s} little theorem p −is a p 1 divisor of q − 1, there exists an element ζ F of order p, that is, p − 2 ∈ p ζ = 1, and ζ generates the subgroup ζ, ζ , . . . , ζ = 1 of F ∗. Note that any ζi (i = p) is again a generator.{ Hence we obtain the} polynomial decomposition6 xp 1 = (x ζ)(x ζ2) (x ζp). − − − ··· − Now we can go to work. Consider the Gauss sum

p 1 − i G := ζi F, p ∈ i=1 X  i where ( p ) is the Legendre symbol. For the proof we derive two different expressions for Gq and then set them equal. First expression. We have

p 1 p 1 p 1 − i − i q − iq q Gq = ( )qζiq = ( )ζiq = ( ) ( )ζiq = ( )G, (5) Example: Take p = 3, q = 5. Then p p p p p i=1 i=1 i=1 G = ζ ζ2 and G5 = ζ5 ζ10 = ζ2 ζ X X X − 2 − − q q q = (ζ ζ ) = G, corresponding to where the first equality follows from (a + b) = a + b , the second uses 5 − −2 − i q i ( 3 ) = ( 3 ) = 1. that ( p ) = ( p ) since q is odd, the third one is derived from (2), which − i q iq yields ( p ) = ( p )( p ), and the last one holds since iq runs with i through all nonzero residues modulo p. Second expression. Suppose we can prove

− 2 p 1 G = ( 1) 2 p , (6) − then we are quickly done. Indeed,

− − − − − − q 2 q 1 p 1 q 1 q 1 p p 1 q 1 G = G(G ) 2 = G( 1) 2 2 p 2 = G( )( 1) 2 2 . (7) − q − Equating the expressions in (5) and (7) and cancelling G, which is nonzero q p p−1 q−1 by (6), we find ( ) = ( )( 1) 2 2 , and thus p q − q p p−1 q−1 ( )( ) = ( 1) 2 2 . p q − 28 The law of quadratic reciprocity

The multiplicative group of a finite field is cyclic

Let F ∗ be the multiplicative group of the field F , with F ∗ = n. Writing ord(a) for the order of an element, that is, the smallest| | pos- k itive integer k such that a = 1, we want to find an element a F ∗ with ord(a) = n. If ord(b) = d, then by Lagrange’s theorem,∈ d divides n (see the margin on page 4). Classifying the elements ac- cording to their order, we have

n = ψ(d), where ψ(d) = # b F ∗ : ord(b) = d . (8) { ∈ } d n X| If ord(b) = d, then every element bi (i = 1, . . . , d) satisfies (bi)d = 1 and is therefore a root of the polynomial xd 1. But, since F is a field, xd 1 has at most d roots, and so the elements− b, b2, . . . , bd = 1 are precisely− these roots. In particular, every element of order d is of the form bi. i d On the other hand, it is easily checked that ord(b ) = (i,d) , where (i, d) denotes the greatest common divisor of i and d. Hence ord(bi) = d if and only if (i, d) = 1, that is, if i and d are rela- tively prime. Denoting Euler’s function by ϕ(d) = # i : 1 i d, (i, d) = 1 , we thus have ψ(d) = ϕ(d) whenever{ ψ(d)≤> ≤0. Looking at (8)} we find

n = ψ(d) ϕ(d) . ≤ d n d n X| X| But, as we are going to show that

ϕ(d) = n, (9) d n X| we must have ψ(d) = ϕ(d) for all d. In particular, ψ(n) = ϕ(n) 1, and so there is an element of order n. ≥ The following (folklore) proof of (9) belongs in the Book as well. “Even in total chaos Consider the n fractions we can hang on 1 2 k n to the cyclic group” n , n ,..., n ,..., n ,

k i reduce them to lowest terms n = d with 1 i d, (i, d) = 1, d n, and check that the denominator d appears precisely≤ ≤ ϕ(d) times. |

It remains to verify (6), and for this we first make two simple observations:

p i p 1 i p i i=1 ζ = 0 and thus i=1− ζ = 1. Just note that i=1 ζ is the • p 1 p p− i − coefficient of x − in x 1 = (x ζ ), and thus 0. P P− i=1 − P p 1 k p 2 k 1 − − k=1( p ) = 0 and thus k=1(Qp ) = ( −p ), since there are equally • many quadratic residues and nonresidues.− P P The law of quadratic reciprocity 29

We have

p 1 p 1 − i − j ij G2 = ζi ζj = ζi+j. p p p i=1 j=1 i,j  X    X   X  Setting j ik (mod p) we find ≡ p 1 p 1 k − k − G2 = ζi(1+k) = ζ(1+k)i. p p i,k k=1 i=1 X  X  X 1 1+k For k = p 1 1 (mod p) this gives ( −p )(p 1), since ζ = 1. Move k = p 1−in front≡ − and write − − p 2 p 1 − − p−1 2 1 k (1+k)i −1 2 G = − (p 1) + ζ . Euler’s criterion: ( p ) = ( 1) p − p − k=1 i=1  X  X Since ζ1+k is a generator of the group for k = p 1, the inner sum equals p 1 i 6 − i=−1 ζ = 1 for all k = p 1 by our first observation. Hence the second − p 2 k 6 − 1 − 2 2 2 summand is k=1( p ) = ( −p ) by our second observation. It follows For p = 3, q = 5, G = (ζ ζ ) = P − − 2 1 2 p 1 2 3 4 2 − that G = ( )p and thus with Euler’s criterion G = ( 1) 2 p, which ζ 2ζ + ζ = ζ 2 + ζ = 3 = −p 3−1 P − − − 2 − completes the proof.  ( 1) 2 3, since 1 + ζ + ζ = 0. −

References

[1] A. BAKER: A Concise Introduction to the Theory of Numbers, Cambridge Uni- versity Press, Cambridge 1984.

[2] F. G. EISENSTEIN: Geometrischer Beweis des Fundamentaltheorems für die quadratischen Reste, J. Reine Angewandte Mathematik 28 (1844), 186-191.

[3] C. F. GAUSS: Theorema arithmetici demonstratio nova, Comment. Soc. regiae sci. Göttingen XVI (1808), 69; Werke II, 1-8 (contains the 3rd proof).

[4] C. F. GAUSS: Theorematis fundamentalis in doctrina de residuis quadraticis demonstrationes et amplicationes novae (1818), Werke II, 47-64 (contains the 6th proof).

[5] F. LEMMERMEYER: Reciprocity Laws, Springer-Verlag, Berlin 2000. 30 The law of quadratic reciprocity

What’s up? I’m pushing 196 proofs for quadratic reciprocity Every finite division ring is a field Chapter 6

Rings are important structures in modern algebra. If a ring R has a mul- tiplicative unit element 1 and every nonzero element has a multiplicative inverse, then R is called a division ring. So, all that is missing in R from being a field is the commutativity of multiplication. The best-known exam- ple of a non-commutativedivision ring is the ring of quaternionsdiscovered by Hamilton. But, as the chapter title says, every such division ring must of necessity be infinite. If R is finite, then the axioms force the multiplication to be commutative. This result which is now a classic has caught the imaginationof manymath- ematicians, because, as Herstein writes: “It is so unexpectedly interrelating two seemingly unrelated things, the number of elements in a certain alge- braic system and the multiplication of that system.”

Theorem. Every finite division ring is commutative.

Ernst Witt This beautiful theorem which is usually attributed to MacLagan Wedder- burn has been proved by many people using a variety of different ideas. Wedderburn himself gave three proofs in 1905, and another proof was given by Leonard E. Dickson in the same year. More proofs were later given by Emil Artin, Hans Zassenhaus, Nicolas Bourbaki, and many others. One proof stands out for its simplicity and elegance. It was found by Ernst Witt in 1931 and combines two elementary ideas towards a glorious finish.  Proof. Our first ingredient comes from a blend of linear algebra and basic group theory. For an arbitrary element s R, let Cs be the set x R : xs = sx of elements which commute∈ with s; C is called the { ∈ } s centralizer of s. Clearly, Cs contains 0 and 1 and is a sub-division ring of R. The center Z is the set of elements which commute with all elements of R, thus Z = s R Cs. In particular, all elements of Z commute, 0 and 1 are in Z, and so Z ∈is a finite field. Let us set Z = q. T | | We can regard R and Cs as vector spaces over the field Z and deduce that R = qn, where n is the dimension of the vector space R over Z, and | | similarly C = qns for suitable integers n 1. | s| s ≥ Now let us assume that R is not a field. This means that for some s R ∈ the centralizer Cs is not all of R, or, what is the same, ns < n. On the set R∗ := R 0 we consider the relation \{ } 1 r′ r : r′ = x− rx for some x R∗. ∼ ⇐⇒ ∈ 32 Every finite division ring is a field

It is easy to check that is an equivalence relation. Let ∼ 1 A := x− sx : x R∗ s { ∈ } be the equivalence class containing s. We note that A = 1 precisely | s| when s is in the center Z. So by our assumption, there are classes As with 1 A 2. Consider now for s R∗ the map f : x x− sx from R∗ | s| ≥ ∈ s 7−→ onto A . For x, y R∗ we find s ∈ 1 1 1 1 x− sx = y− sy (yx− )s = s(yx− ) ⇐⇒ 1 yx− C∗ y C∗x, ⇐⇒ ∈ s ⇐⇒ ∈ s

for Cs∗ := Cs 0 , where Cs∗x = zx : z Cs∗ has size Cs∗ . Hence any 1 \{ } { ∈ } n | | element x− sx is the image of precisely C∗ = q s 1 elements in R∗ | s | − under the map fs, and we deduce R∗ = As Cs∗ . In particular, we note that | | | | | | n R∗ q 1 | | = − = As is an integer for all s. C qns 1 | | | s∗| − We know that the equivalence classes partition R∗. We now group the central elements Z∗ together and denote by A1,...,At the equivalence classes containing more than one element. By our assumption we know t t 1. Since R∗ = Z∗ + k=1 Ak , we have proved the so-called class≥ formula | | | | | | P t qn 1 qn 1 = q 1 + − , (1) − − qnk 1 kX=1 − qn 1 where we have 1 < n − N for all k. q k 1 ∈ With (1) we have left abstract− algebra and are back to the natural numbers. Next we claim that qnk 1 qn 1 implies n n. Indeed, write n = an +r − | − k | k with 0 r < n , then qnk 1 qank+r 1 implies ≤ k − | − n an +r n n (a 1)n +r q k 1 (q k 1) (q k 1) = q k (q − k 1), − | − − − − n (a 1)n +r n n and thus q k 1 q − k 1, since q k and q k 1 are relatively − | − n r − prime. Continuing in this way we find q k 1 q 1 with 0 r < nk, which is only possible for r = 0, that is, n −n. In| summary,− we≤ note k | n n for all k. (2) k | Now comes the second ingredient: the complex numbers C. Consider the polynomial xn 1. Its roots in C are called the n-th roots of unity. Since λn = 1, all these− roots λ have λ = 1 and lie therefore on the unit circle of | | 2kπi the complex plane. In fact, they are precisely the numbers λk = e n = cos(2kπ/n) + i sin(2kπ/n), 0 k n 1 (see the box on the next page). Some of the roots λ satisfy≤ λd≤= 1−for d < n; for example, the root λ = 1 satisfies λ2 = 1. For a root λ, let d be the smallest positive exponent with− λd = 1, that is, d is the order of λ in the group of the roots of unity. Then d n, by Lagrange’s theorem (“the order of every element of | Every finite division ring is a field 33 a group divides the order of the group” — see the box in Chapter 1). Note 2πi that there are roots of order n, such as λ1 = e n .

z = reiϕ

Roots of unity   Any complex number z = x + iy may be written in the “polar” form r = z  | |     y = r sin ϕ z = reiϕ = r(cos ϕ + i sin ϕ),      ϕ where r = z = x2 + y2 is the distance of z to the origin, and ϕ is    the angle measured| | from the positive x-axis. The n-th roots of unity x = r cos ϕ  p are therefore of the form | {z }

2kπi λ = e n = cos(2kπ/n) + i sin(2kπ/n), 0 k n 1, k ≤ ≤ − λ2 λ1 = ζ since for all k

n 2kπi λk = e = cos(2kπ) + i sin(2kπ) = 1. 1 1 − We obtain these roots geometrically by inscribing a regular n-gon k 2πi into the unit circle. Note that λk = ζ for all k, where ζ = e n . Thus 2 n 1 n the n-th roots of unity form a cyclic group ζ, ζ , . . . , ζ − , ζ = 1 of order n. { } The roots of unity for n = 6

Now we group all roots of order d together and set

φ (x) := (x λ). d − of order λ Y d

Note that the definition of φd(x) is independent of n. Since every root has some order d, we conclude that

xn 1 = φ (x). (3) − d d n Y|

Here is the crucial observation: The coefficients of the polynomials φn(x) are integers (that is, φn(x) Z[x] for all n), where in addition the constant coefficient is either 1 or 1∈. − Let us carefully verify this claim. For n = 1 we have 1 as the only root, and so φ (x) = x 1. Now we proceed by induction, where we assume 1 − φd(x) Z[x] for all d < n, and that the constant coefficient of φd(x) is 1 or 1.∈ By (3), − xn 1 = p(x) φ (x) (4) − n ℓ n ℓ j − k where p(x) = pjx , φn(x) = akx , with p0 = 1 or p0 = 1. j=0 k=0 −

Since 1 = p0Pa0, we see a0 1P, 1 . Suppose we already know that − ∈ { − } k a0, a1, . . . , ak 1 Z. Computing the coefficient of x on both sides of (4) − ∈ 34 Every finite division ring is a field

we find k k pjak j = pjak j + p0ak Z. − − ∈ j=0 j=1 X X By assumption, all a0, . . . , ak 1 (and all pj) are in Z. Thus p0ak and hence a must also be integers, since−p is 1 or 1. k 0 − We are ready for the coup de grâce. Let nk n be one of the numbers appearing in (1). Then |

xn 1 = φ (x) = (xnk 1)φ (x) φ (x). − d − n d d n d n, d∤n , d=n Y| | Yk 6 We conclude that in Z we have the divisibility relations n n q 1 φn(q) q 1 and φn(q) − . (5) | − qnk 1 − Since (5) holds for all k, we deduce from the class formula (1)

φ (q) q 1, n | − but this cannot be. Why? We know φ (x) = (x λ) where λ runs n − through all roots of xn 1 of order n. Let λ = a + ib be one of those roots. − Q By n > 1 (because of R = Z) we have λ = 1, which implies that the real 6 6 part a is smaller than 1. Now λ 2 = a2 +eb2 = 1, and hence | | e 2 2 2 2 q λ = q ae ib = (q a) + b µ | − | | − − | − = q2 2aq + a2 + b2 = q2 2aq + 1 e − − 1 q > q2 2q + 1 (because of a < 1) − 2 q µ > q 1 = (q 1) , | − | | − | − and so q λ > q 1 holds for all roots of order n. This implies | − | −

e φn(q) = q λ > q 1, | | | − | − Yλ which means that φn(q) cannot be a divisor of q 1, contradiction and end of proof. − 

References

[1] L. E. DICKSON: On finite algebras, Nachrichten der Akad. Wissenschaften Göttingen Math.-Phys. Klasse (1905), 1-36; Collected Mathematical Papers Vol. III, Chelsea Publ. Comp, The Bronx, NY 1975, 539-574. [2] J. H. M. WEDDERBURN: A theorem on finite algebras, Trans. Amer. Math. Soc. 6 (1905), 349-352. [3] E. WITT: Über die Kommutativität endlicher Schiefkörper, Abh. Math. Sem. Univ. Hamburg 8 (1931), 413. Some irrational numbers Chapter 7

“π is irrational”

This was already conjectured by Aristotle, when he claimed that diameter and circumference of a circle are not commensurable. The first proof of this fundamental fact was given by Johann Heinrich Lambert in 1766. Our Book Proof is due to Ivan Niven, 1947: an extremely elegant one-page proof that needs only elementary calculus. Its idea is powerful, and quite a bit more can be derived from it, as was shown by Iwamoto and Koksma, respectively:

π2 is irrational and • er is irrational for rational r = 0. • 6 Niven’s method does, however, have its roots and predecessors: It can be traced back to the classical paper by Charles Hermite from 1873 which Charles Hermite first established that e is transcendental, that is, that e is not a zero of a polynomial with rational coefficients.

Before we treat we will look at and its powers, and see that these are 1 1 1 1 π e e := 1 + 1 + 2 + 6 + 24 + ... irrational. This is much easier, and we thus also follow the historical order = 2.718281828... in the development of the results. 2 4 4 ex := 1 + x + x + x + x + ... To start with, it is rather easy to see (as did Fourier in 1815) that e = 1 2 6 24 k 1 a = x k 0 k! is irrational. Indeed, if we had e = b for integers a and b > 0, k! ≥ then we would get kX≥0 P n!be = n!a for every n 0. But this cannot be true, because on the right-hand side we have an integer,≥ while the left-hand side with 1 1 1 1 1 1 e = 1+ + +...+ + + + + ... 1! 2! n! (n + 1)! (n + 2)! (n + 3)!     decomposes into an integral part 1 1 1 bn! 1 + + + ... + 1! 2! n!   and a second part 1 1 1 b + + + ... n + 1 (n + 1)(n + 2) (n + 1)(n + 2)(n + 3)   36 Some irrational numbers

b which is approximately n , so that for large n it certainly cannot be integral: Geometric series b b It is larger than n+1 and smaller than n , as one can see from a comparison For the infinite geometric series with a geometric series: Q = 1 + 1 + 1 + ... q q2 q3 1 1 1 1 with q > 1 we clearly have < + + + ... n + 1 n + 1 (n + 1)(n + 2) (n + 1)(n + 2)(n + 3) qQ = 1 + 1 + 1 + ... = 1 + Q q q2 1 1 1 1 and thus < + 2 + 3 + ... = . 1 n + 1 (n + 1) (n + 1) n Q = . q 1 − Now one might be led to think that this simple multiply–by–n! trick is not even sufficient to show that e2 is irrational. This is a stronger statement: √2 is an example of a number which is irrational, but whose square is not. From John Cosgrave we have learned that with two nice ideas/observations (let’s call them “tricks”) one can get two steps further nevertheless: Each of the tricks is sufficient to show that e2 is irrational, the combination of both of them even yields the same for e4. The first trick may be found in a one page paper by J. Liouville from 1840 — and the second one in a two page “addendum” which Liouville published on the next two journal pages. 2 2 a Why is e irrational? What can we derive from e = b ? According to Liouville we should write this as

1 be = ae− ,

substitute the series 1 1 1 1 1 e = 1 + + + + + + ... 1 2 6 24 120 and 1 1 1 1 1 1 e− = 1 + + ..., − 1 2 − 6 24 − 120 ±

and then multiply by n!, for a sufficiently large even n. Then we see that n!be is nearly integral:

Liouville’s paper 1 1 1 1 n!b 1 + + + + ... + 1 2 6 n!   is an integer, and the rest

1 1 n!b + + ... (n + 1)! (n + 2)!   b b b is approximately n : It is larger than n+1 but smaller than n , as we have seen above. 1 At the same time n!ae− is nearly integral as well: Again we get a large integral part, and then a rest

1 1 1 ( 1)n+1n!a + ... , − (n + 1)! − (n + 2)! (n + 3)! ∓   Some irrational numbers 37

n+1 a and this is approximately ( 1) n . More precisely: for even n the rest is larger than a , but smaller− than − n 1 1 1 a 1 a ... = 1 < 0. − n + 1 − (n + 1)2 − (n + 1)3 − −n + 1 − n     1 But this cannot be true, since for large even n it would imply that n!ae− is just a bit smaller than an integer, while n!be is a bit larger than an integer, 1 so n!ae− = n!be cannot hold.  In order to show that e4 is irrational, we now courageously assume that 4 a e = b were rational, and write this as

2 2 be = ae− .

We could now try to multiply this by n! for some large n, and collect the non-integral summands, but this leads to nothing useful: The sum of the 2n+1 remaining terms on the left-hand side will be approximately b n , on the n+1 right side ( 1)n+1a 2 , and both will be very large if n gets large. − n So one has to examine the situation a bit more carefully, and make two little adjustments to the strategy: First we will not take an arbitrary large n, but a large power of two, n = 2m; and secondly we will not multiply by n!, n! but by 2n−1 . Then we need a little lemma, a special case of Legendre’s theorem (see page 8): For any n 1 the integer n! contains the prime factor 2 at most n 1 times — with≥ equality if (and only if) n is a power of two, n = 2m. − n n This lemma is not hard to show: 2 of the factors of n! are even, 4 of them are divisible by 4, and so on.⌊ ⌋ So if 2k is the largest power⌊ of⌋ two which satisfies 2k n, then n! contains the prime factor 2 exactly ≤ n n n n n n 1 + +...+ + +...+ = n 1 n 1 2 4 2k ≤ 2 4 2k − 2k ≤ − j k j k j k   times, with equality in both inequalities exactly if n = 2k. 2 2 Let’s get back to be = ae− . We are looking at n! n! 2 2 (1) b n 1 e = a n 1 e− 2 − 2 − and substitute the series 2 4 8 2r e2 = 1 + + + + ... + + ... 1 2 6 r! and r 2 2 4 8 r 2 e− = 1 + ... + ( 1) + ... − 1 2 − 6 ± − r! For r n we get integral summands on both sides, namely ≤ n! 2r n! 2r resp. r b n 1 ( 1) a n 1 , 2 − r! − 2 − r! 38 Some irrational numbers

where for r > 0 the denominator r! contains the prime factor 2 at most r 1 times, while n! contains it exactly n 1 times. (So for r > 0 the summands− are even.) − And since n is even (we assume that n = 2m), the series that we get for r n + 1 are ≥ 2 4 8 2b + + + ... n + 1 (n + 1)(n + 2) (n + 1)(n + 2)(n + 3) resp.   2 4 8 2a + ... . − n + 1 (n + 1)(n + 2) − (n + 1)(n + 2)(n + 3) ±   4b 4a These series will for large n be roughly n resp. n , as one sees again by comparison with geometric series. For large n −= 2m this means that the left-hand side of (1) is a bit larger than an integer, while the right-hand side is a bit smaller — contradiction!  So we know that e4 is irrational; to show that e3, e5 etc. are irrational as well, we need heavier machinery (that is, a bit of calculus), and a new idea — which essentially goes back to Charles Hermite, and for which the key is hidden in the following simple lemma. Lemma. For some fixed n 1, let ≥ xn(1 x)n f(x) = − . n!

2n 1 (i) The function f(x) is a polynomial of the form f(x) = c xi, n! i i=n where the coefficients ci are integers. X 1 (ii) For 0 < x < 1 we have 0 < f(x) < n! . (iii) The derivatives f (k)(0) and f (k)(1) are integers for all k 0. ≥

 Proof. Parts (i) and (ii) are clear. For (iii) note that by (i) the k-th derivative f (k) vanishes at x = 0 unless n k 2n, and in this range f (k)(0) = k! c is an integer. From f(x) = ≤ ≤ n! k f(1 x) we get f (k)(x) = ( 1)kf (k)(1 x) for all x, and hence f (k)(1) = ( 1)−kf (k)(0), which is an integer.− −  − Theorem 1. er is irrational for every r Q 0 . ∈ \{ }  Proof. It suffices to show that es cannot be rational for a positive integer s s t s s (if e t were rational, then e t = e would be rational, too). Assume The estimate n! > e( n )n yields an that es = a for integers a, b > 0, and let n be so large that n! > as2n+1. e b  explicit n that is “large enough.” Put

2n 2n 1 2n 2 (2n) F (x) := s f(x) s − f ′(x) + s − f ′′(x) ... + f (x), − ∓ where f(x) is the function of the lemma. Some irrational numbers 39

F (x) may also be written as an infinite sum

2n 2n 1 2n 2 F (x) = s f(x) s − f ′(x) + s − f ′′(x) ..., − ∓ since the higher derivatives f (k)(x), for k > 2n, vanish. From this we see that the polynomial F (x) satisfies the identity

2n+1 F ′(x) = s F (x) + s f(x). − Thus differentiation yields

d sx sx sx 2n+1 sx [e F (x)] = se F (x) + e F ′(x) = s e f(x) dx and hence 1 N := b s2n+1esxf(x)dx = b [esxF (x)]1 = aF (1) bF (0). 0 − Z0 This is an integer, since part (iii) of the lemma implies that F (0) and F (1) are integers. However, part (ii) of the lemma yields estimates for the size of N from below and from above, 1 1 as2n+1 0 < N = b s2n+1esxf(x)dx < bs2n+1es = < 1, n! n! Z0 which shows that N cannot be an integer: contradiction.  Now that this trick was so successful, we use it once more.

Theorem 2. π2 is irrational.  2 a Proof. Assume that π = b for integers a, b > 0. We now use the polynomial

n 2n 2n 2 (2) 2n 4 (4) F (x) := b π f(x) π − f (x) + π − f (x) ... , − ∓   2 n 2n+2 which satisfies F ′′(x) = π F (x) + b π f(x). − From part (iii) of the lemma we get that F (0) and F (1) are integers. π is not rational, but it does have “good Elementary differentiation rules yield approximations” by rationals — some of these were known since antiquity: d 2 22 F ′(x) sin πx πF (x) cos πx = F ′′(x) + π F (x) sin πx = 3.142857142857... dx − 7 355 = 3.141592920353...   = bnπ2n+2f(x) sin πx 113 104348 = 3.141592653921... = π2anf(x) sin πx, 33215 π = 3.141592653589... and thus we obtain

1 1 n 1 N := π a f(x) sin πx dx = F ′(x) sin πx F (x) cos πx π − 0 Z0 = Fh (0) + F (1), i which is an integer. Furthermore N is positive since it is defined as the 40 Some irrational numbers

integral of a function that is positive (except on the boundary). However, πan if we choose n so large that n! < 1, then from part (ii) of the lemma we obtain 1 πan 0 < N = π anf(x) sin πx dx < < 1, n! Z0 a contradiction.  Here comes our final irrationality result.

Theorem 3. For every odd integer n 3, the number ≥ 1 1 A(n) := arccos π √n   is irrational.

We will need this result for Hilbert’s third problem (see Chapter 9) in the 1 cases n = 3 and n = 9. For n = 2 and n = 4 we have A(2) = 4 and 1 A(4) = 3 , so the restriction to odd integers is essential. These values are easily derived by appealing to the diagram in the margin, in which the 1 1 statement “ π arccos √n is irrational” is equivalent to saying that the 0 1 1 polygonal arc constructed from 1 , all of whose chords have the same √n  √n length, never closes into itself. We leave it as an exercise for the reader to show that A(n) is rational only for n 1, 2, 4 . For that, distinguish the cases when n = 2r, and when n is not∈ a power { of} 2.  Proof. We use the addition theorem

α+β α β cos α + cos β = 2 cos 2 cos −2 from elementary trigonometry, which for α = (k + 1)ϕ and β = (k 1)ϕ yields − cos (k + 1)ϕ = 2 cos ϕ cos kϕ cos (k 1)ϕ. (2) − − 1 1 For the angle ϕn = arccos √n , which is defined by cos ϕn = √n and 0 ϕ π, this yields representations of the form ≤ n ≤  Ak cos kϕn = , √nk

where A is an integer that is not divisible by n, for all k 0. In fact, k ≥ we have such a representation for k = 0, 1 with A0 = A1 = 1, and by induction on k using (2) we get for k 1 ≥ 1 Ak Ak 1 2Ak nAk 1 − − cos (k + 1)ϕn = 2 k k 1 = − k+1 . √n √n − √n − √n

Thus we obtain Ak+1 = 2Ak nAk 1. If n 3 is odd, and Ak is not − − ≥ divisible by n, then we find that Ak+1 cannot be divisible by n, either. Some irrational numbers 41

Now assume that 1 k A(n) = ϕ = π n ℓ is rational (with integers k, ℓ > 0). Then ℓϕn = kπ yields A 1 = cos kπ = ℓ . ± √nℓ

Thus √nℓ = A is an integer, with ℓ 2, and hence n √nℓ. With ± ℓ ≥ | √nℓ A we find that n divides A , a contradiction.  | ℓ ℓ

References

[1] C. HERMITE: Sur la fonction exponentielle, Comptes rendus de l’Académie des Sciences () 77 (1873), 18-24; Œuvres de Charles Hermite, Vol. III, Gauthier-Villars, Paris 1912, pp. 150-181.

[2] Y. IWAMOTO: A proof that π2 is irrational, J. Osaka Institute of Science and Technology 1 (1949), 147-148.

[3] J. F. KOKSMA: On Niven’s proof that π is irrational, Nieuw Archief voor Wiskunde (2) 23 (1949), 39.

[4] J. LIOUVILLE: Sur l’irrationalité du nombre e = 2,718..., Journal de Mathé- matiques Pures et Appl. (1) 5 (1840), 192; Addition, 193-194.

[5] I. NIVEN: A simple proof that π is irrational, Bulletin Amer. Math. Soc. 53 (1947), 509.

Three times π2/6 Chapter 8

1 We know that the infinite series n 1 n does not converge. Indeed, in Chapter 1 we have seen that even the series≥ 1 diverges. P p P p However, the sum of the reciprocals of the squares∈ converges (although P very slowly, as we will also see), and it produces an interesting value.

Euler’s series. 1 π2 = . n2 6 n 1 X≥

This is a classical, famous and important result by Leonhard Euler from 1734. One of its key interpretations is that it yields the first non-trivial value ζ(2) of Riemann’s zeta function (see the appendix on page 49). This value is irrational, as we have seen in Chapter 7. But not only the result has a prominent place in mathematics history, there are also a number of extremely elegant and clever proofs that have their history: For some of these the joy of discovery and rediscovery has been 1 = 1.000000 1 shared by many. In this chapter, we present three such proofs. 1+ 4 = 1.250000 1 1 1+ 4 + 9 = 1.361111  Proof. The first proof appears as an exercise in William J. LeVeque’s 1 1 1 1+ 4 + 9 + 16 = 1.423611 number theory textbook from 1956. But he says: “I haven’t the slightest 1 1 1 1 1+ 4 + 9 + 16 + 25 = 1.463611 idea where that problem came from, but I’m pretty certain that it wasn’t 1 1 1 1 1 1+ 4 + 9 + 16 + 25 + 36 = 1.491388 original with me.” π2/6 = 1.644934. The proof consists in two different evaluations of the double integral

1 1 1 I := dx dy. 1 xy Z0 Z0 − 1 For the first one, we expand 1 xy as a geometric series, decompose the summands as products, and integrate− effortlessly: 1 1 1 1 I = (xy)n dx dy = xnyn dx dy n 0 n 0 Z0 Z0 X≥ X≥ Z0 Z0 1 1 1 1 = xndx yndy = n + 1 n + 1 n 0    n 0 X≥ Z0 Z0 X≥ 1 1 = = = ζ(2). (n + 1)2 n2 n 0 n 1 X≥ X≥ 44 Three times π2/6

This evaluation also shows that the double integral (over a positive function with a pole at x = y = 1) is finite. Note that the computation is also easy and straightforward if we read it backwards — thus the evaluation of ζ(2) leads one to the double integral I. y The second way to evaluate I comes from a change of coordinates: in the u y+x y x new coordinates given by u := 2 and v := −2 the domain of integra- 1 1 √ tion is a square of side length 2 2, which we get from the old domain by first rotating it by 45◦ and then shrinking it by a factor of √2. Substitution of x = u v and y = u + v yields − 1 1 = . v 1 xy 1 u2 + v2 − − To transform the integral, we have to replace dx dy by 2 du dv, to com- x pensate for the fact that our coordinate transformation reduces areas by a 1 constant factor of 2 (which is the Jacobi determinant of the transformation; see the box on the next page). The new domain of integration, and the v function to be integrated, are symmetric with respect to the u-axis, so we just need to compute two times (another factor of 2 arises here!) the inte- 1 gral over the upper half domain, which we split into two parts in the most 2 natural way:

1/2 u 1 1 u − I1 I2 u dv dv I = 4 2 2 du + 4 2 2 du. 1 1 u + v 1 u + v 2 1 Z0  Z0 −  1Z/2  Z0 −  dx 1 x Using = arctan + C , this becomes a2 + x2 a a Z 1/2 1 u I = 4 arctan du √1 u2 √1 u2 Z0 −  −  1 1 1 u + 4 arctan − du. √1 u2 √1 u2 1Z/2 −  −  These integrals can be simplified and finally evaluated by substituting u = sin θ resp. u = cos θ. But we proceed more directly, by computing that the derivative of g(u) := arctan u is g (u) = 1 , while the deriva- √1 u2 ′ √1 u2 − − 1 u 1 u 1 1 tive of h(u) := arctan − = arctan − is h (u) = . √1 u2  1+u ′ 2 √1 u2 − − − b 1 2 b 1 2 1 2 So we may use f ′(x)f(x)dx= f(x)q= f(b) f(a) and get a 2 a 2 − 2 R 1/2   1 I = 4 g′(u)g(u) du + 4 2h′(u)h(u) du − Z0 Z1/2 1/2 1 = 2 g(u)2 4 h(u)2 0 − 1/2 h i h i = 2g( 1 )2 2g(0)2 4h(1)2 + 4h( 1 )2 2 − − 2 2 2 2 = 2 π 0 0 + 4 π = π .  6 − − 6 6   Three times π2/6 45

This proof extracted the value of Euler’s series from an integral via a rather simple coordinate transformation. An ingenious proof of this type — with The Substitution Formula an entirely non-trivial coordinate transformation — was later discovered by To compute a double integral Beukers, Calabi and Kolk. The point of departure for that proof is to split 1 the sum n 1 n2 into the even terms and the odd terms. Clearly the even I = f(x, y) dx dy. ≥ terms 1 + 1 + 1 + ... = 1 sum to 1 ζ(2), so the odd terms Z 22P 42 62 k 1 (2k)2 4 S 1 1 1 ≥1 12 + 32 + 52 + ... = k 0 (2k+1)2 make up three quarters of the total we may perform a substitution of ≥ P sum ζ(2). Thus Euler’s series is equivalent to variables P x = x(u, v) y = y(u, v), 1 π2 = . if the correspondence of (u, v) T (2k + 1)2 8 ∈ k 0 to (x, y) S is bijective and contin- ≥ ∈ X uously differentiable. Then I equals

d(x, y)  Proof. As above, we may express this as a double integral, namely f(x(u, v), y(u, v)) du dv, d(u, v) 1 1 TZ ˛ ˛ ˛ ˛ 1 1 ˛ ˛ J = dx dy = . where d(x,y) is the Jacobi determi- 1 x2y2 (2k + 1)2 d(u,v) k 0 Z0 Z0 − X≥ nant: So we have to compute this integral J. And for this Beukers, Calabi and d(x, y) dx dx = det du dv . Kolk proposed the new coordinates d(u, v) dy dy du dv ! 1 x2 1 y2 u := arccos − v := arccos − . s1 x2y2 s1 x2y2 − − To compute the doubleintegral, we may ignore the boundaryof the domain, and consider x, y in the range 0 < x < 1 and 0 < y < 1. Then u, v will lie y in the triangle u > 0, v > 0, u + v < π/2. The coordinate transformation can be inverted explicitly, which leads one to the substitution 1 sin u sin v x = and y = . S cos v cos u It is easy to check that these formulas define a bijective coordinate transfor- x mation between the interior of the unit square S = (x, y) : 0 x, y 1 1 and the interior of the triangle T = (u, v): u, v {0, u + v ≤π/2 .≤ } { ≥ ≤ } Now we have to compute the Jacobi determinant of the coordinate transfor- v mation, and magically it turns out to be π cos u sin u sin v 2 2 2 2 sin u sin v det cos v cos v = 1 = 1 x2y2. sin u sin v cos v cos2 u cos2 v cos2 u cos u ! − − But this means that the integral that we want to compute is transformed into

π/2 π/2 u − T J = 1 du dv, u Z0 Z0 π 2 1 π 2 π2  which is just the area 2 ( 2 ) = 8 of the triangle T . 46 Three times π2/6

Beautiful — even more so, as the same method of proof extends to the computation of ζ(2k) in terms of a 2k-dimensional integral, for all k 1. We refer to the original paper of Beuker, Calabi and Kolk [2], and≥ to Chapter 23, where we’ll achieve this on a different path, using the Herglotz trick and Euler’s original approach. After these two proofs via coordinate transformation we can’t resist the temptation to present another, entirely different and completely elementary 1 π2 proof for n 1 n2 = 6 . It appears in a sequence of exercises in the problem book≥ by the twin brothers Akiva and Isaak Yaglom, whose Russian original editionP appeared in 1954. Versions of this beautiful proof were rediscovered and presented by F. Holme (1970), I. Papadimitriou (1973), and by Ransford (1982) who attributed it to John Scholes.  Proof. The first step is to establish a remarkable relation between values of the (squared) cotangent function. Namely, for all m 1 one has For m = 1, 2, 3 this yields ≥ 2 π 2 2π 2 mπ 2m(2m 1) 2 π 1 cot + cot + ... + cot = − . (1) cot 3 = 3 2m+1 2m+1 2m+1 6 2 π 2 2π cot + cot = 2 ix 5 5 To establish this, we start with the relation e = cos x + i sin x. Taking 2 π 2 2π 2 3π inx ix n cot 7 + cot 7 + cot 7 = 5 the n-th power e = (e ) , we get cos nx + i sin nx = (cos x + i sin x)n. The imaginary part of this is

n n 1 n 3 n 3 sin nx = sin x cos − x sin x cos − x ... (2) 1 − 3 ±     Now we let n = 2m + 1, while for x we will consider the m different rπ values x = 2m+1 , for r = 1, 2, . . . , m. For each of these values we have π nx = rπ, and thus sin nx = 0, while 0 < x < 2 implies that for sin x we get m distinct positive values. In particular, we can divide (2) by sinn x, which yields

n n 1 n n 3 0 = cot − x cot − x ..., 1 − 3 ±     that is, 2m + 1 2m 2m + 1 2m 2 0 = cot x cot − x ... 1 − 3 ±     for each of the m distinct values of x. Thus for the polynomial of degree m

2m + 1 m 2m + 1 m 1 m 2m + 1 p(t) := t t − ... + ( 1) 1 − 3 ± − 2m + 1       we know m distinct roots

2 rπ ar = cot 2m+1 for r = 1, 2, . . . , m. Hence the polynomial coincides with 2m + 1 p(t) = t cot2 π t cot2 mπ . 1 − 2m+1 ··· − 2m+1     Three times π2/6 47

m 1 Comparison of the coefficients of t − in p(t) now yields that the sum of Comparison of coefficients: the roots is If p(t) = c(t a1) (t am), − ··· − then the coefficient of m−1 is 2m+1 t 3 2m(2m 1) c(a1 + ... + am). a1 + ... + ar = 2m+1 = 6 − , − 1  which proves (1).  We also need a second identity, of the same type,

2 π 2 2π 2 mπ 2m(2m+2) csc 2m+1 + csc 2m+1 + ... + csc 2m+1 = 6 , (3)

  1  for the cosecant function csc x = sin x . But

1 cos2 x + sin2 x csc2 x = = = cot2 x + 1, sin2 x sin2 x so we can derive (3) from (1) by adding m to both sides of the equation. Now the stage is set, and everything falls into place. We use that in the π range 0 < y < 2 we have

0 < sin y < y < tan y, 0 < a < b < c and thus implies 1 1 1 0 < c < b < a 1 0 < cot y < y < csc y, which implies

2 1 2 cot y < y2 < csc y. Now we take this double inequality,apply it to each of the m distinct values of x, and add the results. Using (1) for the left-hand side, and (3) for the right-hand side, we obtain

2m(2m 1) 2m+1 2 2m+1 2 2m+1 2 2m(2m+2) 6 − < π + 2π + ... + mπ < 6 , that is,   

π2 2m 2m 1 1 1 1 π2 2m 2m+2 6 2m+1 2m+1− < 12 + 22 + ... + m2 < 6 2m+1 2m+1 .

π2 Both the left-hand and the right-hand side converge to 6 for m : end of proof. −→ ∞

1 2 So how fast does n2 converge to π /6? For this we have to estimate the difference m P π2 1 ∞ 1 = . 6 − n2 n2 n=1 n=m+1 X X 48 Three times π2/6

This is very easy with the technique of “comparing with an integral” that we have reviewed already in the appendix to Chapter 2 (page 10). It yields

1 f(t) = ∞ 1 ∞ 1 1 t2 < dt = n2 t2 m n=m+1 m X Z for an upper bound and

1 ∞ 1 ∞ 1 1 (m+1)2 > dt = n2 t2 m + 1 ... n=m+1 m+1 X Z m + 1 t for a lower bound on the “remaining summands” — or even

∞ 1 ∞ 1 1 2 > 2 dt = 1 n 1 t m + n=m+1 m+ 2 2 X Z if you are willing to do a slightly more careful estimate, using that the 1 function f(t) = t2 is convex. This means that our series does not converge too well; if we sum the first one thousand summands, then we expect an error in the third digit after the decimal point, while for the sum of the first one million summands, m = 1000000, we expect to get an error in the sixth decimal digit, and we do. However, then comes a big surprise: to an accuracy of 45 digits, π2/6 = 1.644934066848226436472415166646025189218949901, 6 10 1 = 1.644933066848726436305748499979391855885616544. n2 n=1 X So the sixth digit after the decimal point is wrong (too small by 1), but the next six digits are right! And then one digit is wrong (too large by 5), then again five are correct. This surprising discovery is quite recent, due to Roy D. North from Colorado Springs, 1988. (In 1982, Martin R. Powell, a school teacher from Amersham, Bucks, England, failed to notice the full effect due to the insufficient computing power available at the time.) It is too strange to be purely coincidental ... A look at the error term, which again to 45 digits reads

∞ 1 2 = 0.000000999999500000166666666666633333333333357, 6 n n=10X+1 reveals that clearly there is a pattern. You might try to rewrite this last number as

6 1 12 1 18 1 30 1 42 + 10− 10− + 10− 10− + 10− + ... − 2 6 − 30 42 1 1 1 1 6i where the coefficients (1, 2 , 6 , 0, 30 , 0, 42 ) of 10− form the be- ginning of the sequence of−Bernoulli− numbers that we’ll meet again in Chapter 23. We refer our readers to the article by Borwein, Borwein & Dilcher [3] for more such surprising “coincidences” — and for proofs. Three times π2/6 49

Appendix: The The Riemann zeta function ζ(s) is defined for real s > 1 by 1 ζ(s) := . ns n 1 X≥ Our estimates for Hn (see page 10) imply that the series for ζ(1) diverges, but for any real s > 1 it does converge. The zeta function has a canonical continuation to the entire complex plane (with one simple pole at s = 1), which can be constructed using power series expansions. The resulting complex function is of utmost importance for the theory of prime numbers. Let us mention four diverse connections: (1) The remarkable identity 1 ζ(s) = 1 p s p − Y − is due to Euler. It encodes the basic fact that every natural number has a unique (!) decomposition into prime factors; using this, Euler’s identity is a simple consequence of the geometric series expansion 1 1 1 1 = 1 + + + + ... 1 p s ps p2s p3s − − (2) The following marvelous argument of Don Zagier computes ζ(4) from ζ(2). Consider the function 2 1 2 f(m, n) = + + m3n m2n2 mn3 for integers m, n 1. It is easily verified that for all m and n, ≥ 2 f(m, n) f(m + n, n) f(m, m + n) = . − − m2n2 Let us sum this equation over all m, n 1. If i = j, then (i, j) is either of the form (m + n, n) or of the form (m,≥ m + n),6 for m, n 1. Thus, in the sum on the left-hand side all terms f(i, j) with i = j cancel,≥ and so 6 5 f(n, n) = = 5ζ(4) n4 n 1 n 1 X≥ X≥ remains. For the right-hand side one obtains 2 1 1 = 2 = 2ζ(2)2, m2n2 m2 · n2 m,n 1 m 1 n 1 X≥ X≥ X≥ and out comes the equality 5ζ(4) = 2ζ(2)2.

π2 π4 With ζ(2) = 6 we thus get ζ(4) = 90 . Another derivation via Bernoulli numbers appears in Chapter 23. 50 Three times π2/6

(3) It has been known for a long time that ζ(s) is a rational multiple of πs, and hence irrational, if s is an even integer s 2; see Chapter 23. In contrast, the irrationality of ζ(3) was proved by≥ Roger Apéry only in 1979. Despite considerable effort the picture is rather incomplete about ζ(s) for the other odd integers, s = 2t + 1 5. Very recently, Keith Ball and Tanguy Rivoal proved that infinitely many≥ of the values ζ(2t + 1) are irrational. And indeed, although it is not known for any single odd value s 5 that ζ(s) is irrational, Wadim Zudilin has proved that at least one of≥ the four values ζ(5), ζ(7), ζ(9), and ζ(11) is irrational. We refer to the beautiful survey by Fischler. (4) The location of the complex zeros of the zeta function is the subject of the “Riemann hypothesis”: one of the most famous and important unre- solved conjectures in all of mathematics. It claims that all the non-trivial 1 zeros s C of the zeta function satisfy Re(s) = 2 . (The zeta function vanishes∈ at all the negative even integers, which are referred to as the “trivial zeros.”) Very recently, Jeff Lagarias showed that, surprisingly, the Riemann hypo- thesis is equivalent to the following elementary statement: For all n 1, ≥ d H + exp(H ) log(H ), ≤ n n n d n X| with equality only for n = 1, where Hn is again the n-th harmonic number.

References

[1] K. BALL & T. RIVOAL: Irrationalité d’une infinité de valeurs de la fonction zêta aux entiers impairs, Inventiones math. 146 (2001), 193-207.

[2] F. BEUKERS,J.A.C.KOLK &E.CALABI: Sums of generalized harmonic series and volumes, Nieuw Archief voor Wiskunde (4) 11 (1993), 217-224.

[3] J. M. BORWEIN, P. B. BORWEIN &K.DILCHER: Pi, Euler numbers, and asymptotic expansions, Amer. Math. Monthly 96 (1989), 681-687.

[4] S. FISCHLER: Irrationalité de valeurs de zêta (d’après Apéry, Rivoal, . . . ), Bourbaki Seminar, No. 910, November 2002; Astérisque 294 (2004), 27-62.

[5] J. C. LAGARIAS: An elementary problem equivalent to the Riemann hypo- thesis, Amer. Math. Monthly 109 (2002), 534-543.

[6] W. J. LEVEQUE: Topics in Number Theory, Vol. I, Addison-Wesley, Reading MA 1956.

[7] A. M. YAGLOM &I.M.YAGLOM: Challenging mathematical problems with elementary solutions, Vol. II, Holden-Day, Inc., San Francisco, CA 1967.

[8] D. ZAGIER: Values of zeta functions and their applications, Proc. First Euro- pean Congress of Mathematics, Vol. II (Paris 1992), Progress in Math. 120, Birkhäuser, Basel 1994, pp. 497-512.

[9] W. ZUDILIN: Arithmetic of linear forms involving odd zeta values, J. Théorie Nombres Bordeaux 16 (2004), 251-291. Proofs from THE BOOK 269 About the Illustrations

We are happy to have the possibility and privilege to illustrate this volume with wonderful original drawings by Karl Heinrich Hofmann (Darmstadt). Thank you! The regular polyhedra on page 76 and the fold-out map of a flexible sphere on page 84 are by WAF Ruppert (Vienna). Jürgen Richter-Gebert (Munich) provided the two illustrations on page 78, and Ronald Wotzlaw wrote the nice postscript graphics for page 134. Page 231 features the Weisman Art Museum in Minneapolis designed by Frank Gehry. The photo of its west façade is by Chris Faust. The floorplan is of the Dolly Fiterman Riverview Gallery behind the west façade. The portraits of Bertrand, Cantor, Erdos,˝ Euler, Fermat, Herglotz, Hermite, Hilbert, Pólya, Littlewood, and Sylvester are all from the photo archives of the Mathematisches Forschungsinstitut Oberwolfach, with permission. (Many thanks to Annette Disch!) The Gauss portrait on page 23 is a lithograph by Siegfried Detlev Bendixen published in Astronomische Nachrichten 1828, as provided by Wikipedia. The picture of Hermite is from the first volume of his collected works. The Eisenstein portrait is reproduced with friendly permission by Prof. Karin Reich from a collection of portrait cards owned by the Mathema- tische Gesellschaft Hamburg. The portrait stamps of Buffon, Chebyshev, Euler, and Ramanujan are from Jeff Miller’s mathematical stamps website http://jeff560.tripod.com with his generous permission. The photo of Claude Shannon was provided by the MIT Museum and is here reproduced with their permission. The portrait of Cayley is taken from the “Photoalbum für Weierstraß” (edited by Reinhard Bölling, Vieweg 1994), with permission from the Kunst- bibliothek, Staatliche Museen zu Berlin, Preussischer Kulturbesitz. The Cauchy portrait is reproduced with permission from the Collections de l’École Polytechnique, Paris. The picture of Fermat is reproducedfrom Ste- fan Hildebrandt and Anthony Tromba: The Parsimonious Universe. Shape and Form in the Natural World, Springer-Verlag, New York 1996. The portrait of Ernst Witt is from volume 426 (1992) of the Journal für die Reine und Angewandte Mathematik, with permission by Walter de Gruyter Publishers. It was taken around 1941. The photo of Karol Borsuk was taken in 1967 by Isaac Namioka, and is reproduced with his kind permission. We thank Dr. Peter Sperner (Braunschweig) for the portrait of his father, and Vera Sós for the photo of Paul Turán. Thanks to Noga Alon for the portrait of A. Nilli! Index

acyclic directed graph, 196 Chebyshev’s theorem, 140 addition theorems, 150 chromatic number, 221, 251 adjacency matrix, 248 class formula, 32 adjacent vertices, 66 clique, 67, 235, 242 antichain, 179 clique number, 237 arithmetic mean, 119 2-colorable set system, 261 art gallery theorem, 232 combinatorially equivalent, 61 average degree, 76 comparison of coefficients, 47 average number of divisors, 164 complete bipartite graph, 66 complete graph, 66 Bernoulli numbers, 48, 152 complex polynomial, 139 Bertrand’s postulate, 7 components of a graph, 67 bijection, 103, 207 cone lemma, 56 Binet–Cauchy formula, 197, 203 confusion graph, 241 binomial coefficient, 13 congruent, 61 bipartite graph, 67, 223 connected, 67 birthday paradox, 185 connected components, 67 Bolyai–Gerwien Theorem, 53 continuum, 109 Borsuk’s conjecture, 95 continuum hypothesis, 112 Borsuk–Ulam theorem, 252 convex polytope, 59 Bricard’s condition, 57 convex vertex, 232 Brouwer’s fixed point theorem, 169 cosine polynomial, 143 Buffon’s needle problem, 155 countable, 103 coupon collector’s problem, 186 Calkin–Wilf tree, 105 critical family, 182 Cantor–Bernstein theorem, 110 crossing lemma, 267 capacity, 242 crossing number, 266 cardinal number, 103 cube, 60 cardinality, 103, 115 cycle, 67 Cauchy’s arm lemma, 82 C4-condition, 257 Cauchy’s minimum principle, 127 C4-free graph, 166 Cauchy’s rigidity theorem, 81 Cauchy–Schwarz inequality, 119 degree, 76 Cayley’s formula, 201 dense, 114 center, 31 determinants, 195 centralizer, 31 dihedral angle, 57 centrally symmetric, 61 dimension, 109 chain, 179 dimension of a graph, 162 channel, 241 Dinitz problem, 221 Chebyshev polynomials, 144 directed graph, 223 272 Index

division ring, 31 hyper-binary representation, 106 double counting, 164 dual graph, 75, 227 incidence matrix, 64, 164 incident, 66 edge of a graph, 66 indegree, 223 edge of a polyhedron, 60 independencenumber, 241, 251, 264 elementary polygon, 79 independent set, 67, 221 equal size, 103 induced subgraph, 67, 222 equicomplementability, 54 inequalities, 119 equicomplementable polyhedra, 53 infinite products, 207 equidecomposability, 54 initial ordinal number, 116 equidecomposable polyhedra, 53 intersecting family, 180, 251 Erdos–Ko–Rado˝ theorem, 180 involution, 20 Euler’s criterion, 24 irrational numbers, 35 Euler’s function, 28 isomorphic graphs, 67 Euler’s polyhedron formula, 75 Jacobi determinants, 45 Euler’s series, 43 even function, 152 kernel, 223 expectation, 94 Kneser graph, 251 Kneser’s conjecture, 252 face, 60, 75 facet, 60 labeled tree, 201 Fermat number, 3 Lagrange’s theorem, 4 finite field, 31 Latin rectangle, 214 finite fields, 28 Latin square, 213, 221 finite set system, 179 lattice, 79 forest, 67 lattice basis, 80 formal power series, 207 lattice paths, 195 four-color theorem, 227 lattice points, 26 friendship theorem, 257 law of quadratic reciprocity, 24 fundamentaltheorem of algebra, 127 Legendre symbol, 23 Legendre’s theorem, 8 Gale’s theorem, 253 lexicographically smallest solution, 56 Gauss lemma, 25 line graph, 226 Gauss sum, 27 linear extension, 176 general position, 253 linearity of expectation, 94, 156 geometric mean, 119 list chromatic number, 222 Gessel–Viennot lemma, 195 list coloring, 222, 228 girth, 263 Littlewood–Offord problem, 145 golden section, 245 loop, 66 graph, 66 Lovász’ theorem, 247 graph coloring, 227 Lovász umbrella, 244 graph of a polytope, 60 Lyusternik–Shnirel’mantheorem, 252

harmonic mean, 119 Markov’s inequality, 94 harmonic number, 10 marriage theorem, 182 Herglotz trick, 149 matching, 224 Hilbert’s third problem, 53 matrix of rank 1, 96 Index 273 matrix-tree theorem, 203 prime number theorem, 10 Mersenne number, 4 probabilistic method, 261 Minkowski symmetrization, 91 probability distribution, 236 mirror image, 61 probability space, 94 monotone subsequences, 162 product of graphs, 241 Monsky’s Theorem, 133 projective plane, 167 multiple edges, 66 museum guards, 231 Mycielski graph, 264 quadratic nonresidue, 23 quadratic reciprocity, 24 near-triangulatedplane graph, 228 quadratic residue, 23 nearly-orthogonal vectors, 96 needles, 155 rainbow triangle, 133 neighbors, 66 Ramsey number, 262 Newman’s function, 107 random variable, 94 non-Archimedeanreal valuation, 132 rate of transmission, 241 non-Archimedean valuation, 136 red-blue segment, 135 refining sequence, 205 obtuse angle, 89 Riemann zeta function, 49 odd function, 150 riffle shuffles, 191 order of a group element, 4 Rogers–Ramanujan identities, 211 ordered abelian group, 136 rooted forest, 205 ordered set, 115 roots of unity, 33 ordinal number, 115 orthonormal representation, 244 outdegree, 223 scalar product, 96 Schönhardt’s polyhedron, 232 p-adic value, 132 segment, 54 partial Latin square, 213 Shannon capacity, 242 partition, 207 shuffling cards, 185 partition identities, 207 simple graph, 66 path, 67 simplex, 60 path matrix, 195 size of a set, 103 pearl lemma, 55 slope problem, 69 pentagonal numbers, 209 speed of convergence, 47 periodic function, 150 Sperner’s lemma, 169 Petersen graph, 251 Sperner’s theorem, 179 Pick’s theorem, 79 squares, 18 pigeon-hole principle, 161 stable matching, 224 planar graph, 75 star, 65 plane graph, 75, 228 Stern’s diatomic series, 104 point configuration, 69 Stirling’s formula, 11 polygon, 59 stopping rules, 188 polyhedron, 53, 59 subgraph, 67 polynomialwith real roots, 121, 142 sums of two squares, 17 polytope, 89 Sylvester’s theorem, 13 prime field, 18 Sylvester–Gallai theorem, 63, 78 prime number, 3, 7 system of distinct representatives, 181 274 Index

tangential rectangle, 121 valuation ring, 136 tangential triangle, 121 valuations, 131, 136 top-in-at-random shuffles, 187 vertex, 60, 66 touching simplices, 85 vertex degree, 76, 165, 222 tree, 67 vertex-disjoint path system, 195 triangle-free graph, 264 volume, 84 Turán graph, 235 Turán’s graph theorem, 235 weighted directed graph, 195 two square theorem, 17 well-ordered, 115 well-ordering theorem, 115 umbrella, 244 windmill graph, 257 unimodal, 12 unit d-cube, 60 zero-error capacity, 242 Zorn’s lemma, 137