<<

Princeton Companion to Proof 1

Algebraic and we can think of 1−i=1−e2πi/4 as the world’s simplest trigonometric sum; we shall see general- By Barry Mazur izations of this√ for all quadratic surds below. We can also view 2 as a limit of various infinite The roots of our subject go back to ancient Greece sequences, one of which is given by the elegant while its branches touch all aspects of continued contemporary mathematics. In 1801 the Disquisi- √ 1 tiones Arithmeticae of was 2=1+ 1 . (1.2) 2+ . first published, a “founding treatise,” if ever there 2+ .. was one, for the modern attitude towards Directly connected to this continued fraction (1.2) theory. Many of the still unachieved aims of cur- is the Diophantine rent research can be seen, at least in embryonic form, as arising from Gauss’s work. 2X2 − Y 2 = ±1 (1.3) This article is meant to serve as a companion to the reader who might be interested in learning, and known as the Pell equation. There are infinitely thinking about, some of the classical theory of alge- many pairs of (x, y) satisfying this equa- braic numbers. Much can be understood, and much tion, and the corresponding y/x are pre- of the beauty of algebraic numbers can be appreci- cisely what you get by truncating the expression ated, with a minimum of theoretical background. in (1.2). For example, the first few solutions are I recommend that readers who wish to begin this (1, 1), (2, 3), (5, 7), and (12, 17), and ⎫ journey carry in their backpacks Gauss’s Disqui- 3 1 =1+ =1.5, ⎪ sitiones Arithmeticae as well as Davenport’s The 2 2 ⎪ 1 ⎪ Higher (1992) which is one of the gems 7 =1+ =1.4, ⎬ 5 2+ 1 of exposition of the subject, and which explains the 2 (1.4) ⎪ founding ideas clearly and in depth using hardly 1 ⎪ 17 ⎪ 12 =1+ 1 =1.416 ... .⎭ anything more than high-school mathematics. 2+ 1 2+ 2 Replace the ±1 on the right-hand side of (1.3) by 1 The Root of 2 zero and you get 2X2 − Y 2 = 0, an equation all of whose positive real-number√ solutions (X, Y ) have The study of algebraic numbers and algebraic the Y/X = 2, so it is easy to see that the integers begins with, and constantly reverts back sequence of fractions (1.4)√ (these being alternately to, the study of ordinary rational numbers and larger√ and smaller than 2=1.414 ...) converges ordinary integers. The first algebraic irrationali- to 2 in the limit. Even more striking is that ties occurred not so much as numbers but rather (1.4) is√ the full list of fractions that best approx- as obstructions to simple answers to questions in imate 2. (A a/d is said to be geometry. a best approximant to a α if a/d is That the ratio of the diagonal of a square to the closer to α than any rational number of denomina- length of its side cannot be expressed as a ratio tor smaller than or equal to d.) To deepen the pic- of whole numbers is purported to be one of the ture, consider another important infinite expres- vexing discoveries of the early Pythagoreans. But sion, the conditionally convergent this very ratio, when squared, is 2:1. So we might— √ and later mathematicians certainly did—deal with log( 2+1) √ =1− 1 − 1 + 1 + 1 + ···± 1 + ··· , it algebraically. We can think of this ratio as a 2 3 5 7 9 n cipher, about which we know nothing beyond the (1.5) fact that its square is 2 (a viewpoint taken toward Here the n range over positive odd numbers, and ± 1 by Kronecker√ , as we shall see the of the term n is plus if n has a remain- below). We can write 2 in various forms, e.g. der of 1 or 7 when divided by 8, and it is minus √ otherwise, i.e. if n has a remainder of 3 or 5 when 2=|1 − i|, (1.1) divided by 8. This elegant formula (1.5), which you 2 Princeton Companion to Mathematics Proof

Its continued-fraction expansion is √ 1 1 2 (1 + 5)=1+ 1 , (2.2) 1+ . 1+ .. where the sequence of fractions obtained by suc- cessive truncations of this continued fraction, y 1 2 3 5 8 13 21 34 x = 1 , 1 , 2 , 3 , 5 , 8 , 13 , 21 , ..., (2.3) is the sequence of best rational-number approxi- mants to √ 1 2 (1 + 5)=1.618033988749894848 ..., where “best” has the sense already mentioned. For example, the fraction 34 1 =1+ 21 1 1+ 1+ 1 1+ 1 1+ 1 1+ 1 1+ 1 1 Figure 1.1. The outer rectangle has its height-to-width equals 1.619047619047619047 ... and is closer to ratio equal to the golden mean. If you remove a square the golden mean than any fraction with denomi- from it as indicated in the figure, you are left with a nator less than 21. rectangle that has the golden mean as its width-to- Nevertheless, the exclusive appearance of 1s in height ratio. This procedure is of course repeatable. this continued fraction1 can be used to show that, among all irrational real numbers, the golden mean are invited to “check out” at least to one digit is the number that is, in a specific technical sense, accuracy with a calculator, is an instance of the least well approximated by rational numbers. powerful and general theory of analytic formulas Readers familiar with the sequence of Fibonacci numbers will recognize them in the successive for special values of L-functions, which plays the denominators of (2.3)—and in the numerators as role of a bridge between the more algebraic and the well. The analogue to equation (1.2) is more analytic sides of the story. When we allude to this, below, we will call it “the analytic formula,” X2 + XY − Y 2 = ±1. (2.4) for short. This time, if you replace the ±1 on the right-hand side of the equation by 0, you get the equation X2 + XY − Y 2 = 0, whose positive real-number√ 2 The Golden Mean 1 solutions (X, Y ) have the ratio Y/X = 2 (1+ 5)— that is, the golden mean. And now the numerators If you are looking for quadratic irrationalities that and denominators y, x that appear in (2.3) run have been the subject√ of geometric fascination through the positive solutions of (2.4). The through the ages, then √2 has a strong competi- analogue of formula (1.4) (the “analytic formula”) 1 tor in the number 2 (1 +√ 5), known as the golden for the golden mean is the conditionally convergent mean. The ratio 1 (1 + 5):1 gives the proportions infinite sum 2 √ of a rectangle with the property that when you 2 log( 1 (1 + 5)) remove a square from it, as in Figure 1.1, you are 2√ =1− 1 − 1 + 1 + 1 +···±1 +··· , 5 2 3 4 6 n left with a smaller rectangle whose sides are in the (2.5) same proportion. Its corresponding trigonometric 1 sum description is The continued-fraction expansion of any real quadratic algebraic number has an eventually recurring pattern in its √ entries, as is vividly exhibited by the two examples (1.2) 1 1 2 − 4 2 (1 + 5) = 2 + cos 5 π cos 5 π. (2.1) and (2.2) given above. Princeton Companion to Mathematics Proof 3 where the n range over positive integers not divis- some sort that we can effectively study is repeated ible by 5, and the sign of ±1/n is plus if n has over and over again, in different contexts, through- a remainder of ±1 when divided by 5, and minus out mathematics. Much later, complex quadratic otherwise. irrationalities also made their appearance. Again What governs the choice of the plus terms and these were not at first regarded as “numbers as minus terms is whether or not n is a quadratic such,” but rather as obstructions to the solution residue modulo 5. Here is a brief explanation of of problems. Nicholas Chuquet, for example, in his this terminology. If m is an , two integers a, 1484 manuscript, Le Triparty, raised the question b are said to be congruent modulo m (in symbols of whether or not there is a number whose triple we write a ≡ b mod m) if the difference a − b is is four plus its square and he comes to the con- an integral multiple of m;ifa, b, and m are posi- clusion that there is no such number because the tive numbers, it is equivalent to ask that a and b quadratic formula applied to this problem yields have the same “remainder” (sometimes also called “impossible” numbers, i.e. complex quadratic irra- “residue”) when each is divided by m. An inte- tionalities in our terminology.2 ger a relatively prime to m is called a quadratic For any real quadratic (“integral”) irrational- residue modulo m if a is congruent to the square ity there is a discussion along similar lines to the of some integer, modulo m; otherwise it is called √ones we have just given (expressions (1.1)–(1.5)√ for a quadratic nonresidue modulo m. So, 1, 4, 6, 9,... 1 2 and expressions (2.1)–(2.5) for 2 (1 + 5)). For are quadratic residues modulo 5, while 2, 3, 7, 8,... complex irrationalities, there is also such a the- are quadratic nonresidues modulo 5. ory, but with interesting twists. For one thing, A generalization of (1.4) and (2.5) (the we do not have anything directly comparable to “analytic formula for the L-function attached to continued-fraction expansions for a complex quad- quadratic Dirichlet characters”) gives a very sur- ratic irrationality. In fact, the simple, but true, prising formula for the conditionally convergent answer to the problem of how to find an infinite sum of terms ±1/n, where n runs through posi- number of rational numbers that converge to such tive integers relatively prime to a fixed integer and an irrationality is that you cannot! Correspond- ± the sign of 1/n corresponds to whether n is a ingly, the analogue of the Pell equation has only quadratic residue, or nonresidue modulo that inte- finitely many solutions. As a consolation, however, ger (see Section 17). the appropriate “analytic formula” has a simpler sum, as we will see below. 3 Quadratic Irrationalities Let d be any square-free integer, positive or neg- ative. Associated with d is a particularly important The quadratic formula number τd, defined as follows. If d is congruent to √ − − ± 2 − 1 mod 4 (that√ is, if d 1 is a multiple√ of 4), then b b 4ac 1 X = , τd = (1 + d); otherwise, τd = d. We will refer 2a 2 to these quadratic irrationalities τd as fundamental gives the solutions (usually two) to the general algebraic integers of degree 2. The general notion quadratic equation aX2 + bX√ + c =0 of an “” is defined in Section 11 as a rational expression of the number D, where of this article. An algebraic integer of degree two D = b2 − 4ac is known as the discriminant of is simply a root of a quadratic polynomial of the the polynomial aX2 + bX + c, or, equivalently, form X2 +aX +b with a, b ordinary integers. In the of the corresponding homogeneous quadratic form first case (when d ≡ 1 modulo 4), τd isarootofthe aX2 + bXY + cY 2. This formula introduces many 2 − 1 − polynomial X X + 4 (1 d) and in the second irrational numbers: Plato’s dialogue “Theaetetus” itisarootofX2 − d. The reason special names are has the young√ Theaetetus credited with the dis- given to these quadratic irrationalities is that any covery that D is irrational whenever D is a nat- quadratic algebraic integer is a ural number that is not a perfect square. The curi- 2 ous switch from initially perceiving an obstruc- Rafael Bombelli, in the sixteenth century, would refer to irrational square roots, of positive or of negative numbers, tion to a problem, to eventually embodying this as “deaf” (reminiscent of the word surd that is still in use) obstruction as a number or an algebraic object of and as “numbers impossible to name.” 4 Princeton Companion to Mathematics Proof

(with ordinary integers as coefficients) of 1 and one of these fundamental quadratic algebraic integers.

−2 + i −1 + i + i 1 + i 2 + i 4 Rings and Fields

I think that one of the big early advances in math- ematics is the now-current, universal recognition of −2 −1 0 1 2 the importance of studying the properties of col- lections of mathematical objects, and not just the objects in isolation. A R of complex numbers is a collection of them that contains 1 and is closed −2 − i −1 − i − i 1 − i 2 − i under the operations of , , and . That is, if a, b are any two num- bers in R, a ± b and ab must also be in R. If such a ring R has the further property that it is closed under by nonzero elements (i.e. if a/b is again in R whenever a and b are, and b = 0), then Figure 1.2. The Gaussian integers are the vertices of we say that R is a field. (These concepts are dis- this lattice of tiling the . cussed further in fields and rings and ideals.) Z { ± ± } The ring of ordinary integers, 0, 1, 2,... is (or synonymously, an irreducible) element in R is our “founding example” of a ring; visibly, it is the a nonunit that cannot be written as the product smallest ring of complex numbers. of two nonunits in R. A ring of complex numbers The collection of all real or complex numbers R has the unique property if every that are integral linear combinations of 1 and τ is d nonzero, nonunit, algebraic number in R can be closed under addition, subtraction, and multiplica- expressed as a product of prime elements in exactly tion, and so is a ring, which we denote by R . That d one way (where two are counted as is, R is the of all numbers of the form a + bτ d d the same if one can be obtained from the other by where a and b are ordinary integers. These rings rearranging the in which the primes appear R —our first, basic, examples of rings of algebraic d and multiplying them by units). integers beyond that prototype, Z—are the most Z important rings that are receptacles for quadratic In the prototype ring of ordinary integers, the ± irrationalities. Every quadratic irrational algebraic only units are 1. The fundamental fact that any ordinary integer greater than 1 can be uniquely integer is contained in exactly one Rd. For example, when d = −1 the corresponding expressed as a product of (positive) prime num- bers (that is, that Z enjoys the unique factorization ring R−1, usually referred to as the ring of Gauss- ian integers, consists of the set of complex numbers property) is crucial for much of the whose real and imaginary parts are ordinary inte- done with ordinary integers. That this unique fac- gers. These complex numbers may be visualized torization property for integers actually required as the vertices of the infinite tiling of the complex proof was itself a hard-won realization of Gauss, plane by squares whose sides have length 1 (see who also provided its proof (see fundamental Figure 1.2). theorem of arithmetic). When d = −3 the complex numbers in the cor- It is easy to see that there are only four units in ± responding ring R−3 may be visualized as the ver- the ring R−1 of Gaussian integers, namely 1 and tices of the regular hexagonal tiling of the complex ±i; multiplication by any of these units effects a plane (see Figure 1.3). symmetry of the infinite square tiling (Figure 1.2 With the rings Rd in hand, we may ask ring- above). There are only√ six units in the√ ring R−3, ± ± 1 − ± 1 − − theoretic questions about them, and here is some namely 1, 2 (1 + 3) and 2 (1 3); mul- of the standard vocabulary useful for this. A u tiplication by any of these units results in a sym- in a given ring R of complex numbers is a number metry of the infinite hexagonal tiling (Figure 1.3 in R whose reciprocal 1/u is also in R;aprime above). Princeton Companion to Mathematics Proof 5

and which remain prime in this agreed sense. The rule depends upon the residue of p modulo 4d: the reader is invited to guess it for the ring of Gauss- ian integers given the data just displayed above. In −1 + τ τ− −3 3 general, an elementary computable rule that says which primes split and which do not in a ring of algebraic integers such as Rd is referred to as a −1 0 +1 splitting law for the ring of algebraic integers in question.

− τ − 1 − τ − 3 3 5 The Rings Rd of Quadratic Integers Figure 1.3. The elements of the ring R−3 are the ver- tices of this lattice of hexagons tiling the complex There is a very important “symmetry,” or auto-√ plane. morphism√ , defined on the ring Rd. It sends d to − d, keeps all ordinary integers fixed, and more Fundamental to understanding the arithmetic of generally, for√ rational numbers u and v, it sends α = u + v d to what we may call its algebraic Rd is the following question: which ordinary prime √ conjugate α = u − v d. (The word “algebraic” numbers p remain prime in Rd and which ones fac- torize into products of primes in Rd? We will see is to remind you that this is not necessarily the shortly that if a does factorize in same as the complex-conjugate symmetry of the Rd, it must be expressible as the product of pre- complex numbers!) cisely two prime factors. For example, in the ring of You can immediately work out the formulas for Gaussian integers, R−1, we have the factorizations this algebraic conjugation operation on the funda- mental quadratic irrationalities τd√:ifd is not con- 2 = (1 + i)(1 − i), gruent to 1 modulo 4, then τd = d, so obviously 5 = (1 + 2i)(1 − 2i), − τd = τd, while√ if d is congruent to√ 1 modulo 4, − 1 1 − − 13 = (2 + 3i)(2 3i), then τd = 2 (1 + d) and τd = 2 (1 d)=1 τd. 17 = (1 + 4i)(1 − 4i), This symmetry α → α respects all algebraic for- mulas. For example, to work out the algebraic con- 29 = (2 + 5i)(2 − 5i), jugate of a polynomial expression like αβ +2γ2, . . where α, β, and γ are numbers in Rd, you just replace each individual number by its algebraic where all the factors in brackets conjugate, obtaining the expression αβ +2γ2. above are prime in the ring of Gaussian integers. The most telling integer quantity attached to Let us say that an odd prime p splits in R−1 if it a number α = x + yτd in Rd is its norm N(α), factorizes into a product of at least two primes and remains prime if it does not do so. As we shall soon which is defined to be√ the product αα . This equals 2 − 2 2 − 1 − 2 see, the officially agreed-upon definitions of split- x dy when τd √= d and x + xy 4 (d 1)y 1 ting and remaining prime for more general rings of when τd = 2 (1 + d). The norm turns out to be algebraic integers (even ones of the form Rd) are multiplicative, meaning that N(αβ)=N(α)N(β), worded slightly, but very significantly, differently as you can directly check by multiplying out the from the way we have just defined these concepts formula for the norm of each factor and comparing in the ring R−1 of Gaussian integers. (Note that with the norm of the product. This gives us a use- we have excluded the prime p = 2 from the above ful tactic for trying to factorize algebraic numbers dichotomy. This is because 2 ramifies in R−1; for in Rd, and offers criteria for determining whether a discussion of this concept see Section 7 below.) a number α in Rd is a unit, and whether it is prime In any event, there is an elementary computable in Rd. In fact, an element α ∈ Rd is a unit if and rule that tells us, for any Rd, which primes p split only if N(α)=αα = ±1; in other words, the units 6 Princeton Companion to Mathematics Proof are given by the integral solutions to the equations (for a proof of it along with a historical discus- sion of such estimates, see Chapters 3 and 8 in 2 − 2 ± X dY = 1 (5.1) Narkiewicz (1973)). There are examples of d that come close to attaining that bound, but we still or do not know whether or not there is a positive X2 + XY − 1 (d − 1)Y 2 = ±1 (5.2) 4 number η and an infinity of squarefree d for which dη following the two cases. Here is the proof of this. εd >d . (The answer to this question would be If α = x + yτd is a unit in Rd, then its reciprocal, yes if, for example, there were an infinity of Rd sat- β =1/α, must also be in Rd, and, of course, we isfying the unique factorization property! This fol- have αβ = 1. Applying the norm to both sides of lows from a famous theorem of Brauer (1947) and this equation and using the multiplicative property Siegel (1935); for a proof of the Brauer–Siegel the- of the norm discussed above, we see that N(α) and orem, see Theorem 8.2 of Chapter 8 in Narkiewicz N(β) are reciprocal ordinary integers, and there- (1973), or see Lang (1970).) fore they are both either equal to +1 or −1. This shows that (x, y) is a solution to whichever of equa- 6 Binary Quadratic Forms and the tion (5.1) or (5.2) is appropriate. In the other direc- Unique Factorization Property tion, if N(α)=αα = ±1, then the reciprocal of α ± is simply α . This is in Rd so α is indeed a unit The principle of unique factorization is an all- in Rd. important fact for the ring of ordinary integers These homogeneous quadratic forms, the left- Z. The question of whether this principle does or hand sides of equations (5.1) and (5.2) (which gen- does not hold for a given ring Rd is central to eralize formulas (1.4) and (2.4)), play an important the theory. There are helpful, analyzable, obstruc- role; let us refer to whichever of them is relevant to tions to the validity of unique factorization in Rd. Rd as the fundamental quadratic form for Rd, and These obstructions, in turn, connect with profound its discriminant D the fundamental discriminant. arithmetic issues, and have become the focus of (D is equal to d if d is congruent to 1 modulo 4 important study in their own right. One such mode and to 4d otherwise.) When d is negative there are of expressing the obstruction to unique factoriza- only finitely many units (if d<−3 the only ones tion is already prominent in Gauss’s Disquisitiones ± are 1) but when d is positive, so that Rd con- Arithmeticae (1801), in which much of the basic sists entirely of real numbers, there are infinitely theory of Rd was already laid down. many. The ones that are greater than 1 are powers This “obstruction” has to do with how many of a smallest such unit, εd, and this is called the “essentially different” binary quadratic forms fundamental unit. aX2 + bXY + cY 2 there are with discriminant For example,√ when d = 2 the fundamental unit, equal to the fundamental discriminant D of Rd. 2 2 ε2,is1+ 2,√ and when d = 5 it is the golden mean, (Recall that the discriminant of aX +bXY +cY is 1 2 ε5 = 2 (1 + 5). Since any power of a unit is again b −4ac, and that D equals 4d unless d ≡ 1mod4, a unit, we immediately have a machine for produc- in which case it equals d.) ing infinitely many units from any single one. For In order to define a binary quadratic form example, taking powers of the golden mean, we get aX2 + bXY + cY 2 of discriminant D, what you √ √ need to provide is simply a triplet of coefficients ε = 1 (1 + 5),ε2 = 1 (3 + 5), 5 2 √ 5 2 √ (a, b, c) such that b2 −4ac = D. Given such a form, ε3 =2+ 5,ε4 = 1 (7+3 5), 5 5 √ 2 one can use it to define other ones. For example, 5 1 if we make a small linear change of the variables, ε5 = (11+5 5), 2 replacing X by X −Y and keeping Y fixed, then we 2 2 all of which are units in R5. The study of these get a(X − Y ) + b(X − Y )Y + cY which simplifies fundamental units was already under way in the to aX2 +(b − 2a)XY +(c − b + a)Y 2. That is, we twelfth century in India, but in general their get a new binary quadratic form whose triplet of detailed behavior as d varies still holds mysteries coefficients is (a, b − 2a, c − b + a), and which (as for us today. For example, there is a deep theo-√ can easily be checked) has the same discriminant 2 d rem of Hua (1942) that tells us that εd < (4e d) D. We can “reverse” this change by replacing X by Princeton Companion to Mathematics Proof 7

X + Y and keeping Y fixed. If we do this reversal binary quadratic form X2 +3XY +8Y 2 is equiv- and perform the corresponding simplification then alent to X2 + XY +6Y 2? we get back our original binary quadratic form. This type of exercise offers a small hint of the Because of this reversibility, these two quadratic role that the will play in the forms take exactly the same set of integer values eventual theory. As you might expect from the ven- as X and Y vary: it is therefore reasonable to think erability of these ideas, elegant streamlined meth- of them as equivalent. ods have been discovered for making such calcula- More generally, then, one says that two binary tions. Nevertheless, it is an open secret that any quadratic forms are equivalent if one can be working mathematician, contemporary or ancient, turned into the other (or minus the other) by any engaged in this subject or nearby subjects, has “reversible” linear change of variables with integer done a myriad of straightforward simple hand com- coefficients. That is, one chooses integers r, s, u, v putations along the lines of the above exercise. such that rv − su = ±1, replaces X and Y by the If you try a few examples of this exercise, as I linear combinations X = rX +sY , Y = uX +vY , hope you do, here is one way of organizing your and simplifies the resulting expression to get a new calculations. First, find a simple reversible linear triplet of coefficients. The condition rv − su = ±1 change of variables to turn your form into an equiv- guarantees that by a similar operation we can get alent one with a, b, c  0. (You may also have to back to our original binary quadratic form, and multiply the whole form by −1.) also that the new binary quadratic form has the The cleanest way of writing down all binary same discriminant D as the old one. So when quadratic forms given by triplets (a, b, c) of dis- we talk of “essentially different” binary quadratic criminant −23 is to list the triplets in increasing forms of discriminant D we mean that we cannot order of b, which will now be an odd positive inte- turn one into the other by this kind of change of ger. For each value of b you can then choose a and 1 2 variables. c in such a way that their product is 4 (b + 23). Here is the surprising obstruction to unique fac- At this point the aim is to build up a repertoire torization that Gauss discovered. of moves that tend to decrease b (which will keep a and c within bounds as well). A big clue, and The unique factorization principle is valid aid, here is that for any pair of relatively prime in Rd if and only if every homogeneous 2 2 integers x, y if you evaluate your quadratic form quadratic form aX +bXY +cY with dis- aX2+bXY +cY 2 at (X, Y )=(x, y) to get the inte- criminant equal to the fundamental dis- ger a = ax2 +bxy+cy2, you can find, for appropri- criminant of Rd is equivalent to the fun- ate b and c, a quadratic form aX2 +bXY +cY 2 damental quadratic form of Rd. equivalent to yours, with first coefficient a. So, Furthermore, the collection of inequivalent quad- one tactic is to look for small integers represented ratic forms whose discriminant is the fundamental by your quadratic form. Also the “example” lin- ear change of variables X → X − Y , Y → Y will discriminant of Rd expresses in concrete terms the lead you to be able to reduce the coefficient b to degree to which Rd “enjoys unique factorization.” If you have never seen this theory of binary an integer smaller than 2a. Can you check that 2 2 2 2 quadratic forms before, try your hand at working X +XY +6Y and 2X +XY +3Y are inequiv- with quadratic forms in the case where D = −23. alent? The idea is to start with some particular quad- Now, as we have just discussed, it follows from ratic form aX2 + bXY + cY 2 of your choice with the general theory that R−23 does not have the discriminant D = b2 − 4ac = −23. Then, using a unique factorization property. We can also see this sequence of carefully chosen linear changes of vari- directly. For example, ables you reduce the size of the coefficients a, b, c τ−23 · τ− =2· 3, until you can go no further. Eventually you should 23 end up with one of the two (inequivalent) quad- and all four of the factors in this equation are irre- ratic forms that there are with discriminant −23: ducible in R−23. To be a faithful companion, I the fundamental form X2 +XY +6Y 2, or the form should at this point give at least a hint at what 2X2+XY +3Y 2. For example, can you see that the connection there might be between this specific 8 Princeton Companion to Mathematics Proof

“failure of unique factorization” and the previous Taking norms, as above, we would get discussion. It may become a bit clearer in the next · paragraph, but the underlying tension in the equa- N(γ (rτ−23 + s)) = N(rγτ−23 + sγ) · · 2 2 tion τ−23 τ−23 =2 3 is that all the factors in = N(γ)(6r + rs + s ). our ring are prime: we are missing any elements Again, thinking of r and s as variables and renam- in our ring R−23 that could factorize it further. We lack, for example, elements that play the role ing them R and S we would have the corresponding of the greatest common of factors of this norm quadratic form: equation. The general theory regarding these mat- N(γ) · (6R2 + RS + S2)=2· (6R2 + RS + S2). ters (which we are not entering into here, but see Euclid’s ) tells us that what is miss- Given the above facts—dependent, of course, on ing is some element γ in R−23 that is both a linear the contrary-to-fact hypothesis that there is a γ as combination of the numbers τ−23 and 2 (with coef- above—the key idea is that there would be linear ficients in the ring R−23) and also a common divi- changes of variables from (U, V )to(R, S) and back sor of τ−23 and 2 in the ring R−23, i.e. such that that would establish an equivalence between the · 2 2 τ−23/γ and 2/γ are both in R−23. There is no such two quadratic forms 2 (3U + UV +2V ) and · 2 2 element, for its norm must divide N(τ−23)=6and 2 (6R + RS + S ). But these quadratic forms are N(2) = 4, and therefore be equal to 2, which can not equivalent! Their inequivalence therefore shows easily be shown to be impossible. But we are inter- that the putative γ does not exist and factorization ested, rather, in the phenomenon that inequiva- in the ring R−23 is not unique. lence of certain binary quadratic forms will indeed show this, so let us go on. 7 Class Numbers, and the Unique First, check that any linear combination Factorization Property α · τ−23 + β · 2 In the previous section we saw that the collection of with α, β elements of R can also be written as 23 inequivalent quadratic forms of discriminant equal u · τ− + v · 2, where u and v are ordinary inte- 23 to the fundamental discriminant provides us with gers. Now compute the binary quadratic form given an obstruction to unique factorization. Somewhat by systematically taking the norms of these linear later, a more articulated version of this obstruc- combinations, and viewing these norms as func- tion arose, known as the ideal class H of tions of the integer coefficients u, v: d Rd. As its name implies, to describe this we must · · N(u τ−23 + v 2) = (τ−23u +2v)(τ−23u +2v) use the vocabulary of ideals and groups. For a gen- fundamen- =6u2 +2uv +4v2. eral discussion of these concepts, see tal concepts, groups and rings and ideals. Viewing the u and the v as variables, and dubbing A I of Rd is an ideal if it has the following them U and V to emphasize their status as vari- closure properties: if α belongs to I,sodo−α and ables, we can say that the norm quadratic form τdα, and if α and β belong to I,sodoesα + β. obtained from the collection of linear combinations (The first and third properties imply together that of τ−23 and2is any integer combination of α and β belongs to I.) 6U 2 +2UV +4V 2 =2· (3U 2 + UV +2V 2). The basic example of such an ideal is the set of all multiples of some fixed, nonzero element γ of Rd, Now suppose that, contrary to fact, there were where by a multiple of γ we mean the product of a common divisor, γ, as above; in particular, the γ and an element of Rd. We denote this set tersely multiples of γ in the ring R−23 would then be pre- as (γ), or, slightly more expressively, as γ · Rd.An cisely the linear combinations of the numbers τ− 23 ideal of this sort, i.e. one that can be expressed and 2. We would then have another way of describ- as the set of all multiples of a single nonzero ele- ing those linear combinations; namely, for any pair ment γ, is called a . For example, of ordinary integers (u, v) there would be a pair of the ring R itself is an ideal (it consists, after all, ordinary integers (r, s) such that d of all linear combinations of 1 and τd) and is even u · τ−23 + v · 2=γ · (rτ−23 + s)=rγτ−23 + sγ. a principal ideal: in our laconic terminology, it can Princeton Companion to Mathematics Proof 9

be denoted (1) = 1 · Rd = Rd. Strictly speaking, two different ideals (not necessarily principal ide- { } the singleton 0 is also an ideal, but the ones that als, this is the whole point) in Rd, and if neither of will interest us are the nonzero ideals. these is the unit ideal (1) = Rd, then we say that As a direct counterpart to the obstruction prin- p splits in Rd. If, on the other hand, no factoriza- ciple involving binary quadratic forms that was tion of the ideal (p) can be made without one of described in the previous section, we have the fol- the factors being the ideal (1) = Rd, then we say lowing obstruction principle involving ideals. that p remains prime in Rd. There is also a third The unique factorization principle is valid important definition: if the principal ideal (p) can in Rd if and only if every ideal in Rd is be expressed as the square of another ideal I, then principal. we say that p ramifies in Rd. Continuing with the momentum of this definition, we may say that an Reflecting on this, you can get a sense of why the ideal P is a prime ideal if P cannot be “factorized” word ideal might have been chosen. Every princi- as the product of two ideals neither of which is the pal ideal in R is of the form γ·R for some number d d unit ideal. This definition makes sense whether or γ in R (which is uniquely determined apart from d not P is principal, so we are subtly shifting our multiplication by units), but sometimes there are more general ideals. These arise if you ever have attention from the multiplicative arithmetic of the numbers in Rd to the ideals. two elements of Rd (think of τ−23 and 2, as in the previous section) such that the set of all their inte- By definition, two ideals are in the same ideal ger combinations cannot be expressed as the set of class if when you multiply each by an appropriate multiples of some fixed number γ in Rd. This phe- principal ideal you get the same ideal as a result. nomenon is a sign that we may be missing numbers This is a natural equivalence relation on ide- in Rd that provide fine enough factorizations to als. It is also one that respects products, meaning make the arithmetic in Rd as smooth going as one that if I and J are two ideals, then the ideal class · might hope for. Just as a principal ideal γ Rd corre- of their product I · J depends only on the ideal sponds to the number γ, ideals of this more general classes of I and J. (In other words, if I is in the kind (think of the set of all integer combinations same ideal class as I and J is in the same ideal of τ− and 2) can be thought of as correspond- 23 class as J , then I · J is in the same ideal class as ing to “ideal numbers” that should, “by rights,” · be present in our ring, but happen not to be. I J.) We can therefore say what we mean by mul- Once we think of ideals as standing for ideal tiplication of ideal classes: to multiply two classes, numbers it makes some sense to try to multiply pick an ideal from each, multiply those, and take them: if I, J are two ideals in Rd, we let I ·J denote the ideal class of the resulting product. The set the set of all finite sums of products α·β where α is Hd of ideal classes of Rd, given this operation of in I and β is in J. The product of two principal ide- multiplication, forms an , in the als (γ1)·(γ2) is the principal ideal (γ1·γ2) so, just as sense that the multiplication law whose definition one would hope, multiplication of principal ideals we have just defined is associative and commuta- corresponds to multiplication of the corresponding tive, and there are inverses. The identity element numbers. Multiplication of any ideal I by the ideal is the principal ideal Rd itself. This group Hd, (1) leaves I unchanged: (1) · I = I; we therefore known as the , directly measures refer to the ideal (1) as the unit ideal. With this the extent to which the ideals of the ring Rd are new notion of multiplication of ideals we can now principal: roughly speaking it is what you get if give the general definition of what it means for a you take the multiplicative structure of all ideals prime number p to split or to remain prime in a and “divide out” by the principal ones. ring Rd—the definition we promised in Section 4. The idea behind the definition is to use multi- As was mentioned in Section 6, there is a close plication of ideals rather than of numbers. So if connection between ideal classes and binary quad- we are thinking about a prime p, the first thing ratic forms. To begin to see this, take an ideal I we do is turn our attention to the principal ideal of Rd and write it as the set of all integer combi- (p)inRd. If this can be factorized as a product of nations of two elements α, β of Rd. Then consider 10 Princeton Companion to Mathematics Proof the norm function on the elements of I, that is, and then one has a formula of the following type: N(xα + yβ)=(xα + yβ)(xα + yβ) hd 1 √ = ± . = αα x2 +(αβ + α β)xy + ββ y2. n wd D n0 This is a binary quadratic form in the variable As d tends to −∞ the class number h tends coefficients x and y. If you start with a different d to infinity. choice of α, β that generate I you get a different form, but the two forms are multiples of two We have effective lower bounds for the growth of hd forms with discriminant D that are equivalent to but these lower bounds are probably still far from one another. Even better, the equivalence class of the actual growth (cf. Goldfeld 1985). The effec- these forms depends only on the ideal class of I. tive lower bounds that are known are exceedingly It can be shown that there are only a finite num- weak. They follow, however, from beautiful work ber of distinct ideal classes of Rd; that is, the ideal of Goldfeld, and Gross and Zagier: for every real class group Hd is finite. The number of its elements number r<1 there is a computable constant C(r) is denoted h and called the class number of R . r d d such that hd >C(r) log |D| . Here is a sample: So, the obstruction to unique factorization of R is d √ given by the nontriviality of the group Hd; equiv- 1 2 p hd > 1 − · log |D| alently, unique factorization holds for Rd if and p 55 | +1 only if its class number is 1. But whether or not p D Hd is trivial, its detailed group-theoretic structure if (D, 5077) = 1. is profoundly related to the arithmetic of Rd. It is a striking lacuna in our theory that, even The class number enters into the generalizations today, nobody knows how to prove that there of formulas (1.5) and (2.5) of Section 1; that is, are infinitely many values of d>0 for which the analytic formulas we alluded to in that sec- Rd enjoys the unique factorization property— tion. These formulas represent just the beginning particularly since we expect that more than three- of one of the ongoing chapters of our subject, and quarters of them do! Our expectations are even form a bridge between the world of discrete arith- more precise than that, thanks to Henri Cohen and metical issues and that of calculus, infinite series, Hendrik Lenstra, who make use of certain prob- volumes of spaces, all of which can be attacked by abilistic expectations (now known as the Cohen– see fundamen- the methods of complex analysis ( Lenstra heuristics) to conjecture that the density tal concepts, holomorphic functions ). Here of positive fundamental discriminants of class num- is a sample of them. ber 1 among all positive fundamental discriminants (1) If d>0 is a square-free integer and D is either is 0.75446 .... d or 4d according to whether d is congruent to 1 modulo 4 or not, then 8 The Elliptic Modular Function log ε 1 h · √ d = ± , and the Unique Factorization d n D n0 Property

where the integers n run through those that A different obstruction to unique factorization in ± are relatively prime to D and the signs are Rd is available when d is negative. Now Rd may be chosen in a way that depends only on the thought of as a lattice in the complex plane (see residue class of n modulo D. Figure 1.4), which makes a wonderful tool avail- able for us: the classical elliptic modular function (2) If d<0 we have a somewhat simpler formula: of Klein, there is no fundamental unit εd in Rd to con- − − − tend with, but when d = 1or 3, there are j(z)=e 2πiz + 744 + 1 986 884 e2πiz more roots of unity than merely ±1. If wd + 21 493 760 e4πiz + 864 299 970 e6πiz + ··· . denotes the number of roots of unity in Rd, (8.1) then w−1 =4,w−3 = 6 and otherwise wd =2, Princeton Companion to Mathematics Proof 11

This function, also colloquially referred to as the mathematicians of the time. His paper and his pur- “j-function,” converges for complex numbers z = ported proof were largely forgotten until the late x +iy with y>0. If z = x +iy and z = x +iy 1960s, when the nonexistence of the tenth field was are two such complex numbers, then j(z)=j(z)if established (to the mathematical community’s sat- and only if the lattice generated by z and 1 in the isfaction) by Stark (1967) and independently, via a complex plane is the same as the lattice generated different method, by Baker (1971). It was only then by z and 1 (or, equivalently, z =(az +b)/(cz +d), that mathematicians took a second and closer look where a, b, c, d are ordinary integers such that at Heegner’s original article and discovered that he ad−bc = 1). We can paraphrase this by saying that had indeed proven exactly what he claimed. More- the value j(z) depends only on, and characterizes, over, his proof offered an elegant direct conceptual the lattice generated by z and 1. road to an understanding of the underlying issue. It turns out (by a theorem of Schneider) that Here are the nine values of d: if an algebraic number α = x +iy with y>0 has − − − − − − − − − the property that j(α) is also algebraic, then α is a d = 1, 2, 3, 7, 11, 19, 43, 67, 163. (complex) quadratic irrationality; and the converse And here are the corresponding nine values of is also true. In particular, since α = τ is such a d j(τd): complex quadratic irrationality when d is negative, 6 3 6 3 3 3 15 15 3 we have that the value, j(τd), of the j-function on j(τd)=2 3 , 2 5 , 0, −3 5 , −2 , −2 3 , τ is an algebraic number—in fact, an algebraic d − 2183353, −2153353113, −2183353233293. integer. This will be of some importance for our story. First, since the ring Rd as situated in the As Stark once pointed out, if, for some of these complex plane is simply the lattice generated by values of d, you simply “plug” τd into the power τd and 1, it follows from the previous paragraph series expansion for j, you get rather surprising that this value j(τd) will be the same if we replace formulas. For example, when d = −163, then τd by any element α of Rd, as long as the lattice √ − 2πiτd − π 163 generated by α and 1 is the entire ring Rd. More e = e importantly, j(τd) is an algebraic integer of degree is the first term of the for j(τ−163) (see roughly comparable with the class number of Rd. 18 3 3 3 3 formula (8.1)). Since j(τ− )=−2 3 5 23 29 In particular, it is an ordinary integer if and only if 163 and since all the terms e2πnτd (n>0) that appear the ring R has the unique factorization property. d in the power series for the j-function are relatively (This result is one of the great applications of a √ small, we find that eπ 163 is incredibly close to an classical theory known as .) integer. Indeed, it is 2183353233293 + 744 + ···, In brief, here is yet another answer to the question which works out as 262 537 412 640 768 744 − , of when the unique factorization principle holds where the error term  is less than 7.5 × 10−13. for Rd when d is negative: if j(τd) is an ordinary integer, the answer is yes; otherwise it is no. The search for the full list of negative values of d 9 Representations of Prime for which Rd has the unique factorization property Numbers by Binary Quadratic makes a marvelous tale: there are precisely nine Forms values of d for which it occurs (see below), but for over two decades number theorists, while know- It happens more often than you might at first ing these nine, could prove only that there were no expect that difficult and/or somewhat artificial more than ten. The history of how the nonexistence problems about ordinary integers can be translated of a possible tenth value of d was established, and into natural and tractable problems about larger reestablished, is one of the thrilling chapters in rings of algebraic integers. My favorite elementary our subject. K. Heegner, in an article published example of this type is the theorem due to Fermat in 1934, provided what he claimed was a proof of that if a prime number p may be expressed as a the nonexistence of the possible tenth value of d. sum of two squares, p = a2 + b2 with 0

not divisible by p. Then define σp(a)tobe+1if some prime factor p, so all terms apart from the a is a quadratic residue modulo p, that is, if a is first are also divisible by p. Since the terms add up congruent to the square of an integer modulo p, to zero, it follows that p divides Cn, which implies and −1 if not. The simple trigonometric sums of that p divides C, which contradicts the assertion (1.1) and (2.1) generalize to quadratic Gauss sums: that the fraction C/D is in its lowest terms. This √ in turn contradicts the hypothesis that Θ can be ± (p−1)/2 2 3 ··· i p = ζp + σp(2)ζp + σp(3)ζp + expressed as a ratio of whole numbers in the first − p−2 − p−1 + σp(p 2)ζp + σp(p 1)ζp . place. As the reader may like to verify, this fact (13.2) implies√ the result attributed to Theaetetus above, that A is irrational if and only if A is not a perfect This formula is not too hard to prove, apart from square. determining which sign is correct in the initial ±, The degree of an algebraic number Θ is defined but after considerable efforts Gauss managed to to be the smallest degree, n, of any polyno- n n−1 work this out too. To see the connection between, mial relation Θ + a1Θ + ··· + an−1Θ + say, formula (2.1) and (13.2) note√ that when p =5, an = 0 that Θ satisfies, where the coefficients ai the left-hand side of (13.2) is 5 and the right- are rational numbers. The corresponding polyno- n n−1 hand side is mial, P (X)=X + a1X + ···+ an−1X + an − − is unique, since if there were two of them then their ζ + −ζ2 − ζ 2 + ζ 1 = 2 cos 2 π − 2 cos 4 π. 5 5 5 5 5 5 difference would be of smaller degree and would

As for the circular unit cp, it is defined to be also have Θ as a root. (One could make it monic by dividing it through by the leading coefficient.) (p−1)/2 (p−1)/2 Let us call P (X) the minimal polynomial of Θ. The − a − a σp(a) σp(a) (ζp ζp ) = sin(πa/p) , minimal polynomial is irreducible over the field of a=1 a=1 rational numbers: that is, it cannot be factored as a and this leads to further formulas. For√ example, product of two polynomials, each of smaller degree 1 and having rational numbers as coefficients. (If it when p = 5, we have εp = τ5 = (1 + 5), and 2 could, then it would not be of minimal degree, since h5 = 1, formula (1.4) for p = 5 tells us that √ since one of its factors would have Θ as a root.) 1+ 5 ζ − ζ−1 sin(π/5) The minimal polynomial P (X)ofΘ is a factor of = 5 5 = . 2 − −2 any monic polynomial G(X) with rational coeffi- 2 ζ5 ζ5 sin(2π/5) cients that has Θ as root. (The greatest common 14 The Degree of an Algebraic divisor of P and G is another monic polynomial with rational coefficients that has Θ as a root, so Number it cannot be of degree smaller than that of P and it must therefore be P ). The minimal polynomial If Θ is an algebraic integer that is also a rational P (X)ofΘ has distinct roots. (If P (X) had mul- number, then Θ is an “ordinary” integer. Here is tiple roots, then a little elementary calculus shows the proof of this fact. If Θ is a rational number, that it would share a nontrivial factor with its then we may write Θ = C/D as a fraction in lowest derivative, P (X). Since the derivative is of lower terms. If Θ is also an algebraic integer, then it is the degree than P (X) and again has rational coeffi- root of a monic polynomial with rational integer cients, the of P and P coefficients, Θn + a Θn−1 + ···+ a ,sowehavean 1 n would provide a nontrivial factorization of P (X), equation contradicting its irreducibility.) n n−1 A fundamental result due to Gauss is that the (C/D) +a1(C/D) +···+an−1(C/D)+an =0. 2πi/n of unity ζn =e is an algebraic integer Multiplying through by Dn we get of degree precisely φ(n), where φ is Euler’s φ-

n n−1 n−1 n function. For example, if p is prime, the minimal C + a1C D + ···+ an−1CD + anD =0, polynomial of ζp is where all terms are (ordinary) integers, and all but Xp − 1 = Xp−1 + Xp−2 + ···+ X +1, the first one is divisible by D.IfD>1 then it has X − 1 16 Princeton Companion to Mathematics Proof which is of degree φ(p)=p − 1. which is known as the norm form. The coefficient of Xn−1 is the sum of the roots: 15 Algebraic Numbers as Ciphers A1(T1,T2,...,Tn)=T1 + T2 + ···+ Tn, Determined by their Minimal Polynomials and this is the trace form. When n = 2 the norm and trace are all the sym- We have expressly insisted that our algebraic num- metric polynomials in the list. For n = 3, beyond bers are complex numbers (of a certain sort). the norm and trace we also have the symmetric But another possible attitude towards an alge- polynomial of degree two: braic number, Θ, an attitude at times promoted by Kronecker among others, is to deal with Θ as A2(T1,T2,T3) an unknown satisfying only the algebraic relations = T1T2 + T2T3 + T3T1 implied by the fact that it is a root of its (unique 1 { 2 − 2 2 2 } = (T1 + T2 + T3) (T1 + T2 + T3 ) . monic) minimal polynomial with rational coeffi- 2 cients. For example, if the minimal polynomial of It is of major importance to this theory, and Θ is P (X)=X3 − X − 1, then, according to this more specifically to , that the sym- view, Θ is just an algebraic symbol that comes with metry properties of the conjugate roots are nicely the rule that any occurrence of Θ3 may be replaced reflected in these symmetric polynomials. In par- by Θ + 1 (rather as the complex number i can be ticular, we have the fundamental result that 2 regarded as a symbol with the property that i any symmetric polynomial in T1,T2,...,Tn with may be replaced by −1). Any root of the minimal rational coefficients can be expressed as a poly- polynomial of Θ satisfies all the same polynomial nomial with rational coefficients in the symmetric relations with rational coefficients that Θ satisfies; polynomials Aj(T1,T2,...,Tn), and similarly with these roots are called conjugates of Θ.IfΘ is an integral coefficients. For example, the equation 2 2 2 algebraic number of degree n, then Θ has n distinct above shows that T1 + T2 + T3 can be expressed conjugates, all of them again, of course, algebraic as 2 numbers. A1(T1,T2,T3) − 2A2(T1,T2,T3).

16 A Few Remarks About the 17 Fields of Algebraic Numbers, Theory of Polynomials Rings of Algebraic Integers

Central to the theory of polynomials in one The inverse of a nonzero algebraic number is again variable—and, therefore, particularly to the theory an algebraic number; the sum, difference, and of algebraic numbers—is the general relationship product of two algebraic numbers are algebraic that roots have to coefficients: numbers; the sum, difference, and product of two n algebraic integers are algebraic integers. The neat (X − Ti) proofs of these (latter) facts are a good demon- i=1 stration of the power of linear algebra, and in n−1 particular of Cramer’s rule. This states that any n j n−j = X + (−1) Aj(T1,T2,...,Tn)X . matrix with integer coefficients (and therefore also j=0 any linear transformation of a finite-dimensional that preserves an integer lattice) satis- The polynomial A (T ,T ,...,T ) is homogeneous j 1 2 n fies a monic polynomial identity with integer coef- of degree j (this means that every monomial in it ficients. has total degree j), has integer coefficients, and is To see just how useful this remark is for find- symmetric in (i.e. unchanged by any permutation ing polynomial relations, and more specifically for of) the variables T ,T ,...,T . 1 2 n showing that the collection of algebraic numbers The constant term is the product of the roots: and algebraic integers are closed under sums√ and√ An(T1,T2,...,Tn)=T1 · T2 ·····Tn, products, try your hand at showing that 2+ 3 Princeton Companion to Mathematics Proof 17 is an algebraic integer. One way to do it is to search they are the coefficients of a power of its min- for the monic fourth-degree polynomial equation imal polynomial. The most prominent of these that it satisfies. But this is hardly a beautiful cal- functions are the multiplicative function an(x)= culation! If, however, you are familiar with lin- x1 · x2 ·····xn, called the norm function, usually ear algebra, then a less painful route is to form denoted x → NK/Q(x), and the additive function the four-dimensional vector√ space√ over the√ rational a1(x)=x1 +x2 +···+xn, called the trace function, numbers, generated by 1, 2, 3, and 6 (which usually denoted x → traceK/Q(x). are linearly independent when√ √ the scalars are The trace function can be used to define a funda- rational). Multiplication by 2+ 3 defines a lin- mental symmetric bilinear form on the Q-vector ear transformation T of this vector space, and one space K, can compute its polynomial P . The  · Cayley–Hamilton theorem says that P (T√)=0,and√ x, y := traceK/Q(x y), this translates into the statement that 2+ 3is which turns out to be nondegenerate. This nonde- arootofP . generacy, together with the fact that if x, y are These “closure properties” we have just dis- both algebraic integers, then x, y is an ordinary cussed lead us to study, in complete generality, integer, can be used to show that the ring O(K) fields of algebraic numbers and rings of algebraic of all algebraic integers in K is finitely generated integers. A number field is a field that is generated as an additive group. More specifically, there is a (as a field) by finitely many algebraic numbers. A of algebraic integers in K, that is, a finite standard result tells us that any number field K set {Θ ,Θ ,...,Θ }, such that any other algebraic can in fact be generated by a single carefully cho- 1 2 n integer in K can be expressed as an “ordinary” sen algebraic number. The degree of this algebraic integer combination of the numbers Θ . number equals the degree of K, which is defined to i Let us summarize this structure. The number be the dimension of K when K is viewed as a vector Q space over the field Q of rational numbers. One of field K is a finite-dimensional vector space over the main introductory observations of Galois the- and comes equipped with a nondegenerate bilinear → ory is that if K is a number field of degree n, then symmetric form (x, y) x, y , and also with a O ⊂ there are exactly n distinct ring-homomorphisms lattice (K) K. Moreover, the restriction of the O (“imbeddings”) ι : K → C from K into the field bilinear form to (K) takes on integral values. of complex numbers. (This means that ι sends 1 The discriminant of K, denoted D(K), is to 1 and respects the addition and multiplication defined to be the determinant of the matrix whose  { } laws within K. That is, ι(x + y)=ι(x)+ι(y) ij-entry is Θi,Θj , for Θ1,Θ2,...,Θn a basis O and ι(x · y)=ι(x) · ι(y).) From these imbeddings, of the lattice (K); this determinant does not we can construct some very useful rational-valued depend on the basis chosen. functions on K. For any element x in K, we form The discriminant represents important informa- tion about the number field K. For one thing, there the n complex numbers x1,x2,...,xn that are the images of x under the n different imbeddings of K is a natural generalization to any number field of into C. We then let the notions of splitting and ramification that we discussed for quadratic fields, and the prime divi- aj(x):=Aj(x1,x2,...,xn), sors p of D(K) are precisely those prime numbers that ramify in the field extension K. By a theorem where A (X ,X ,...,X )isthejth symmetric j 1 2 n of Minkowski, the of the discrim- polynomial of Section 14 above. (Because the poly- inant D(K) of a number field K of degree n is nomials A are symmetric, we do not have to worry j always greater than about the order of the images x ,x ,...,x in the 1 2 n above expression.) It is not immediately obvious π n nn 2 · . that the values of aj are rational numbers, but 4 n! there is a theorem that tells us this. If an algebraic number Θ in K generates K (as This is greater than 1 unless K is the field of a field), then the rational numbers aj(Θ) are the rational numbers. It follows that any nontrivial coefficients of its minimal polynomial; in general extension of the field of rational numbers has some 18 Princeton Companion to Mathematics Proof prime that ramifies in it, a result that would be and define a Weil number 3 of absolute value r to very hard to prove without the help of the alge- be a nonzero algebraic integer such that it and all braic structures we have just defined. This integer of its conjugates have the same absolute value r. D(K) really is quite a discriminating “tag” for our By the discussion in the previous section there are number field K, for, by a theorem of Hermite, only finitely many distinct Weil numbers of given given any integer D there are only finitely many degree and absolute value. By Kronecker’s theo- different number fields with discriminant equal to rem, which we have just described, the Weil num- D. (Not all integers can be discriminants: as is true bers of absolute value 1 are precisely the roots for quadratic number fields, the integers D that are of unity. Here are further basic facts that you discriminants are either divisible by 4 or else con- might try to prove. First, the quadratic Weil num- gruent to 1 modulo 4.) bers ω are precisely those quadratic algebraic inte- gers such that |trace(ω)|  2 |N(ω)| =2 |ωω|, 18 On the Size(s) of the Absolute where ω is the (algebraic) conjugate of ω. Sec- ond, if p is prime then√ a quadratic Weil number Values of All Conjugates of an ω of absolute value p is a of the Algebraic Integer (unique) ring of quadratic integers Rd that con- tains ω, and therefore gives a prime factorization As we have just seen, the coefficients of the mini- ωω = ±p of the integer p in that ring. mal polynomial for an algebraic integer Θ are given Weil numbers of absolute value pν/2, where p by the ordinary integers aj(Θ1,Θ2,...,Θn), where is again a prime number and ν is a natural num- the numbers Θi are all the conjugates of Θ. The ber, are extremely important in arithmetic: they sizes of all these coefficients must therefore all be hold the key to counting numbers of rational solu- less than some universal number M that depends tions of systems of polynomial equations over finite only on the degree of Θ and the largest absolute fields. For just one concrete example, the Gauss- value of any of its conjugates. As a consequence, ian integer ω := −1 + i and its algebraic conjugate given any n and any positive number B, there are (which, in this instance, is also its complex con- only finitely many algebraic integers Θ of degree jugate)ω ¯ = −1 − i are Weil numbers (of absolute less than n such that the absolute values of Θ and value 2) that control the number of solutions of the its conjugates are all less than B. (This is because equation y2 −y = x3 −x over all finite fields of size for any n and M there are only finitely many poly- a power of 2. Specifically, the number of solutions nomials of degree less than or equal to n with the of that equation over a field of order 2ν is given by absolute values of all their integer coefficients at the formula most M.) This finiteness result is the key to the following observation, due to Kronecker: if Θ is an 2ν − (−1 − i)ν − (−1+i)ν algebraic number and if the absolute values of Θ and of all of its conjugates are equal to 1, then Θ (which is an ordinary integer). This leads to is a . Indeed, all the powers of Θ have another immense chapter of mathematics. degree at most that of Θ, and they enjoy the same property: their absolute value, and that of all their 20 Epilogue conjugates, is equal to 1. Consequently, there are only finitely many such algebraic numbers, from The single symmetry α → α, the algebraic conju- which it follows that there must be at least one gation in the rings R that we have discussed, gave a b d coincidence of the form Θ = Θ for different a birth, thanks to Abel and Galois in the begin- and b. But this can happen only if Θ isarootof ning of the nineteenth century, to the rich study unity. of (Galois) groups of symmetries of general num- ber fields (see The Insolubility of the Quin- 19 Weil Numbers tic). This study continues with great intensity, 3This is a weaker condition than is usually required for To follow this thread for just a bit, let us gen- Weil numbers but our deviation from standard usage should eralize the hypothesis of Kronecker’s observation, not be the cause of too much confusion. Princeton Companion to Mathematics Proof 19 since these Galois groups and their linear represen- Technical Articles and Books tations hold the key to a very detailed understand- Baker, A. 1971. Imaginary quadratic fields with class ing of number fields. In its modern dress, algebraic number 2. Annals of Mathematics (2) 94:139–152. number theory is closely connected with what is Brauer, R. 1950. On the Zeta-function of algebraic often called arithmetic algebraic geometry. number fields. I. American Journal of Mathematics Kronecker’s dream of getting explicit control of 69:243–250. a wealth of algebraic number theoretic material Brauer, R. 1950. On the Zeta-function of algebraic by expressing algebraic numbers in terms of natu- number fields. II. American Journal of Mathemat- ral analytic functions has not yet been fully real- ics 72:739–746. ized. Nevertheless, the scope of this dream (and, Goldfeld, D. 1985. Gauss’s class number problem for one might also add, the supply of natural ana- imaginary quadratic fields. Bulletin of the American lytic and algebraic functions) has expanded sub- Mathematical Society 13:23–37. Gross, B. and D. Zagier. 1986. Heegner points and stantially: the full range of algebraic geometry and derivatives of L-series. Inventiones Mathematicae theory is now being brought 84:225–320. to bear on it. This is done, for example, by the Heegner, K. 1952. Diophantische Analysis und Modul- Langlands program, which among other things funktionen. Mathematische Zeitschrift 56:227–253. works with objects known as Shimura varieties.On Hua, L.-K. 1942. On the least solution of Pell’s equa- the one hand, these varieties have close connec- tion. Bulletin of the American Mathematical Society tions with the theory of group representations and 48:731–735. classical algebraic geometry, which greatly helps us Lang, S. 1970. . Reading, to understand them. On the other hand, they are MA: Addison-Wesley. Narkiewicz, W. 1973. Algebraic Numbers. Warsaw: Pol- a rich source of concrete linear representations of ish Scientific Publishers. Galois groups of number fields. This program, one Siegel, C. L. 1935. Uber¨ die Classenzahl quadratischer of the glories of current mathematics, will, I expect, Zahl¨orper. Acta Arithmetica 1:83–86. make a terrific chapter for a Companion to Math- Stark, H. 1967. A complete determination of the com- ematics volume to be written at the beginning of plex quadratic fields of class-number one. Michigan the next century. Mathematical Journal 14:1–27.

Further reading Texts First, three classics that require a minimum of background. Gauss, C. F. 1986. Disquisitiones Arithmeticae, English edn. Springer. Davenport, H. 1992. The Higher Arithmetic: An Intro- duction to the Theory of Numbers. Cambridge Uni- versity Press. Hardy, G. H. and E. M. Wright. 1979. Introduction to Number Theory. Oxford University Press. At a more advanced level, the following are extraordinary expository books. Borevich, Z. I. and I. R. Shafarevich. 1966. Number Theory. Academic Press. Cohen, H. 1993. A Course in Computational Algebraic Number Theory. Springer. Cassels, J. and A. Fr¨ohlich. 1967. Algebraic Number Theory. Academic Press. Ireland, K. and M. Rosen. 1982. A Classical Introduc- tion to Modern Number Theory, 2nd edn. Springer. Serre, J.-P. 1973. A Course in Arithmetic. Springer.