<<

______

Copyright 2019 All rights reserved. No part of this book may be used or reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior written permission from the publisher. This work was printed in the United States. ______Cover Page: Worradirek/Shutterstock.com

Beitler, Kenneth W. Using the Greatest Function to reveal the magic in and Analysis Includes bibliographical references and index 1. United States-Theoretical and Applied Mathematics, I. Title, February 2019. ______I dedicate this work to everyone who shares the uncommon goal of realizing the intricate beauty of mathematics. ______Table of Contents ______Page Preface: An introduction to this text...... 3

PART I Introduction Chapter 1: An Introduction to the Greatest Integer Function...... 6 1.1: Rounding, Truncation, and Transformations 1.2: An Analysis of the Greatest Integer Function Chapter 2: More Properties of the Greatest Integer Function...... 19 2.1: Elementary Functions 2.2: Congruence Relations 2.3: Summation and Integration

PART II Applications Chapter 3: Number Bases and Digits...... 30 3.1: The Definitions and Properties of the Decimal Digit 3.2: Using Digits for Change of Number Base Operations Chapter 4: The Division Algorithm and Reciprocity...... 38 4.1: Reciprocal Subtraction, the Euclidean Approach 4.2: 4.3: Dedekind Reciprocity Chapter 5: Miscellaneous Topics in Number Theory and Discrete Math...... 52 5.1: Finding Integer Solutions to One−Variable Equations 5.2: Graphics and Block−Truncation of Images 5.3: An introduction to Analytic Number Theory Chapter 6: Advanced Series Acceleration Methods...... 65 6.1: Series Acceleration for Monotone Decreasing 6.2: Fourier series Acceleration

Appendix.: Part III Chapter Previews, References and Index...... 74 7.1: An Introduction to ODEs with Piecewise Constant Arguments 8.1: General methods for solving Diophantine Equations Preview

PREFACE

This work can best be described as a study of various integer-intensive topics in mathematics. It is currently an exotic blend of discrete math, number theory and analysis. The topics are chosen in part based on their practical importance or theoretical interest. But our main criterion for selection is that our ability to study them in a rigorous and straightforward manner can be improved using the greatest integer function (or integer functions that can be easily expressed in terms of the greatest integer function). In this respect, our approach is unique for many of these topics and it has produced some original results. Although most of the material (and all the material in Part II) is not specifically about the greatest integer function, we need to study the greatest integer function first so that we can better apply its properties and basic applications to the study of these topics. What is the greatest integer function? The greatest integer of x is usually denoted as [x] and it represents the greatest whole number less than or equal to x. For example, [2.00]=2, [2.99]=2, and [−3.14]=−4. This function and its square bracket notation were first introduced by the German mathematician and physicist Johann Carl Friedrich Gauss in 1808 in his third proof of quadratic reciprocity, which is presented in Chapter 4. This function is also known as the floor function, bracket function, or step function, and it is sometimes denoted as x, [[x]], E(x), or int(x). Aside from using int(x) in computations on the TI−83 graphing calculator, this work does not use any alternative name or notation for the greatest integer function in any context. Conversely, wherever square brackets are used throughout this work, the reader may assume that they denote the greatest integer function unless otherwise specified. The advantage to using the square bracket notation is twofold in that 1) it is Gauss's original notation and is therefore universally recognized and 2) it can be read in any text file whereas the L−shaped brackets above are special characters, which means that they are less transferable and less likely to be retrieved in the event the file is corrupted. The latter happened to one of my drafts. The concept of the greatest integer dates back to Ancient Greece. Using a straight edge one unit in length and a compass, the Euclidean Greeks could construct segments with lengths that were not integer multiples of the length of the straight edge. They would represent the length of these segments as the greatest integer multiple of the length of the straight edge and any remaining fraction of its length. Rather than represent a ratio as a fraction, the Euclidean Greeks would often express a ratio as an integer quotient with a remainder. For example, the Euclidean algorithm uses an iterative process of division with a remainder to compute the greatest common of two numbers. The concept is simple enough, so why should you study it? While we know so much more about today than did the ancient Greeks, I believe our understanding of what integers are and how to work with them rigorously has improved little since ancient times. I argue that a narrow concept of the integer is not without consequence.

I believe that a comprehensive study of the greatest integer function is justified for three main reasons. My first objective is to learn more about the properties and applications of integer functions. My second objective is to use integer functions to express more concepts and statements involving integers (such as in number theory, discrete math, and real analysis) mathematically so that we can work with them on a more solid foundation. My third objective is to expand coverage of number theory topics to the extent that this work can serve as a first−year course on the subject. The first reason is an end in itself. A lot of theoretical research is either a generalization or a spin−off of an earlier discipline. For example, the proof that the general quintic equation has no solution by radicals using the fact that its Galois group is not solvable led to a comprehensive study of solvable groups. Since number theory is the study of properties of integers and Gauss hailed it as the queen of mathematics, it seems only natural to expand our study to integer functions. In fact, Gauss’s third and most famous proof of quadratic reciprocity generated further interest in the study of sums involving the greatest integer function. Some notable contributions were made by Eisenstein, Hacks, Stern, and Zeller in the 19th century and by Berndt, Carlitz, and Dieter in the 20th century. The second reason is as practical as it is powerful. It would mean that the study of integer functions would not only unify several areas of mathematics, but it would also help us build a framework to study them in a more rigorous and straightforward manner. I am arguing that no amount of additional theory about integers can substitute for a mathematical expression. I believe that with sufficient study of the greatest integer function we can achieve this objective. First, since the greatest integer function is continuous in the sense that its domain includes all real numbers and discrete in the sense that its range is restricted to integers, the greatest integer function acts as a bridge between the continuous and the discrete by mapping any real number to its integer approximate not greater than itself. It follows that by studying the properties of integers and integer functions, we may be able to use integer functions to express, if not outright discover, relationships between constructs involving integers. Second, by expressing concepts and statements involving integers (including number−theoretic functions) in terms of the greatest integer function, we may be able to manipulate them mathematically using properties of integer functions. This will allow us to simplify, if not solve a greater variety of problems in number theory and construct more direct number theory proofs. For this reason, the study of the greatest integer function is indispensable to our understanding of integers and hence to number theory. Third, I already built up enough theory on the greatest integer function to use it for solving an increasing variety of problems involving integers. If I can get to this point working almost entirely alone, imagine how much further we can go with collaboration. The third reason necessarily follows from the first two. I am certain that if you enjoy number theory, then you will enjoy learning about the greatest integer function. My interest in the greatest integer

function led me to study number theory because the study of integers is the most immediate application of integer functions. I spent years reading number theory texts looking for material on the greatest integer function and for new problems I could solve using my current knowledge of the greatest integer function. The vast majority of the material in my work on the greatest integer function is in discrete mathematics and number theory and between my text and exercise set this work covers at least as much number theory as a semester course on the subject. And this work attempts to present the material in a more straightforward manner than traditional number theory texts. I largely attribute this quality to differences in approach. A major difference is that my work places a greater emphasis on expressing number−theoretic statements mathematically for manipulation in computations and proofs as opposed to working with number−theoretic statements in English from theorems in the text. Another difference is that my work has a common thread throughout and that will continue to be the case as the accumulated material becomes increasingly diverse. In addition, I explain each concept as I understand it and I put great effort into organizing the material so I can explain it as directly and concisely as possible. I also recognize that there are topics in number theory that are not straightforward and that the greatest integer function must be applied cleverly in these contexts. One of my reasons for writing this book is to help the reader develop that skill.

Over the past decade, I spent much of my free time researching the greatest integer function and developing many brilliant concepts involving it. In addition to my years of research on the greatest integer function and its applications in number theory and analysis, some of my work is original and some of these results have the potential to open new avenues of study. Examples include a high−precision series acceleration method for summing ANY monotone decreasing and a very powerful Euclidean-like algorithm for solving a broad range of Diophantine equations. I am humbled by what I have learned about the greatest integer function and I am certain that integer functions merit greater attention. Since there is little mainstream 21st century literature on the greatest integer function beyond its basic properties and applications, it is largely viewed as unimportant. For the most part, the greatest integer function is used for operations involving rounding and truncation. New textbooks ranging from algebra to calculus define it and discuss the discontinuities in its graph; however, they do not discuss it in greater detail. Many older college number theory texts give a more thorough discussion of the greatest integer function; however, they fall far short of fully describing this fascinating function and new editions barely mention it at all. Of all the hours I spent researching books on discrete math and number theory, I found very little on the greatest integer function. And after researching the function on the internet, I only found three books on the greatest integer function. All three were published by the same author, Folke Ryde, a retired and now deceased Swedish mathematician, and they are all different editions of the same work. Since his books are very scarce and no longer in print, I was lucky to obtain one of his books through an interlibrary loan. He discusses and solves (by factorization) several forms of one−variable Diophantine equations involving nested divisions of the greatest integer function as if it were a differential equations text. Although his work is somewhat interesting, he is not clear about its applications and I could not think of any myself. Although interest in the greatest integer function may have died out over the past fifty years, I am asking the mathematical community to reconsider the role that the greatest integer function can play in the advancement of mathematics, especially in number theory. I am also asking for more support for this project. This work is far from a dry theoretical text and it is designed to inspire others to pursue math, especially in the field of number theory. I have found many fascinating applications for the greatest integer function ranging from consumer mathematics to finding expressions for famous number−theoretic functions. This work may also be of interest to the computer scientist. In several instances throughout the text, expressions with the greatest integer function serve as if−statements. In addition, Section 3.2 is devoted entirely to number bases and digits and it includes a subsection on binary digits. Since the material is diverse and applies mathematics at all levels from college algebra through analysis, this work is a good investment for anyone interested in broadly sharpening their mathematical skills.

I organized the chapters into three parts depending upon the level of study. Chapters in Part I are specifically about the greatest integer function. They discuss its properties, its basic applications, and provide a comprehensive treatment for all requisite material. Chapter 1 is mostly an intensive study of the greatest integer function itself. Chapter 2 is an intensive study of the greatest integer function in relation to the following broader topics: elementary number−theoretic functions, congruence relations, and summation and integration. While most of the material is an end in itself, some of it also helps prepare the reader for other levels of study. Chapters in Part II introduce mainstream topics in discrete math, number theory and analysis that we study using the greatest integer function. In this respect, our presentation of the material may differ from mainstream sources, but is not less comprehensive than them. Chapters in Part III are advanced theoretical research either specifically about the greatest integer function or about a theoretical topic we are using the greatest integer function to study. It is also faster−paced, mentioning important requisite material and resources, but with less elaboration or proof. These chapters are omitted from copies of this work published online. I conclude with secrets for becoming a world−class mathematician or at least for living a happier life, which I leave for the appendix. I include them in this work because they are not what you think and you are not likely to read them elsewhere. The wisest readers will use them to make lifestyle changes that prove more valuable than the content from the rest of this work. However, determining what material to include is no less a challenge, even with this organizational criterion. I generally consider the extent to which integer functions are used (just in one or two steps

or throughout or in many terms e.g. in a summand), their ultimate purpose (just for rounding and truncation or also at a deeper level), applications or consequences of these results (the most practical and most general results have the broadest implications) and how much material I already have on the subtopic. For these reasons, the vast majority of results are either left out or left as exercises.

When writing this manuscript, I deviated from convention with the intention of making this an ultra−modern work. Since I believe internet publishing is a greener alternative to hard−copy publishing and is rapidly replacing it, this work is only accessible as a pdf. In addition, it is largely a product of internet sources. It is formatted so that it reads like books in other disciplines. And last, but certainly not least, since the TI-83 is the most popular graphing calculator used in high schools and colleges across the country, I discuss how to use functions and operations specific to the TI-83 for several computational objectives. In my experience, advanced material on the greatest integer function is scarce, forcing me to extract many ideas from internet sources. With articles in the billions, the internet is among the richest sources of information, it is the future of the transmission of information, and it is a greener alternative to hard−copy textbook publishing. In addition, I recommend judging internet sources with an open mind. Although not all information on the internet is credible, hard−copy textbooks are not perfect either. As an example, on page 168 of Joseph H. Silverman's A Friendly Introduction to Number Theory, (2001) he presents the Generalized Law of Quadratic Reciprocity, but omitted that just because (a|q) = 1 for a composite q does not mean that a is a in modulo q. Although he did not make any statements that are mathematically false, I believe he should have mentioned this as an additional caveat concerning the relationship between quadratic residues and Jacobi symbols, which we discuss in Chapter 4. (Even so, I like and use his book because he makes every attempt to explain the material as simply as possible.) I found both Joseph Silverman and the additional caveat on Wikipedia. The reader will notice that this work is formatted differently than most other mathematical texts. Most formulas and mathematical expressions are not displayed on separate lines, but are inserted in the text instead. They are formatted as text and formatted to fit on lines of text. Diagrams are no larger than required. These differences leave less need for spacing. In this respect, this work is much more concise than most other texts. I format the text this way because I believe in being comprehensive yet concise and I believe that as a branch of philosophy, mathematics can read like any other philosophical literature. When breaking this work into segments, I highlighted the first letter of each segment in red so that I would know to leave a blank line before it. Since I thought it looked nice, I decided to leave it this way when publishing consecutive drafts. For the computations discussed in this work, there is no need for a graphing calculator more sophisticated than the TI-83. If you do not know how to perform certain operations or access certain functions on the TI-83 that are described in this work, then consult the index of your TI-83 instruction manual. In my opinion, the manual is underused. If you have had a year of calculus and you know how to use your TI-83, then you should be able to handle the vast majority of the computational objectives in this work. And if you do not have a TI-83, then you can download the software for it from the Texas Instruments website for free and then use it on your laptop or desktop computer for a time. The link is https://education.ti.com/en/downloads/trial-software. Since all formulas and algorithms are first described mathematically, you can follow along even without experience using a graphing calculator. Showing how to perform computations on the TI-83 is integral to this work because a) the reader will undoubtedly work out all but the simplest using a machine and b) there are many mathematical operations, functional properties of keys, and quirks specific to the TI-83 that when fully understood and accounted for can make the difference between a user-friendly algorithm and a cumbersome one.

My ultimate goal is to found and then temporarily fund The Greatest Integer Function Research Project, GIFRP. It would be a non-profit collaborative research project devoted to the study of the properties and applications of integer functions and adopted by an accredited university. Researchers from PhD students to tenured professors from anywhere in the world would periodically collaborate to edit the work and add new material. A board would approve any changes to the work. Graduate students would work out solutions to some exercises and propose new ones. Parts I and II would be available for free online and active members of the GIFRP would make presentations on highlights of the material to convince professors to teach a semester or one-year course on it. There is already enough material for it to make for a unique and quite spectacular first-semester course on number theory. Other parts of it can add spice to a course on real analysis by helping students identify with the material through interesting and often unique examples and by using the theory in practical and advanced applications. Its sources of funding could include publication of advanced results in Part III as an eBook, teaching courses on Parts I and II and commercializing the exercise set, and of course grants.

While I completed nearly all of the research on my own, I could never publish this book working alone. I offer my gratitude to my family for providing unceasing encouragement and support. Moreover, I applaud Dr. Gerald M. Funk, professor of Statistics at Loyola University of Chicago, for volunteering for an independent study on the subject and editing one of my drafts while I was an undergraduate. He helped me learn how to write like a mathematician. Furthermore, I thank everyone who helped me find literature on the greatest integer function or on any subject that gave me ideas which advanced this work.

"God created the integers; everything else is the work of man." −−Leopold Kronecker (1823−1891), German Mathematician and Professor at the University of Berlin______

CHAPTER 1: An Introduction to the Greatest Integer Function

What is the greatest integer function? The greatest integer of x is the greatest whole number less than or equal to x. It is denoted as [x] throughout this work, except where otherwise specified. For instance, [2.00] = 2, [2.99] = 2, and [−3.14] = −4. Notice that |−3.14| < |[−3.14]|; this property holds for all negative non-integers. Aside from using int(X) in operations involving the TI-83 graphing calculator, this work does not use any alternative name or notation for the greatest integer function in any context. Conversely, wherever square brackets are used throughout this work, the reader may assume that they denote the greatest integer function unless otherwise specified. Throughout the text, we discuss creative ways for using the greatest integer function to solve a broad range of math problems and express many statements involving integers mathematically. So, let’s get started!

Section 1.1: Rounding, Truncation, and Transformations

If x is a real number, then there is a largest integer not greater than x. An integer is a whole number and is in the infinite set {…,−3, −2, −1, 0, 1, 2, 3,…}. Notice that the list of integers is consecutive. For example, there are no integers between 2 and 3 even though 3 is a full unit greater than 2. Since consecutive integers are spaced a unit apart on the real number line, any finite interval contains a finite number of integers and hence a largest one. Using a similar line of reasoning, suppose that x is a real number and that we partition the set of integers at x into the set of integers not greater than x and the set of integers greater than x. The Greatest integer axiom states that there is a largest integer, k, in the set of integers not greater than x. This axiom cannot be proved (at least not without the use of another axiom). It merely follows from the definition of the integer. It is also the fundamental assumption upon which this work is based. One of the most interesting aspects of the greatest integer function is that even though it is a non-constant function, every point is a critical point. Proof: Let k be any integer and f be any positive fraction part so that 0 < f, 1 − f < 1. By the Greatest integer axiom, if x is not an integer, then x = k + f for some k and f. Since the value of [x] at x = k + f is k by definition, the value of [x] is independent of the value of f. Therefore, the derivative of [x] at x = k + f is zero. At x = k, [x] = k and at x = k − f, [x] = [k − f] = [(k − 1) + (1 − f)] = k − 1. Since [k − f] + 1 = k = [k + f], the graph of [x] has a positive jump discontinuity of one unit at x = k (as depicted by the gray vertical line segments below). Therefore, the derivative of [x] at x = k is said to be +, or undefined. Since the derivative of [x] is undefined for all integers and zero for all non-integers, every point is a critical point. For the rest of this work, except where otherwise specified, let k denote an integer and f denote a fraction part.

We provide a complete graph of y = [x] above. The black horizontal line segments are called wafers. Each wafer at y = k extends over [k, k + 1) for x. The gray vertical line segments are called jump discontinuities. Each jump discontinuity at x = k extends over [k − 1, k) for y. It does not include the point (k, k) because that point is on a wafer. Notice that the wafers and jump discontinuities are exactly a unit in length and connect (but do not intersect) at integer coordinates. Moreover, since limf0⁺ [k + f] = [k + 0] = k, y = [x] is right-continuous at x = k. The greatest integer function is both one-to-one and onto, but only for the set of integers. Proof: A function is one-to-one if for every x ≠ x', f(x) ≠ f(x') and onto if for each value of y there is a corresponding value of x such that y = f(x). As demonstrated above, if x = k and x' = k + f where 0 < f < 1, then x ≠ x', but [x] = [x']. Moreover, y must be an integer to have a corresponding value of x such that y = [x] by definition. So, in general, [x] is neither

one-to-one nor onto. However, if x an integer, then x = [x]. So, over the subset of integers, the function y = [x] is y = x, which is both one-to-one and onto because if x and x' are distinct, then y and y' are distinct and for any integer, y, there is a corresponding integer, x, such that y = x.

Since the transformation x[x] acts as a bridge from the real numbers to the integers, there are a few important logical iterations of the greatest integer function. Logical iterations of the greatest integer function allow us to express statements about integers mathematically. The statement 'y is an integer' is logically equivalent to y = [y]. Moreover, the statement 'y is a function of an integer' is logically equivalent to y = f([x]) because the values of y are restricted to the values of f(x) for integer values of x. If y = [f(x)], then y is an integer. And if y = [f([x])], then the values of y are integers restricted to the values of [f(x)] for integer values of x. We call this last iteration, [f([x])], a sandwich iteration because f(x) is sandwiched between two iterations of the greatest integer function, one for x and one for y. Any function, f(x), is a sandwich iteration iff f(x) = [f([x])] for all real x. The proof is fairly straight forward and is left as an exercise. The sandwich iteration is a powerful concept that is explored further throughout this work and is used for block-truncation and solving Diophantine equations. Example 1: Let y = (100x + 1)/667. Express each of the following statements about x and y mathematically in terms of the variable that is not necessarily integer valued: a) x is an integer; b) y is an integer; and c) y is an integer over a domain restricted to integers. Solution: a. Since y = (100x + 1)/667, x = (667y − 1)/100. And since x is an integer, x = [x]. Therefore, (667y − 1)/100 = [(667y − 1)/100]. b. Since y is an integer, y = [y]. Therefore, (100x + 1)/667 = [(100x + 1)/667]. c. We already handled the case where y is an integer, but how do we handle the condition on its domain? The input variable must be restricted to integers, but restricting the domain of y to integers does not necessarily restrict x to integers. Since [x] is necessarily restricted to integers, replacing x with [x] restricts the domain of y to integers. Therefore, (100[x] + 1)/667 = [(100[x] + 1)/667]. In Section 5.1, we use this relation to find integer solutions to one-variable equations. Before continuing, we introduce some important notation. For the rest of this work, wherever we introduce a new variable, we implicitly or explicitly categorize it into one of the following four sets of real numbers. If y = |y| or if y = −|y|, then we say y is a real number, or yℝ (y belongs to the set of real numbers). In this work, except where otherwise specified, all variables are real-valued. If y is a quotient (or ratio) of integers, then we say y is rational, or yℚ. If y = [y], then we say y is an integer, or yℤ (from Zahlen, the German word for numbers). If y is a positive integer, then we say y is a , or yℕ. We always categorize each new variable in its most exclusive set. For example, if y is an integer, then the statement yℚ, though correct, is incomplete.

The greatest integer function was designed for rounding any real number to an integer multiple of a positive number. Let j > 0, n = jx, and [x] = k so that x = k + f if x is not an integer. Then, i. n rounded down to the nearest j is j[n/j], or jk; ii. n rounded to the nearest j is j[n/j + ½]; and iii. n rounded up to the nearest j is −j[−n/j]. Proof: i. Since k is the greatest integer not greater than x, jk is the greatest integer multiple of j not greater than n. Since x = n/j, the truncation of n to the nearest j is j[n/j]. For instance, when rounding 35 down to the nearest 8, we divide 35 by 8 to obtain 4.375. Thus, x = 4.375, k = 4, and jk = 84 = 32, which is the greatest integer multiple of 8 not greater than 35. As a corollary, if n > 0, then the number of positive integer multiples of j that are not greater than n is [n/j]. ii. By definition, n rounded to the nearest j is jk if f < ½ and j(k + 1) if f ≥ ½. Since k + f is rounded up to k + 1 if f ≥ ½, the truncation must be shifted up a half unit. In order to shift the truncation up a half unit, we add ½ to x. This way, if f does not exist or f < ½, then ½ ≤ f + ½ < 1 and hence [k + f + ½] = k. And if f ≥ ½, then 1 ≤ f + ½ < 1½ and hence [k + f + ½] = k + 1. So, n rounded to the nearest j is j[n/j + ½]. iii. If −j[−n/j] rounds n up to the nearest j, then −j[−n/j] leaves n = jk unchanged and rounds n = j(k + f) up to j(k + 1). Since −j[−(jk)/j] = −j(−k) = jk and −j[−j(k + f)/j] = −j[−(k + f)] = −j[−(k + 1 − 1 + f)] = −j[−(k + 1) + (1 − f)] = −j(−(k + 1)) = j(k + 1), −j[−n/j] rounds n up to the nearest j. It follows that the greatest integer function is also useful for counting the number of integers (or integer multiples of a positive real) over a finite interval of positive width. First, we determine the number of integers in the closed interval, [x, x']. The smallest integer in the interval is x rounded up to the nearest integer, or −[−x]. And the largest integer in the interval is x' rounded down to the nearest integer, or [x']. So, the number of integers over [x, x'] is [x'] + [−x] + 1. Next, we determine the number of integers in the open interval, (x, x'). The smallest integer

in the interval is x rounded up to the nearest integer if xℤ and x + 1 if xℤ, or [x] + 1. And the largest integer in the interval is x' rounded down to the nearest integer if x'ℤ and x' − 1 if xℤ, or −[−x'] − 1. So, the number of integers over [x, x'] is −[−x'] − 1 − ([x] + 1) + 1 = −[−x'] − [x] − 1.

The greatest integer function has several additional properties involving truncation. Although the greatest integer function is not a linear transformation, it possesses traits of linearity under certain remarkably frequent conditions. A function, f(x), is a linear transformation iff f(x + y) = f(x) + f(k) and f(cx) = cf(x). It follows that if the domain of the greatest integer function were restricted to integers, then the greatest integer function would be a linear transformation. i. If x is a real number and k is an integer in the former qualification, then linearity is closed under addition. ii. If x and c are real numbers with c = [cx]/[x] in the latter qualification, then linearity is closed under multiplication. Proof: i. Since k is an integer, its fraction part is zero. So, [x + k] = [(x − [x]) + [x] + [k]]. Since 0 ≤ x − [x] < 1 and the fraction part of [x] + [k] is zero, [(x − [x]) + [x] + [k]] = [[x] + [k]] = [x] + [k]. ii. Let c = [cx]/[x]. Then by definition, [cx] = ([cx]/[x])[x] = c[x]. The reader may verify that this latter condition always holds for c = 0 and c = 1. For all real x and y, [x] + [y] ≤ [x + y] ≤ [x] + [y] + 1. Proof: [x + y] = [[x] + (x − [x]) + [y] + (y − [y])] = [x] + [y] + [x − [x] + y − [y]]. Since [x] ≤ x and [y] ≤ y, x − [x] and y − [y] are non-negative and hence the greatest integer of their sum is non-negative. Thus, [x] + [y] ≤ [x + y]. Moreover, since [x] ≤ x < [x] + 1 and [y] ≤ y < [y] + 1, we conclude that 0 ≤ x − [x] + y − [y] < 2. So, 0 ≤ [x − [x] + y − [y]] ≤ 1. Thus, [x + y] ≤ [x] + [y] + 1. Therefore, [x] + [y] ≤ [x + y] ≤ [x] + [y] + 1. For all x and all nℕ, [x/n] = [[x]/n]. Proof: Since [x/n] ≤ x/n < [x/n] + 1, n[x/n] ≤ x < n[x/n] + n. Since n[x/n] is an integer, n[x/n] ≤ [x] < n[x/n] + n and so [x/n] ≤ [x]/n < [x/n] + 1. It follows that [x/n] = [[x]/n] and the theorem is proved. As a corollary, if n = dk for natural numbers d and k, then [[x/d]/k] = [x/n]. Proof: Since n = dk, [x/n] = [x/(dk)] = [(x/d)/k] = [[x/d]/k]. This property is called the property of nested divisions. The above result can be generalized to all monotone increasing functions as follows. If f(x) is any continuous monotone increasing function with the property that if f(x)ℤ then xℤ, then [f(x)] = [f([x])]. Proof: Based on the restrictions of f(x), in order for f(x) to increase to [f(x)] + 1, x must be increased to an integer exceeding [x]. It follows that [f(x)] is frozen except at integer values of x and hence [f(x)] = [f([x])]. This next result is known as the division algorithm. Let n be a positive integer and m be any integer. Then, there exists unique integers for the quotient, q, and remainder, r, such that m = nq + r and 0 ≤ r < n given by q = [m/n]. Proof: Since q = (m − r)/n and 0 ≤ r < n, m/n − 1 < q ≤ m/n. If m/n is an integer, then the inequality forces q = m/n = [m/n] and r = 0. If m/n is not an integer, then by the Greatest integer axiom, m/n = k + f for a unique integer, k = [m/n], and fraction part, f, with 0 < f < 1. By substitution, k + f − 1 < q ≤ k + f. Since 0 < f < 1, we have k − 1 < q < k + 1. Since q is an integer, the inequality forces q = k = [m/n]. So in either case, q = [m/n] and r = m − n[m/n].

Many well-known integer functions can be expressed explicitly in terms of the greatest integer function. Several examples are stated below. The first three functions listed are operations on the TI-83. The function, iPart(x), pronounced ‘integer part of x or fix(x)’ truncates x by discarding its fraction part. For instance, fix(2.7) = fix(2 + 0.7) = 2 and fix(−3.1) = fix(−3 − 0.1) = −3. And fix(x) = |x|[|x|]/x provided x ≠ 0. Proof: If x > 0 and x = k, then fix(k) = k = k[k]/k. If x > 0 and x = k + f, then fix(k + f) = k = (k + f)[k + f]/(k + f). Since fix(x) truncates all x > 0 down to the nearest integer, fix(x) = [x] for all x > 0. If x < 0 and x = k, then fix(x) = k = −k[−k]/k. If x < 0 and x = k + f, then x = k + 1 + f − 1 = (k + 1) − (1 − f) and fix(x) = fix((k + 1) − (1 − f)) = k + 1 = −[−(k + f)] = −(k + f)[−(k + f)]/(k + f). Since fix(x) rounds all x < 0 up to the nearest integer, fix(x) = −[−x] for all x < 0. Although fix(0) = 0, our formula has a removable discontinuity at x = 0. Similarly, the function, fPart(x) (fraction part of x) returns the fraction part of x by discarding the integer part of x. For instance, fPart(2.3) = fPart(2 + 0.3) = 0.3 and fPart(−1.8) = fPart(−1 − 0.8) = −0.8. And fPart(x) = x − |x|[|x|]/x provided x ≠ 0. Proof: The statement that fPart(x) discards the integer part of x is equivalent to the statement that fPart(x) subtracts iPart(x) from x. Likewise, fPart(x) = x − iPart(x) = x − |x|[|x|]/x. The rounding function, round(x,n), rounds x to n places to the right of the decimal point for all x and any natural number, n. For instance, round(log(282), 2) = 2.45. The function round(x,n) = [10nx + ½]/10n. Proof: As stated above, y[x/y + ½] rounds x to the nearest y. Since n represents the placement value of the nth digit to the

right of the decimal point, y = 10−n. The ceiling function rounds any number, x, UP to the nearest integer. We demonstrated above that −[−x] rounds x up to the nearest integer. Further, if x is a rational of the form n/m, then the ceiling of x is [(n + m − 1)/m] because the minimum fraction part of n/m, if any, is 1/m. The mod function, x mod m, returns the remainder of x/m for any xℤ and any natural number, m. That is, if x = mq + r with 0 ≤ r < m, then x mod m = r. For instance, 12 mod 4 = 0 and 11 mod 9 = 2. Therefore, x mod m = x − m[x/m]. Proof: This result follows at once from the division algorithm above. And two important additional properties of the mod function follow. That is, i. a mod m = b mod m iff (a − b) mod m = 0 and ii. (a + b) mod m = (a + b mod m) mod m. Proof: i. Using our expression for the mod function, a mod m = b mod m means a − m[a/m] = b − m[b/m]. So, a − b = m([a/m] − [b/m]). Since the right hand side is obviously an integer multiple of m, its remainder when divided by m is zero. ii. Using our expression for the mod function, (a + b mod m) mod m = (a + b − m[b/m]) − m[(a + b − m[b/m])/m] = (a + b) − m[b/m] − m[(a + b)/m] + m[b/m] = (a + b) − m[(a + b)/m] = (a + b) mod m. The mod function also returns the principal value, or least non-negative residue, of x in a congruence relation, which is a generalization of the mod function. Congruences are introduced in Chapter 2. The standard sawtooth wave function, denoted as ((x)), is defined as ((x)) = {0 if xℤ and x − [x] − ½ if xℤ}. Although the sawtooth wave function is discontinuous at integers, it is often expressed as an infinite sine series, which is both continuous and integrable. Later in this chapter, we construct a Fourier sine series for it.

Integer functions allow us to construct mathematical models for real-world functions that are based on the operations of rounding and truncation. They can be used to express just about any function with uniform thresholds. For this reason, many calculators and computer programs provide integer function operators. The TI-83 provides several of them. Just about every important integer function can be expressed in terms of the greatest integer function and other elementary functions. For this reason, as we apply integer functions to material throughout the text, they will usually not be addressed by their name, but in terms of the greatest integer function. Example 2: Most cellular phone companies charge for an integer number of minutes. If a phone company charges $0.35 per minute and rounds up to the nearest minute, how much money does the company charge someone who talks for x minutes? Solution: If m is the number of minutes rounded up to nearest minute, then the charge is $0.35m. Since m is x rounded up to the nearest integer, m = −[−x]. Therefore, the company charges $0.35(−[−x]) after x minutes. Example 3: a. Create a mathematical model for a probability bar graph with the binomial distribution in percent for an event with a 10% probability of occurrence with twelve opportunities for occurrence. b. Graph the probability mass function (pmf) and cumulative distribution function (cdf) in percent for the minimum number of craps games a gambler must play to win three times. Assume the probability of winning each game is 244/495. Solution: a. The binomial probability mass function (pmf) represents the probability that the event has 12 k 12−k 12 occurred exactly k out of 12 tries. The formula for the distribution is P(k) = 100( 푘 )0.1 0.9 where ( 푘 ) = 12!/(k!(12−k)!). Since the study of probability theory is not an objective of this work, we do not discuss the pmf further except to note that it is a discrete function defined only at integers. At this point, our goal is to transform the discrete function into a piecewise-continuous bar graph defined for all real x. We want to create a bar graph where each bar is a unit wide and has a height of P(k) so that its area is P(k). In addition, we want to have a jump discontinuity in the bar graph from P(k − 1) to P(k) at x = k. This way, P(k) is the height of the bar graph over the interval [k, k + 1). It follows that our mathematical model for P(k) is P([x]). Nevertheless, this mathematical model does not please all statisticians. Since the bar graph below models a pmf, the bar graph is also referred to as a histogram. The bars in a histogram are of uniform length; however, each bar is centered at (not bordered by) each possible integer value for the discrete, random variable. Since the width of each bar is a unit, the binomial distribution is transformed into a centered bar graph when x is rounded to the nearest integer, meaning k is replaced with [x + ½]. b. The negative binomial probability mass function represents the probability that the gambler must play exactly k games to win three times. The formula for its pmf is P(k) = 푘−1 3 k−3 100( 3−1 )(244/495) (251/495) . Again, P(k) is a discrete function defined only at integers because it is impossible to win a third time before a game it is finished. Casino rules ensure this. Let C(k) denote the probability the gambler has already won at least three times after playing k games. Then, C(k) = 100 − 푘 0 k 푘 1 k−1 푘 2 k−2 100{( 0 )(244/495) (251/495) + ( 1 )(244/495) (251/495) + ( 2 )(244/495) (251/495) }. The expression for

C(k) simplifies to C(k) = 100 − 100{(244k/251)2/2 + (129/251)(244k/251) + 1}(251/495)k. Moreover, C(k) must be defined for all real x because the gambler could already have three wins while in the middle of a game. However, since P(k) is undefined at non-integers, C(k) cannot accumulate over these intervals. So, the probability that the gambler has at least three wins after k games is the same as that after k + f games. For this reason, C(k) should be defined for all real x even if P(k) is not. For this reason, we replace the random variable, k, with [x] so that C([x]) = 100 − 100{(244[x]/251)2/2 + (129/251)(244[x]/251) + 1}(251/495)[x]. Unlike in part a, we are not [푥] modeling a bar graph for a discrete function. In general, if P(k) is a pmf, then its cdf, C([x]) = ∑ 푘=−P(k). So, P(k) is the magnitude of the jump in the graph of C([x]) at x = k. The probability bar graphs for parts a) and b) are shown below.

Example 4: Suppose that a high school English teacher weighs all minor assignments equally regardless of their point value and records each grade rounded to the nearest percent. Determine a formula to evaluate a student's grade, g, on any specific minor assignment given his or her raw score, x. Solution: Since x is the number of correct answers divided by the total number of answers, x ranges from zero to one and hence x is multiplied by 100 when converted into percent. The score is then rounded to the nearest percent. Since 100x rounded to the nearest percent is [100x + ½], g(x) = [100x + ½]. Example 5: You are traveling by train from a suburb to a city to give a presentation. a. An empty seven-car train pulls up to the station. Fifty people board the train. At least one car has at most M people and at least one car has at least L people. Determine the value of M + L. b. The train station has electronic depositing and vending. In order to attract price-elastic customers, the vending machine gives a bonus $1 for every integer multiple of $10 deposited. Since you have no money on your fare card, you deposit $d. How much money will you have on your fare card after making the deposit? Solution: a. The average number of people in a car is 50/7. Since a natural number of people must be in each car, some cars must have fewer passengers while others must have more. So, at least one car has at most 7 passengers and at least one car has at least 8 passengers. So, M + L = 15. This is an example of the generalized pigeonhole principle, which states that if n elements are divided into k sets, then at least one set has at most [n/k] elements and at least one set has at least −[−n/k] elements. In other words, if n + 1 pigeons are in n holes, then at least one hole must contain at least two pigeons. Although the pigeonhole principle is elementary, it is used extensively in number theory both in computations and proofs. b. The amount on your fare card will be $d and a bonus. Since you earn a bonus dollar for every integer multiple of $10 deposited, you earn [d/10] bonus dollars. So, you will have a total of $(d + [d/10]) on your fare card.

The greatest integer function is also useful in consumer mathematics. Suppose that a consumer has $d to spend on a particular good; the good is sold in a basic unit in small packages; the good is sold at a discount in a large unit that is an integer multiple of the basic unit in large packages; and the consumer must purchase a natural number of packages, if any. Then, the consumer should purchase as many large discounted packages as $d allows and thereafter purchase smaller packages with the money left over. With these assumptions, we can determine how much of a particular good a consumer can purchase with $d and the cost of any integer multiple of a basic unit of that good. In cases where the amount is a seemingly continuous variable, assume a very small basic unit. Example 6: Johnny Harrison is going to host a Halloween party at his house and invite several of his friends from school over. He knows his friends like M&Ms, so he buys as many as $d will allow at the local convenience store. He can buy a small 2-ounce bag of M&Ms for $2.17 or a large 10-ounce bag of M&Ms for $9.79. Assume that Johnny can only purchase an integer number of bags and sales tax is included in the price. a. Write an expression for the maximum amount of M&Ms Johnny can buy with $d in terms of d and evaluate the expression for d = 30 and d = 50. b. Johnny's parents want to offer each of his friends at least three small bags’

worth (6 ounces) of M&Ms so that they would not trick-or-treat after his party ends. Determine the minimum cost of 42 ounces of M&Ms. c. Suppose that the convenience store raises the price of its candy in anticipation of Halloween. Now, he can buy a small 2-ounce bag of M&Ms for $2.69 or a large 10-ounce bag of M&Ms for $9.99. Determine the minimum cost of 44 ounces of M&Ms and 48 ounces of M&Ms. Then, write an equation for the minimum cost in dollars, d, in terms of m ounces of M&Ms. Since M&Ms are sold in a basic unit of two ounces, assume m is even. Solution: a. Since the large bag has five times as many M&Ms as the small bag, but only costs 9.79/2.17, or 4.51 times as much, the large bag is sold at a discount. Hence, we assume that Johnny will buy the large bag iff (if and only if) he can afford it. We also assume that Johnny will purchase as many small bags as he can iff he cannot afford another large bag. Johnny can purchase [d/9.79] large bags for $9.79[d/9.79]. He then has d − 9.79[d/9.79] = d1 dollars left over. He can purchase [d1/2.17] small bags thereafter. Therefore, he buys 10[d/9.79] ounces of M&Ms in large bags and 2[d1/2.17] ounces of M&Ms in small bags. Hence, with $d, Johnny can buy a total of 10[d/9.79] + 2[(d − 9.79[d/9.79])/2.17] ounces of M&Ms. So, Johnny can buy 30 ounces of M&Ms with $30 and 50 ounces of M&Ms with $50. b. For the maximum discount on 42 ounces of M&Ms, Johnny must purchase [42/10] large bags for a total of 10[42/10], or 40 ounces of M&Ms. Thereafter, he must purchase 42 − 40 ounces, or one small bag of M&Ms. So, Johnny must pay $9.794 + $2.171, or $41.33. c. As in part b, for the maximum discount on 44 ounces of M&Ms, Johnny must purchase four large bags and two small bags for a total cost of $9.994 + $2.692, or $45.34. At first glance, you might think that for the maximum discount on 48 ounces of M&Ms, he must purchase four large bags and four small bags for a total cost of $9.994 + $2.694, or $50.72. Then again, if Johnny is clever enough, he would notice that he can buy five large bags for $49.95. Since he would have to purchase two more ounces of M&Ms to get the discount on another large bag, the minimum cost of 48 ounces of M&Ms is really $49.95. In general, when solving problems of this type, we must consider two cases. In the first case, the cost of the two small bags is less than that of one large bag. However, in the second case, the cost of the four small bags is more than that of one large bag even though the large bag has more M&Ms. Since we can only write one expression for cost, we must combine both cases. Regardless of the case, we can determine how many large bags he should buy using the greatest integer function. He must buy at least [m/10] large bags. Whether he should buy an additional large bag is dependent upon the remainder when m is divided by 10. If m mod 10 < 8, then he purchases three or fewer small bags for at most $8.07. If m mod 10 = 8, then he purchases an additional large bag for $9.99 instead. So, our intention is to round m/10 down to the nearest integer if m mod 10 < 8 and round m/10 up to the nearest integer if m mod 10 = 8. Since adding 2 to 8 makes an even 10, he purchases [(m + 2)/10] large bags. Thereafter, he must purchase (m − 10[m/10])/2 small bags if m mod 10 < 8 and 0 otherwise. In order to combine these two cases, we must multiply (m − 10[m/10])/2 by a function of m, f(m), such that f(m) = {1 if m mod 10 < 8 and 0 otherwise}. With a little ingenuity, we can express f(m) in terms of m. As stated above, [m/10] always rounds m/10 down to the nearest integer, but [(m + 2)/10] rounds m/10 down to the nearest integer if m mod 10 < 8 and up to the nearest integer if m mod 10 = 8. So, their difference, [m/10] − [(m + 2)/10], is 0 if m mod 10 < 8 and −1 if m mod 10 = 8. Hence, [m/10] − [(m + 2)/10] + 1 = f(m). With a little algebra, we can express the number of small bags purchased as m/2 − 5[m/10] and f(m) more concisely as [m/10] − [(m − 8)/10]. So, d = 9.99[(m + 2)/10] + 2.69(m/2 − 5[m/10])([m/10] − [(m − 8)/10]). Example 7: The 10-point grading scale used by a history professor at a university has the following cutoffs: 90% 4.0; 87% 3.5; 80% 3.0; 77% 2.5; 70% 2.0; 67% 1.5; 60% 1.0. Express the number of grade-points, g, in terms of the grade in the course, x, where x is in percent. Assume that 60 < x < 97. Solution: We must first consider this scenario as a consumer problem. At the end of the semester, each student has a numerical grade of x% and must use it to purchase grade points. According to the grading scale, the first grade point costs 60% and every grade point thereafter costs 10%. In order to make the scale more uniform, we subtract 50 from x so that each integer grade point costs 10%. So, each student purchases [(x − 50)/10] integer grade points. And after purchasing an integer number of grade points, some students can afford to purchase half a grade point for 7%. Since integer grade points cost 10%, each student has (x − 10[x/10])% left over. Then each student purchases [(x − 10[x/10])/7] half-grade points. Hence, g(x) = [(x − 50)/10] + [(x − 10[x/10])/7]/2.

Section 1.2: An Analysis of the Greatest Integer Function

In this section, we analyze the deeper mathematical properties of the greatest integer function. While we consider the study of the greatest integer function an end in itself, the knowledge gained helps us work with the greatest integer function on a more solid foundation. We start with a thorough analysis of its continuity and discontinuities. Although we touched on continuity when introducing the greatest integer function in the previous section, we have yet to explore some of its deeper properties, which we now investigate using limits. We conclude by expressing the greatest integer function mathematically, that is without using the bracket or int( notations.

Using limits, we can determine the extent of discontinuity of functions involving the greatest integer function. We define the magnitude of the jump in f(x) at x as limh0⁺ (f(x + h) − f(x − h)). We say that f(x) is continuous at x if the magnitude of the jump in f(x) at x is zero and limh0⁺ (f(x + h) − f(x)) = 0. If the former condition is met, but not the latter, then f(x) has a removable discontinuity at x. Further, where f(x) is differentiable, we define the derivative of f(x) at x as f ′(x) = limh0⁺ (f(x + h) − f(x − h))/(2h). If f(x) is differentiable at x, then f(x) is continuous at x, but the converse may not hold. For this reason, in the examples below, we evaluate f ′(x) everywhere except where at least one of the terms in f(x) has a jump discontinuity. Example 1: Let f(x) = [x] + [−x] + 1. Determine f ′(x) or the magnitude of the jump in f(x) at all x. Solution: Since f(x) = [x] + [−x] + 1, the magnitude of the jump in f(x) at x is limh0⁺ {([x + h] + [−(x + h)] + 1) − ([x − h] + [−(x − h)] + 1)} = limh0⁺ {[x + h] − [x − h] + [−x − h] − [−x + h]}. If xℤ, then f(x) does not have a jump discontinuity at x. For a sufficiently small h > 0, [x + h] = [x − h] and [−x − h] = [−x + h]. So, f ′(x) = limh0⁺ (f(x + h) − f(x − h))/(2h) = 0 for all xℤ. If xℤ, then some terms in f(x) have a jump discontinuity at x. If x = k, then for a sufficiently small h > 0, limh0⁺ ([k + h] − [k − h] + [−k − h] − [−k + h]) = k − (k − 1) − k − 1 + k = 0. So, the magnitude of the jump in f(x) at x = k is 0. It follows that f(x) may be continuous at x = k and hence at all x, so we determine if limh0⁺ (f(x + h) − f(x)) = 0 at x = k to verify that. At x = k, limh0⁺ (f(x + h) − f(x)) = ([k + h] + [−(k + h)] + 1) − ([k] + [−k] + 1) = −1. So, even though the magnitude of the jump in f(x) at x = k is zero, f(x) is not continuous at x = k because its graph jumps up a unit at k and then crashes a unit just to the right of k. In other words, f(x) has a removable discontinuity at integers. It turns out that we could have also defined f(x) above as f(x) = {1 if xℤ and 0 if xℤ}. Example 2: Let f(x) = (−[x]2 + (2x − 1)[x])/2. Determine f ′(x) or the magnitude of the jump in f(x) at all x. Solution: Since f(x) = (−[x]2 + (2x − 1)[x])/2, the magnitude of the jump in f(x) at x is 2 2 limh0⁺ {(−[x + h] + (2(x + h) − 1)[x + h])/2 − (−[x − h] + (2(x − h) − 1)[x − h])/2} = 2 2 limh0⁺ {(x + h − ½)[x + h] + ([x − h] − [x + h] )/2 + (h − x + ½)[x − h]}. If xℤ, then f(x) does not have a jump discontinuity at x. For a sufficiently small h > 0, [x + h] = [x − h] = [x]. So, f ′(x) = limh0⁺ (f(x + h) − f(x − h))/(2h) = (2h)[x]/(2h) = [x] for all xℤ. If xℤ, then some terms in f(x) have a 2 jump discontinuity at x. If x = k, then for a sufficiently small h > 0, limh0⁺ {(k + h − ½)[k + h] + ([k − h] − [k + 2 h] )/2 + (k − x + ½)[k − h]} = limh0⁺ (2hk − h) = 0. So, the magnitude of the jump in f(x) at x = k is 0. It follows that f(x) may be continuous at x = k and hence at all x, so we determine if limh0⁺ (f(x + h) − f(x)) = 0 at x = k to 2 2 verify that. At x = k, limh0⁺ (f(x + h) − f(x)) = limh0⁺ {(−[k + h] + (2(k + h) − 1)[k + h])/2 − (−[k] + (2k − 1)[k])/2} = limh0⁺ hk = 0. So, f(x) is everywhere continuous even though some of its terms are not. Since f(x) is continuous at all x and f ′(x) = [x] except at countably many points, f(x) is a continuous antiderivative of [x].

We can also determine the limit of the greatest integer of a convergent infinite sequence. Let {sn}nℕ denote an infinite sequence and let sn denote the nth term in the sequence. If {sn} converges to a constant, c i.e. limnsn = c, then {sn} is called a Cauchy sequence. Cauchy sequences are named after the French mathematician and physicist Augustin-Louis Cauchy because he gave the formal definition of convergence that is used today. Before continuing, we want to be precise about what convergence means. By definition, an infinite sequence converges to the constant, c, iff for all  > 0 there is a first term in the sequence, n', such that |c − sn| <  for all n > n'. Let {sn} denote a Cauchy sequence with limnsnℤ. Then [limnsn] = limn[sn]. Proof: Let limn sn = c and c = k + f for some k. Then, [limnsn] = k. Next, we must show that limn[sn] = k. By definition of convergence, for all  > 0 there is a first term in the sequence, n', such that |c − sn| <  for all n > n'. Since 0 < f, 1

− f < 1, we set  = min(f, 1 − f). Then, the sequence has a first term, n', such that for all n > n', k < sn < k + 1 and hence limn[sn] = k. Therefore, if limnsnℤ, then [limnsn] = limn[sn]. Let {sn} denote a monotone Cauchy sequence with limnsnℤ. If {sn} is monotone decreasing, then limnsn = limn[sn] and if {sn} is monotone increasing, then limnsn = limn[sn] + 1. Proof: Let limnsn = k. If {sn} is monotone decreasing, then the sequence has a first term, n', such that sn − k < 1 for all n > n'. And if sn − k < 1 and {sn} is monotone decreasing, then 0 < sn − k < 1; k < sn < k + 1; and so [sn] = k. Since [sn] = k for all n > n', limn[sn] = k. Therefore, if {sn} is monotone decreasing and limnsnℤ, then limnsn = limn[sn]. If {sn} is monotone increasing, then the sequence has a first term, n', such that k − sn < 1 for all n > n'. And if k − sn < 1 and {sn} is monotone increasing, then 0 < k − sn < 1; −k < −sn < −k + 1; k > sn > k − 1; and so [sn] = k − 1. Since [sn] = k − 1 for all n > n', limn[sn] = k − 1. Therefore, if {sn} is monotone increasing and limnsnℤ, then limnsn = limn[sn] + 1. −n −n Example 3: Evaluate a. limn[sn] where sn = 3(1 + e ) and b. limn[sn] where sn = 3(1 − e ). −n −n −n Solution: a. Since limne = 0, limn3(1 + e ) = 3. Since 3(1 + e ) > 3 for all n, {sn} is monotone −n −n −n decreasing and hence limn[sn] = 3. b. Since limne = 0, limn3(1 − e ) = 3. Since 3(1 − e ) < 3 for all n, {s } is monotone increasing and hence lim [s ] = 2. n n n

2  Example 4: Let f(x) = [xsin(90/x)]/(x) where sine is in radians. Express ∑ 푘=1 f(k) in exact form using the  2  identity ∑ 푘=11/(k) = ⅙. Assume ∑ 푘=1 f(k) converges and {ksin(90/k)} is monotone increasing for all k > 57. Solution: At first glance, this example seems like a tall order, but upon closer inspection, we see how we can  express ∑ 푘=1 f(k) as a finite sum. Since limx xsin(90/x) = 90 and xsin(90/x) < 90 for all x, limx [xsin(90/x)] = 89. Since {ksin(90/k)} is monotone increasing for all k > 57 and 347 is the largest value of k for which ksin(90/k) < 89, [ksin(90/k)] = 89 for all k > 347. Since f(k) = [ksin(90/k) − 89]/(k)2 + 89/(k)2 and [ksin(90/k) − 89] = 0  347 2  2 for all k > 347, ∑ 푘=1 f(k) = ∑ 푘=1[ksin(90/k) − 89]/(k) + ∑ 푘=189/(k) . Using the identity, the latter term  347 2 evaluates to 14⅚. Hence, ∑ 푘=1 f(k) = 14⅚ + ∑ 푘=1[ksin(90/k) − 89]/(k) .

The greatest integer function has no explicit inverse function. The horizontal line, y = k, intersects y = [x] at infinitely many points, (x, k), where k ≤ x < k + 1. For this reason, the transformation x[x] is a narrowing conversion. It follows that unless the fraction part of x is known, there is no way to solve for x in terms of k. Although the greatest integer function has no explicit inverse, it possesses certain properties that allow an explicit function to serve as its inverse in the event that its wafers and discontinuities are interchangeable. This explicit function is ultimately used to create an imaginary inverse function for y = [x]. In order to find this explicit function, we must first compare the wafers and discontinuities on the graphs of y = [x] and [y] = x at x = k and at y = k. As in Chapter 1, the wafers are in black and the jump discontinuities are in gray. We provide a complete graph of y = [x] below. As discussed in Chapter 1, its jump discontinuity at x = k extends over [k − 1, k) for y and its wafer at y = k extends over [k, k + 1) for x. We also provide a complete graph of [y] = x below. Since the inverse function merely switches x and y (by reflecting each point across the line y = x), its jump discontinuity at y = k extends over [k − 1, k) for x and its wafer at x = k extends over [k, k + 1) for y. Now, imagine that we interchange the wafers and the jump discontinuities of [y] = x. Since the vertical wafers of [y] = x contain all the values of y, with each value of y corresponding to one value of x, the equation generated by the interchange of the wafers and jump discontinuities of [y] = x is a function. There is more than one way to interchange the wafers and jump discontinuities of [y] = x. We do so by interchanging each point with an x-coordinate of k with each point with a y-coordinate of k on the graph [y] = x (by reflecting each point with one coordinate, k, across the line y = 2k − x). Let g(x) denote the function generated by reflecting each point with one

coordinate, k, across the line y = 2k − x. Then, the wafer on g(x) at y = k extends over (k − 1, k] for x and the jump discontinuity in g(x) at x = k extends over (k, k + 1] for y. Equivalently, its jump discontinuity at x = k extends over (k, k + 1] for y and its wafer at y = k + 1 extends over (k, k + 1] for x. So, g(x) = {x if xℤ and [x] + 1 otherwise}. Recall from Chapter 1 that ceiling(x) = {x if xℤ and [x] + 1 otherwise}. Therefore, g(x) = ceiling(x) = −[−x]. The motivation for reflecting each point on [y] = x with one coordinate, k, across the line y = 2k − x is so that the point (k, k) is reflected upon itself and so it remains on a wafer. Suppose we were to add the auxiliary vertical line, x = r, in the diagram below. If rℤ, then the line would intersect the graph of [y] = x on its discontinuity and at exactly one point, (r, [r] + 1). If rℤ, then the line would intersect the graph of [y] = x at infinitely many points, making it a little trickier for us to define g(k). Since y = [x] acts as an identity function for the set of integers, i.e. [k] = k, g(k) would be more like an inverse function if it also acts as an identity function for the set of integers. More importantly, since [[x]] = [x] for all x, we want g(k) = k so that g(g(x)) = g(x) for all x.

Since [x] has no real explicit inverse, we took a few bold steps to create an imaginary inverse. Between their wafers and jump discontinuities, the graphs of y = −[−x] and [y] = x have all the same points. In fact, their graphs would be indistinguishable had we not assigned different colors to the wafers and jump discontinuities. However, since −[−[x]] ≠ x, g(x) alone cannot serve as the inverse function. We define the imaginary inverse of the greatest integer function as follows. Let int−1(x) denote the imaginary inverse of [x] and let p(x) and q(y) be functions such that p(x) = [q(y)]. Then, int−1(p(x)) = −[−p(x)] and int−1([q(y)]) = q(y). This way, if q(y)ℤ, then int−1(p(x)) = q(y) and if q(y)ℤ, then p(x) < q(y) < p(x) + 1, meaning q(y) is on the interval of the jump discontinuity in the graph of y = −[−p(x)] at x. The imaginary inverse is obviously fictitious because it performs different operations on each side of the equation. Nevertheless, in the next few examples, we demonstrate that the imaginary inverse really works! The imaginary inverse allows us to solve for variables in simple equations involving the greatest integer function algebraically and find an equation which looks and behaves like the inverse with the exception that the wafers and discontinuities are interchanged. Example 5: Complete each part using the imaginary inverse of the greatest integer function. a. Solve y = 1 − [x]/6 for x and then use the imaginary inverse again to solve for y. b. Let 5 = [6x + ½] and 5½ = [6z + ½]. Write an inequality for x and solve for z. c. Let f(k) denote the probability that we must flip a coin k times to get heads. Determine how many times we must flip a coin to have at least a 93¾% chance of getting heads at least once and at least a 95% chance of getting heads at least once. Solution: a. In this case, we use the imaginary inverse to solve for x. Since y = 1 − [x]/6; 6 − 6y = [x]; int−1(6 − 6y) = int−1([x]); −[−(6 − 6y)] = −[6y − 6] = 6 − [6y] = x. Next, we solve x = −[−(6 − 6y)] for y. Since x = −[−(6 − 6y)]; −x = [−(6 − 6y)]; int−1(−x) = int−1([−(6 − 6y)]); −[−(−x)] = −(6 − 6y); [x] = 6 − 6y; and so y = 1 − [x]/6. b. Set y = 5 and then solve y = [6x + ½] for x. Since y = [6x + ½]; int−1(y) = int−1([6x + ½]); −[−y] = 6x + ½; and so (−[−y] − ½)/6 = x. At y = 5, (−[−y] − ½)/6 has a discontinuity because the ceiling function is not continuous at integers. Since the value of x can lie anywhere within the discontinuity, we seek its upper and lower bounds by evaluating −[−y] at values of y just to the left or right of the discontinuity, namely at limh0⁺ (5 − h) and limh0⁺ (5 + h). Since limh0⁺ (−[−(5 − h)] − ½)/6 = ¾ and limh0⁺ (−[−(5 + h)] − ½)/6 = 11/12, but 6 = [6(11/12) + ½], x cannot reach 11/12. Hence, 3/4 ≤ x < 11/12. In this case, the equation was simple enough for us to solve 5 = 6x + ½ and 6 = 6x + ½ for x and then determine the interval for x. However, in cases where the equation is a more complicated iteration of the greatest integer function (such as a sandwich-iteration), solving for the inequality piecemeal could

become tedious. Next, we solve 5½ = [6z + ½] for z. Although there is obviously no solution, the imaginary inverse provides the extraneous solution, z = 11/12. Since limh0⁺ [6(11/12 + h) + ½] = 6 and limh0⁺ [6(11/12 − h) + ½] = 5, the graphs of y = 5½ and y = [6z + ½] intersect at z = 11/12. So, the question, “For what value of z is 5½ = [6z + ½]?” has no answer, but the question, “At what values of z do the graphs of y = 5½ and y = [6z + ½] intersect (including intersection at discontinuities)?” can have a very relevant answer as we will see in part c. This bad example demonstrates that the imaginary inverse must be applied in the proper context. c. The pmf for getting heads for the first time on the kth flip is f(k) = 2−k and it’s cdf is F(x) = 1 − 2−[x]. Since x represents the number of flips and the value of F(x) is given in the exercise, we solve for x and evaluate the expression at F(x). Since F(x) = 1 − 2−[x]; −log(1 − F(x))/log(2) = [x]; int−1(−log(1 − F(x))/log(2)) = int−1([x]); and −[log(1 − F(x))/log(2)] = x. At F(x) = 0.9375, −[log(1 − F(x))/log(2)] = 4 and at F(x) = 0.95, −[log(1 − F(x))/log(2)] = 5.

The greatest integer function can also be used to study set spanning of integers and rationals. To start with a trivial example, the set, {[x]}xℝ, is the set of all integers. In 2001, an Israeli mathematician proved that if f(x) = 1/(1 + 2[x] − x), x1 = 0, and xn+1 = f(xn), then {xn}nℕ contains every rational number exactly once. In 2003, I determined that if g(n) = 1½ + ½[−(√2푛 + ¼ + ½)] − (n − 1)/[−(√2푛 + ¼ + ½)], then {g(n)}nℕ is the set of all rationals between 0 and 1. The most direct way to verify this is to make a table of consecutive values for g(n). Expect to find a very deliberate ordering of the rationals. Although infinitely many values of n generate each rational, g(n) generates every ordered pair of natural numbers, (k, m) with k/m and k < m, exactly once. This result illustrates that the set of rational numbers is countable because we can pair each element in the set, {g(n)}, with a natural number, n. These next two results are more famous examples of set spanning. The fraction part set, F = {nx − [nx]}nℕ, is dense over the interval, (0, 1), iff x is irrational. (Kronecker) By dense we mean that for every fraction part, f, there is no arbitrarily small  > 0 such that |nx − [nx] − f| >  for all nℕ. Proof: Assuming {nx − [nx]} is dense over (0, 1), it is obvious that xℚ for if x = k/m with k, mℤ then all elements of this set would be rationals with denominator m and hence they would all be either indistinct or spaced apart at intervals of 1/m. Assuming xℚ, there are no n, mℕ such that nx − [nx] = mx − [mx], for if there were, then x = ([mx] − [nx])/(m − n)ℚ. Since there are infinitely many distinct fraction parts within an interval of finite length, F must have elements such that 0 < (mx − [mx]) − (nx − [nx]) <  for all  > 0. But then (mx − [mx]) − (nx − [nx]) = (mx − [mx]) − (nx − [nx]) − [(mx − [mx]) − (nx − [nx])] = (m − n)x − [(m − n)x]F. So, F contains arbitrarily small elements. If 0 < k((m − n)x − [(m − n)x]) < 1, then k((m − n)x − [(m − n)x]) = k((m − n)x − [(m − n)x]) − [k((m − n)x − [(m − n)x])] = k(m − n)x − [k(m − n)x]F. And hence 0 < (k(m − n)x − [k(m − n)x]) − ((k − 1)(m − n)x − [(k − 1)(m − n)x]) <  for all  > 0. It follows that elements of F are distributed throughout every open subinterval of (0, 1) and so F is dense in (0, 1). In 1909 Hermann Weyl proved the even stronger result that the fraction part set is equidistributed in (0, 1) iff x is irrational. This means that for every open subinterval, (y, y')(0, 1), limn (#(nx − [nx])(y, y'))/n = y' − y. For example, out of the first 10,000 elements in the sequence, you would intuitively predict that roughly 100 elements are in the interval (0.76, 0.77). Since the proof involves analytical methods beyond the scope of this work, we omit it here, but it is given in Fourier Analysis An Introduction by Stein and Shakarchi; see appendix. Sequences of the form {[nx]}nℕ, where x is a positive irrational are known as Beatty sequences, named after Canadian mathematician Samuel Beatty, who wrote about them in 1926. They also have the following interesting property, first proved by Lord Rayleigh in 1894. The sequences {[nx]} and {[ny]} partition the positive integers for all x, y > 0 and nℕ iff x, yℚ and 1/x + 1/y = 1. In other words, {[nx], [ny]}nℕ, contains every natural number exactly once. We devote the next two paragraphs to the proof. First, we suppose {[nx]}⋂{[ny]} is empty and {[nx]}⋃{[ny]} is the set of positive integers and prove x, yℚ and 1/x + 1/y = 1. Certainly not both x, yℚ for if they were, say x = x1/x2 and y = y1/y2, nx = x2y1, and ny = x1y2, then nxx = x1y1 = nyy meaning both sets would share a common integer. In addition, no set can share a common integer with itself, so x, y > 1. Let’s count the number of integers in {[nx] ≤ k}. That would be the largest n such that [nx] ≤ k and equivalently, nx < k + 1. So, n < (k + 1)/x and n ≤ [(k + 1)/x]. Hence, [(k + 1)/x] is the number of integers n such that [nx] ≤ k unless (k + 1)/xℤ, which we later prove is not the case. So for now we replace [(k + 1)/x] with −[−(k + 1)/x] − 1 to eliminate that case. By hypothesis, the number of integers in {[ny] ≤ k} is

therefore k − (−[−(k + 1)/x] − 1) = [(k + 1) − (k + 1)/x] = [(k + 1)(1 − 1/x)] = [(k + 1)/(1 − 1/x)−1] = [(k + 1)/y], simultaneously proving 1/x + 1/y = 1 and x, yℚ. Next, we suppose x, yℚ and 1/x + 1/y = 1 and prove {[nx]}⋂{[ny]} is empty and {[nx]}⋃{[ny]} is the set of positive integers. We prove it using induction on k. For the base case, k = 1, we show that exactly one of [x] or [y] is one. If [x] = 1 and xℚ, then 1 < x < 2; hence 2 < (1 − 1/x)−1; and 2 ≤ [y]. If [x] = 0 and xℚ, then 0 < x < 1; 1 < 1/x; 1 − 1/x < 0; 1/y < 0 and hence y < 0, a contradiction. It follows that x, y > 1. If [x] ≥ 2 and xℚ, then working backwards, we find [y] = 1. For the inductive case, we assume that {[nx] < k}⋂{[ny] < k} is empty and {[nx] < k}⋃{[ny] < k} is the set of positive integers (up to k − 1) and prove the result holds up to k. We can do this by showing there is exactly one element in exactly one set such that [nx] = k or [ny] = k. Since x, y > 1 and x, yℚ, we demonstrated above that the number of integers in {[nx] ≤ k} is [(k + 1)/x]. It follows that [(k + 1)/x] + [(k + 1)/y] = [(k + 1)/x] + [(k + 1)/(1 − 1/x)−1] = [(k + 1)/x] + [(k + 1)(1 − 1/x)] = [(k + 1)/x] + [(k + 1) − (k + 1)/x] = [(k + 1)/x] + [−(k + 1)/x] + k + 1 = −1 + k + 1 = k. Since [(k + 1)/x] + [(k + 1)/y] = k, there is exactly one more element between both sets, and hence the sets are partitioned up to {[nx] ≤ k} and {[ny] ≤ k}. There is an interesting corollary illustrated by a special case of the Beatty sequence. First and second powers of  = (√5 + 1)/2 nested within iterations of the greatest integer function uniquely span the set of natural numbers above one iff [2] is at its core. For example, [2] = 2 and [2[[[2]]]] = 10. The proof follows by induction on k (above) and is left as an exercise.

The remainder of this section is devoted to redefining the greatest integer function. We started Section 1.1 by defining the greatest integer function in English, justifying its existence without proof, and assigning it the bracket notation. Here, we derive additional mathematical expressions for it. Expressing [x] mathematically is equivalent to defining truncation to the nearest integer mathematically. And since truncation requires the discrimination between integers and fraction parts, we would in effect be defining them both mathematically. Our concept of integers and integer functions would be transformed from an abstract idea to a tangible expression or equation that we can manipulate mathematically, thereby allowing us to work with them on a more solid foundation. Our approach is to find additional expressions for ((x)) first. We do so by exploiting its periodicity.

The late 18th and early 19th century French mathematician, Joseph Fourier, demonstrated how we can represent any well-defined periodic function as an infinite series of sines and cosines (in radians). We derive the  following result in the next three paragraphs: [x] = {x if xℤ and x − ½ + ∑ 푛=1sin(2nx)/(n) if xℤ}. If f(x) is any piecewise continuous function over the interval (−t, t) and f(x) = f(x + 2t), then we can express f(x) as a Fourier series expansion using Euler formulas. The Fourier series for f(x) has three components. The first 푡 component is called a0, which is an initial-valued constant equal to ∫−͏ 푡 f(x)/t dx. The second component is called 푡 an, which is the coefficient of the nth cosine term and an = ∫−͏ 푡 f(x)cos(nx/t)/t dx. The third component is called 푡 bn, which is the coefficient of the nth sine term and bn = ∫−͏ 푡 f(x)sin(nx/t)/t dx. The integrals for the an and bn  components are known as Euler formulas and the Fourier series for f(x) is a0/2 + ∑ 푛=1{ancos(nx/t) +  bnsin(nx/t)}. Note that this does not necessarily mean that a0/2 + ∑ 푛=1{ancos(nx/t) + bnsin(nx/t)} converges to f(x) for all x. We elaborate on the matter of convergence below. In order to express ((x)) as a Fourier series, we must first determine the interval and a piecewise expression for ((x)) over this interval. Since ((x)) has a period of one unit, it must be periodic over (−½, ½), so we set t = ½. Since ((x)) = ((x + k)), ((−k/2)) = ((k/2)). So, we can set t = k/2 for k > 1 and arrive at the same result; however, that would make the computations more involved than they have to be. Since we cannot easily integrate functions involving [x] using the bracket notation, we seek an alternative expression for ((x)) over the interval (−½, ½). Since [x] = {−1 for −½ ≤ x < 0 and 0 for 0 < x ≤ ½}, ((x)) = {x − ½ + 1 for −½ ≤ x < 0 and x − ½ for 0 < x ≤ ½}. At this point, we have what we need to find the Fourier series for ((x)) using Euler formulas. ½ 0 2 ½ 0 a0 = ∫−͏ ½2(x − ½)dx + ∫−͏ ½2dx = (x − x)|−͏ ½ + 2x|−͏ ½ = 0; ½ 0 ½ an = ∫−͏ ½2(x − ½)cos(2nx)dx + ∫−͏ ½2cos(2nx)dx = {2xsin(2nx) + cos(2nx)/(n) − sin(2nx)}/(2n)|−͏ ½ + 0 n n sin(2nx)/(n)|−͏ ½ = {(−1) /(n) − (−1) /(n) + 0 − 0}/(2n) = 0 for all nℕ; and ½ 0 ½ bn = ∫−͏ ½2(x − ½)sin(2nx)dx + ∫−͏ ½2sin(2nx)dx = {sin(2nx)/(n) − 2xcos(2nx) + cos(2nx)}/(2n)|−͏ ½ − 0 n n n n n cos(2nx)/(n)|−͏ ½ = {(−(−1) + (−1) ) − ((−1) + (−1) ) − 2(1 − (−1) )}/(2n) = −1/(n) for all nℕ.

 Therefore, the Fourier series for ((x)) is −∑ 푛=1sin(2nx)/(n) and hence our Fourier representation for [x] is x − ½  + ∑ 푛=1sin(2nx)/(n).  For all real x, ((x)) = −∑ 푛=1sin(2nx)/(n). Proof: If xℤ, then i. ((x)) = 0 by definition and ii. sin(2nx) =  0 for all nℕ and hence −∑ 푛=1sin(2nx)/(n) = 0. If xℤ, then ((x)) = x − [x] − ½ by definition and x − [x] − ½ is differentiable at every x over the interval (k, k + 1). Let f be a Riemann-integrable function that is differentiable at a specific point, x. Then, the Fourier series for f evaluated at x converges to f(x). Since the proof involves analytical methods beyond the scope of this work, we omit it here, but it is given in Fourier Analysis An  Introduction by Stein and Shakarchi; see appendix. Therefore, ((x)) = −∑ 푛=1sin(2nx)/(n) for all real x. So, if   xℤ, then x − ½ + ∑ 푛=1sin(2nx)/(n) = [x] and if xℤ, then x − ½ + ∑ 푛=1sin(2nx)/(n) = [x] − ½. The above theorem provides a criterion for determining convergence of Fourier series that we can generalize. First, the Fourier series for f evaluated at x cannot be expected to converge to f(x) at its discontinuities. For example, ((x)) and x − [x] − ½ have the same Fourier series even though their values differ at integers. Second, since sin(nx/t) and cos(nx/t) are uniformly continuous functions of x for all n  ℕ and all t > 0, every finite linear combination of them must also be uniformly continuous. It follows that every partial sum of the Fourier series for f(x) is uniformly continuous. Since every partial sum of the Fourier series for f(x) is uniformly continuous, the Fourier series for f(x) must converge to the midpoint of any discontinuities in f(x). The Fourier series for ((x)) is accurate after summing its first few terms. Since the sine function oscillates between −1 and 1, its Fourier series is an alternating harmonic series of the first degree, meaning the rate at which the series converges is similar to that of the Leibniz series discussed in the next chapter, which is slow but steady. In fact, after adding the fifth sine term to the series, its graph becomes a rough approximation for ((x)) (see below). Example 6: Let l(t) denote the retinal luminance in trolands of a flickering red light-emitting diode in a circuit over time where t is in seconds. Suppose that at time t = 0, the luminance is 300 trolands. Every 5 seconds, the luminance increases linearly from 300 trolands to 500 trolands and then decreases to 300 trolands in an instant. Then, l(t) = 40(t − 5[t/5]) + 300 and its Fourier series representation is l(t) = 400 −  200∑ 푛=1sin(2nt/5)/(n). To demonstrate the efficacy of the Fourier representation for l(t), we compare l(t) to partial sums of its Fourier representation below. We display a graph of l(t) (left), the first 5 trigonometric terms in the series (center) and the first 20 trigonometric terms in the series (right). Notice that l20(t) slightly overshoots the 300 to 500 range. This is known as the Gibbs phenomenon.

The main advantage to expressing a piecewise-continuous function as a trigonometric series is that each term in the series is infinitely differentiable and integrable. The applications for differentiating Fourier representations of terms with the greatest integer function are few because the derivative of their Fourier representation does not converge. On the other hand, the applications for integrating Fourier representations of terms with the greatest integer function are far reaching because the indefinite integral of their Fourier representation not only converges, but it converges faster than the Fourier representation for [x]. In addition, it is often difficult if not impossible to determine the antiderivative of a composite function involving the greatest integer function without using series  integration. For integrating functions of the form [f(x)], you can use the identity ∫∑ 푛=1 f ′(x)sin(2nf(x))/(n)dx = ₁ 2 c − ∑  cos(2nf(x))/(n) . If you want to integrate functions of the form f([x]) to find a Fourier representation ² 푛=1 for the cumulative sum of f(k) or extend its infinite series representation to the complex plane, consider using the 푛 푛  푛 identity, ∑ 푘=푚 f(k) = ∫푚͏ f(x)dx − ∑ 푛=1{∫푚͏ f ′(x)sin(2nx)/(n)}dx + (f(m) + f(n))/2. This is a special case of summation by parts, which we introduce in Chapter 2. In Chapter 6, we use successive antiderivatives of the sawtooth wave function to derive identities between the greatest integer function and a class of Clausen functions that we use for evaluating a large subset of Fourier series either exactly or with very high precision. We devote the next example to developing theory for the latter objective.

Example 7: We have two representations for the sawtooth wave function: as x − [x] − ½ for xℤ and as a sine series. Evaluate successive antiderivatives for both expressions of the sawtooth wave function and equate the two. After evaluating each antiderivative, choose the constant, c, that yields a pure sine or cosine series. Solution: Let fk(x) denote the kth antiderivative of the periodic function y = 1 over the interval (0, 1) such that  k  k fk(x) = 2∑ 푛=1sin(2nx)/(2n) for odd k and fk(x) = 2∑ 푛=1cos(2nx)/(2n) for even k. And let ck denote the integration constant for fk(x). Then, f0(x) = 1 and f1(x) = x + c1. We set c1 = −½ because the sawtooth wave function  is x − ½ over (0, 1) and it has the pure sine series f1(x) = −2∑ 푛=1sin(2nx)/(2n). When determining fk(x) for k > 1 over (0, 1), we can remove all [x]-terms because [x] = 0 over (0, 1) and we can reinsert them at the end by replacing each x with x − [x] because fk(x) has a period of one unit. For k > 1, we solve for ck by evaluating the k series for fk(0). The reader may verify that for k > 1, ck = −2cos(k/2)(k)/(2) where (k) is the Riemann zeta function. Some important values for (k) up to k = 6 are presented in Table 1 below. And the expressions for fk(x) up to k = 6 are presented in Table 2 below. To better understand what we did, we encourage the reader to follow the steps above to generate the results in Table 2 up to k = 3. The reader may notice that each fk(x) is proportional to the corresponding periodic Bernoulli polynomial, Bk(x − [x]), although we do not use this result.

 −2  −3  −4  −5  −6 Table 1 (2) = ∑ 푛=1n (3) = ∑ 푛=1n (4) = ∑ 푛=1n (5) = ∑ 푛=1n (6) = ∑ 푛=1n 1.644934066848 1.202056903159 1.082323233711 1.036927755143 1.017343061984 2/6 4/90 6/945

Table 2 Polynomial Representation of fk(x) Series Representation of fk(x) ₁ 1 f1(x) x − [x] − , xℤ −2∑  sin(2nx)/(2n) ² 푛=1 ₁ 2 ₁ ₁ 2 f2(x) (x − [x]) − (x − [x]) +  ² ² ¹² 2∑ 푛=1cos(2nx)/(2n) ₁ 3 ₁ 2 ₁ 3 f3(x) (x − [x]) − (x − [x]) + (x − [x])  ⁶ ⁴ ¹² 2∑ 푛=1sin(2nx)/(2n) ₁ 4 ₁ 3 ₁ 2 ₁ 4 f4(x) (x − [x]) − (x − [x]) + (x − [x]) −  ²⁴ ¹² ²⁴ ⁷²⁰ −2∑ 푛=1cos(2nx)/(2n) ₁ 5 ₁ 4 ₁ 3 ₁ 5 f5(x) (x − [x]) − (x − [x]) + (x − [x]) − (x − [x])  ¹²⁰ ⁴⁸ ⁷² ⁷²⁰ −2∑ 푛=1sin(2nx)/(2n) ₁ 6 ₁ 5 ₁ 4 ₁ 2 ₁ 6 f6(x) (x − [x]) − (x − [x]) + (x − [x]) − (x − [x]) +  ⁷²⁰ ²⁴⁰ ²⁸⁸ ¹⁴⁴⁰ ³⁰²⁴⁰ 2∑ 푛=1cos(2nx)/(2n)

While a Fourier representation is certainly a useful alternative to the bracket-notation, it is not a closed-form expression for the greatest integer function. In January of 2001, I experimented with the tangent and Arctangent functions and came up with a remarkably short closed-form expression for [x] in terms of elementary functions, though I doubt I was the first to do so. I discovered that [x] = {x if xℤ and x − tan−1(tan((x − ½)))/ − ½ if xℤ}. The next two paragraphs are devoted to deriving this result. The key element of the derivation is that the Arctangent function is not exactly the inverse of the tangent function from − to +. While the tangent function has a period of , meaning tan(x) = tan(x − k), the Arctangent function, tan−1(x), does not have a period. Rather, it is a one-to-one monotone function whose domain extends from − to + and range extends from −/2 to /2. This is because the Arctangent function only takes the inverse of a specific area of the tangent function, namely over the interval (−/2, /2). Not only does Arctangent undo the operation of tangent, it sets every value of x plugged into tan(x) back into the interval above. So, the output from tan−1(x) is the value on the interval (−/2, /2), such that the tangent of this value is x. This means tan(tan−1(x)) = x, but not the other way around. For example, both tan(7) and tan(0.7168) approximate 0.8714. Since 0.7168 is on the interval (−/2, /2), but 7 is not, tan−1(tan(7)) ≠ 7; instead, tan−1(tan(7)) = 7 − 2 ≈ 0.7168. Since the interval (−/2, /2) has length , tan−1(tan(x)) = x − k such that −/2 < x − k < /2 for all x ≠ (k + ½). Solving this inequality for k yields 0 < x + /2 − k < ; 0 < x/ + ½ − k < 1; k < x/ + ½ < k + 1; and hence k = [x/ + ½]. So, if x ≠ (k + ½), then tan−1(tan(x)) = x − [x/ + ½]. At this point, we see similarities between tan−1(tan(x)) and ((x)) for all xℤ. By making a series of adjustments, we can express ((x)) in terms of the Arctangent and tangent functions for all xℤ. Since tan−1(tan(x)) = x − [x/ + ½], tan−1(tan(x)) = x − [x + ½]; tan−1(tan((x − ½))) = (x − ½) − [x]; and tan−1(tan((x − ½)))/ = (x − ½) − [x] = ((x)) for all xℤ. Therefore, if xℤ, then x = k + f where k = x − tan−1(tan((x − ½)))/ − ½ and f = tan−1(tan((x − ½)))/ + ½.

CHAPTER 2: More Properties of the Greatest Integer Function

While both this chapter and the previous chapter go into great detail analyzing the greatest integer function, this chapter makes the transition from purely studying the greatest integer function to studying its relationship to broad mathematical constructs in number theory and real analysis. This chapter discusses some important arithmetic (number-theoretic) properties of the greatest integer function, important arithmetic properties of finite summations involving the greatest integer function, and their relationship to other arithmetic (number-theoretic) functions. It concludes with techniques for evaluating summations involving the greatest integer function and important theorems concerning the relationship between summation and integration.

Section 2.1: Elementary Arithmetic Functions

The greatest integer function has some important applications in number theory. In fact, quite a few famous arithmetic functions (functions with domain restricted to ℕ and range ℂ) can be expressed explicitly as a finite sum involving the greatest integer function. Ironically, many of these profound results stem from a very elementary, yet very important theorem. In Section 1.1, we proved that the number of positive integer multiples of j that are not greater than n is [n/j]. The next several theorems stem from this very important relation. Before continuing, we introduce some important notation. If n, mℤ and n/m = k, then we say both m and k are of n or both m and k divide n. We denote integer division with “∣” and non-integer division with “∤”. For example, 2 ∣ 6 and 3 ∣ 6, but 4 ∤ 6 and 5 ∤ 6. A natural number, p, is a prime iff its only divisors are 1 and p.

e [log(푛)/log(푝)] i If p is a prime and nℕ, then the greatest integer exponent, e, such that p ∣ n! is e = ∑ 푖=1 [n/p ]. (While we call this de Polignac’s formula, some sources attribute this result to Legendre.) Proof: Although seemingly trivial, we must first prove the existence of e. Since p0 ∣ n!, there is at least one non-negative integer power of p that divides n!. If i > log(n!)/log(p), then pi > n! and then pi ∤ n!. It follows that if pi ∣ n!, then i belongs to a non-empty set of non-negative integers bounded from above and hence by the Greatest integer axiom, there is a greatest element in this set, which we call e. Now we can proceed with the proof. Since n! is the product of all integers 1 through n, p divides n! exactly once for every positive multiple of p not greater than n, kp, such that p ∤ k. Similarly, p divides n! exactly i times for every positive multiple of pi not greater than n, kpi, such that p ∤ k. Moreover, the number of positive multiples of pi that are not multiples of pi+1 and are not greater than n is [n/pi] − [n/pi+1]. Since p is a prime, p has no factors other than 1 and p, meaning no product of non-integer multiples of p can be an integer multiple of p. Therefore, e is the sum of the products of the number of times p divides n! for each power of p not greater than n and the number of positive multiples of each power of p that are not multiples of the next power of p that divides n! and are not greater than n. Simply put, e = 1([n/p] − [n/p2]) + 2([n/p2] − [n/p3]) + ... + (i − 1)([n/pi−1] − [n/pi]) + i([n/pi] − [n/pi+1]) + ...; e = 1[n/p] − 1[n/p2] + 2[n/p2] − 2[n/p3] + ... + (i − 1)[n/pi−1] − (i − 1)[n/pi]) + i[n/pi] − i[n/pi+1]) + ...; e = 1[n/p] + (−1[n/p2] + 2[n/p2]) + ... + (−(i − 1)[n/pi] + i[n/pi]) + ...; i  i e = [n/p] + ... + [n/p ] + ... = ∑ 푖=1[n/p ]. Last, we prove that the upper bound of the summation need not exceed [log(n)/log(p)]. Since plog(n)/log(p) = n, p[log(n)/log(p)] is the greatest integer power of p not greater than n. Since p[log(n)/log(p)]+1 > n, 0 < n/p[log(n)/log(p)]+1 < 1 and hence [n/p[log(n)/log(p)]+1] = 0. So, all the terms with values of i exceeding log(n)/log(p) do not contribute to the [log(푛)/log(푝)] i sum. Therefore, e = ∑ 푖=1 [n/p ] for all primes, p, and nℕ. Q.E.D.  i De Polignac's formula has some interesting properties. First, using the identities n/(p − 1) = ∑ 푖=1n/p and  i  i ∑ 푖=1n/p > ∑ 푖=1[n/p ], we can conclude that e < n/(p − 1) for all p and nℕ. Second, De Polignac's formula is also an interesting application of nested divisions involving the greatest integer function. We can compute the i+1st term in the sum recursively by dividing the ith term by p and taking the greatest integer. That is, [[n/pi]/p] = [n/pi+1]. Third, De Polignac's formula is an algorithm with polynomial-time complexity. Formally, this means that the number of operations it takes you (or a computer) to evaluate De Polignac's formula can be expressed roughly as a finite polynomial of the number of digits of n. If a computer can evaluate [n/pi] in linear time (using long division) for each of the [log(n)/log(p)] terms in the sum, then De Polignac's formula is an algorithm with quadratic-time complexity with respect to the number of digits of n. For the most part, the concept of

polynomial-time complexity is not a big topic in this work. Throughout this work, we use it to mean that we can evaluate certain expressions with large arguments or within a specified level of precision within a reasonable amount of computing time. Example 1: Determine how many times each of the following numbers divide 50! a. 7 b. 15 c. 27 Solution: a. The greatest integer, e, such that 7e ∣ 50! is [50/71] + [50/72] = 8. b. Since 15 is a composite, we rewrite 15 in prime-factored form, which is 3151. The greatest integer, e for which 3e ∣ 50! is [50/31] + [50/32] + [50/33] = 22. Similarly, the greatest integer, e for which 5e ∣ 50! is [50/51] + [50/52] = 12. So, the greatest e for e e which both 3 ∣ 50! and 5 ∣ 50! is the smaller of the two, which is 12. Let pa and pb be primes such that pa < pb. i i Then, [n/pa ] ≥ [n/pb ]. For this reason, if m is a composite of unique prime factors, p1,...,pi,...,pk, (meaning pi = pj iff i = j) which are ordered from smallest to largest, then the greatest e for which me ∣ n! is the greatest e for which e e pk ∣ n!. In other words, since 5 is the largest unique prime factor of 15, the greatest e for which 15 ∣ 50! is the greatest e for which 5e ∣ 50!. c. Since 27 = 33 and the greatest e for which 3e ∣ 50! is 22, the greatest e for which 27e ∣ 50! is [22/3] = 7 because 33 must divide 50! an integer number of times. Hence, the greatest e for which (pk)e ∣ n! [log(푛)/log(푝)] i is [(∑ 푖=1 [n/p ])/k]. 푛 The binomial coefficient, pronounced ‘n choose k’ and denoted as ( 푘 ), is defined as the number of ways to choose k elements without replacement from a set of n elements for all non-negative integers n and k. If n ≥ k, then 푛 푛 ( 푘 ) = n!/(k!(n − k)!). If n < k, then ( 푘 ) = 0. 푛 Example 2: Using de Polignac's formula, prove ( 푘 )ℕ for all non-negative integers, n and k, where n ≥ k. Solution: Since n ≥ k and n!ℕ, n!/(k!(n − k)!) > 0 and n!/(k!(n − k)!)ℚ. It follows from the 푛 e1 ei ek fundamental theorem of arithmetic that ( 푘 ) = p1 ...pi ...pk , where each prime, p, is distinct (meaning pi = pj iff i = j) and each e is an integer, some of which may be negative. If we can show that for an arbitrary prime, e 푛 푛 p, the greatest integer exponent, e, such that p ∣ ( 푘 ) is non-negative, then we can conclude ( 푘 )ℕ. Let en be the en ek greatest integer exponent such that p ∣ n!, ek be the greatest integer exponent such that p ∣ k!, and en−k be the en−k greatest integer exponent such that p ∣ (n − k)!. Then, e = en − ek − en−k. Using de Polignac's formula, we can set  i i i i i i e = ∑ 푖=1([n/p ] − [k/p ] − [(n − k)/p ]). If we can show that for an arbitrary term, i, [n/p ] − [k/p ] − [(n − k)/p ] ≥ 0, then we can conclude that the sum, e ≥ 0. Well, [n/pi] − [k/pi] − [(n − k)/pi] = [(k + (n − k))/pi] − [k/pi] − [(n − k)/pi]. Let k/pi = x and (n − k)/pi = y. Then, [(k + (n − k))/pi] − [k/pi] − [(n − k)/pi] = [x + y] − [x] − [y]. Since [x + y] ≥ [x] + [y], [n/pi] − [k/pi] − [(n − k)/pi] ≥ 0 for all i. This means that for an arbitrary prime, p, the greatest e 푛 푛 integer exponent, e, such that p ∣ ( 푘 ) is non-negative. Therefore, ( 푘 )ℕ for all non-negative integers n and k. 푛 Although we could have given a simple combinatorial proof that ( 푘 )ℕ, we used de Polignac's formula to prepare the reader for more complicated proofs involving divisibility properties of binomial coefficients.

Let I(i|n) denote the indicator for i, nℕ. We define I(i|n) = {1 if i ∣ n and 0 if i ∤ n}. For example, I(7|77) = 1 and I(8|77) = 0. I invented the indicator divisor function so that I could restate some very important arithmetic expressions using conventional mathematical notation. Many number theory texts introduce 푛 two kinds of finite summation notation for 1 ≤ i ≤ n. When summing g(i) uniformly for iℕ, we write ∑ 푖=1g(i). When summing g(i) only for divisors, i, of n, we write ∑i∣ng(i). With the aid of the indicator divisor function, we can sometimes convert finite summation expressions from one kind to the other. 푛 More precisely, if nℕ and g(i) is defined for all iℕ such that 1 ≤ i ≤ n, then ∑i∣ng(i) = ∑ 푖=1I(i|n)g(i). Proof: This property of I(i|n) is true by definition. Obviously, ∑i∣ng(i) = ∑i∣n1g(i) + ∑i∤n0g(i). Since I(i|n) is 1 for  all i ∣ n and 0 otherwise, ∑i∣n1g(i) + ∑i∤n0g(i) = ∑ 푖=1I(i|n)g(i). The upper bound of the summation need not exceed n because for all i > n, 0 < n/i < 1, meaning i ∤ n and hence I(i|n) = 0 for all i > n. Therefore, ∑i∣ng(i) = 푛 ∑ 푖=1I(i|n)g(i). Notice that a uniform finite summation from i = 1 to n can be converted to a summation of divisors, i, of n iff I(i|n) can be factored out of the expression. Otherwise, non-zero terms of the sum are divided by zero. We can express some important arithmetic functions in terms of the indicator divisor function. Let (n), pronounced tou of n, represent the number of factors (positive divisors) of any nℕ. For example, (10) = 4 because 10 has 4 factors, namely 1, 2, 5, and 10. Let (n), pronounced sigma of n, represent the sum of all the factors of any nℕ. For example, (10) = 18 because the sum of the factors of 10 is 1 + 2 + 5 + 10 = 18. Since 푛 (n) represents the number of positive divisors, i, of n, (n) = ∑i∣n1 = ∑ 푖=1I(i|n). And since (n) represents the sum 푛 of all positive divisors, i, of n, (n) = ∑i∣ni = ∑ 푖=1I(i|n)i. Sometimes the tau and sigma functions are referred to as

divisor functions of order 0 and 1 respectively, though not in this work. One closed-form expression for I(i|n) is [n/i] − [(n − 1)/i]. Proof: Let [n/i] = k. First, suppose i ∣ n. If i = 1, then i ∣ n and [n/1] − [(n − 1)/1] = 1. If i > 1, then n/i = [n/i] = k and [(n − 1)/i] = [n/i − 1/i] = [k − 1/i] = [(k − 1) + (1 − 1/i)]. Since i > 1, 0 < 1/i, 1 − 1/i < 1. So, [(k − 1) + (1 − 1/i)] = k − 1. Therefore, if i ∣ n, then [n/i] − [(n − 1)/i] = 1. Next, suppose i ∤ n. Since i, nℕ, n/i = k + f for some fℚ. More precisely, f = j/i for some jℕ. Then, 1 ≤ j ≤ i − 1 by the division algorithm and so (n − 1)/i = n/i − 1/i = k + f − 1/i = k + (j − 1)/i. Since 1 ≤ j ≤ i − 1, 0 ≤ (j − 1)/i ≤ (i − 2)/i < 1. So, [(n − 1)/i] = [k + (j − 1)/i] = k. Therefore, if i ∤ n, then [n/i] − [(n − 1)/i] = 0. Therefore, I(i|n) = [n/i] − [(n − 1)/i] for all i, nℕ. 푘 Example 3: Express ∑ 푛=1(n) as a single finite uniform summation from n = 1 to k of a closed-form expression, meaning it can be expressed explicitly in terms of elementary functions without summation notation. 푘 푘 푛 Solution: Obviously, ∑ 푛=1(n) = ∑ 푛=1∑ 푖=1I(i|n). In order to proceed, we must answer two questions about 푘 푛 the double sum of I(i|n). First, are the indices in the double sum, ∑ 푛=1∑ 푖=1I(i|n), interchangeable? Second, if the 푘 indices are interchangeable, can ∑ 푛=1I(i|n) be simplified? To answer the first question, we can show that the indices are interchangeable if we can show that they are disjoint, or independent. At first glance, it appears that the indices may not be disjoint because n is both the upper bound of the inner sum and the index of the outer sum. 푘 푘  Then again, since I(i|n) = 0 for all i > n, ∑ 푛=1(n) = ∑ 푛=1∑ 푖=1I(i|n) and hence the indices are disjoint. Since 푘   푘 addition is commutative, ∑ 푛=1∑ 푖=1I(i|n) = ∑ 푖=1∑ 푛=1I(i|n). Since k is the upper bound of the sum with respect to n, 푘 푘 푘 k is the largest possible value of n. Hence, I(i|n) = 0 for all i > k, meaning ∑ 푛=1(n) = ∑ 푖=1∑ 푛=1I(i|n). To answer 푘 푘 푘 푘−1 푘−1 the second question, ∑ 푛=1I(i|n) = {∑ 푛=1[n/i]} − {∑ 푛=1[(n − 1)/i]} = {[k/i] − [0/i] + ∑ 푛=0[n/i]} − {∑ 푛=0[n/i]} = 푛 [k/i]. Using this powerful result, we can sum both sides of any equation of the form f(n) = ∑ 푖=1I(i|n)g(i) with 푘 푘 푛 푘 푘 푘 respect to n and get ∑ 푛=1 f(n) = ∑ 푛=1∑ 푖=1I(i|n)g(i) = ∑ 푖=1g(i)[k/i]. So, ∑ 푛=1(n) = ∑ 푖=1[k/i]. Last, we can change 푘 푘 our index from i to n so that the indices match. Therefore, ∑ 푛=1(n) = ∑ 푛=1[k/n].

The greatest common divisor (gcd) of any two natural numbers is the largest natural number that divides them both. Any two natural numbers with no common prime factors have a gcd of one and they are said to be relatively prime or coprime. The gcd of a and c is generally denoted as gcd(a, c), (a, c), or d. Although seemingly trivial, we must first prove the existence of the gcd. Since a, cℕ, they have at least a common divisor of 1. Further, since the greatest factors of a and c are themselves, gcd(a, c) ≤ min(a, c). It follows that the set of positive common divisors of a and c is a non-empty set of natural numbers bounded from above and hence by the Greatest integer axiom, there is a greatest element in this set. We can better describe the concept of the gcd using the fundamental theorem of arithmetic. It states that a and ea1 eai eak c have a unique expression as a product of powers of primes, a = p1 ...pi ...pk and c = ec1 eci eck p1 ...pi ...pk , where each prime, p, is distinct, meaning pi = pj iff i = j, and each e is a non-negative min(eai,eci) integer. Then, each pi is the greatest integer power of pi that divides both a and c. Since all common divisors of a and c are products of primes and powers of primes that divide a and c, the product, min(ea1,ec1) min(eai,eci) min(eak,eck) p1 ...pi ...pk , is unique and is divisible by every common divisor of a and c. And since every term in the product above is a natural number, the product must be the gcd(a, c). Now that we explained what the gcd is in English, we study its properties to derive mathematical expressions for it. If the prime factorization of a or c is known, then we can express their gcd in closed form in terms of the indicator divisor function. Proof: Let a be fixed and its prime factorization known. First, suppose a = p where p is a prime. Then, gcd(a, c) = {p if p ∣ c and 1 if p ∤ c} = (p − 1)I(p|c) + 1. Next, suppose a = pk. Then, gcd(a, c) = {pe e k e 푘 i−1 i where e is the greatest integer exponent such that p ∣ p and p ∣ c} = (p − 1)(∑ 푖=1p I(p |c)) + 1. This result 푒 i−1 e follows from the identity ∑ 푖=1p = (p − 1)/(p − 1). Finally, suppose a is the product of powers of distinct primes ea1 eai eak ea1 eai eak so that a = p1 ...pi ...pk . Then, gcd(a, c) = gcd(p1 , c)... gcd(pi , c) ... gcd(pk , c). This result follows from the fundamental theorem of arithmetic and the proof is complete. It also follows that gcd(a, c) is a multiplicative function with a fixed because gcd(a, 1) = 1 and if (i, j) = 1, then gcd(a, ij) = gcd(a, i)gcd(a, j). For all a, c, nℕ, (c/d) ∣ n iff c ∣ an where d = gcd(a, c). Proof: Let I denote the set of all natural numbers, n, such that c ∣ an. First we assume (c/d) ∣ n. Then, a(n)/c = a(kc/d)/c = (a/d)kℤ and kc/dI. Next, we assume c ∣ an and prove (c/d) ∣ n. Since c ∣ an and c ∣ akc/d, c ∣ a(n − kc/d) and n − kc/dI. If k = [n/(c/d)], then 0 ≤ n − kc/d < c/d by the division algorithm. It follows that if (c/d) ∤ n, then there is an element of I less than c/d.

Since I is a subset of the natural numbers, I necessarily has a smallest, or first element, i, not exceeding c/d. Since c ∣ ac/d and c ∣ ai, c ∣ a(c/d − ik). If k = [(c/d)/i], then 0 ≤ c/d − ik < i by the division algorithm. We cannot have c/d − ik > 0 because that contradicts i being the smallest element of I. So, c/d = ik and hence c/(dk) = i and dk ∣ c. Since c ∣ ai, ck ∣ a(ik), ck ∣ a(c/d), k ∣ (a/d), and dk ∣ a. Since dk ∣ a, dk ∣ c, gcd(a, c) = d, and k is positive (because i ≤ c/d, [(c/d)/i] ≥ 1), k must be 1. Hence c/d = i. It follows from this result and the division algorithm that all elements of I must be integer multiples of c/d. Q.E.D. Two important corollaries follow. First, this result is a generalization of Euclid’s lemma, which states that if p is a prime, a, nℤ, and p ∣ an, then p ∣ a or p ∣ n (and more importantly that if p ∤ a, then p ∣ n). The generalization is that if c ∣ an, then whatever part of the prime factorization of c that does not divide a must divide n. We proved that if c ∣ an, d ∣ a, and d ∣ c, then (c/d) ∣ n. Second, we can use this result to derive an explicit expression for the gcd(a, c) by counting the number of elements in I between 1 and c in two ways. First, it follows from the theorem that this is the number of positive integer multiples of c/d that are not greater than c and this quantity is [c/(c/d)] = d. Second, since I(c ∣ an) = {1 if c ∣ an and 0 otherwise}, I(c|an) = 1 if nI and I(c|an) = 0 otherwise. It follows that the number of elements in I 푐 푐 between 1 and c is gcd(a, c) = ∑ 푛=1I(c|an) = ∑ 푛=1([an/c] − [(an − 1)/c]). We can derive an algorithm for 푐 푎 푎 computing the gcd from this expression. Since ∑ 푛=1I(c|an) = (a, c) = (c, a) = ∑ 푛=1I(a|cn) and ∑ 푛=1I(a|cn) = 푎 ∑ 푛=1I(a|(c − ak)n) = (a, c − ak), we can alternately switch the arguments in the gcd and reduce them using the division algorithm. The objective of this subsection is only to introduce the gcd and we are just getting started with it. It is indispensable to number theory and it is applied in the vast majority of number theory topics throughout this work. Section 4.1 is devoted entirely to the study of greatest common divisors and continued fractions. There, we resume discussion of computation, derive the Euclidean algorithm for evaluating the gcd, and use it for evaluating the gcd of any two positive integers, Gaussian integers, and finally Eisenstein integers.

Section 2.2: Congruence Relations

The congruence relation is the generalization of the mod function defined in Chapter 1. While the set of integers is infinite and every integer is unique, in modulo m only the set of integers 0 through m − 1 are distinct. If x', xℤ, mℕ, and m ∣ (x' − x), then x' is said to be congruent to x in modulo m, which we denote as x' ≡ x mod m from this point forward. The following properties of congruences follow immediately from their definition. 1) There is some k for which x' = x + km iff x' ≡ x mod m. 2) If x' ≡ x mod m, then x' ≡ x + km mod m and x' ≡ x' + km mod m. 3) If x' ≡ x mod m, then x ≡ x' mod m because if x' = x + km, then x = x' + (−k)m. 4) If x ≡ x' mod m and x' ≡ x" mod m, then by transitivity, x ≡ x" mod m. Notice the similarity between a congruence relation and the arguments of the mod function. The mod function returns the principal value of the argument in a congruence relation, which is the smallest non-negative value of x' + km. The next result demonstrates that congruent integers have the same remainder, meaning the congruence relation and mod function can often be used interchangeably. If x' ≡ x mod m, then x' − m[x'/m] = x − m[x/m]. Proof: By definition, x' = x + km for some k. So, x' − m[x'/m] = (x + km) − m[(x + km)/m] = x + km − m[x/m + k] = x − m[x/m]. With a little algebra, we can also show that x' ≡ x mod m iff x'/m − x/m = [x'/m] − [x/m]. If x' ≡ x mod m and y' ≡ y mod m, then i. x' + y' ≡ x + y mod m; ii. x' − y' ≡ x − y mod m; iii. x'y' ≡ xy mod m. Proof: i. By definition, x' = k1m + x and y' = k2m + y for some integers k1 and k2. So, x' + y' = (k1 + k2)m + x + y = km + (x + y) for some k. ii. Similarly, x' − y' = (k1 − k2)m + x − y = km + (x − y) for some k. iii. And x'y' = (k1m + 2 x)(k2m + y) = k1k2m + k1my + k2mx + xy = (k1k2m + k1y + k2x)m + xy = km + xy for some k. It follows that if x' ≡ y' mod m, then x'y' ≡ x2 mod m. Since multiplication is associative, it follows by induction that (x')n ≡ xn mod m for all nℕ. General division is prohibited in because the quotient of two integers is not always an integer. However, integer division is allowed. Our next theorem is about integer division where x\n denotes [x/n]. i. If x' ≡ x mod nm, then x'\n ≡ x\n mod m. ii. x\n mod m = ((x mod nm) − (x mod n))/n. Proof: i. Since x' = x + knm, [x'/n] = [(x + knm)/n] = [x/n + km] = [x/n] + km. ii. We can evaluate x\n mod m using the nesting property of the greatest integer function. That is, x\n mod m = [x/n] mod m = [x/n] − m[[x/n]/m] = [x/n] − m[x/(nm)] =

(n[x/n] − nm[x/(nm)])/n = (n[x/n] − x + x − nm[x/(nm)])/n = (−(x − n[x/n]) + (x − nm[x/(nm)]))/n = ((x mod nm) − (x mod n))/n. Congruences involving integer division arise in the study of binomial coefficients in prime moduli, which we encounter in Chapter 3. Example 4: Evaluate 1018 − 21[1018/21]. Solution: 1018 − 21[1018/21] = 1018 mod 21 = (106)3 mod 21 = (106 − 21[106/21])3 mod 21 = 13 mod 21 = 1. Example 5: Evaluate (67879848972 + 8794343970)(68968798793 − 6787943895) mod 7843. Solution: The first thing we want to do is reduce these ten-digit numbers in modulo 7843. Since 6787984897 mod 7843 = 1728, 8794343970 mod 7843 = 3756, 6896879879 mod 7843 = 4498, and −6787943895 mod 7843 = 59, (67879848972 + 8794343970)(68968798793 − 6787943895) mod 7843 = (17282 + 3756)(44983 + 59) mod 7843 = (2989740 mod 7843)(4498{44982 mod 7843} + 59) mod 7843 = 1557(44984907 + 59) mod 7843 = 1557(22071745 mod 7843) mod 7843 = (15571543) mod 7843 = 2493. Fermat's little theorem states that if p is prime and xℕ, then xp ≡ x mod p. Proof: We prove this theorem using induction on x. For the base case, let x = 1. Clearly, 1p ≡ 1 mod p. For the inductive case, we assume xp ≡ x p p 푝 푝 p−i p mod p and prove (x + 1) ≡ x + 1 mod p. Using the binomial theorem, we find that (x + 1) = ∑ 푖=0( 푖 )x = x + 푝−1 푝 p−i 푝 p 푝−1 푝 p−i ∑ 푖=1 ( 푖 )x + 1. It follows from de Polignac’s formula that p ∣ ( 푖 ) for all 1 ≤ i ≤ p − 1. So, x + ∑ 푖=1 ( 푖 )x + 1 ≡ xp + 1 ≡ x + 1 mod p by hypothesis. Therefore, xp ≡ x mod p for all xℕ. Since x + kp ≡ x mod p, this result also holds for all xℤ. The Euler phi function is denoted as 휙(m) and defined for all mℕ as the number of natural numbers not greater than m that are relatively prime to m. It has some important arithmetic properties for computation that we state without proof. Since 휙(m) is a multiplicative function (as defined above), we can easily compute it by e1 en e1 e1−1 en en−1 factoring m into a product of primes. If m = p1 …pn , then 휙(m) = (p1 − p1 )…(pn − pn ). Using 푛 ei 푛 ei ei−1 the product symbol, ∏, we can restate this result more concisely as 휙(∏ 푖=1pi ) = ∏ 푖=1(pi − pi ). If (a, m) = 1, then an ≡ an mod (m) mod m. Proof: Let ∏ᵢi denote the product of all 휙(m) elements in modulo m that are coprime to m. Then, a(m)∏ᵢi ≡ ∏ᵢ(ai) mod m. Since (ai, m) = 1, ∏ᵢ(ai) is also the product of all 휙(m) elements of m that are coprime to m, though the elements would be multiplied in a different order except if a = 1. Since (∏ᵢi, m) = 1, ∏ᵢi has a multiplicative inverse in modulo m and hence it can be cancelled from both sides of the congruence. It follows that a(m) ≡ 1 mod m. Since ak(m) ≡ a0 mod m, 휙(m) is always a multiple of the order of the exponential cycle of each coprime element in modulo m. Therefore, an ≡ an−(m)[n/(m)] mod m. It follows that a(m)−1 ≡ a−1 mod m where a−1 is the multiplicative inverse of a in modulo m. For primes, 휙(p) = p − 1 and if p ∤ a then ap−1 ≡ 1 mod p and ap−2 ≡ a−1 mod p. It follows that if p ∤ a, then the solution to the modular equation ax ≡ b mod p is x ≡ ap−2b mod p. In the previous section, we determined which combinatorial expressions are divisible by p and how many times they are divisible by p. Next, we study some properties of combinatorial expressions in prime moduli. The first result we present is Wilson’s theorem, which states that p is prime iff (p − 1)! ≡ −1 mod p. Proof: First, assume p is prime. We attempt to evaluate the product, 1…(p − 1), which includes every non-zero element in modulo p. It follows from Fermat’s little theorem that if p ∤ x, then x has a multiplicative inverse in modulo p. Furthermore, if x2 ≡ 1 mod p, then x is its own inverse. If x2 ≡ 1 mod p, then x2 − 1 = kp, meaning p ∣ (x − 1)(x + 1). It follows that by Euclid’s lemma, either p ∣ (x − 1) or p ∣ (x + 1) and hence the only elements that are their own multiplicative inverse in modulo p are 1 and −1 (or p − 1). If p = 2, then (2 − 1)! = 1 ≡ −1 mod 2. If p > 2, then p is odd and p − 3 is even. This means that except for the two non-zero elements 1, and p − 1, we can pair the other p − 3 non-zero elements in modulo p with their mulitplicative inverses. So, 1…(p − 1) ≡ 1(p − 1) mod p ≡ −1 mod p. Next, assume p is not prime. If p = ij with 1 < i < j < p, then i and j are in the product, 1…(p − 1). So, (p − 1)! ≡ 0 mod p. If p = i2 and p ≤ 4, then either (p − 1)! = 1 ≡ 0 mod 1 or (p − 1)! = 6 ≡ 2 mod 4. If p = i2 and p > 4, then i > 2, meaning i and 2i are in the product, 1…(p − 1). So, (p − 1)! ≡ 0 mod p. Either way, if p is not prime, then (p − 1)! ≢ −1 mod p. Hence, p is prime iff (p − 1)! ≡ −1 mod p. The second result we present is Anton’s congruence. Let nℕ and p be a prime. Let (n!)p denote the product [n/p] of all natural numbers up to n except for those that are multiples of p. Then, (n!)p ≡ (−1) (n − p[n/p])! mod p. 푞 Proof: Let n = qp + r with 0 ≤ r < p. If r = 0, then (n!)p = {∏ 푖=1(ip − p + 1)…(ip − 1)}. If r > 0, then (n!)p = 푞 푞 {∏ 푖=1(ip − p + 1)…(ip − 1)}{1…r}. Either way, (n!)p = {∏ 푖=1(ip − p + 1)…(ip − 1)}r!. And in 푞 푞 q modulo p, {∏ 푖=1(ip − p + 1)…(ip − 1)}r! ≡ {∏ 푖=1(1)…(p − 1)}r! ≡ (p − 1)! r!. Since p is prime, (p −

q q q 1)! ≡ −1 mod p; hence (p − 1)! r! ≡ (−1) r! mod p. Since (n!)p ≡ (−1) r! mod p and the division algorithm gives q [n/p] = [n/p] and r = n − p[n/p], (n!)p ≡ (−1) (n − p[n/p])! mod p. Although we did not use it in the proof, (n!)p = [n/p] [n/p] n!/([n/p]!p ). However, we use both Anton’s congruence and the fact that (n!)p = n!/([n/p]!p ) in our study of binomial coefficients in prime moduli, which we encounter in Section 3.2.

We conclude this section with a brief treatment of calendar theory. In the next several paragraphs, we demonstrate how to use congruences to compute the day of the week given the date. Most Americans know that America declared Independence from Great Britain on July 4th, 1776 or that at least John Hancock did. (The rest of the Continental Congress did not sign the declaration before August.) However, few Americans can say with certainty what day of the week that was. Occasionally, when analyzing historical events or planning for the future, it is important to know the day of the week, especially for verifying certain aspects of an event. In order to calculate the day of the week given the date (month, day, and year), we must first understand how our calendar works. Since 1752, we have used the Gregorian calendar, which improved upon the Julian and early Roman calendars. The Gregorian calendar has 12 months, each with the following number of days: January (31), February (28 or 29), March (31), April (30), May (31), June (30), July (31), August (31), September (30), October (31), November (30), and December (31). February has 29 days iff the year is a leap year. A year is a leap year iff it is divisible by 4 and not divisible by 100 or it is divisible by 400. For example, 1900 is not a leap year, 1948 is a leap year, 1950 is not a leap year, and 2000 is a leap year. We must also understand how the evolution of our calendar affects the computation process. The early Roman calendar was invented with the purpose of marking the phases of the moon for religious reasons while keeping track of the seasons for agricultural reasons. At first, the Roman calendar had ten months from March to December (304 days) followed by a winter gap. January and February were added shortly thereafter, becoming the 11th and 12th months respectively. This brought the number of days in a year up to 355. In every other year, the Pontifices, a counsel that assisted the chief magistrate in planning sacrificial functions, would usually replace the last five days of February with the month Intercalans, lasting 27 or 28 days. This brought the average number of days in a year up to 366¼. Obviously, this could not continue and many reforms were made since. However, up to the present day, February has been chosen to absorb the extra day in each leap year. Since February absorbs the extra day in each leap year, we consider the last day in February as the last day of the previous year. At this point, we have enough information to find the day of the week given a month, day, and year. Since the days of the week are not numbers, we have to number them. Since they are already ordered, we can number them as follows: Sunday is 0, Monday is 1,..., and Saturday is 6. Since Sunday comes after Saturday and the days repeat thereafter, the numbering for the day of each week can be expressed in modulo 7. Since a date is a month, day, and year, we can also think of a date as the period of time (measured in days) that has elapsed after a set number of years, months and days. Let D be the day of the week corresponding to a given date. Then, D = {g(m) + d + g(y) + d0 − 1} mod 7 where g(m) is the number of days in the months after 3/1/y (since we start each year on March 1st), d − 1 is the number of days after m/1/y (since we start each month with day 1 we subtract 1 from the day of the month), g(y) is the number of days in the years after 3/1/0000, and d0 is the day of the week on 3/1/0000. We chose the start date 3/1/0000 to make the calculations easier; there was no year zero and this formula cannot work on any historic dates before its introduction in Rome on October 15, 1582. First, we seek an expression for g(m), the number of full days from 3/1/y to m/1/y (excluding m/1/y). This is simply the sum of the number of days in each month up to m/1/y. See the table below:

Month Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb g(m) 0 31 61 92 122 153 184 214 245 275 306 337 g(m) mod 7 0 3 5 1 3 6 2 4 0 2 5 1

Next, we find an expression for g(y), the number of full days from 3/1/0000 to 3/1/y (excluding 3/1/y). During this period, there are y years with at least 365 days. Leap years have an additional day in February. So, g(y) = 365y + {the number of leap years}. The number of leap years is the number of years that are multiples of 4 and are not multiples of 100 unless they are also multiples of 400. At first, it may appear that the first year is a leap year, but this is not the case. Since the calendar starts on 3/1/0000, the first year is from 3/1/0000 to 3/1/0001.

And since 0001 is not a multiple of 4, there is no extra day added in February 0001. Hence, the first year is not a leap year. It follows from this line of reasoning that the yth year is from 3/1/(y − 1) to 3/1/y. So, the number of leap years between 3/1/0000 and 3/1/y is [y/4] − [y/100] + [y/400]. Hence, g(y) = 365y + [y/4] − [y/100] + [y/400]. At this point, we can solve for d0 in modulo 7. Since we are solving for an initial condition, we must first find a date and corresponding day of the week. We leave it to the reader to verify that March 1st, 2006 falls on a Wednesday. Since our calendar formula will be of the form D = {g(m) + d + g(y) + d0 − 1} mod 7 and since we chose a date on March 1st, g(m) and d − 1 are zero. It follows that 3 = {g(2006) + d0} mod 7. Therefore, 3652006 + [2006/4] − [2006/100] + [2006/400] + d0 ≡ 3 mod 7; 732,676 + d0 ≡ 3 mod 7; and hence d0 ≡ 3 mod 7. Now that we solved for each variable in terms of m and y, we have the option of writing out our calendar formula or improving upon it. If we stop here, our formula would be far from perfect. First, we must replace y with y − 1 if the month is January or February and we have yet to reduce g(y) in modulo 7. Second, we must subtract 2 from m and even after doing so, we must set Jan = 11 and Feb = 12. Third, we have no explicit formula for g(m) in modulo 7. Since our formula still has a lot of room for improvement, we chose to improve upon it. With a few adjustments, we can correct for each problem. We take care of the first problem by replacing y with y + [.1m − .3] so that the output is y − 1 if the month is January or February and y otherwise. And we can reduce g(y) in modulo 7 by replacing 365y with y. In order to take care of the second problem, we revise our table by setting g'(m) = g(m) + d0 − 1 where Jan = 1, Feb = 2,..., and Dec = 12 in the table below.

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec g'(m) 308 339 2 33 63 94 124 155 186 216 247 277 g'(m) mod 7 0 3 2 5 0 3 5 1 4 6 2 4

We take care of the third problem using modeling techniques. We leave it to the reader to verify that g'(m) ≡ [2.57m + 1.87] − 3[.1m − .3] mod 7 for 1 ≤ m ≤ 12. So, our calendar formula is D = [2.57m + 1.87] + d + y − 2[.1m − .3] + [(y + .1m − .3)/4] − [(y + .1m − .3)/100] + [(y + .1m − .3)/400] mod 7. Example 7: Determine the day of the week John Hancock signed the Declaration of Independence. Solution: All the work is done. Just set m = 7, d = 4, and y = 1776 and then evaluate D. Our output is 2230, which reduces to 4 in modulo 7. So, John Hancock signed the Declaration of Independence on a Thursday.

Section 2.3: Summation and Integration

Some summations and integral forms involving the greatest integer function have well-known and useful properties. Summations involving the greatest integer function can often be used to express arithmetic properties of various constructs. For this reason, we study some properties of summations involving the greatest integer function and general summation techniques so that we can better manipulate or evaluate them. The greatest integer function is also indispensable for expressing the relationship between summation and integration. Using the greatest integer function, we can express the infinite sum of any positive monotone decreasing infinite sequence as a variety of integral forms provided the series converges.

We devote the next several paragraphs to the study of finite sums involving the greatest integer function. 푘 Our first result is known as Hermite’s identity. For any kℕ and any x, [kx] + 1 = ∑ 푖=1[x + i/k]. Proof: 푘 푘 푘 First, let xℤ. Then, [kx] = kx and ∑ 푖=1[x + i/k] = ∑ 푖=1{x + [i/k]} = [kx] + ∑ 푖=1[i/k] = [kx] + 1 because [i/k]={1 푘 if i = k and 0 if 1 ≤ i < k}. Next, let xℤ and x − [x] = f. Then, [kx] = [k([x] + f)] = k[x] + [kf] and ∑ 푖=1[x + i/k] = 푘 푘 푘 ∑ 푖=1[x − [x] + [x] + i/k] = ∑ 푖=1{[x] + [f + i/k]} = k[x] + ∑ 푖=1[f + i/k]. Since 0 < f < 1 and 0 < i/k ≤ 1 for all 1 ≤ i ≤ k, it follows that 0 < f + i/k < 2 and hence 0 ≤ [f + i/k] ≤ 1. Let n denote the smallest natural number such that [f + n/k] = 1. Then, f + n/k ≥ 1, meaning n ≥ k(1 − f). So, if k(1 − f)ℤ, then n = k(1 − f) and if k(1 − f)ℤ, then n 푘 is k(1 − f) rounded up to the nearest integer. So, n = −[−k(1 − f)] = k − [kf]. Since n = k − [kf] ≤ k, ∑ 푖=1[f + i/k] = 푘 푘 ∑ 푖=푛1 = k − n + 1 = [kf] + 1. So, ∑ 푖=1[x + i/k] = k[x] + [kf] + 1 = [k[x] + kf] + 1 = [k([x] + f)] + 1 = [kx] + 1. 푘 푘−1 Therefore, [kx] + 1 = ∑ 푖=1[x + i/k] for all x and k. Equivalently, [kx] = ∑ 푖=0 [x + i/k].

Our second result is applicable to a much wider variety of sums involving the greatest integer function. Let f(i) be a function such that for all i, j, k, nℕ and 1 ≤ i, j, k ≤ n, f(i)ℤ for all 1 ≤ i ≤ n and for every j such that f(j) − [f(j)] = f, there exists a k for which f(k) − [f(k)] = 1 − f except for at most one j for which f(j) − [f(j)] = ½. Then, 푛 푛 ∑ 푖=1 f(i) = ∑ 푖=1[f(i)] + n/2. Proof: If n is even, then there are n/2 distinct ordered pairs, (j, k), such that f(j) − [f(j)] + f(k) − [f(k)] = 1. If n is odd, then there are (n − 1)/2 distinct ordered pairs, (j, k), such that f(j) − [f(j)] + f(k) 푛 − [f(k)] = 1 and a single j for which f(j) − [f(j)] = ½. In either event, ∑ 푖=1{f(i) − [f(i)]} = n/2. This theorem can help us evaluate the sum of the greatest integer of any function provided the function has this symmetry in its fraction parts over the index of summation. 2푛 푛 Example 1: Evaluate a. ∑ 푘=1[9sin(k/n)] and b. ∑ 푘=1[9sin(2k/n)]. 2푛 Solution: a. Our first objective is to evaluate ∑ 푘=19sin(k/n). Since sin(k/n) + sin((2n − k)/n) = 0, terms 2푛 k and 2n − k cancel. Also, when k = n and k = 2n, sin(k/n) = 0. Hence, ∑ 푘=19sin(k/n) = 0. Our next objective is 2푛 to evaluate ∑ 푘=1{9sin(k/n) − [9sin(k/n)]}. Summing terms k and 2n − k yields 9sin(k/n) − [9sin(k/n)] + 9sin((2n − k)/n) − [9sin((2n − k)/n)] = 0 − [9sin(k/n)] − [−9sin(k/n)]. Except where 9sin(k/n)ℤ, its ceiling is one unit above its greatest integer. Hence, the fraction parts for terms k and 2n − k sum to one except where k is an integer multiple of n/2. Since there is no fraction part there, the sum of the fraction parts is (2n − 2푛 2)/2 = n − 1 if n is odd and (2n − 4)/2 = n − 2 if n is even. Hence, ∑ 푘=1[9sin(k/n)] = −2[(n − 1)/2]. b. Our first 푛 objective is to evaluate ∑ 푘=19sin(2k/n). Since sin(2k/n) + sin(2(n − k)/n) = 0, terms k and n − k cancel. If n is odd, then all terms in the sum cancel except for term k = n, which is zero. If n is even, then all terms in the sum 푛 cancel except for terms k = n/2 and k = n, which are zero. So, ∑ 푘=19sin(2k/n) = 0. Our next objective is to 푛 evaluate ∑ 푘=1{9sin(2k/n) − [9sin(2k/n)]}. Summing terms k and n − k yields 9sin(2k/n) − [9sin(2k/n)] + 9sin(2(n − k)/n) − [9sin(2(n − k)/n)] = 0 − [9sin(2k/n)] − [−9sin(2k/n)]. Except where 9sin(2k/n)ℤ, its ceiling is one unit above its greatest integer. Hence, the fraction parts for terms k and n − k sum to one except where k is an integer multiple of n/4. Since there is no fraction part there, the sum of the fraction parts is (n − 1)/2 푛 if n is odd, (n − 2)/2 if 2 ∣ n but 4 ∤ n and (n − 4)/2 if 4 ∣ n. Hence, ∑ 푘=1[9sin(2k/n)] = (gcd(4, n) − n)/2. We demonstrated above that we can use fraction-part symmetries to express sums involving the greatest integer function in terms of the greatest common divisor function. Additional examples include the following 푐−1 푐 identities for all a, cℕ: i. (a, c) = 1 iff ∑ 푖=1 [ai/c] = (a − 1)(c − 1)/2; ii. −(a, c)/2 = ∑ 푖=1(ai/c − [ai/c] − ½). The proofs are left as exercises for the reader. Our third result is known as Abel summation, or summation by parts. When summing a product of terms, 푥 푛 sometimes we can manipulate the product to make it easier to work with. If F(x) = ∑ 푘=1 f(k), then ∑ 푘=1 f(k)g(k) = 푛−1 F(n)g(n) − ∑ 푘=1F(k)(g(k + 1) − g(k)). The expression, g(k + 1) − g(k), is called the first-difference of g(k). The proof is elementary and is left for the reader. We thank Dr. Kin-Yin Li for presenting the following example. 푛 Example 2: Prove that if x > 0, then [nx] ≥ ∑ 푘=1[kx]/k for all nℕ. (Source: 1982 USAMO) Solution: The proof is by induction on n. We set n = 1 for the base case, which yields [x] ≥ [x], and assume 푛 the inequality holds through n − 1. Let f(k) = [kx]/k and g(k) = k. Applying summation by parts, we have ∑ 푘=1[kx] 푛 푛−1 푛−1 푛−1 = ∑ 푘=1 f(k)g(k) = F(n)n − ∑ 푘=1F(k). By inductive hypothesis, we can assume that ∑ 푘=1[kx] ≥ ∑ 푘=1F(k). Then, 푛 푛−1 푛 푛−1 푛−1 푛−1 F(n)n = ∑ 푘=1[kx] + ∑ 푘=1F(k) ≤ ∑ 푘=1[kx] + ∑ 푘=1[kx] = [nx] + ∑ 푘=1([kx] + [(n − k)x]) ≤ [nx] + ∑ 푘=1[kx + (n − k)x] 푛 = n[nx]. Since F(n)n ≤ n[nx], ∑ 푘=1[kx]/k ≤ [nx]. Our fourth result has no name, but inverse-summation is an accurate description. The cumulative sums of the greatest integer of some inverse functions, such as the square-root and the logarithm with an integer base, have a closed-form expression. The reason for this is that the greatest integer or ceiling of their inverse when restricted to integer inputs is just their inverse because every integer input, such as for a square or power of an integer base, has an integer output. And the cumulative sum of squares and cumulative sum of powers have closed-form expressions. So, arguably the sums are about the same except for the fact that they are taken over differ axes. The following example and the lattice-point diagram below should clear up any ambiguities. 푛 Example 3: Let f(k) = √푘 − ¾ − ½. Find a closed-form expression for ∑ 푘=1[f(k)] in terms of n. Solution: We will evaluate this sum, but not over the index, k; instead our index will be i = [f(k)]. The sum of [f(k)] over any interval for k over which [f(k)] is constant is simply [f(k)] times the number of integers in that interval. And the number of integers in that interval is simply the first-difference of f −1(k). Here, f −1(k) = k2 + k + 1 and its first-difference is 2k + 2. We illustrate these results on the first few terms in this sum in the table below.

[√푘 − ¾ − ½] Interval for k #Integers in interval 2[√푘 − ¾ − ½] + 2 0 [1, 2] 2 2 1 [3, 6] 4 4 2 [7, 12] 6 6 3 [13, 20] 8 8

20 20 3 For example, ∑ 푘=13[√푘 − ¾ − ½] = 3(2  3 + 2) and ∑ 푘=1[√푘 − ¾ − ½] = ∑ 푖=1i(2i + 2). It follows that we can 2 easily evaluate ∑k[f(k)] 2k + 2 terms at a time up to n. But unless there is an integer i such that i + i = n, we will have to cut the last interval short, extending it from i2 + i + 1 to n for the largest i such that i2 + i + 1 ≤ n. Let j denote the largest i such that i2 + i + 1 ≤ n. Then, j = [√푛 − ¾ − ½] and the last interval has width, n – (j2 + j + 1) [√푛−¾ − ½]−1 + 1 = n – j(j + 1) and height, j. The rest of the sum is ∑ 푖=1 i(2i + 2) = 2j(j – 1)(j + 1)/3. And the grand total is j(n – j2 – j) + 2j(j – 1)(j + 1)/3 = jn – j(j + 1)(j + 2)/3 = [√푛 − ¾ − ½](3n − [√푛 − ¾ + ½][√푛 − ¾ + 1½])/3. 푛 [√푛−¾ − ½] 2 It also follows that ∑ 푘=1[√푘 − ¾ − ½] + ∑ 푘=1 (k + k) = n[√푛 − ¾ − ½] for 3 ≤ n. This result is demonstrated quite easily with the lattice-point diagram and graph of f(x) over [0, 26] below. We simply count the number of lattice points in the rectangle with x-coordinates between 1 and n and y-coordinates between 1 and √푛 − ¾ − ½. The reason why the latter summand in the identity above is k2 + k instead of k2 + k + 1 is because in this case, f(x) ℕ at xℕ, so all points at which f(x) ℕ are lattice points. To adjust for this, we either have to subtract one from the latter summand or count the lattice points with coordinates (x, √푥 − ¾ − ½) ℕ twice. This kind of relationship between two sums is known as reciprocity. We discuss reciprocity further in Chapter 4.

Since our objective for this subsection was merely to introduce techniques for evaluating finite sums involving the greatest integer function, we stop for now although we are far from finished. Lattice-point enumeration and Riemann-Stieltjes integration are far more powerful techniques. However, since they are also very advanced, they are primarily reserved for Part III of this work, although we introduce lattice-point diagrams in Section 4.2. Further, our reasons for studying these sums will be better understood after completing Chapter 4.

We can use the greatest integer function to convert from summation to integration. After all, a summation differs from integration because a summation adds values of the function only at consecutive integer values of x. Our first result is a simple yet important summation-to-integration identity. If m, nℤ and m < n, then 푛 푛+1 푛+1 ∑ 푘=푚 f(k) = ∫푚͏ f([x])dx. Proof: Since we can partition the integral at integers, ∫푚͏ f([x])dx = 푛 푘+1 푛 푘+1 푛 푘+1 푛 ∑ 푘=푚{∫푘͏ f([x])dx} = ∑ 푘=푚{∫푘͏ f(k)dx} = ∑ 푘=푚 f(k)x|푘͏ = ∑ 푘=푚 f(k). Note: The justification for the step 푘+1 푘+1 ∫푘͏ f([x]) = ∫푘͏ f(k) is that f([x]) = f(k) for all k ≤ x < k + 1. For the remainder of this section, when integrating a function over subintervals, we assume the subintervals are open because 1) this assumption lets us assume a single value for the function over each subinterval and 2) values at endpoints of subintervals are of measure zero provided there are countably many endpoints and the function is bounded over every subinterval. We can also use this identity to relate first-order difference and differential equations. Let n ≥ 1 and s(n) = 푛 푛+1 ∑ 푘=1 f(k) for nℕ. Then, the first difference of s(n) is ∆s(n) = s(n + 1) − s(n) = f(n + 1) and s(n) = ∫1͏ f([x])dx. Furthermore, let F(x) be a continuous antiderivative of f([x]) so that s(n) = F(n + 1) − F(1). Then, s′(n) = F′(n + 1) = f([n] + 1) for nℕ. A generalization of this result is that for the same initial conditions, the difference equation, yn+1 − yn = f(yn), and the differential equation, y′(n) = f(y([n])), are equivalent for all non-negative integers, n. Proof: We prove y(n) = yn using induction on n. For the base case, we set n = 0. Since the initial conditions are the same for both equations, we can assume y(0) = y0. And now that the base case is covered, we can also assume that y(n − 1) = yn−1 and must prove that y(n) = yn. For the difference equation, yn = yn−1 + f(yn−1). For the differential

푛 푛 equation, y(n) = y(n − 1) + y(n) − y(n − 1) = y(n − 1) + ∫푛͏ −1 y′(x)dx = y(n − 1) + ∫푛͏ −1 f(y([x]))dx = y(n − 1) + 푛 ∫푛͏ −1 f(y(n − 1))dx = y(n − 1) + f(y(n − 1)). Therefore, y(n) = yn. This is an elementary result in time-scale calculus, which studies the interrelationship between difference and differential equations. In Chapter 7, we expand on this concept in our study of hybrid discrete-continuous dynamical systems by using the greatest integer function to express the discrete-time elements in continuous time. Our second result is a similar summation-to-integration identity with a continuous integrand. Starting with the set of points {f(k)}, we create a continuous approximation of f(k), fc(x), by ‘connecting the dots’ with a straight line, or linear interpolation. Assuming k ≤ x < k + 1, we perform linear interpolation simply by taking a weighted average of f(k) and f(k + 1) depending on the fraction part of x with a larger fraction part putting more weight on f(k + 1) because then x would be closer to k + 1. Then, fc(x) = (1 − x + [x])f([x]) + (x − [x])f([x] + 1). We leave it to the reader to verify that fc(x) is defined and continuous for all x. Additionally, fc′(x) = f([x] + 1) − f([x]) = ∆f([x]) for all x  ℤ. This last identity is our first practical example of the relationship between first-order difference and 푛 푛 differential equations described above. Furthermore, if m, nℤ and m < n, then ∑ 푘=푚 f(k) = ∫푚͏ fc(x)dx + ( f(m) + ₁ f(n))/2. Proof: Since we can partition the integral at integers, ∫͏ 푛 f (x)dx = ∑ 푛−1 {∫͏ 푘+1f (x)dx} = ∑ 푛−1 ( f(k) + f(k 푚 c 푘=푚 푘 c 푘=푚 ² 푛−1 + 1)) = f(m)/2 + ∑ 푘=푚+1 f(k) + f(n)/2. Adding ( f(m) + f(n))/2 to each side of the equation completes the proof. 푛 Our third result is summation by parts for integral forms. If g(x) is differentiable over [1, n], then ∑ 푘=1g(k) = 푛 푥 푛 푛−1 푘+1 푛−1 푘+1 ng(n) − ∫1͏ [x]g′(x)dx. Proof: Here, F(x) = ∑ 푘=11 = [x] and ∫1͏ [x]g′(x)dx = ∑ 푘=1{∫푘͏ [x]g′(x)dx} = ∑ 푘=1kg(x)|푘͏ . 푛 Further, if g(x) is integrable over [1, n] with antiderivative, G(x), then ∑ 푘=1g(k) = G(n) − G(1) + (g(n) + g(1))/2 + 푛 푛 푛 ∫1͏ ((x))g′(x)dx. Proof: We start with the identity, ∑ 푘=1g(k) = ng(n) − ∫1͏ [x]g′(x)dx, and then add and subtract 푛 푛 ∫1͏ (x − ½)g′(x)dx from the right-hand-side. This yields ∑ 푘=1g(k) = 푛 푛 푛 ng(n) − ∫1͏ [x]g′(x)dx + ∫1͏ (x − ½)g′(x)dx − ∫1͏ (x − ½)g′(x)dx = 푛 푛 ng(n) + ∫1͏ (x − [x] − ½)g′(x)dx + (G(x) − (x − ½)g(x))|1͏ = 푛 ng(n) + ∫1͏ ((x))g′(x)dx + G(n) − G(1) − (n − ½)g(n) + (1 − ½)g(1) = 푛 ∫1͏ ((x))g′(x)dx + G(n) − G(1) + (g(n) + g(1))/2. We can also use this identity to express an infinite converging series in integral form. Letting n   yields the following corollary. If f(x) is differentiable, f(x) has a closed-form antiderivative, F(x) with F() = 0, and     ∑ 푘=1 f(k) converges, then ∑ 푘=1 f(k) = ∫1͏ ((x))f ′(x)dx − F(1) + f(1)/2. Proof: If we can assume ∑ 푘=1 f(k) converges, then we can assume f(k) has all properties required for convergence. This means all limits at infinity are zero and therefore the corollary follows from our third result. In the next few paragraphs, we discuss how this corollary can be used to derive the Euler-Maclaurin summation formula, a high-precision method for  approximating ∑ 푘=1 f(k).   Our goal is to reduce ∫1͏ ((x))f ʹ(x)dx further and we can do so with integration by parts. That is, ∫1͏ ((x))f ʹ(x)dx   = (f ʹ(x)∫((x))dx)|1͏ − ∫1͏ (∫((x))dx)f ʺ(x)dx. So, we need a continuous closed-form antiderivative for the sawtooth wave function with a constant chosen so that its integral over (k, k + 1) is zero. This would seem like a tall order, except that we worked out this result at the end of Section 1.2. Hence, the right-hand-side simplifies to ₁ 2 ₁ ₁ ₁ 2 ₁ ₁ {f ʹ(x)( (x − [x]) − (x − [x]) + )}|͏ − ∫͏ ( (x − [x]) − (x − [x]) + )f ʺ(x)dx = ² ² ¹² 1 1 ² ² ¹² ₁ 2 ₁ ₁ − f ʹ(1)/12 − ∫͏ ( (x − [x]) − (x − [x]) + )f ʺ(x)dx. 1 ² ² ¹²  We illustrate below how the above iterations of integration by parts can be used to approximate ∑ 푘=1 f(k) with high precision. On the left, we superimpose the graphs of y = ((x)) and its continuous antiderivative. Notice how much smoother and how much closer to the x-axis its antiderivative is. As you might have guessed, each successive antiderivative of the sawtooth wave function is smoother than the former and its range is much smaller than the former with the right choice of constants. For the purpose of illustration, we arbitrarily set f(x) = 1/(4x −  3) − 1/(4x − 1). The sum, ∑ 푘=1{1/(4k − 3) − 1/(4k − 1)}, is called the Leibniz series and it converges to /4. On the right, we have a table of values for the first few terms of f(x) and the absolute values of its first and second derivatives. Notice that the sequences generated by the absolute values of each successive derivative of f(x) decline to zero much more rapidly than the former; however, their first few terms are much larger than the former.

x 1 2 3 4 5 10 15 20 25 30 f (x) .666667 .057143 .020202 .010256 .006192 .001386 .000595 .000329 .000208 .000144 |f ʹ(x)| 3.55556 .078367 .016325 .005891 .002760 .000292 .000082 .000034 .000017 .000010 |f ʺ(x)| 30.8148 .162706 .019854 .005084 .001848 .000092 .000017 .000005 .000002 .000001

 It follows that in order to evaluate ∑ 푘=1 f(k) with high precision, we have to raise the lower bound of the 푛 integral. This means we need to get rid of the first few terms in the sequence and we do so by calculating ∑ 푘=1 f(k) for a moderate sized n first. Then, we evaluate antiderivatives and derivatives of f(x) at x = n. More precisely,  푛 ∑ 푘=1 f(k) = ∑ 푘=1 f(k) − F(n) − f(n)/2 − f ʹ(n)/12 + f ‴(n)/720 − … It follows that summing the first few terms of f(k) is just as important as summing the first few terms in the series of derivatives generated by iterations of integration by parts. Our point is that by also using the greatest integer function with integration by parts, we do not have to sum nearly as many terms to achieve the desired level of precision. Although most other texts explain the mathematics behind the Euler-Maclaurin formula in terms of Bernoulli polynomials rather than the greatest integer function, all of our steps and calculations are the same.

The imaginary inverse of the greatest integer function can also help us express the infinite sum of any positive monotone decreasing infinite sequence as a definite integral. Let {f(k)} denote the kth term in the sequence where kℕ and f(k) ≥ f(k + 1). Since f(k) is a monotone decreasing function, f(1) is the first and largest term of the series and limkf(k) = 0. Since the inverse of y = f(k) generates the values of k for all yf(k), the imaginary inverse of y = f([x]) generates all the integer values of x > 0 for all y between 0 and f(1) (because f() = 0 and f(1) is the first and largest term). Since the perpendicular edges of the graph of y = f([x]) and the x-axis box in the area representing the infinite sum and neither the wafers nor discontinuities enclose the region more effectively than the other, they are interchangeable for the purpose of integration over the region. For this reason, we can determine the infinite sum by integrating the imaginary inverse of y = f([x]) from x = 0 to f(1). Since the sequence starts at k = 1 rather than at k = 0, we must subtract 1 from the imaginary inverse (or subtract f(1) from the integrand). This is because the definite integral represents the area bounded above by the graph and below by y = 0, not y = 1. We call this summation technique the inverse summation and integration method.  푔(1) −1 If g(x) is defined, continuous, and monotone decreasing over [1, ) for x, then ∑ 푘=1g(k) = ∫0͏ [g (x)]dx.  Proof: We assume ∑ 푘=1g(k) converges and all necessary conditions on g for that. Since g(x) is monotone −1 −1 푔(1) −1  푔(푘) −1  푔(푘) decreasing, so are g (x) and [g (x)]. It follows that ∫0͏ [g (x)]dx = ∑ 푘=1{∫푔͏ (푘+1)[g (x)]dx} = ∑ 푘=1{∫푔͏ (푘+1)kdx}  푔(푘)  = ∑ 푘=1kx|푔͏ (푘+1) = −∑ 푘=1k(g(k + 1) − g(k)). Setting f(k) = 1 and using the formula for summation by parts yields 푛 푛−1   ∑ 푘=11g(k) = ng(n) − ∑ 푘=1k(g(k + 1) − g(k)). Letting n yields ∑ 푘=1g(k) = −∑ 푘=1k(g(k + 1) − g(k)) = 푔(1) −1 ∫0͏ [g (x)]dx. Note: If g(x) is weird enough that it is non-decreasing over large intervals, then some terms in  the sum ∑ 푘=1k(g(k + 1) − g(k)) could be zero, but the identity still holds. Example 4: The inverse summation and integration method is illustrated below using the Leibniz series,  ∑ 푘=1{1/(4k − 3) − 1/(4k − 1)}, which converges to /4. Let f(x) = 1/(4x − 3) − 1/(4x − 1). Since f(x) is defined,  continuous at integers, strictly decreasing over [1, ) for x, and ⅔ is the first term in the Leibniz series, ∑ 푘=1 f(k) ⅔ −1 = ∫0͏ [f (x)]dx = /4. Notice how both graphs below appear to be mirror images of each other about the line y = x except for the fact that the wafers become the discontinuities and vice versa.

Using the inverse summation and integration method, we can numerically approximate all infinite converging  −1 −1 sums of the form ∑ 푘=1g (k) provided that g(x) is defined, continuous, and monotone decreasing over (0, g (1)]  −1 푔ˉ¹(1) for x. Then ∑ 푘=1g (k) = ∫0͏ [g(x)]dx. This method may be useful in instances where there is an explicit expression for g(x), but no good explicit expression for g−1(k). Although we can express infinite sums as definite integrals using the greatest integer function, we usually approximate infinite sums of convergent series using methods that combine partial summation, advanced properties of summations, integration and limits, but do not necessarily use the greatest integer function. This branch of analysis is called series acceleration. Chapter 6 discusses series acceleration methods that use the greatest integer function.

CHAPTER 3: Number Bases and Digits

This number-theory intensive chapter redefines the digit and discusses problem-solving methods involving digits. Section 3.1 discusses decimal digits. Section 3.2 discusses p-adic digits and change-of-number-bases.

Section 3.1: The Definitions and Properties of the Decimal Digit

The base-ten digit, or decimal digit, is indispensable. We use a combination of these digits for expressing any finite fixed figure from the amount in our checkbook to important dates in history. We express numbers in terms of digits in conversation every day and take them for granted. By the way, what is a digit? According to Merriam Webster's Collegiate Dictionary Tenth Edition 1993, the qualitative definition of the digit is any Arabic numeral between 0 and 9. Indeed, digits fit this criterion. However, the dictionary's definition is not precise because it does not describe the digit in terms of the number it represents. Without a precise definition of the decimal digit, we cannot work with it in a straightforward manner in a number-theory context. For this reason, this section is devoted to finding a more precise definition of the decimal digit. The dictionary also defines the decimal system as the expression of a number in base ten with digits 0 to 9 in each place and in each place multiplied by powers of ten. The dictionary's definition of the decimal system is sufficient since every real number's decimal expansion exists and is unique. In other words, we can express all x k−1 k−2 k k+1 uniquely as x = dk10 + dk−110 + ... + d210 + d1 + d0/10 + d−1/100 + ... + d−k+1/10 + d−k/10 + ... Proof: We can express zero with the single digit 0 and negative numbers with a minus sign in front of a positive number. So, we can restrict the proof to positive reals. For all let x be a positive real number. Then, 10[log(x)]+1 > x. (Assume all logs are in base 10.) For this reason, all digits di where i > [log(x)] + 1 are zero and are usually omitted. Otherwise, some of the digits would have to be negative or fractional to make up for it, a contradiction. [log(x)] [log(x)] On the other hand, if x > 0, then x ≥ 10 , meaning x/10 ≥ 1. For this reason, some digits di where i ≤ [log(x)] + 1 are not zero and k = [log(x)] + 1. At this point, we can prove both existence and uniqueness for the k−j decimal expansion of x using induction. Let xj = xj−1 − dk−j+110 and for the base case j = 0, let x = x0 and dk = [log(x)] [log(x)] [log(x)] [x/10 ]. Then, 1 < dk < 9. If d[log(x)]+1 < 1, then x/10 < 1, meaning 10 > x, a contradiction. And if [log(x)] [log(x)]+1 d[log(x)]+1 > 9, then x/10 ≥ 10, meaning 10 ≤ x, a contradiction. But could dk be any other value? Well, [log(x)] [log(x)] k−1 [log(x)] [log(x)] if dk > [x/10 ], then dk ≥ [x/10 ] + 1. Otherwise, dkℤ. But then dk10 ≥ 10 [x/10 + 1] > x, [log(x)] [log(x)] forcing some di < 0 to make up for it, a contradiction. And if dk < [x/10 ], then dk ≤ [x/10 ] − 1. k−1 [log(x)] [log(x)] [log(x)] [log(x)] k−1 k−2 Otherwise, dkℤ. But then dk10 ≤ 10 [x/10 − 1] ≤ x − 10 . Since 10 = 10 > dk−110 + ... k k+1 i−1 + d210 + d1 + d0/10 + d−1/100 + ... + d−k+1/10 + d−k/10 + ... even if all di = 9, ∑di10 < x, another [log(x)] contradiction. So, dk = [x/10 ] is indisputably the decimal digit of x corresponding to the highest power of ten, k−1 but what about the other digits of x? Let x1 = x0 − dk10 . Since dk exists and is unique, x1 is unique, and hence the k−j leftmost decimal digit of x1, dk−1, exists and is unique. In this fashion, let xj = xj−1 − dk−j+110 . We have proved the existence and uniqueness of the leftmost decimal digit of x0 and that if the leftmost decimal digit of xj−1 exists and is unique, then xj is unique and therefore the leftmost decimal digit of xj, dk−j, exists and is unique. So, all the decimal digits of x exist and are unique.

Using the greatest integer function, we can define the decimal digit more precisely. That is, the greatest integer function can help us express a decimal digit in terms of the natural number it represents by isolating the digit from the natural number. Let x be the n-digit natural number, dn...di+1didi−1...d1. Our objective is to isolate di. i−1 i−1 Since the coefficient of 10 represents the ith place in x, x/10 = dn...di+1di.di−1...d1. And [dn...di+1di.di−1...d1] = i i−1 dn...di+1di. Similarly, x/10 = dn...di+1.didi−1...d1 and 10[dn...di+1.didi−1...d1] = dn...di+10. Therefore, [x/10 ] − i i−1 i 10[x/10 ] = dn...di+1di − dn...di+10 = di. So, if x = dn...di...d1, then di = [x/10 ] − 10[x/10 ]. We call this expression the quantitative definition of the ith decimal digit of x. The quantitative definition of the decimal digit is so precise that it describes each digit mathematically in terms of the number it represents. i−1 n−1 Since x = d1 + ... + 10 di + ... + 10 dn for all x and we now have an expression for the ith decimal digit of x in terms of x, it is only natural for us to investigate the sums and alternating sums of the decimal digits of x. 푛 Let s1(x) = ∑ 푖=1di. Then, by the quantitative definition of the decimal digit, 푛 i−1 i 푛 i−1 푛 i 푛 i 0 n s1(x) = ∑ 푖=1([x/10 ] − 10[x/10 ]) = ∑ 푖=1[x/10 ] − ∑ 푖=1[x/10 ] − 9∑ 푖=1[x/10 ] = [x/10 ] − [x/10 ] −

푛 i 0 n 푛−1 i 9∑ 푖=1[x/10 ]. Since xℕ, it follows that [x/10 ] = x and [x/10 ] = 0. Therefore, s1(x) = x − 9∑ 푖=1 [x/10 ]. 푛 i Let s−1(x) = −∑ 푖=1(−1) di. Then, by the quantitative definition of the decimal digit, 푛 i i−1 i i s−1(x) = −∑ 푖=1{(−1) [x/10 ] − (−1) 10[x/10 ]} = 푛 i−1 i−1 푛 i i 푛 i i ∑ 푖=1(−1) [x/10 ] − ∑ 푖=1(−1) [x/10 ] + 11∑ 푖=1(−1) [x/10 ] = 0 0 n n 푛 i i (−1) [x/10 ] − (−1) [x/10 ] + 11∑ 푖=1(−1) [x/10 ]. 0 0 n n 푛−1 i i Since xℕ, it follows that (−1) [x/10 ] = x and (−1) [x/10 ] = 0. Therefore, s−1(x) = x + 11∑ 푖=1 (−1) [x/10 ]. Since we expressed the sums and alternating sums of the decimal digits of x in terms of x, we can draw some 푛−1 i 푛−1 i profound conclusions about their relationship. Since s1(x) = x − 9∑ 푖=1 [x/10 ] and ∑ 푖=1 [x/10 ]ℤ, s1(x) ≡ x mod 푛−1 i i 푛−1 i i 9. And since s−1(x) = x + 11∑ 푖=1 (−1) [x/10 ] and ∑ 푖=1 (−1) [x/10 ]ℤ, s−1(x) ≡ x mod 11. So, we can determine the remainder of any large number divided by 9 (or a factor of 9) or 11 simply by summing its decimal digits! Using the greatest integer function, we can also define a group of k consecutive decimal digits more precisely. That is, the greatest integer function can help us express a group of k consecutive decimal digits of the natural number it represents by isolating the group of consecutive digits from the natural number. As an example, let x = 6,493,736. The first group of 3 consecutive decimal digits of x is the number 736; the second group is 493; and the third group is 006, or 6. Let x be the nk-digit number, dnk...d(n−1)k+1d(n−1)k...dik+1dik...d(i−1)k+1d(i−1)k...dk+1dk...d1 where at least one of dnk through d(n−1)k+1 exceeds 0. Our objective is to isolate dik...d(i−1)k+1. Since the coefficient of (i−1)k 10 represents the ith group of every k decimal places in x, we isolate dik...d(i−1)k+1 in the following steps: (i−1)k x/10 = dnk...d(n−1)k+1d(n−1)k...dik+1dik...d(i−1)k+1.d(i−1)k...dk+1dk...d1; (i−1)k [x/10 ] = [dnk...d(n−1)k+1d(n−1)k...dik+1dik...d(i−1)k+1.d(i−1)k...dk+1dk...d1] = dnk...d(n−1)k+1d(n−1)k...dik+1dik...d(i−1)k+1; ik x/10 = dnk...d(n−1)k+1d(n−1)k...dik+1.dik...d(i−1)k+1d(i−1)k...dk+1dk...d1; k ik k 10 [x/10 ] = 10 [dnk...d(n−1)k+1d(n−1)k...dik+1.dik...d(i−1)k+1d(i−1)k...dk+1dk...d1] = dnk...d(n−1)k+1d(n−1)k...dik+10ik...0(i−1)k+1; and dnk...d(n−1)k+1d(n−1)k...dik+1dik...d(i−1)k+1 − dnk...d(n−1)k+1d(n−1)k...dik+10ik...0(i−1)k+1 = dik...d(i−1)k+1. (i−1)k k ik Therefore, if x = dnk...d(n−1)k+1...dik...d(i−1)k+1...dk...d1, then dik...d(i−1)k+1 = [x/10 ] − 10 [x/10 ]. We call [x/10(i−1)k] − 10k[x/10ik] the quantitative definition of the ith group of every k decimal digits of x. Since we now have an expression for the ith group of every k decimal digits of x in terms of x, we can also express the sums and alternating sums of groups of k consecutive decimal digits of x in terms of x. And we can draw some profound conclusions about their relationship. This latter objective is left for the exercises.

In the next few paragraphs, we present a simple method for performing long division on rationals. We demonstrate how to convert a rational fraction between 0 and 1 to decimal notation. In our proof for the existence and uniqueness of the decimal expansion of x, we showed how to compute x using the iterative process xj = xj−1 − k−j [log(x)] dk−j+110 where x = x0 and dk = [x/10 ]. With a few adjustments, the same iterative process can compute infinitely many digits of rational fractions as long as the fraction part of each iteration is expressed as c/d where c, dℕ. Otherwise, if the fraction parts are expressed as a decimal expansion, then after enough iterations, they could lose precision due to round-off error. In the next example, we show how to correct for round-off error. Converting a rational fraction, 0 < x < 1, to decimal notation requires that we find the first decimal digit of x and then use a rapid iterative process to find each successive digit. We must use the iterative process for finding each successive digit of x as opposed to using a closed-end formula because its decimal expansion is often indefinite and an iterative process preserves the fraction part after each digit, which is necessary to determine whether the decimal expansion of x terminates or repeats. If the fraction part complement to any of the digits does not exist or matches the fraction part complement to a preceding digit, then the fraction's decimal expansion terminates or repeats respectively. Proof: If the fraction part complement to di is zero, then all successive digits of x are zero. Since the fraction part from the previous iteration is the independent variable in the next iteration, if the fraction part of any iteration is the same as that of a preceding iteration, then the input is the same. Since the formula for the iterations is a function, if the inputs are the same, then the outputs are the same. Essentially, the digits repeat. Let 0 < x < 1, x = 0.d1d2...didi+1... x = x0, and xi = di.di+1... Then xi+1 = 10(xi − [xi]). Proof: We can prove this statement using induction on i. For the base case, i = 1, we must show that x1 = d1.d2...didi+1... = 10(x − [x]). Since 0 < x < 1, [x] = 0 and hence 10(x − [x]) = 10x = 10(0.d1d2...didi+1...) = d1.d2...didi+1... And for the inductive case, we must show that if xi = 10(xi−1 − [xi−1]), then xi+1 = 10(xi − [xi]). Since xi = di.di+1..., [xi] = di, xi − [xi] = 0.di+1di+2..., and 10(xi − [xi]) = di+1.di+2... Hence, for all iℕ, xi+1 = 10(xi − [xi]).

Using this iterative process on the TI-83, we can generate each successive digit of x just by pressing the ENTER key. We carefully chose this expression for the iterative process and its index so that [xi] = di in the proof above. First, type the value of x into the calculator and press enter. Next, type 10(Ans−int(Ans)). At this point, we can generate each digit just by pressing enter where di is the greatest integer of our output from the ith iteration. If x = c/d with c, dℕ, then every fraction part of an iteration of x can be expressed as k/d for some k between 0 and d − 1. Proof: We can prove this statement using induction on i. For the base case, i = 1, we must show that the fraction part of x1 can be expressed as k1/d for some k1 between 0 and d − 1. In the proof above, we show that x1 = 10x. Since x = c/d, x1 = 10c/d, and x1 − [x1] = 10c/d − [10c/d] = (10c − d[10c/d])/d. Since c, dℕ, 10c − d[10c/d]ℤ. Since 10c − d[10c/d] is the remainder after dividing 10c by d, 0 ≤ 10c − d[10c/d] ≤ d − 1. For the inductive case, we must show that if the fraction part of xi is ki/d for some ki between 0 and d − 1, then the fraction part of xi+1 is ki+1/d for some ki+1 between 0 and d − 1. So, if ki/d = xi − [xi], xi+1 = 10ki/d. And xi+1 − [xi+1] = 10ki/d − [10ki/d] = (10ki − d[10ki/d])/d = ki+1/d. Since ki, d, ki+1ℤ. Since 10ki − d[10ki/d] is the remainder after dividing 10ki by d, 0 ≤ 10ki − d[10ki/d] ≤ d − 1. And since 0 ≤ ki+1 ≤ d − 1, the fraction part of each xi can be expressed as ki/d for some ki between 0 and d − 1. Using the preceding lemma, we can correct for the round-off error by converting iterations of x to an exact fraction. Since xi+1 = 10ki/d in the proof above, xi+1d = 10kiℤ. So, we can find the numerator of each xi by multiplying it by d. Since the product, xidℤ, we can correct for round-off error by rounding xid to the nearest integer. Then, we divide by d to get the exact fraction. By using this combined iterative process, we can get as many digits as needed with complete precision on the TI-83 simply by pressing the ENTER key. Example 1: Express 1/23 as a repeating decimal. Solution: Let x = 1/23. Then, x0 = 1/23, the first iteration of x is x1 = 10(x0 − [x0]) = 10/23, and the i+1st iteration of x is xi+1 = 10(xi − [xi]). Since we need to correct for round-off error, we replace xi with [23xi + ½]/23. Since we are substituting a longer expression for xi and xi appears twice in the iteration, we express the i+1st iteration more concisely as xi+1 = 10fPart(xi). So, the combined iteration is 10fPart(int(23Ans+.5)/23). We press enter until the fraction parts of the iterations start repeating. The first 22 digits of 1/23 are 0.0434782608695652173913. Since the 23rd iteration of x is 0.4347..., the 23rd digit of 1/23 is 0 and the fraction part is the same as that for the first iteration, 10/23. So, the decimal expansion of 1/23 is 0.͞0͞4͞3͞4͞7͞8͞2͞6͞0͞8͞6͞9͞5͞6͞5͞2͞1͞͞͞͞7͞3͞9͞1͞3. If we use a larger power of ten, then we can compute more digits per iteration; however, this is generally not recommended because without the fraction part after every decimal digit, it is harder to determine where the decimal expansion starts repeating.

Section 3.2: Using Digits for Change of Number Base Operations

An n-digit natural number in the number base m is usually denoted as dn...di...d1m. As for notation, throughout this section, we use the alternative notation, dn...di...d1bm, to avoid confusing it with the (n + 1)-digit natural number, dn...di...d1m. The digits of x in base m are known as the m-adic digits of x. And their decimal expansion is n−1 i−1 the dot product, dnm + ... + dim + ... + d11, where di is an integer between 0 and m − 1. For instance, 765b8 = 782 + 68 + 5 = 501. Wherever the base is not denoted, the reader may assume the number is a decimal expansion. Using the greatest integer function, we can convert any positive real number from base ten to another base. When converting natural numbers from base ten to base m, we use a closed-end formula for finding each digit in base m. When converting fractions from base ten to base m, we use a fast iterative process involving subtraction, truncation, and multiplication for finding each successive digit in base m. We omit the proof for the existence and uniqueness of the base m expansion of any positive real number because it is similar enough to the one given for the decimal expansion in Section 3.1. While the decimal system is by far the most popular number base, there are other important number bases. For instance, computers generally perform calculations in binary (base two) or in powers of base two such as octal (base eight) or hexadecimal (base sixteen). Moreover, not every human civilization uses base ten. For example, the ancient Sumerians used base twelve; the ancient Babylonians used a mixed base of ten and sixty; and the Mayans used base twenty in their calendar and astronomical systems. Even today their legacy remains with us. For instance, the fraction part of an hour and the degree of an angle are measured in minutes and seconds. And in Mexico and Central America, many astronomical calculations are still done in base twenty.

Suppose we wanted to convert 501 to base 8. We can perform this calculation using the quantitative definition i−1 i of the digit. Let xℕ be the n-digit number dn...di...d1bm. Then, di = [x/m ] − m[x/m ]. After replacing 10 with m, the method by which we derive this relation is the same as the method by which we derived the quantitative definition of the decimal digit. And the formula works the same way, by isolating di. So, assuming that we operate in decimal and we know the decimal expansion of x, we can find the ith digit of x in base m by using this formula. 2 2 3 So, d1 = [501/1] − 8[501/8] = 5; d2 = [501/8] − 8[501/8 ] = 6; and d3 = [501/8 ] − 8[501/8 ] = 7. Example 1: Convert 73b8 to binary (base 2). Solution: Assuming that we are not familiar with base 8, we must first convert 73b8 into base ten; 73b8 = 78 + 3 = 59. Since [log(59)/log(2)] + 1 = 6, 59 is a 6-digit number in binary. Rather than finding all 6 digits one by one, we set Y1 = int(59/2^(X−1))−2int(59/2^X) on the TI-83 and examine the table of values for 1 ≤ x ≤ 6. (Do not forget to set TblStart = 1 and deltaTbl = 1.) We find that d1 = 1; d2 = 1; d3 = 0; d4 = 1; d5 = 1; and d6 = 1. Indeed, 132 + 116 + 18 + 04 + 12 + 1 = 59. So, 73b8 = 111011b2. Had we been more clever, we might have realized that since 8 is the third power of 2, it takes exactly three binary digits to represent each base 8 digit (except perhaps the leftmost digit). Since 7 is 111 in binary and 3 is 011 in binary, 73b8 = 111011b2. The following result demonstrates an important application for converting large numbers from decimal to binary notation.

We can raise a number in a large modulus to a large power efficiently using an algorithm called successive squaring. With each successive square, the exponent doubles i.e. (x2)2 = x4 and ((x2)2)2 = x8. Although the exponent, e, may not always be a power of 2, e is always the sum of powers of 2. We can determine the powers of 2 that sum to e by expressing e in binary. Then, we can express xe mod m uniquely as the product of successive squares of x mod m. So, we can raise a number in a large modulus to a large power efficiently by successively squaring it. The most effective way to perform successive squaring is to make a table of each successive square. First, list each number from 0 to [log(e)/log(2)]. Next, represent e in binary. List the binary digit of e of each from 20 to 2[log(e)/log(2)]. If the binary digit of e for a power of 2 is 1, then also list the value of a^2power mod m. Remember that the nth power of 2 corresponds to the n+1st digit in binary since the first digit is in the one’s place. Then, take the product of the successive squares for which the binary digit of e is 1 and reduce modulo m. Fermat's little theorem has some big applications in number theory. An immediate application is primality testing. If m ∤ x and xm−1 ≢ 1 mod m, then m is a composite. Conversely, if x ≢ 1 mod m and xm−1 ≡ 1 mod m, then m is a prime with rare exception. Since there are a few composite numbers, m, such that xm−1 ≡ 1 mod m for all x that are not multiples of a prime divisor of m (called Carmichael numbers), it helps to verify xm−1 ≡ 1 mod m for a few small odd prime values for x. At present, an algorithm using Fermat's little theorem called the Rabin-Miller test is used to determine whether any given large number is prime to construct asymmetric public-key cryptosystems, which we introduce in Section 5.1. We can also use Fermat’s little theorem and successive squaring to find the multiplicative inverse (or discrete reciprocal) of any non-zero number in a prime modulus. Example 2: Using Fermat’s little theorem, a) determine with certainty or near certainty if 9001 is prime and if 9017 is prime and b) solve 1989x ≡ 7 mod 10,007 for x. Assume that 10,007 is prime. Solution: a. Since the nth power of 2 corresponds to the n+1st binary digit, we add 1 to x so that the digit placements match the exponents. First, we set Y1 = int(9000/2^X)−2int(9000/2^(X + 1)) on the TI-83. Since [log(9000)/log(2)] = 13, we examine the binary values for 0 ≤ x ≤ 13; all digits after x = 13 are 0. On the TI-83, we evaluate 2^2x mod 9001 by typing 2, pressing enter, and then typing Ans2−9001int(Ans2/9001). (Ans stands for previous answer.) Then, we press enter successively. Each time we press enter, we effectively square our input modulo 9001. So, the number of times we press enter after typing Ans2−9001int(Ans2/9001) corresponds to the value of x. When taking the product of 2^2x mod 9001 for different values of x, we multiply by one term at a time and then reduce it in modulo 9001 by typing Ans−9001int(Ans/9001). We used the same method for 9017 and a similar method for part b. Our outputs from the iterative process are provided in the table below.

Exponents of 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 mod m Digits of 9000 0 0 0 1 0 1 0 0 1 1 0 0 0 1 2^(2^x) mod 9001 256 5131 1953 6786 1991 1 3^(2^x) mod 9001 6561 6998 6164 1675 7812 1

Digits of 9016 0 0 0 1 1 1 0 0 1 1 0 0 0 1 2^(2^x) mod 9017 256 2417 7890 2048 1399 4 5208 Digits of 10,005 1 0 1 0 1 0 0 0 1 1 1 0 0 1 1989^2xmod10007 1989 4861 976 7016 9830 1308 8889 4035

In conclusion, we can determine with near certainty that 9001 is prime and we can determine with absolute certainty that 9017 is composite. Since 29000 ≡ 1 mod 9001, we calculated 39000 mod 9001. Since 39000 ≡ 1 mod 9001, 9001 is probably a prime and it is indeed. Since 29016 ≡ 5208 mod 9017, 9017 is not prime; however, Fermat's little theorem will not help us factor it. It turns out that 9017 = 71127. While this method seems like a lot of work as opposed to factoring for determining if a 4-digit number is a prime, it beats trying to factor 200-digit numbers, a task that can literally take forever. Since 198910,005 ≡ 4035 mod 10,007, we assert that 4035 ≡ 1989−1 mod 10,007. Since 19894035 = 1 + 802 10,007, our assetion is correct. Since x ≡ 1989−17 mod 10,007, x ≡ 8231 mod 10,007.

In Section 2.1, we determined which combinatorial expressions are divisible by p and how many times they are divisible by p. At this point, we take our study of combinatorial expressions one step further. First, we prove that the greatest integer exponent, e, such that pe ∣ n! is a function of the sum of the p-adic digits of n. Next, we study properties of binomial coefficients in prime moduli. The first result we present is an alternative expression for de Polignac’s formula. An alternative expression for [log(푛)/log(푝)]+1 e is e = (n − ∑ 푖=1 dni)/(p − 1) where dni is the ith p-adic digit of n. We can also think of e as the number of zeros in n! before encountering its first non-zero p-adic digit. Proof: It suffices to show that (n − [log(푛)/log(푝)]+1 [log(푛)/log(푝)] i [log(푛)/log(푝)]+1 [log(푛)/log(푝)]+1 i−1 ∑ 푖=1 dni)/(p − 1) = ∑ 푖=1 [n/p ]. Let s = ∑ 푖=1 dni. Then, s = ∑ 푖=1 {[n/p ] − i [log(푛)/log(푝)] i [log(푛)/log(푝)]+1 i 0 [log(푛)/log(푝)] i [log(푛)/log(푝)] i p[n/p ]} = ∑ 푖=0 [n/p ] − p∑ 푖=1 [n/p ] = [n/p ] + ∑ 푖=1 [n/p ] − p∑ 푖=1 [n/p ] − [log(n)/log(p)+1 [log(푛)/log(푝)] i [log(푛)/log(푝)] i p[n/p ] = n − (p − 1)∑ 푖=1 [n/p ]. Since s = n − (p − 1)∑ 푖=1 [n/p ], (n − s)/(p − 1) = [log(푛)/log(푝)] i ∑ 푖=1 [n/p ] and hence we achieved the desired result.

The next result we present is Lucas’s theorem. Let n, kℕ, p be a prime, dni denote the ith p-adic digit of n, 푛 [log(푘)/log(푝)]+1 푑푛ᵢ and dki denote the ith p-adic digit of k. Then, ( 푘 ) ≡ ∏ 푖=1 ( 푑푘ᵢ ) mod p. This theorem was discovered in 1878 by the nineteenth century French mathematician Edouard Lucas. There are several good proofs of Lucas's theorem; however, most involve concepts in combinatorics and group-theory that this work does not cover. There is a popular elementary proof by Nathan J. Fine where he uses the binomial theorem to demonstrate that 푛 푛 k n dnipⁱˉ¹ 푑푛ᵢ 푑푛ᵢ kpⁱˉ¹ 푛 [log(푘)/log(푝)]+1 푑푛ᵢ k ∑ 푘=0( 푘 )x = (x + 1) = ∏ᵢ(x + 1) ≡ ∏ᵢ(∑ 푘=0( 푘 )x ) ≡ ∑ 푘=0(∏ 푖=1 ( 푑푘ᵢ ))x mod p. In other words, n 푛 푛 k Fine proves that for all n, the nth-degree polynomial, (x + 1) = ∑ 푘=0( 푘 )x , is congruent in modulo p to the nth-degree polynomial with binomial coefficients reduced by Lucas’s theorem. From this, Fine concludes that the coefficients for like powers of x in the terms in each sum are congruent in modulo p. While his proof beautifully illustrates Lucas's theorem and proves a necessary condition, we did not believe that his last step was justified without additional proof. Due to Fermat’s little theorem, two polynomials, the larger of which having degree at least p, can be congruent for all x in modulo p without all their coefficients for like powers of x being congruent in modulo p. For this reason, we present a more rigorous proof using the theorems above. Since it has many components, we organize it by dividing it into sub-objectives and providing sub-proofs for each. Proof: Let n = dnx…dn1 where dnx is the leftmost non-zero p-adic digit of n. We prove Lucas’s theorem by induction on x, the number of p-adic digits of n. For the base case, we start with x = 1. Then n < p. Assuming k < 푛 푛 푑푛ᵢ p, Lucas’s theorem does not reduce ( 푘 ) in modulo p. If k ≥ p, then ( 푘 ) = 0 and the product ∏ᵢ( 푑푘ᵢ ) must contain at least one i > x with dki > 0. Since dni = 0, that binomial coefficient is zero and hence the product is also zero. For the inductive case, we assume Lucas’s theorem is true for all n with at most x − 1 p-adic digits and prove it is true for all n with x p-adic digits. Since we already covered the case with n < k, we assume k has at most x p-adic digits. 푛 푑푛ₓ…푑푛₂ 푑푛₁ If Lucas’s theorem is true, then ( 푘 ) ≡ ( 푑푘ₓ…푑푘₂ )( 푑푘₁ ) mod p and we can assume the inductive hypothesis for both factors. We can express all terms explicitly as follows: dnx…dn2 = [n/p], dn1 = n − p[n/p], dkx…dk2 = [k/p] and dk1 = k − p[k/p]. For these reasons, the objective of our proof of Lucas’s theorem, formally stated, is to show that 푛 [푛/푝] 푑푛₁ 푛 푛’푝+푑푛₁ ( 푘 ) ≡ ( [푘/푝] )( 푑푘₁ ) mod p. To simplify matters, we set [n/p] = n’ and [k/p] = k’ so that ( 푘 ) = ( 푘’푝+푑푘₁ ).

푛’푝+푑푛₁ 푛’ 푑푛₁ 푛’ 푑푛₁ If n’ < k’ or dn1 < dk1, then ( 푘’푝+푑푘₁ ) ≡ ( 푘’)( 푑푘₁ ) mod p. Proof: If n’ < k’ or dn1 < dk1, then ( 푘’)( 푑푘₁ ) = 0. So, we 푛’푝+푑푛₁ need to show that p ∣ ( 푘’푝+푑푘₁ ) in either event. If n’ < k’, then regardless of the values of the remainders, n’p + dn1 < 푛’푝+푑푛₁ 푛’푝+푑푛₁ k’p + dk1. Hence, ( 푘’푝+푑푘₁ ) = 0. If dn1 < dk1, and n’ = k’, then n’p + dn1 < k’p + dk1. Hence, ( 푘’푝+푑푘₁ ) = 0. If dn1 < dk1, 푛’푝+푑푛₁ and n’ > k’, then regardless of the values of the remainders, n’p + dn1 > k’p + dk1. Hence, ( 푘’푝+푑푘₁ ) > 0. To show that 푛’푝+푑푛₁ e 푛’푝+푑푛₁  p ∣ ( 푘’푝+푑푘₁ ), we use de Polignac’s formula. The greatest integer exponent, e, such that p ∣ ( 푘’푝+푑푘₁ ) is ∑ 푖=1{[(n’p + i i i dn1)/p ] − [(k’p + dk1)/p ] − [(n’p − k’p + dn1 − dk1)/p ]}. The first term in the sum is n’ − k’ − [n’ − k’ − (dk1 − dn1)/p] = −[−(dk1 − dn1)/p]. Since 0 ≤ dn1 < dk1 < p, 0 < (dk1 − dn1)/p < 1 and hence −[−(dk1 − dn1)/p] = 1. We 푛’푝+푑푛₁ demonstrated in Section 2.1 that every term in the sum is non-negative. So, e ≥ 1 and hence, p ∣ ( 푘’푝+푑푘₁ ). Therefore, if n’ < k’ or dn1 < dk1, then Lucas’s theorem holds. 푛’ 푛’푝+푑푛₁ 푛’ 푑푛₁ If n’ ≥ k’, dn1 ≥ dk1, and p ∣ ( 푘’), then ( 푘’푝+푑푘₁ ) ≡ ( 푘’)( 푑푘₁ ) mod p. Proof: If n’ ≥ k’ and dn1 ≥ dk1, then we can 푛’푝+푑푛₁ express all three binomial coefficients in terms of factorials. That is, ( 푘’푝+푑푘₁ ) = (n’p + dn1)!/{(k’p + dk1)!((n’ − k’)p 푛’ 푑푛₁ 푛’푝+푑푛₁ 푛’ 푑푛₁ + dn1 − dk1)!} and ( 푘’)( 푑푘₁ ) = n’!dn1!/{k’!(n’ − k’)!dk1!(dn1 − dk1)!}. Since ( 푘’푝+푑푘₁ ) ≡ ( 푘’)( 푑푘₁ ) mod p iff p ∣ 푛’푝+푑푛₁ 푛’ 푑푛₁ {( 푘’푝+푑푘₁ ) − ( 푘’)( 푑푘₁ )}, we want to show that p divides (n’p + dn1)!/{(k’p + dk1)!((n’ − k’)p + dn1 − dk1)!} − n’!dn !/{k’!(n’ − k’)!dk !(dn − dk )!}. Cross-multiplying yields: 1 1 1 1 (n’p + dn₁)!k’!(n’ − k’)!dk₁!(dn₁ − dk₁)! − (k’p + dk₁)!((n’ − k’)p + dn₁ − dk₁)!n’!dn₁! (k’p + dk )!((n’ − k’)p + dn − dk )!k’!(n’ − k’)!dk !(dn − dk )! 1 1 1 1 1 1 We can simplify this expression by factoring all powers of p out of both terms in the numerator and in the denominator. That is, n’ (n’p + dn1)!k’!(n’ − k’)!dk1!(dn1 − dk1)! = n’!p ((n’p + dn1)!)pk’!(n’ − k’)!dk1!(dn1 − dk1)! = n’ n’!k’!(n’ − k’)!p ((n’p + dn1)!)pdk1!(dn1 − dk1)!; k’ n’−k’ (k’p + dk1)!((n’ − k’)p + dn1 − dk1)!n’!dn1! = k’!p ((k’p + dk1)!)p(n’ − k’)!p (((n’ − k’)p + dn1 − dk1)!)pn’!dn1! = n’ n’!k’!(n’ − k’)!p ((k’p + dk1)!)p(((n’ − k’)p + dn1 − dk1)!)pdn1!; and (k’p + dk1)!((n’ − k’)p + dn1 − dk1)!k’!(n’ − k’)!dk1!(dn1 − dk1)! = k’ n’−k’ k’!p ((k’p + dk1)!)p(n’ − k’)!p (((n’ − k’)p + dn1 − dk1)!)pk’!(n’ − k’)!dk1!(dn1 − dk1)! = 2 2 n’ k’! (n’ − k’)! p ((k’p + dk1)!)p(((n’ − k’)p + dn1 − dk1)!)pdk1!(dn1 − dk1)!. We factored n’!k’!(n’ − k’)!pn’ out of each term in the numerator and k’!2(n’ − k’)!2pn’ out of the denominator. By the definition of (n!)p and the fact that 0 ≤ dk1 ≤ dn1 < p, no other factor of p can exist in either term in the numerator n 2 2 n or in the denominator. Since {n’!k’!(n’ − k’)!p ’}/{k’! (n’ − k’)! p ’} = ( 푛’), ( 푛’푝+푑푛₁ ) − ( 푛’)( 푑푛₁ ) simplifies to 푘’ 푘’푝+푑푘₁ 푘’ 푑푘₁ 푛’ ( 푘’){((n’p + dn₁)!)pdk₁!(dn₁ − dk₁)! − ((k’p + dk₁)!)p(((n’ − k’)p + dn₁ − dk₁)!)pdn₁!} ((k’p + dk )!) (((n’ − k’)p + dn − dk )!) dk !(dn − dk )! 1 p 1 1 p 1 1 1 푛’ 푛’푝+푑푛₁ 푛’ 푑푛₁ 푛’푝+푑푛₁ 푛’ 푑푛₁ Since p ∣ ( 푘’) and no powers of p are in the denominator, p ∣ {( 푘’푝+푑푘₁ ) − ( 푘’)( 푑푘₁ )}. Therefore, ( 푘’푝+푑푘₁ ) ≡ ( 푘’)( 푑푘₁ ) 푛’ mod p if n’ ≥ k’, dn1 ≥ dk1, and p ∣ ( 푘’). 푛’ 푛’푝+푑푛₁ 푛’ 푑푛₁ If n’ ≥ k’, dn1 ≥ dk1, and p ∤ ( 푘’), then ( 푘’푝+푑푘₁ ) ≡ ( 푘’)( 푑푘₁ ) mod p. Proof: Again, we must show that p ∣ 푛’푝+푑푛₁ 푛’ 푑푛₁ 푛’ {( 푘’푝+푑푘₁ ) − ( 푘’)( 푑푘₁ )} by showing that the complicated fraction above is divisible by p. Since p ∤ ( 푘’) and there are 푛’푝+푑푛₁ 푛’ 푑푛₁ no powers of p left in the denominator, p ∣ {( 푘’푝+푑푘₁ ) − ( 푘’)( 푑푘₁ )} iff p ∣ {((n’p + dn1)!)pdk1!(dn1 − dk1)! − ((k’p + dk1)!)p(((n’ − k’)p + dn1 − dk1)!)pdn1!}. Since there are no powers of p left in either term, we must show that their difference is divisible by p. In other words, we must show that ((n’p + dn1)!)pdk1!(dn1 − dk1)! ≡ ((k’p + dk1)!)p(((n’ [n/p] − k’)p + dn1 − dk1)!)pdn1! mod p and we can do so using Anton’s congruence. Since (n!)p ≡ (−1) (n − p[n/p])! n’ mod p, ((n’p + dn1)!)pdk1!(dn1 − dk1)! ≡ (−1) dn1!dk1!(dn1 − dk1)! mod p and ((k’p + dk1)!)p(((n’ − k’)p + dn1 − k’ n’−k’ n’ 푛’푝+푑푛₁ 푛’ 푑푛₁ dk1)!)pdn1! ≡ (−1) dk1!(−1) (dn1 − dk1)!dn1! ≡ (−1) dn1!dk1!(dn1 − dk1)! mod p. So, p ∣ {( 푘’푝+푑푘₁ ) − ( 푘’)( 푑푘₁ )}. 푛’푝+푑푛₁ 푛’ 푑푛₁ 푛’ Therefore, ( 푘’푝+푑푘₁ ) ≡ ( 푘’)( 푑푘₁ ) mod p if n’ ≥ k’, dn1 ≥ dk1, and p ∤ ( 푘’). 푛 [푛/푝] 푑푛₁ Lucas’s theorem is true for all n, kℕ. Proof: We proved that ( 푘 ) ≡ ( [푘/푝] )( 푑푘₁ ) mod p for all non-negative integers n and k between the three cases above. Since [n/p] and [k/p] are each at most x − 1 digit numbers in base [푛/푝] 푥 푑푛ᵢ 푑푛ᵢ p, we can assume ( [푘/푝] ) ≡ ∏ 푖=2( 푑푘ᵢ ) mod p by the inductive hypothesis. Since ( 0 ) = 1, we do not need to include binomial coefficients in the product past the leftmost non-zero p-adic digit of k, which is dk[log(k)/log(p)]+1. Q.E.D. 97 푛 Example 3: a. How many times does 7 divide 50!? b. Evaluate ( 33 ) mod 7. c. Simplify ( 푝 ) mod p.

Solution: a. In this example, we set n = 50 and p = 7. Then, we evaluate de Polignac’s formula. So, [log(50)/log(7)]+1 3 i−1 i e = (50 − ∑ 푖=1 d50ib7)/(7 − 1) = (50 − ∑ 푖=1{[50/7 ] − 7[50/7 ]})/(7 − 1) = (50 − 1 − 0 − 1)/(7 − 1) = 8, which confirms our answer from Example 1 in Section 2.1. b. In this example, we set n = 97, k = 33, and p = 7. 97 [log(33)/log(7)]+1 푑97ᵢ 2 [97/7ⁱ⁻¹]−7[97/7ⁱ] Then, we evaluate the expression given by Lucas’s theorem, ( 33 ) ≡ ∏ 푖=1 ( 푑33ᵢ ) ≡ ∏ 푖=1( [33/7ⁱ⁻¹]−7[33/7ⁱ] ) 6 6 mod 7 = ( 4 )( 5 ) mod 7 = 6. c. Since the p-adic representation of p is 10bp and the p-adic representation of the ith 푛 푑푛₂ 1 2 푛 digit of n is dni, ( 푝 ) ≡ ( 1 ) mod p. And dn2 = [n/p ] − p[n/p ] ≡ [n/p] mod p, so ( 푝 ) ≡ [n/p] mod p.

Converting a fraction, 0 < x < 1, from base ten to base m requires a rapid iterative process for finding each digit of x in base m. If x is a fraction, then it is better to use an iterative process for finding each successive digit of x as opposed to using a closed-end formula because the digital expansion of a fraction in base m is often indefinite. In addition, if xℚ, then an iterative process can preserve the fraction part after each digit, which is necessary to correct for round-off error and determine whether the expansion of x in base m terminates or repeats. If the fraction part complement to any of the digits does not exist or matches the fraction part complement to a preceding digit, then the fraction's digital expansion in base m terminates or repeats respectively. Proof: If the fraction part complement to di is zero, then all successive digits are zero. Since the fraction part from the previous iteration is the independent variable in the next iteration, if the fraction part of the output of an iteration is the same as that of a preceding iteration, then the input is the same. Since the iterations are functions of x, if the inputs are the same, then the outputs are the same. Essentially, the digits repeat. Let 0 < x < 1, x = 0.d1d2...didi+1...bm, x = x0, and xi = di.di+1...bm. Then, [xi] = dibm and xi+1 = m(xi − [xi]). Moreover, if x = c/d with c, dℕ, then the fraction part of each xi in base m can be expressed as ki/d for some ki between 0 and d − 1.We can prove this statement using induction on i (provided i, mℕ and m ≥ 2) in the same way we did for the decimal expansion of x in Section 3.1. Using this iterative process on the TI-83, we can generate each successive digit of x just by pressing the ENTER key. I carefully chose this expression for the iterative process and its index so that [xi] = di. First, type the numerical value of x in base ten into the calculator and press enter. If x is irrational or precision is not an issue, then type m(Ans−int(Ans)) next. On the other hand, if x = c/d with c, dℕ and you want to generate many digits and possibly even determine where the fraction's digital expansion in base m repeats, then you might need to correct for round-off error. Since the fraction part of each xi can be expressed as ki/d for some ki between 0 and d − 1, you can correct for round-off error by substituting [dxi + ½]/d for xi. Since we must substitute a longer expression for xi and xi appears twice in the iteration, we express the iterations more concisely as xi+1 = mfPart(xi). So, we type mfPart(int(dAns+.5)/d) next. At this point, we can generate each digit of x in base m just by pressing enter where each dibm is simply the greatest integer of our output from the ith iteration. Example 4: Convert each fraction to base 8 exactly: a. 0.715625 b. 0.5009765625. Solution: First, we type each fraction into the TI-83 and press enter. Since both decimal expansions terminate in base ten, which is the base we start from and perform the arithmetic in, we do not need to correct for round-off error. So, we type 8(Ans−int(Ans)) next. Then, we press enter until the fraction part of an output is either zero or it matches the fraction part of a previous output, meaning that the fraction parts and hence, the digits repeat. Our outputs truncated to four significant figures are listed in the table below. For part a, notice that the second and the sixth iterations share the same fraction part. Therefore, the third and the seventh iterations must be the same, meaning that the digits repeat from d3 through d6. So, 0.7b8 has the four repeating digits 6314. For part b, notice that the fourth iteration has no fraction part and thus the fifth and sixth iterations are zero. This means that the expansion terminates after four places. So, 0.7 = 0.556̅3̅1̅4b8 and 0.5009765625 = 0.4004b8.

iteration 1 2 3 4 5 6 Output a 5.725 5.800 6.400 3.200 1.600 4.800 Output b 4.008 0.063 0.500 4.000 0 0

Let xℕ be the n-digit number dn...di...d1bm where m > 10 and not all di < 10. For digits above 9, the convention is to set A = 10, B = 11, etc. For example, 707 = 4ABb12. Example 5: Convert 10/51 to base 12 exactly.

Solution: First, we type 10/51 into the TI-83 and press enter. Since we are working in base 10, not base 51, we need to correct for round-off error. So, we type 12fPart(int(51Ans+.5)/51) next. Then, we press enter until the fraction parts repeat. Our outputs truncated to four significant figures are listed in the table below. To prove that the fraction parts repeat after the seventeenth iteration, we also recorded ki, which is 51 times the fraction part of each output. Since k1 = k17, the outputs for the second and the eighteenth iterations must be the same, meaning that the digits repeat from d2 through d17. So, 10/51 = 0.2429A708579214B36̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅b12.

iteration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 output 2.352 4.235 2.823 9.882 10.58 7.058 0.705 8.470 5.647 7.764 9.176 2.117 1.411 4.941 11.29 3.529 6.352 [output] 2 4 2 9 10 7 0 8 5 7 9 2 1 4 11 3 6 ki 18 12 42 45 30 3 36 24 33 39 9 6 21 48 15 27 18

We conclude this section by demonstrating how to use change of base to express a double sum as a single sum and to express digits in a mixed-base system as a single number in decimal notation. 푘ᵣ−1 푘ᵢ−1 Let s = ∑ 푟=0 ∑ 푖=0 f(i, r) and x be the n-digit number, dn...d1bkᵢ. If we set each i = d1bkᵢ = x − ki[x/ki] and each r 푘ᵢ푘ᵣ−1 = dn...d2bkᵢ = [x/ki], then s = ∑ 푥=0 f(x). Proof: First, we evaluate the double sum by freezing the outermost index, 푘ᵢ−1 r, and evaluating ∑ 푖=0 f(i, r) for a fixed r for every integer value of r between 0 and kr − 1 inclusive. Next, we sum 푘ᵢ−1 ∑ 푖=0 f(i, r) for every fixed value of r between 0 and kr − 1 inclusive. Let n and x be integers such that 0 ≤ n ≤ kr − 1 and x ≥ 0. By setting r = [x/ki], r = n over the interval [nki, nki + ki − 1]. Hence, ki[x/ki] = nki over this interval. Since x can range from [nki, nki + ki − 1] over this interval, i can range from [0, ki − 1] when r = n. Moreover, x can 푘ᵢ푘ᵣ−1 푘ᵣ 푘ᵢ range from [0ki, (kr − 1)ki + ki − 1], which is [0, kikr − 1]. Hence, s = ∑ 푥=0 f(x). Alternatively, if s = ∑ 푟=1∑ 푖=1 f(i, 푘ᵢ푘ᵣ r) and we set i = x − ki[(x − 1)/ki] and r = [(x − 1)/ki] + 1, then s = ∑ 푥=1 f(x). If at most one of the sums is infinite and both sums are disjoint, then we can still express it as a single sum by rearranging the indices so that the infinite sum is the outermost sum. At this point, you might wonder if there is a similar approach for expressing triple or quadruple finite sums as a single sum. It turns out that we can express an nth-order finite sum as a single sum by expressing the indices as digits in a mixed-base system. Our last type of change of base operation involves expressing digits in a mixed-base system with decimal digits. For example, if the time is 11:00, the hours are in base twelve, but the minutes are in base sixty. Nevertheless, each digit is itself expressed as two digits in base ten. We typically encounter these situations when expressing non-metric weights and measures in terms of their various units and subunits. Our objective is to convert from one system to another or from units to subunits and then to express each unit and subunit with decimal digits. Our objective can be accomplished within a few short steps. First, consider whether the quantity, x, you are converting is a fraction or natural number or both (a mixed number) into another base. Next, express each digit in terms of x by using an iterative process and then substituting expressions for previous digits in terms of x. Remember that each digit could have a different base. Then, determine the maximum number of decimal digits it would take to express each digit in the mixed base from right to left and multiply each digit just to the left by the cumulative product, 10 to the power of the sum of the maximum number of decimal digits. Finally, add the products and simplify to get an expression for a single number in decimal notation. Example 6: Military time has no colons and is expressed as a 4-digit number between 0000 and 2359. a. What is the military time when the day is 20.85% over? b. Let x, 0 < x < 1 denote the proportion of the day that has passed. Express the military time, m, in terms of x. Solution: Since we are using military time to divide up a day, we are converting a fraction x, from decimal to the mixed base 0.d2b24d1b60. a. Since 0.208524 = 5.004, at least 5 hours have passed. Since 0.00460 = 0.24, less than one additional minute passed. So, the military time is 0500, or o-five-hundred hours. b. We set d2 = [24x] 0 and d1 = [60(24x − [24x])] = [1440x] − 60[24x]. Since d1 is the rightmost digit, we multiply it by 10 and since 2 base sixty requires two decimal digits, we multiply d2 by 10 . Summing the products and simplifying, we get 100d2 + 1d1 = 100[24x] + [1440x] − 60[24x] = 40[24x] + [1440x]. So, if x days passed, m = 40[24x] + [1440x].

CHAPTER 4: The Division Algorithm and Reciprocity

This chapter discusses a variety of interesting topics in computational number-theory. Section 4.1 introduces greatest common divisors and continued-fraction expansions and discusses many of their properties. We also introduce the concept of reciprocity and use it to create the Euclidean and Euclidean-like algorithms. Section 4.2 provides a comprehensive treatment of quadratic reciprocity. Section 4.3 provides a comprehensive treatment of Dedekind reciprocity. The common thread in each of these constructs is that they have properties allowing for their polynomial-time computation using an iterative process of division with reminder.

Section 4.1: Reciprocal Subtraction, the Euclidean Approach

We continue our investigation of the gcd right where we left off. If you have not read through Section 2.1, then do so before reading on. In Section 2.1, we discovered a property of the gcd that helps us compute it for large arguments. Assume a < c. Since (a, c) = (a, c − ak), our first objective lies in reducing the value of c below the value of a but not below zero. We want c = k1a + r1 with 0 ≤ r1 ≤ a − 1. Then, k1 = [c/a] and d = (a, r1). Similarly, we can reduce the value of a below the value of r1 so that a = k2r1 + r2 with 0 ≤ r2 ≤ r1 − 1. Then, k2 = [a/r1] and d = (r1, r2). And we can reduce the value of ri−1 below the value of ri so that ri−1 = ki+1ri + ri+1 with 0 ≤ ri+1 ≤ ri − 1. Then, ki+1 = [ri−1/ri] and d = (ri, ri+1). On average, the iterations reduce the value of the last remainder by half. And after enough iterations, the remainders get so small that one remainder must divide the other remainder evenly, meaning the smaller of the two remainders is (a, c). Over the centuries, many mathematicians estimated upper boundaries for the number of iterations required to compute (a, c) using the Euclidean algorithm. Although not the first, in 1844, the French mathematician Gabriel Lamé used the Fibonacci sequence to express the upper bound in terms of a. If a < c, then the number of steps required to compute (a, c) using the Euclidean algorithm is less than 5log(a) + 1. Proof: The i+1st step in the Euclidean algorithm is of the form ri−1 = ki+1ri+ri+1 with 0 ≤ ri+1 < ri < ri−1. So, ki > 0 for all i. So that we reduce the remainders the least (and generate an example for our upper bound), we must find a pair of natural numbers, (a, c), such that ki = 1 for all i except for the last step, rn−2 = knrn−1. The Fibonacci sequence is defined by the recurrence relation given by F1 = 1, F2 = 1, and Fn = Fn−1 + Fn−2. Let a = Fn+1 and c = Fn+2. Then, ki = 1 except for the last step, where F3 = 2F2, and hence it takes n steps to compute (a, c) using the Euclidean algorithm. We denote the golden ratio as  = (√5 + 1)/2. Using induction on n and the fact that  = n−1 +1, we can prove that if n ≥ 2, then Fn+1 >  . Since log() > ⅕, log(a) > (n − 1)log() > (n − 1)/5. Therefore, 5log(a) + 1 > n. The Euclidean algorithm has a very rich history. It is also the world's oldest algorithm. It was first recorded by Euclid, an ancient Greek mathematics teacher in Alexandria, in Book VII of The Elements around 300 BC, although it was probably discovered nearly a century earlier by the Pythagoreans. Since the division algorithm actually determines the remainders using subtraction, not division, by subtracting a from c until c − ak < a, his contemporaries called it 'antanairesis', or 'reciprocal subtraction'. Example 1: Compute gcd(784, 378). Solution: Using the division algorithm, 784 = 378[784/378] + 28; 378 = 28[378/28] + 14; and 28 = 14[28/14] + 0, meaning 14 ∣ 28. So, gcd(784, 378) = 14. Since the TI-83 has a gcd( operation that computes the gcd of any two real non-negative integers, we can use it as an additional tool to verify our result. If d = gcd(a, c), then d = ax + cy for some x, yℤ. (Bezout's lemma) Before starting the proof, we first define an ideal. For our purposes, an ideal is a subset, I, of the integers such that i. 0I; ii. if a, cI, then a − cI; and iii. if cI and kℤ, then kcI. In other words, if two elements are in I, their difference is in I and for every element in I, its integer multiples are also in I. Moreover, if x, −yℤ, then ax + cyI by iii and ii, meaning every element in I is of the form ax + cy for some x, yℤ. So, we must prove that if a, cI, then dI. Proof: It suffices to show that each iteration of the Euclidean algorithm, that is each remainder, riI, because d is exactly one of the remainders. We prove this using induction on i. For the base case, we set i = 1 where c = k1a + r1; hence c − k1a = r1. By iii, k1aI and hence by ii, c − k1aI. So, r1I. For the inductive case, we can assume r1,...,riI and must prove ri+1I. The i+1st iteration of the Euclidean algorithm is ri−1 = ki+1ri + ri+1; hence ri−1 − ki+1ri = ri+1. By iii, ki+1riI and hence by ii, ri−1 − ki+1riI. So ri+1I and hence dI. Q.E.D.

Several important corollaries follow from Bezout’s lemma. 1) If b = kd, then bI and hence b = ax + cy for some x, yℤ. So, if d = 1, then b = ax + cy for all bℤ and some x, yℤ. Equivalently, if d = 1, then there is an x in modulo c such that ax ≡ b mod c. 2) Suppose a, cℕ, (a, c) = d, and y = (ax + b)/c. Then, y = (ax + b)/c has solutions, (x, y), in integers iff d ∣ b. And if (x', y') is an integer solution to y = (ax + b)/c, then all integer solutions to y = (ax + b)/c are of the form (x' + ck/d, y' + ak/d). 3) From this we can establish a stronger relationship between the greatest common divisor and the greatest integer function. It follows that if [(ax + b)/c] = [(ax + b')/c] for all xℤ, then [b/d] = [b'/d]. The details are left as an exercise. 4) We can also use Bezout’s lemma to derive Euclid's lemma and its generalization. 5) And we can use the generalization of Euclid’s lemma to derive the Chinese remainder theorem, which states that if (n, m) = 1, then the simultaneous congruence, x ≡ i mod n and x ≡ j mod m, has a unique solution of the form x ≡ k mod nm.

With modified definitions for certain cases, the concept of the gcd can be extended to any Euclidean . A commutative ring is a set of elements, S, such that for all x, y, zS, x + y = y + x, xy = yx, x(yz) = (xy)z, x(y + z) = xy + xz, 1S and 1x = x. A domain satisfies the additional property that if zx = zy and z ≠ 0, then x = y. In other words, if x, y ≠ 0, then xy ≠ 0. And a field satisfies the additional property that if z ≠ 0, then z has a multiplicative inverse in S. A Euclidean ring is a domain that is not a field and it has a degree function, |k|, for every element k in the ring. The degree function is similar to the absolute value function. For all elements j and k in the ring, it satisfies the following two properties: if j, k ≠ 0, then 0 < |j|, |k| ≤ |jk|; and if j ≠ 0, then there are elements q and r, not necessarily unique, in the ring with k = qj + r such that 0 ≤ |r| < |j|. In other words, the division algorithm, and hence, the Euclidean algorithm can be extended to these rings. It is obvious that ℤ is a Euclidean ring. It turns out that the set of Gaussian integers (complex numbers of the form a + bi with a, b  ℤ) denoted by ℤ[i] and the set of Eisenstein integers (complex numbers of the form a + bω where ω = (−½ + √¾i) and a, b  ℤ) denoted by ℤ[ω] are also Euclidean rings. We prove these results below. Before we can adequately define a greatest common divisor for complex integers, we must briefly review some basic definitions and properties of complex numbers. Let X = x + x'i and Y = y + y'i for x, y  ℝ. Then, the real part of Y is Re(Y) = y, the imaginary part of Y is Im(Y) = y', and the conjugate of Y is Y̅ = y − y'i. Further, the product, XY = (x + x'i)(y + y'i) = (xy − x'y') + (xy' + x'y)i and if Y ≠ 0, then the quotient, X/Y = XY̅ /(YY̅ ) = (x + x'i)(y − y'i)/((y + y'i)(y − y'i)) = (xy + x'y')/(y2 + y'2) + {(x'y − xy')/(y2 + y'2)}i. The absolute value, or modulus, of a complex number is its distance from the origin in the complex plane. The square of the absolute value of a complex number is called its norm. It follows that the norm of Y is |Y|2 = YY̅ , the norm of a , a + bi, is a2 + b2 and the norm of the Eisenstein integer, a + bω, is a2 − ab + b2. So, their norms are positive integers except if a and b are zero. Some essential properties of moduli are |XY| = |X||Y|, X/Y = XY̅ /|Y|2 provided Y ≠ 0, and |X/Y| = |X|/|Y| provided Y ≠ 0. The proofs involve basic algebra and are left to the reader. The greatest integer of a complex number is defined as [x + yi] = [x] + [y]i. Right now, evaluate int(3.2 − 6.8i) on your graphing calculator. You should get 3 − 7i. However, we cannot say the same for the mathematical expressions for the greatest integer function in Section 1.2 when the input is a complex number. Indeed even in the above instance, the greatest integer of complex numbers is defined in terms of the greatest integer of reals. We define a gcd of any two Gaussian integers, a, cℤ[i], as a Gaussian integer, d, with the greatest norm such that d ∣ a and d ∣ c. If c, dℤ[i], then d ∣ c iff c/dℤ[i]. A unit in ℤ[i] is any of the four Gaussian integers that divide 1, which are 1, i, −1, and −i. Since the definition of the gcd for integers equates existence with uniqueness, it cannot be extended to the Gaussian integers because it would mean that there are four distinct values of the gcd. Then again, since the norm of a complex number is a non-negative real number, there is a greatest norm. So, we can define a gcd of two Gaussian integers as a common divisor with greatest norm. If a and c have no common factors, then (a, c) is a unit and a is said to be ‘relatively prime’ to c. In addition, if d = (a, c), then di, −d, and −di are also greatest common divisors of a and c. We define a gcd of any two Eisenstein integers, a, cℤ[ω], as an Eisenstein integer, d, with the greatest norm such that d ∣ a and d ∣ c. If c, dℤ[ω], then d ∣ c iff c/dℤ[ω]. A unit in ℤ[ω] is any of the six Eisenstein integers that divide 1, which are 1, ω, and ω2. Since the definition of the gcd for integers equates existence with uniqueness, it cannot be extended to the Eisenstein integers because it would mean that there are six distinct values of the gcd. Then again, since the norm of a complex number is a non-negative real number, there is a greatest norm. So, we can define a gcd of two Eisenstein integers as a common divisor with greatest norm. If a and

c have no common factors, then (a, c) is a unit and a is said to be ‘relatively prime’ to c. In addition, if d = (a, c), then −d, dω, and dω2 are also greatest common divisors of a and c. Example 2: Find a greatest common divisor of 289 − 278i and 378 + 238i. Solution: Even though one of the Gaussian integers may have a larger norm than the other, it makes no difference which one we reduce first, and so we arbitrarily choose 289 − 278i. Using the division algorithm, 289 − 278i = (378 + 238i)[(289 − 278i)/(378 + 238i)] + (51 + 100i); 378 + 238i = (51 + 100i)[(378 + 238i)/(51 + 100i)] + (−75 + 91i); 51 + 100i = (−75 + 91i)[(51 + 100i)/(−75 + 91i)] + (−40 + 25i); −75 + 91i = (−40 + 25i)[(−75 + 91i)/(−40 + 25i)] + (−20 + 1i); −40 + 25i = (−20 + 1i)[(−40 + 25i)/(−20 + 1i)] + (−2 − 17i); −20 + 1i = (−2 − 17i)[(−20 + 1i)/(−2 − 17i)] + (14 − 3i); −2 − 17i = (14 − 3i)[(−2 − 17i)/(14 − 3i)] + (4 + 11i); 14 − 3i = (4 + 11i)[(14 − 3i)/(4 + 11i)] + (−8 + 5i); 4 + 11i = (−8 + 5i)[(4 + 11i)/(−8 + 5i)] + (−6 − 5i); −8 + 5i = (−6 − 5i)[(−8 + 5i)/(−6 − 5i)] + (2 − 7i); −6 − 5i = (2 − 7i)[(−6 − 5i)/(2 − 7i)] + (1 − 3i); 2 − 7i = (1 − 3i)[(2 − 7i)/(1 − 3i)] + (3 + 0i); and 1 − 3i = (3 + 0i)[(1 − 3i)/(3 + 0i)] + 1. So, 289 − 278i and 378 + 238i are relatively prime. By following the steps of the Euclidean algorithm exactly, we achieved the desired result. Example 3: Find a greatest common divisor of 135 − 14ω and 155 + 34ω. Solution: Since the TI-83 can only do calculations on complex numbers of the form a + bi, inputs will be of the form x + y(−½ + √¾i) and outputs will be of the form a + bi. It follows that we must be able to convert complex numbers of the form a + bi to x + yω before we can begin. If a + bi = x + yω, then x = a + b/√3 and y = 2b/√3. This adjustment may produce a small rounding error. This is usually not a problem for computing the quotient because we only want its greatest integer, i.e. x + yω  [x] + [y]ω. However, you should not take the greatest integer of the remainder term. Since the remainder term should be an Eisenstein integer, its fraction parts, if any, should either have a string of zeros or a string of nines so that x and y are very close to integers. If that is not the case, then review your work. If that is the case, then round x and y to the nearest integer. Now we are ready to begin. Aside from this adjustment, we use the same approach below to compute the gcd. 155 + 34ω = (135 − 14ω)[(155 + 34ω)/(135 − 14ω)] + (20 + 48ω); 135 − 14ω = (20 + 48ω)[(135 − 14ω)/(20 + 48ω)] + (3 + 18ω); 20 + 48ω = (3 + 18ω)[(20 + 48ω)/(3 + 18ω)] + (−4 − 3ω); 3 + 18ω = (−4 − 3ω)[(3 + 18ω)/(−4 − 3ω)] + (−2 − 2ω); −4 − 3ω = (−2 − 2ω)[(−4 − 3ω)/(−2 − 2ω)] + (−ω). So, 135 − 14ω and 155 + 34ω are relatively prime. Example 4: Find a greatest common divisor of 135 − 14i and 155 + 34i. Solution: Emboldened by our success, we use the same approach below to find the gcd. 155 + 34i = (135 − 14i)[(155 + 34i)/(135 − 14i)] + (20 + 48i); 135 − 14i = (20 + 48i)[(135 − 14i)/(20 + 48i)] + (−9 + 46i); 20 + 48i = (−9 + 46i)[(20 + 48i)/(−9 + 46i)] + (−26 + 39i); −9 + 46i = (−26 + 39i)[(−9 + 46i)/(−26 + 39i)] + (−48 + 20i); −26 + 39i = (−48 + 20i)[(−26 + 39i)/(−48 + 20i)] + (−46 − 9i); −48 + 20i = (−46 − 9i)[(−48 + 20i)/(−46 − 9i)] + (−39 − 26i); We terminate the algorithm here because the last three remainders are i times the first three. The remainders are not reducing. Rather, they are repeating after three iterations, off by a factor of i. What went wrong and what can we do? Since we demonstrated in Example 2 that we can use the division algorithm on Gaussian integers with negative real and imaginary parts, it suffices that we reduce the modulus of a Gaussian integer below the modulus of another. For this reason, we can round the real part of ri−1/ri to the nearest integer and we can do so by setting k = [ri−1/ri + ½]. Perhaps this will work. At this point, we pick up where we left off by redoing the last iteration. −48 + 20i = (−46 − 9i)[(−48 + 20i)/(−46 − 9i) + ½] + (7 − 17i); −46 − 9i = (7 − 17i)[(−46 − 9i)/(7 − 17i) + ½] + (5 + 12i); 7 − 17i = (5 + 12i)[(7 − 17i)/(5 + 12i) + ½] + 0. Therefore, 5 + 12i is a gcd of 135 − 14i and 155 + 34i. Although rounding the real part made our extension of the Euclidean algorithm to ℤ[i] efficient enough to compute the gcd in Example 4, we still do not know what went wrong. And we do not know if what went wrong can prevent us from computing a gcd in ℤ[ω] for a different pair of integers. Perhaps the proof that ℤ[i] is a

Euclidean ring can identify the problem. To ensure that we can make the necessary adjustments to extend the Euclidean algorithm from ℕ to ℤ[i] and ℤ[ω], we must prove that ℤ[i] and ℤ[ω] are Euclidean rings. The set of Gaussian integers is a domain and not a field. Proof: The proof that the Gaussian integers satisfy the axioms of a ring involves basic algebra and is left to the reader. Suppose a, b, cℤ[i], ca = cb and c ≠ 0. We demonstrated above that if c ≠ 0, then c has a multiplicative inverse in ℂ. So, if ca = cb, then a = b. So, ℤ[i] is a domain. However, if c is not a unit in ℤ[i], then c−1ℤ[i]. As an example, 2ℤ[i] and a multiplicative inverse of 2 is ½. Since ½ℤ, ½ℤ[i]. So, ℤ[i] is not a field. The modulus is a degree function in ℤ[i]. That is, for all k1, k2ℤ[i], the modulus satisfies the following two properties. 1) If k1, k2 ≠ 0, then |k1| ≤ |k1k2|. 2) If k1 ≠ 0, then there exists q, rℤ[i] with k2 = qk1 + r such that 0 ≤ |r| < |k1|. Proof: 1) Since |k1k2| = |k1||k2|, the result follows provided that |k2| ≥ 1. Since k2 ≠ 0 and k2ℤ[i], its real part or imaginary part must be at least a unit. So, when the real and imaginary parts of k2 are squared and added 2 2 2 using the formula |a + bω| = a + b , |k2| ≥ 1. 2) If k2 = qk1 + r and 0 ≤ |r| < |k1|, then k2/k1 = q + r/k1 and 0 ≤ |r|/|k1| = |r/k1| < |k1|/|k1| = 1. Since k2/k1 − q = r/k1, we can show that the inequality holds by finding a qℤ[i] such that 2 |k2/k1 − q| < 1, or equivalently |k2/k1 − q| < 1. To see this geometrically, refer to the left diagram below. Let q = 2 2 2 2 2 2 [k2/k1] so that k2/k1 − q = fx + fyi. Then, |k2/k1 − q| = | fx + fyi| = fx + fy . Since 0 ≤ fx, fy < 1, 0 ≤ fx + fy < 2. Since the norm of r can be up to twice that of k1, it is no wonder we could not reduce the remainders in Example 3 by truncating the real and imaginary fraction parts of k2/k1. Example 3 also illustrates why the Euclidean algorithm in its present form cannot compute the gcd of any two Gaussian integers. So, we make adjustments. If q = [k2/k1 + ½] so that the real part of k2/k1 is rounded to the nearest integer, then k2/k1 − q = fx − [fx + ½] + fyi. Then, −½ ≤ fx 2 2 2 2 − [fx + ½] < ½ and hence, |k2/k1 − q| = | fx − [fx + ½] + fyi| ≤ ½ + 1 = 1¼. Although this is a marked improvement and was enough for us to complete Example 3, it may not always be enough. We can make a similar argument for rounding the imaginary part of k2/k1 to the nearest integer while truncating the real part. If q = [k2/k1 + (1 + i)/2] so that both the real and imaginary parts of k2/k1 are rounded to the nearest integer, then k2/k1 − q = fx − [fx + ½] + 2 2 {fy − [fy + ½]}i. Then, −½ ≤ fx − [fx + ½], fy − [fy + ½] < ½ and hence, |k2/k1 − q| = | fx − [fx + ½] + {fy − [fy + ½]}i| 2 2 ≤ ½ + ½ = ½ < 1. So, by rounding both the real and imaginary parts of k2/k1 to the nearest integer, we guarantee |r| < |k |. Therefore, ℤ[i] is a Euclidean ring and the optimal value for our quotient is q = [k /k + (1 + i)/2]. 1 2 1 On the left are lattice-point diagrams in the complex plane for x + yi where 0 ≤ x, y ≤ 2. In each lattice-point diagram, the red dot represents k2/k1 and one of the four black dots intersecting the gridlines that enclose k2/k1 represents q.

The set of Eisenstein integers is a domain and not a field. Proof: The proof that the Eisenstein integers satisfy the axioms of a ring involves basic algebra and is left to the reader. Suppose a, b, cℤ[ω], ca = cb and c ≠ 0. We demonstrated above that if c ≠ 0, then c has a multiplicative inverse in ℂ. So, if ca = cb, then a = b. So, ℤ[ω] is a domain. However, if c is not a unit in ℤ[ω], then c−1 ℤ[ω]. As an example, 2ℤ[ω] and a multiplicative inverse of 2 is ½. Since ½ is real, ½ = a + bω with b = 0 and a = ½. So, ½ℤ[ω] and therefore ℤ[ω] is not a field. The modulus is a degree function in ℤ[ω]. That is, for all k1, k2ℤ[ω], the modulus satisfies the following two properties. 1) If k1, k2 ≠ 0, then |k1| ≤ |k1k2|. 2) If k1 ≠ 0, then there exists q, rℤ[ω] with k2 = qk1 + r such that 0 ≤ |r| < |k1|. Proof: 1) Since |k1k2| = |k1||k2|, the result follows provided that |k2| ≥ 1. Since k2 ≠ 0 and k2ℤ[ω], its real part or imaginary part must be at least a unit. So, when the real and imaginary parts of k2 are squared and 2 2 2 added using the formula |a + bω| = a − ab + b , |k2| ≥ 1. 2) If k2 = qk1 + r and 0 ≤ |r| < |k1|, then k2/k1 = q + r/k1 and 0 ≤ |r|/|k1| = |r/k1| < |k1|/|k1| = 1. Since k2/k1 − q = r/k1, we can show that the inequality holds by finding a 2 qℤ[ω] such that |k2/k1 − q| < 1, or equivalently |k2/k1 − q| < 1. To see this geometrically, refer to the right 2 2 2 2 diagram above. Let q = [k2/k1] so that k2/k1 − q = fx + fyω. Then, |k2/k1 − q| = | fx + fyω| = fx − fxfy + fy . Since 0 ≤ 2 2 fx, fy < 1, 0 ≤ fx − fxfy + fy < 1. This guarantees |r| < |k1|. Therefore, ℤ[ω] is a Euclidean ring and we can continue using the quotient, q = [k2/k1]. At this point you may be curious why setting q = [k2/k1] works for ℤ[ω] but not for ℤ[i]. One reason is that taking the greatest integer of x + yω i.e. x + yω  [x] + [y]ω does not necessarily reduce its real fraction part. This is because the real part of x + yω is actually x − y/2 and hence the real part of [x] + [y]ω is

[x] − [y]/2. If x, y  ℤ, then the decrease from x to [x] is somewhat offset by the increase from −y/2 to −[y]/2. The other reason is that taking the greatest integer of x + yω only reduces its imaginary part by up to √¾, compared with the transform, a + bi  [a] + [b]i, which reduces its imaginary part by up to one. It follows that we actually did make an adjustment for q. We defined the greatest integer of a + bi not as a + bi  [a] + [b]i, but as a + bi  x + yω  [x] + [y]ω, or equivalently, as a + bi  [a + b/√3] + [2b/√3]ω.

Sometimes we want to approximate an irrational number with a rational number. We present a very effective method called continued fractions. The method works as follows. Let x be the positive irrational number that we want to approximate. Let x1 = 1/fPart(x) and xi+1 = 1/fPart(xi). Then, x = [x] + ______1______1______[x1] + ⋮ ______1______[xi] + ______1______[xi+1] + ⋮ Let x ≈ pi/qi where pi/qi is the continued fraction of x carried out to i iterations. Then, pi/qi is called the ith convergent to x and is generally denoted as pi/qi = [[x]; [x1],..., [xi]]. Here, the outer brackets are just standard notation for the continued fraction representation but the inner brackets denote the greatest integer function. The entry, [xi], is known as the ith coefficient, or ith term in the continued fraction expansion of x. Using this iterative process on the TI-83, we can generate each successive term for the continued fraction expansion of x just by pressing the ENTER key. I deliberately designed the iterative process so that the greatest integer of the ith iteration is the ith term in the continued fraction expansion for x. First, type the numerical value of x into the calculator and press enter. Next, type 1/fPart(Ans). At this point, we can generate each successive term in the expansion just by pressing enter and recording the greatest integer of each output. Notice how similar the iterative process is for computing each term in a continued fraction expansion to that for computing each digit of a fraction in another base. The continued fractions method gives us a rational approximation of the irrational number that is sufficiently close to its exact value after a finite number of iterations. Continued fractions work by approximating the reciprocals of successive fraction parts with natural numbers. Since each successive fraction part is approximated by the reciprocal of a natural number, the continued fraction is rational after a finite number of iterations. And since each fraction part is between 0 and 1, its reciprocal is greater than one and so each [xi] ≥ 1, meaning its reciprocal can be approximated by a natural number and the value of its corresponding irrational part decreases, thus reducing the error with each convergent by roughly [xi]/xi, which exceeds half. The irrational number with the slowest set of convergents is the golden ratio,  = (√5 + 1)/2, because [i] = 1 for all i. And the error with each convergent to the golden ratio is reduced by roughly 2/(√5 + 1), which exceeds three-fifths. Since the error is greatly reduced with successive iterations, the continued fraction converges to the irrational number. Hence, we can approximate any irrational number to any degree of precision after a finite number of iterations. Example 4: Find the first, second, and third convergents to . Solution: First, type in  and press enter. Next, type 1/fPart(Ans) and press the ENTER key THREE more times. Then, [] = 3; [ ] = 7; [ ] = 15; and [ ] = 1. So, the first convergent to  is 3₁ = [3; 7]; the second 1 2 3 ⁷ convergent to  is 3 + 1/(7 + 1/15) = 3 + 1/(106/15) = 3 ₁₅ = [3; 7, 15]; and the third convergent to  is 3 + 1/(7 ¹⁰⁶ + 1/(15 + 1/1)) = 3 ₁₆ = [3; 7, 15, 1]. Notice that for each convergent, the numerator is relatively prime to the ¹¹³ denominator, meaning each convergent is already expressed in lowest terms. Our observation about convergents is true in general; the proof is left as an exercise. Also note that the third convergent is within a millionth of . In certain cases, it is unnecessary to continue the iterative process to the desired number of places. Some continued fractions terminate while others repeat. If the fraction part complement to any of the terms does not exist or matches the fraction part complement to a preceding term, then the continued fraction terminates or repeats respectively. Proof: If the fraction part of xi is zero, then by definition, the ith convergent to x is the same as x. And the i+1st convergent is undefined! Since the fraction part from the previous iteration is the independent variable in the next iteration, if the fraction part of any iteration is the same as the fraction part of a preceding

iteration, then the inputs are the same and hence the outputs are the same. Essentially, the terms repeat. Example 5: Express 378/784 as a continued fraction. Solution: Since 378/784 is rational, the continued fraction will terminate. Let x = 378/784. Then, [x] = 0; [x1] = 2; [x2] = 13; and x3 = 2. So, 378/784 = [0; 2, 13, 2]. When reconstructing x from the terms in the continued fraction, we find that x = 1/(2 + 1/(13 + 1/2)) = 1/(2 + 1/(27/2)) = 1/(2 + 2/27) = 1/(56/27) = 27/56. In conclusion, 378/784 reduces to 27/56 and hence gcd(378,784) = 14. Example 6: One solution to x2 − 6x + 2 = 0 is 3 − √7. Express 3 − √7 as a continued fraction. Solution: Let x = 3 − √7. Then, [x] = 0; [x1] = 2; [x2] = 1; [x3] = 4; [x4] = 1; [x5] = 1; [x6] = 1; [x7] = 4; [x8] = 1; [x9] = 1; etc. So, 3 − √7 = [0; 2, 1, 4, 1, 1, 1, 4, 1, 1,...]. Notice that although 3 − √7 is irrational, we see a pattern in its continued fraction i.e. [xi] = [xi+4] for i > 1. Since x1 ≈ 2.822875655 and x5 ≈ 1.822875655, in all likelihood, the fraction parts are the same. However, in order to be sure, we must evaluate the iterations using radicals. Although this standard of proof seems unreasonably high for this example, it is required for verifying longer patterns since rounding error can have a significant effect on successive convergents. In this example, if [xi] = [xi+4] for i > 1, then [x23] = 4. However, according to the TI-83, x23 ≈ 11.49, meaning [x23] = 11. It turns out that the terms in the continued fraction expansion for x repeat iff x is a real irrational root of a quadratic polynomial with rational coefficients. While the general proof is beyond the scope of this work, the proof in individual instances is not. When we evaluate the iterations using radicals, we must remember to simplify our solution. When simplifying a solution by radicals, the convention is to have radicals in the numerator and a natural number in the denominator. We achieve this result by multiplying both the numerator and denominator by the conjugate of the algebraic number in the denominator. The following example should clear up any ambiguities. Example 7: Prove that the continued fraction expansion for 3 − √7 repeats after the fifth term. Solution: It suffices to show from Example 6 that x1 − [x1] = x5 − [x5] where x = 3 − √7. Then [x] = 0; x1 = 1/((3 − √7)/1 − 0) = (3 + √7)/((3 + √7)(3 − √7)) = (3 + √7)/2 and [x1] = 2; x2 = 1/((3 + √7)/2 − 2) = 2/(−1 + √7) = 2(−1 − √7)/((−1 − √7)(−1 + √7)) = (1 + √7)/3 and [x2] = 1; x3 = 1/((1 + √7)/3 − 1) = 3/(−2 + √7) = 3(−2 − √7)/((−2 − √7)(−2 + √7)) = (2 + √7)/1 and [x3] = 4; x4 = 1/((2 + √7)/1 − 4) = 1/(−2 + √7) = 1(−2 − √7)/((−2 − √7)(−2 + √7)) = (2 + √7)/3 and [x4] = 1; x5 = 1/((2 + √7)/3 − 1) = 3/(−1 + √7) = 3(−1 − √7)/((−1 − √7)(−1 + √7)) = (1 + √7)/2 and [x5] = 1. Since x1 = x5 + 1, their difference is an integer and hence their fraction parts must be the same. So, 3 − √7 = [0; 2, ̅1̅, ̅4̅, ̅1̅, ̅1̅] where the bars denote the part of the expansion with repeating terms.

Section 4.2: Quadratic Reciprocity

In the study of number theory, we occasionally encounter instances where we need to determine if an integer is a perfect square in a given modulus. In other words, we are determining whether the modular equation, x2 ≡ a mod q, has solutions. We present a very effective method called quadratic reciprocity. Since quadratic reciprocity does not work for even moduli, assume that q is odd. As we develop properties of quadratic reciprocity, keep in mind that the last three laws are only proven to work for odd primes. Although we eventually generalize them for all odd q, if q is a composite, then there are instances where we cannot conclusively determine whether x2 ≡ a mod q has solutions based solely on our output. Before discussing quadratic reciprocity, we must first define quadratic residues and nonresidues. If q ∤ x and x is a perfect square in modulus q, then x is called a quadratic residue (QR) in modulus q. If q ∤ x and x is not a perfect square in modulus q, then x is called a nonresidue (NR) in modulus q. The second row of the table below shows the values of x2 mod 15 for xℕ, which are the quadratic residues in modulus 15.

x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 x2 1 4 9 1 10 6 4 4 6 10 1 9 4 1

We notice an interesting property about quadratic residues immediately. Since x2 ≡ (q − x)2 mod q, their order is symmetric. But if x2 ≡ (q − x)2 mod q, then at most 7 quadratic residues can correspond to the 14 values for x. Later on, we prove that for each odd prime, p, there are always (p − 1)/2 quadratic residues and (p − 1)/2

nonresidues modulo p. The table above illustrates that composites can have even fewer quadratic residues. The products of residues have interesting properties as well. The product of two quadratic residues is a quadratic residue. The product of a quadratic residue and a nonresidue is a nonresidue. And the product of two nonresidues is a quadratic residue. Proof: Let x be a NR such that xaq ≡ a mod q and xbq ≡ b mod q. Suppose that a is a QR and b is a QR. Then, aq and bq must be even. Otherwise, a and b would not be perfect squares. Since the sum of two even numbers is an even number, ab is a QR. Suppose that a is a QR and b is a NR. Then, aq is even and bq is odd. Since the sum of an even number and an odd number is an odd number, ab is a NR. Suppose that a is a NR and b is a NR. Then, aq and bq must be odd. Otherwise, a or b would be a perfect square. Since the sum of two odd numbers is an even number, ab is a QR. However, from this point forward, every property regarding residues is developed only for prime moduli with p > 2 and will not necessarily hold for composite moduli. A primitive root, x, of an odd prime, p, is a natural number such that the smallest exponent, e, for which xe ≡ 1 mod p is p − 1. Primitive roots are similar to repeating digital expansions of reciprocals of primes. For example, in Section 3.1 we demonstrated that 1/23 has 22 repeating decimal digits as opposed to 2 consecutive cycles of 11 repeating digits or 11 consecutive cycles of 2 repeating digits. It follows that 10 is a primitive root of 23. An interesting property of primitive roots is that their powers span the set of integers in modulo p. That is, {x1,..., xp−1} = {1,..., p − 1}, although we can expect the elements in each set to appear in a different order. Therefore, all the elements that are even powers of x must be quadratic residues and since half the elements must be nonresidues, all the elements that are odd powers of x must be nonresidues. It turns out that every prime has primitive roots.

At this point, we are ready to introduce quadratic reciprocity. In order to determine if a is a quadratic residue in a prime modulus, p, we use the , named after the 18th century French mathematician 푎 Adrien-Marie Legendre. The Legendre symbol is generally denoted as ( ͞푝͞ ) or (a|p). Throughout the text, we always refer to the Legendre symbol using the latter notation. The Legendre symbol is defined as (a|p) = {1 if a is a QR in modulo p, 0 (or in some texts undefined) if p ∣ a, and −1 if a is a NR in modulo p}. Notice how well the definition corresponds to the multiplicative properties of quadratic residues i.e. QRQR = QR just as 11 = 1; QRNR = NR just as 1−1 = −1; and NRNR = QR just as −1−1 = 1. The multiplication rule also implies that we can express the Legendre symbol as a product of factors of a. In other words, if a = a1a2, then (a|p) = (a1|p)(a2|p). Although the definition of the Legendre symbol is sufficient for us to call it a function of a and p, suppose we bet that it does not have an explicit formula. Then, the 18th century Swiss mathematician Leonard Euler would prove us wrong! The following result is known as Euler's criterion. If p is an odd prime, then a(p−1)/2 ≡ (a|p) mod p, or equivalently, (a|p) = a(p−1)/2 − p[a(p−1)/2/p + ½]. Proof: Obviously, if a ≡ 0 mod p, then a(p−1)/2 ≡ 0 mod p. Let a ≠ kp and let x be a primitive root in modulo p. Since the powers of x span the set of integers in modulo p, all quadratic residues are even powers of x and all nonresidues are odd powers of x. So, if a is a QR in modulo p, then a(p−1)/2 ≡ (x2k)(p−1)/2 mod p ≡ (xp−1)k mod p ≡ 1 mod p. Similarly, if a is a NR in modulo p, then a(p−1)/2 ≡ (x2k+1)(p−1)/2 mod p ≡ x(p−1)k+(p−1)/2 mod p ≡ (xp−1)k(x(p−1)/2) ≡ x(p−1)/2 ≡ 1 mod p. Since x is a primitive root, the smallest exponent, e, for which xe ≡ 1 mod p is p − 1. So, x(p−1)/2 ≡ −1 mod p. Therefore, a(p−1)/2 ≡ (a|p) mod p. Let p be an odd prime. Then, i. (−1|p) = {1 if p ≡ 1 mod 4 and −1 if p ≡ 3 mod 4} and ii. (2|p) = {1 if p ≡ 1 mod 8 or p ≡ 7 mod 8 and −1 if p ≡ 3 mod 8 or p ≡ 5 mod 8}. Proof: We can evaluate both parts using Euler's criterion. i. Let p ≡ 1 mod 4. Then, (−1)(p−1)/2 = (−1)(4k+1−1)/2 = (−1)2k = 1. Let p ≡ 3 mod 4. Then, (−1)(p−1)/2 = (−1)(4k+3−1)/2 = (−1)2k+1 = −1. ii. Determining 2(p−1)/2 mod p is a little trickier, but Gauss figured it out! Suppose we multiply each of the numbers 1, 2,..., (p − 1)/2 by 2 and then take their product. It would be 2(p−1)/2((p − 1)/2)!. Alternatively, suppose we multiply each of the numbers 1, 2,..., (p − 1)/2 by 2 and then subtract p from those numbers greater than p/2 and then take their product. It would be the product of the elements in the set, {2, 4,..., 2k, 2k + 2 − p,..., −3, −1}, where 2k is the greatest even number not exceeding (p − 1)/2. Since p is an odd prime, each number in our list to the right of 2k must be odd. Since 2k + 2 is the smallest even number greater than p/2, 2k + 2 − p ≥ −(p − 1)/2. So, we have a set of (p − 1)/2 integers whose absolute values are unique, natural numbers from 1 to (p − 1)/2 given by the set {−1, 2, −3, 4,..., (−1)(p−1)/2(p − 1)/2}. And the product of its elements is (−1)s((p − 1)/2)! (푝−1)/2 2 (p−1)/2 s where s is the sum of the exponents of −1, which is ∑ 푖=1 i = (p − 1)/8. So, 2 ((p − 1)/2)! ≡ (−1) ((p − 1)/2)! mod p. Since p ∤ ((p − 1)/2)!, we can cancel both factorial terms, meaning (2|p) ≡ (−1)s mod p. Therefore i. If p ≡ 1 mod 8, then, (−1)(p²−1)/8 = (−1)((8k+1)²−1)/8 = (−1)((8k)²+2(1)(8k)+1²−1)/8 = (−1)(8k²+2k+0) = 1.

ii. If p ≡ 7 mod 8, then, (−1)(p²−1)/8 = (−1)((8k+7)²−1)/8 = (−1)((8k)²+2(7)(8k)+7²−1)/8 = (−1)(8k²+14k+6) = 1. iii. If p ≡ 3 mod 8, then, (−1)(p²−1)/8 = (−1)((8k+3)²−1)/8 = (−1)((8k)²+2(3)(8k)+3²−1)/8 = (−1)(8k²+6k+1) = −1. iv. If p ≡ 5 mod 8, then, (−1)(p²−1)/8 = (−1)((8k+5)²−1)/8 = (−1)((8k)²+2(5)(8k)+5²−1)/8 = (−1)(8k²+10k+3) = −1. Q.E.D.

Our next proof is also courtesy of Gauss. It is celebrated both for its elegance and the fact that it was the greatest integer function's debut. This next theorem is often referred to as the law of quadratic reciprocity. It states that if p and q are distinct odd primes, then (p|q) = {(q|p) if p ≡ 1 mod 4 or q ≡ 1 mod 4 and −(q|p) if p ≡ 3 mod 4 and q ≡ 3 mod 4}. Gauss first proved the law of quadratic reciprocity at the age of 19 and found eight different proofs during his lifetime. His motivation was to discover theorems that could also be used for determining cubic and quartic residues. Not only do these theorems now exist, they have made a significant contribution to . The proof we present here is Gauss’s third proof with modifications by Ferdinand Gotthold Max Eisenstein in 1844. Without the modifications, it would be much longer and very messy. Since Eisenstein's version still has many components, we organize his proof by dividing it into sub-objectives and providing sub-proofs for each. Proof: The objective of Gauss’s third proof, formally stated, is to show that (p|q)(q|p) = (−1)(p−1)(q−1)/4. This is the relation from which we derive our . We begin by letting {ai} denote the set of all even natural numbers less than p. So, {ai} = {2, 4, ..., p − 3, p − 1}. Further, let {ri} = {qai − p[qai/p]} so that qai ≡ ri mod p ri ri with each 1 ≤ ri ≤ p − 1. If ri is even, then (−1) ri = ri and ri{ai}. But if ri is odd, then (−1) ri = −ri. Since 1 ≤ ri ≤ p − 1, 1 ≤ p − ri ≤ p − 1. If ri is odd, p − ri is even and hence p − ri{ai}. So, for every i there exists a j such that ri ri (−1) ri ≡ aj mod p, meaning (−1) ri{ai}. ri Each (−1) ri is unique in the subset {ai} of modulo p. Proof: We first show that each ri is unique, meaning ri = rj iff i = j. Let ri be an arbitrary remainder such that 0 ≤ ri ≤ p − 1 and let ai = 2i. We must show that for each ri there is exactly one value of i in modulo p such that 2qi ≡ ri mod p. Suppose 2qi' ≡ ri mod p. Then, 2q(i − i') ≡ 0 mod p. Since p and q are distinct odd primes, p ∤ 2q. It follows from Euclid's lemma that p ∣ (i − i') and hence i ≡ ri i' mod p. So, there is exactly one value of i for each ri. Therefore, each ri is unique in modulo p. Since (−1) ri = ri ri for all even ri, which are distinct in modulo p, we concern ourselves with odd ri. When ri is odd, (−1) ri ≡ p − ri ≡ ri ri rp−i mod p. So, each (−1) ri is not unique in modulo p; we must show (−1) ri is unique in {ai}. Since ai < p, i < p/2 and 2(p − i) > p, ap−i{ai} and rp−i{ri}. For this reason, for each odd ri, there exists no even rj{ri} such that ri rj = p − ri. Therefore, each (−1) ri{ai} is unique. Our next result is known as Eisenstein's lemma. If p is an odd prime and p ∤ q, then (q|p) = (−1)s where s = (푝−1)/2 ri ri (푝−1)/2 ∑ 푖=1 [qai/p]. Proof: Since each (−1) ri{ai} is unique, {(−1) ri mod p} = {ai}. Hence, ∏ 푖=1 ai ≡ (푝−1)/2 ri (푝−1)/2 (푝−1)/2 ∏ 푖=1 (−1) ri mod p. Since qai ≡ ri mod p, we also have ∏ 푖=1 qai ≡ ∏ 푖=1 ri mod p. Multiplying each term in (푝−1)/2 (푝−1)/2 ri (푝−1)/2 the first congruence by q yields ∏ 푖=1 qai ≡ ∏ 푖=1 q(−1) ri mod p and by transitivity, ∏ 푖=1 ri ≡ (푝−1)/2 ri ri (푝−1)/2 ri ∏ 푖=1 q(−1) ri mod p. Multiplying each term in the third congruence by (−1) yields ∏ 푖=1 (−1) ri ≡ (푝−1)/2 ri ri ∑iri (푝−1)/2 (p−1)/2 (푝−1)/2 ∏ 푖=1 q(−1) (−1) ri mod p, which simplifies to (−1) ∏ 푖=1 ri ≡ q ∏ 푖=1 ri mod p. We can simplify our (p−1)/2 (푝−1)/2 congruence further! Recall that q ≡ (q|p) mod p by Euler's criterion. And since p ∤ ri, p ∤ ∏ 푖=1 ri. Using (푝−1)/2 ∑iri Euclid's lemma, we can cancel each ∏ 푖=1 ri-term, which simplifies to (q|p) ≡ (−1) mod p. Since (q|p) = ∑iri (푝−1)/2 (−1) = 1, our only concern with ∑ 푖=1 ri is whether it is even or odd, namely its value in modulo 2. Since ri = s (푝−1)/2 −p[qai/p] + qai, −p is odd, and qai is even, ri ≡ [qai/p] mod 2. Therefore, (q|p) = (−1) where s ≡ ∑ 푖=1 [qai/p] s (푞−1)/2 mod 2. Similarly, if {bi} = {2, 4, ..., q − 3, q − 1}, then (p|q) = (−1) where s ≡ ∑ 푖=1 [pbi/q] mod 2. Since we can conclude that (q|p)(p|q) = (−1)∑i[qai/p]+∑i[pbi/q] from Eisenstein's lemma, the problem is reduced to (푝−1)/2 (푞−1)/2 proving that ∑ 푖=1 [qai/p] + ∑ 푖=1 [pbi/q] ≡ (p − 1)(q − 1)/4 mod 2. Eisenstein devised a clever method for proving this geometrically using lattice-point diagrams and symmetries. For the purposes of this proof, a lattice point is any point on the Cartesian grid with integer coordinates. Three lattice-point diagrams illustrating his proof for p = 23 and q = 11 are shown below.

Using lattice-point diagrams, we can illustrate Eisenstein's proof in three key steps. In the leftmost lattice diagram, the length of each gray line segment represents the values of [qai/p] for each even column, ai. In the middle lattice diagram, the length of each gray line segment represents the values of [pbi/q] for each even row, bi. First, we show that each gray line segment of length [qai/p] to the right of the line x = p/2 has the same parity as the black line segment above the line y = qx/p and hence, the black lines can represent that part of the sum. Next, we show that by flipping the triangle bounded by x = p/2 and y = qx/p about the lines x = p/2 and y = q/2, the black lines can represent the sum of [qai/p] for each odd column to the left of the line x = p/2. Then, we present a similar argument for each gray line segment of length [pbi/q] above the line y = q/2. Using the leftmost and (푝−1)/2 (푞−1)/2 (푝−1)/2 (푞−1)/2 middle lattice diagrams, he proves that ∑ 푖=1 [qai/p] + ∑ 푖=1 [pbi/q] ≡ ∑ 푖=1 [qi/p] + ∑ 푖=1 [pi/q] mod 2. (푝−1)/2 (푞−1)/2 Using the rightmost lattice diagram, he proves that ∑ 푖=1 [qi/p] + ∑ 푖=1 [pi/q] = (p − 1)(q − 1)/4. Rather than give away too much too soon, we provide a step-by-step algebraic explanation of how he achieves these results in the next several paragraphs. For the remaining sub-proofs below, we use functions of ai and bi as indices in addition to the standard index, i. This makes it easier to express summations of only even or only odd terms for partitions of {ai} and {bi}. Both sums are bounded by the axes and the line y = qx/p. Proof: Since [qai/p] is the number of lattice points at x = ai that are above the line y = 0 and on or below the line y = qx/p, ∑aᵢ[qai/p] represents the total number of lattice points in the even-numbered columns from ai = 2 to p − 1 above the line y = 0 and on or below the line y = qx/p. Similarly, since [pbi/q] is the number of lattice points at y = bi to the right of the line x = 0 and to the left of or on the line y = qx/p, ∑bᵢ[pbi/q] represents the total number of lattice points in the even-numbered rows from bi = 2 to q − 1 to the right of the line x = 0 and to the left of or on the line y = qx/p. Let [x] = k so that y = qk/p for all xℤ. Then, by Euclid's lemma, yℤ iff k is a multiple of p. So, the line, y = qx/p, only intersects lattice points on our grid at (0, 0) and (p, q), which are on the edges of our grid. Hence, every lattice point inside our grid is either above or below y. (푝−1)/2 The sums ∑aᵢ[qai/p] and ∑ 푖=1 [qi/p] are congruent in modulo 2. Proof: Suppose we partition ∑aᵢ[qai/p] at x = p/2. Since the greatest even number less than p/2 is 2[p/4] and the smallest even number greater than p/2 is 2[푝/4] 푝−1 2[p/4] + 2, ∑aᵢ[qai/p] = ∑ 푎ᵢ=2 [qai/p] + ∑ 푎ᵢ=2[푝/4]+2[qai/p]. For now, we focus on the second partition of the sum. The total number of lattice points in each column above the x-axis and below q is q − 1. And since the line, y = qx/p, does not intersect any lattice points inside our grid, the number of lattice points in each column below q and above y is (q − 1) − [qai/p]. Since q − 1 is even and 1 ≡ −1 mod 2, [qai/p] ≡ (q − 1) − [qai/p] mod 2. Since [qai/p] 푝−1 푝−1 ≡ (q − 1) − [qai/p] mod 2, we can replace ∑ 푎ᵢ=2[푝/4]+2[qai/p] with ∑ 푎ᵢ=2[푝/4]+2((q − 1) − [qai/p]). Now, imagine that we flip every lattice point represented by the second partition of the sum about the line x = p/2 (refer to the diagram). This means that each lattice point u units to the right of x = p/2 is now u units to the left of x = p/2. So, if ai = p/2 + u, we replace ai with p/2 − u. So, each lattice point with an x-coordinate of ai > p/2 is given the new 푝−2[푝/4]−2 x-coordinate p/2 − (ai − p/2), or p − ai. Replacing the index ai with p − ai in the sum yields ∑ 푝−푎ᵢ=1 ((q − 1) − [qai/p]). So, we still sum (q − 1) − [qai/p] for an even ai between 2[p/4] + 2 and p − 1, but in opposite order. In order to sum (q − 1) − [qai/p] with respect to p − ai, we need to replace all ai with p − ai. Working backwards, we determine that (q − 1) − [qai/p] = −1 − [−q + qai/p]; −1 − [−qp/p + qai/p] = −1 − [−(qp/p − qai/p)] = −1 − [−q(p − ai)/p]. Notice that −[−q(p − ai)/p] = ceiling(q(p − ai)/p). And since q(p − ai)/pℤ for all ai, ceiling(q(p − 2[푝/4] 푝−2[푝/4]−2 ai)/p) = [q(p − ai)/p] + 1. Hence, (q − 1) − [qai/p] = [q(p − ai)/p] and ∑aᵢ[qai/p] ≡ ∑ 푎ᵢ=2 [qai/p] + ∑ 푝−푎ᵢ=1 [q(p − ai)/p] mod 2. Notice that the index for the first partition of the sum, ai, is every even number between 2 and p/2 and that the index for the second partition of the sum, p − ai, is every odd number between 1 and p/2. So, 2[푝/4] 푝−2[푝/4]−2 ∑ 푎ᵢ=2 [qai/p] + ∑ 푝−푎ᵢ=1 [q(p − ai)/p] is the sum of [qi/p] for all i such that 1 ≤ i ≤ (p − 1)/2. So, ∑aᵢ[qai/p] ≡ (푝−1)/2 ∑ 푖=1 [qi/p] mod 2.

(푞−1)/2 The sums ∑bᵢ[pbi/q] and ∑ 푖=1 [pi/q] are congruent in modulo 2. Proof: Suppose we partition ∑bᵢ[pbi/q] at y = q/2. Since the greatest even number less than q/2 is 2[q/4] and the smallest even number greater than q/2 is 2[푞/4] 푞−1 2[q/4] + 2, ∑bᵢ[pbi/q] = ∑ 푏ᵢ=2 [pbi/q] + ∑ 푏ᵢ=2[푞/4]+2[pbi/q]. For now, we focus on the second partition of the sum. The total number of lattice points in each row to the right of the y-axis and to the left of p is p − 1. And since the line, y = qx/p, does not intersect any lattice points inside our grid, the number of lattice points in each row to the left of x = p and to the right of y is (p − 1) − [pbi/q]. Since p − 1 is even and 1 ≡ −1 mod 2, [pbi/q] ≡ (p − 1) − 푞−1 푞−1 [pbi/q] mod 2. Since [pbi/q] ≡ (p − 1) − [pbi/q] mod 2, we can replace ∑ 푏ᵢ=2[푞/4]+2[pbi/q] with ∑ 푏ᵢ=2[푞/4]+2((p − 1) − [pbi/q]). Now, imagine that we flip every lattice point represented by the second partition of the sum about the line y = q/2 (refer to the diagram). This means that each lattice point u units above y = q/2 is now u units below y = q/2. So, if bi = q/2 + u, we replace bi with q/2 − u. So, each lattice point with a y-coordinate of bi > q/2 is given 푞−2[푞/4]−2 the new y-coordinate q/2 − (bi − q/2), or q − bi. Replacing the index bi with q − bi in the sum yields ∑ 푞−푏ᵢ=1 ((p − 1) − [pbi/q]). So, we still sum (p − 1) − [pbi/q] for an even bi between 2[q/4] + 2 and q − 1, but in opposite order. In order to sum (p − 1) − [pbi/q] with respect to q − bi, we need to replace all bi with q − bi. Working backwards, we determine that (p − 1) − [pbi/q] = −1 − [−p + pbi/q]; −1 − [−pq/q + pbi/q] = −1 − [−(pq/q − pbi/q)] = −1 − [−p(q − bi)/q]. Notice that −[−p(q − bi)/q] is the ceiling of p(q − bi)/q. And since p(q − bi)/qℤ 2[푞/4] for all bi, its ceiling is [p(q − bi)/q] + 1. Hence, (p − 1) − [pbi/q] = [p(q − bi)/q] and ∑bᵢ[pbi/q] ≡ ∑ 푏ᵢ=2 [pbi/q] + 푞−2[푞/4]−2 ∑ 푞−푏ᵢ=1 [p(q − bi)/q] mod 2. Notice that the index for the first partition of the sum, bi, is every even number between 2 and q/2 and that the index for the second partition of the sum, q − bi, is every odd number between 1 2[푞/4] 푞−2[푞/4]−2 and q/2. So, ∑ 푏ᵢ=2 [pbi/q] + ∑ 푞−푏ᵢ=1 [p(q − bi)/q] is the sum of [pi/q] for all i such that 1 ≤ i ≤ (q − 1)/2. So, (푞−1)/2 ∑bᵢ[pbi/q] ≡ ∑ 푖=1 [pi/q] mod 2. If p and q are odd primes, then (p|q)(q|p) = (−1)(p−1)(q−1)/4 Proof: i. Since (q|p)(p|q) = (−1)∑ai[qai/p]+∑bi[pbi/q], (푝−1)/2 (푞−1)/2 (푝−1)/2 ∑aᵢ[qai/p] ≡ ∑ 푖=1 [qi/p] mod 2, and ∑bᵢ[pbi/q] ≡ ∑ 푖=1 [pi/q] mod 2, it suffices to show that ∑ 푖=1 [qi/p] + (푞−1)/2 ∑ 푖=1 [pi/q] = (p − 1)(q − 1)/4. We can prove this using the rightmost lattice-point diagram. Since [qi/p] is the (푝−1)/2 number of lattice points at x = i that are above the line y = 0 and below the line y = qx/p, ∑ 푖=1 [qi/p] represents the total number of lattice points in each column from i = 1 to (p − 1)/2 above the line y = 0 and below the line y = qx/p. Similarly, since [pi/q] is the number of lattice points at y = i to the right of the line x = 0 and to the left of (푞−1)/2 the line y = qx/p, ∑ 푖=1 [pi/q] represents the total number of lattice points in each row from i = 1 to (q − 1)/2 to the right of the line x = 0 and to the left of the line y = qx/p. First, we examine every lattice point contributing to the sum in an arbitrary column, i < p/2. The first [qi/p] lattice points in the column above y = 0 and below y, contribute to the sum of [qi/p]. The remaining lattice points contributing to the sum are above y but below y = q/2 since we sum [pi/q] up to the first (q − 1)/2 rows. So, (q − 1)/2 lattice points in each column contribute to the sum. Next, we examine every lattice point contributing to the sum in an arbitrary row, i < q/2. The first [pi/q] lattice points in the row to the right of x = 0 and to the left of y, contribute to the sum of [pi/q]. The remaining lattice points contributing to the sum are to the right of y but to the left of x = p/2 since we sum [qi/p] up to the (푝−1)/2 first (p − 1)/2 columns. So, (p − 1)/2 lattice points in each row contribute to the sum. Therefore, ∑ 푖=1 [qi/p] + (푞−1)/2 ∑ 푖=1 [pi/q] is the number of rows in each column times the number of columns in each row, or (p − 1)(q − 1)/4. If p and q are odd primes, then (p|q) = {(q|p) if p ≡ 1 mod 4 or q ≡ 1 mod 4 and −(q|p) if p ≡ 3 mod 4 and q ≡ 3 mod 4}. Proof: Since (p|q) = 1 and (q|p) = 1, if (p|q) = (q|p), then (p|q)(q|p) = 1 and if (p|q) = −(q|p), then (p|q)(q|p) = −1. Let i and j be non-negative integers. (p−1)(q−1)/4 (4i+1−1)(4j+1−1)/4 4ij+0i+0j i. If p ≡ 1 mod 4 and q ≡ 1 mod 4, then (−1) = (−1) = (−1) = 1. So (p|q) = (q|p). (p−1)(q−1)/4 (4i+1−1)(4j+3−1)/4 4ij+2i+0j ii. If p ≡ 1 mod 4 and q ≡ 3 mod 4, then (−1) = (−1) = (−1) = 1. So (p|q) = (q|p). (p−1)(q−1)/4 (4i+3−1)(4j+1−1)/4 4ij+0i+2j iii. If p ≡ 3 mod 4 and q ≡ 1 mod 4, then (−1) = (−1) = (−1) = 1. So (p|q) = (q|p). (p−1)(q−1)/4 (4i+3−1)(4j+3−1)/4 4ij+2i+2j+1 iv. If p ≡ 3 mod 4 and q ≡ 3 mod 4, then (−1) = (−1) = (−1) =−1. So (p|q) = −(q|p).

We can use the law of quadratic reciprocity to reduce the values of q and p while preserving the value of (q|p). We can reduce p in modulo q after flipping (q|p). That is, (q|p) = {(p − kq|q) if q ≡ 1 mod 4 or p ≡ 1 mod 4 and −(p − kq|q) if q ≡ 3 mod 4 and p ≡ 3 mod 4}. By taking advantage of this property, we can make the computation process for the Legendre symbol very similar to the Euclidean algorithm. Each successive flip greatly reduces the values of the arguments in the Legendre symbol. After a finite number of iterations, they get so small that we can evaluate the Legendre symbol by factoring or by inspection. Since we can evaluate (−1|q), we can even reduce p

below zero in modulo q. For this reason, we let kq be p rounded to the nearest q by setting k = [p/q + ½]. Example 1: Determine if 564 is a perfect square in modulus 3067. Assume 3067 is a prime. Solution: We complete this exercise using the properties of the Legendre symbol by factoring the arguments. We get (564|3067) = (3|3067)(4|3067)(47|3067) = −(3067|3)1−(3067|47) = (1|3)(12|47) = 1(3|47)(4|47) = −(47|3)1 = −(−1|3) = −(−1) = 1. We can now guarantee that the congruence, x2 ≡ 564 mod 3067, has solutions and the smallest is x ≡ 733 mod 3067. The process was easy because we could factor 564 easily. However, for large enough arguments, factorization is unreasonable.

The is the extension of the Legendre symbol from the set of odd primes to the set of odd numbers. Let (a|q) denote the Jacobi symbol for all aℤ and qℕ with q odd and (a, q) = 1. Formally stated, if e1 ei ek e1 ei ek q has the prime factorization, p1 ...pi ...pk , then (a|q) = (a|p1) ...(a|pi) ...(a|pk) . The Jacobi symbol obeys all laws of the Legendre symbol. We omit formal proofs for this, but you can find them in Quadratic and Cubic Reciprocity by Suzanne Rousseau; see appendix. It follows that we could have also defined s (푞−1)/2 the Jacobi symbol as (a|q) = (−1) where s ≡ ∑ 푖=1 [2ai/q] mod 2 by Eisenstein’s lemma. Furthermore, i. (a1a2|q) = (a1|q)(a2|q); ii. (−1|q) = {1 if q ≡ 1 mod 4 and −1 if q ≡ 3 mod 4}; iii. (2|q) = {1 if q ≡ 1 mod 8 or q ≡ 7 mod 8 and −1 if q ≡ 3 mod 8 or q ≡ 5 mod 8}; and iv. (a|q) = {(q|a) if a ≡ 1 mod 4 or q ≡ 1 mod 4 and −(q|a) if a ≡ 3 mod 4 and q ≡ 3 mod 4} for all odd natural numbers, a and q. Property iv solves the problem of having to factor a before each flip, except for having to factor out at most a power of −1 and powers of 2. But just because the Jacobi symbol obeys all laws of the Legendre symbol does not mean that all properties of the Legendre symbol extend to the Jacobi symbol for composite q. If (a|q) = −1 for a composite q, then a is still a nonresidue. However, if (a|q) = 0 or if (a|q) = 1 for a composite q, then a is indeterminate because a must be a quadratic residue for every prime dividing q, but if a is a nonresidue for an even number of primes dividing q, then (a|q) = 1 because a NRNR = QR. And Euler’s criterion obviously does not hold. Example 2: Determine if 564 is a perfect square a) in modulus 3067 and b) in modulus 1173. Solution: We complete this exercise using the properties of the Jacobi symbol by factoring powers of 2 out of the arguments. For part a) we get (564|3067) = (4|3067)(141|3067) = 1(3067|141) = (3067 − 141[3067/141 + ½]|141) = (−35|141) = (−1|141)(35|141) = 1(35|141) = (141 − 35[141/35 + ½]|35) = (1|35) = 1. And for part b) we get (564|1173) = (4|1173)(141|1173) = 1(1173|141) = (1173 − 141[1173/141 + ½]|141) = (45|141) = (141 − 45[141/45 + ½]|45) = (6|45) = (2|45)(3|45) = −1(45|3) = −(0|3) = 0. Notice that using the Jacobi symbol is almost as easy as using the Euclidean algorithm and it always works on prime moduli. However, in the latter case (564, 1173) > 1 and hence (564|1173) = 0. So, we cannot use the Jacobi symbol to determine if 564 is a perfect square in modulus 1173.

We conclude with applications for quadratic reciprocity and a method for solving quadratic congruences. Quadratic residues and reciprocity have both commercial applications and applications to other areas of mathematics. Commercial applications include sound diffusion (the quadratic residue sound diffusor) and cryptography (such as primality testing). The most obvious application in number theory is determining if an equation with a quadratic component has integer solutions or proving that it does not before searching for them. We provide basic methods for doing that at the end of Section 5.1. In Section 3.2, we used Fermat’s little theorem to derive a very good primality test for m, but we can improve upon it. Instead of verifying xm−1 ≡ 1 mod m, we can do better by verifying x(m−1)/2 ≡ (x|m) mod m. Last, quadratic residues are used to construct Paley graphs, which are undirected graphs constructed from elements of a finite field that differ by a quadratic residue. Paley’s discovery led to his development of Paley construction, which he used to construct Hadamard matrices from finite fields. We can use some properties of residues and nonresidues to find the discrete square-root of any non-zero number in a prime modulus. Let q be a QR in modulo p. If x2 ≡ q mod p and p ≡  mod 4, then x ≡ q(p+1)/4 mod p. The result holds because x2 ≡ q(p+1)/2 ≡ q(p−1)/2q ≡ q mod p. If x2 ≡ q mod p and if p ≡  mod 8, then x ≡ q(p+3)/8 mod p or x ≡ qq)(p−5)/8 mod p. The result holds because x4 ≡ q(p+3)/2 ≡ q(p−1)/2q2 mod p and hence x2 ≡ q mod p. If q(p+3)/4 ≡ −q mod p, then {qq)(p−5)/8}2 = {qqq)(p−5)/4} = {4(p−1)/4q(p+3)/4} = {2(p−1)/2q(p+3)/4}. Since 2 is a NR when p ≡ 5 mod 8 and q(p+3)/4 ≡ −q mod p by hypothesis, {qq)(p−5)/8}2 ≡ q mod p. If 2k ∣ (p − 1), then we would expect 2k−1 possible solutions of the form x2ᵏ ≡ q2ᵏ⁻¹ mod p, each of which differ by a factor of a root of unity in modulo p. If k exceeds 2, then we recommend using the Tonelli-Shank’s algorithm to solve the congruence.

Section 4.3: Dedekind Reciprocity

We conclude this chapter with a brief treatment of Dedekind sums. Richard Dedekind first used these sums in an expression for his Dedekind eta function. Since then, Dedekind sums have been studied in number theory, algebraic geometry, and topology. The objective of this subsection is to develop properties of Dedekind reciprocity so that we can create an algorithm for evaluating Dedekind sums with large arguments. Before we start, we provide a brief outline for how these sums arise in the study of the Dedekind eta function and introduce some basic properties of Dedekind sums. We conclude with applications for Dedekind sums. xi/12  2nxi The Dedekind eta function is defined as (x) = e ∏ 푛=1(1 − e ) for all x = a + bi with b > 0. When working with complex variables, it is often necessary to transform their arguments. The transformation x' = (ax + b)/(cx + d) where ad − bc ≠ 0 is called a linear fractional transformation of x. If a, b, c, dℤ with ad − bc = 1, then it is called a modular transformation of x. Dedekind wanted to express (x') strictly in terms of x and (x) for all modular transformations of x. It turns out that for all modular transformations of x, (x') = ei/12(cx + d)1/2(x). For the trivial case where c = 0 and d = 1,  = b. If c > 0, then  = (a + d)/c − 12s(d, c) − 3 where s(d, c) is the Dedekind sum of d and c. The bivariate Dedekind sum is defined for all p, q with q > 0 and (p, q) = 1 as a sum from k = 1 to q − 1 that has a few alternate forms. There are several ways it can be expressed as the sum of a complex sequence in 2ki/q 푞−1 terms of e . These expressions are equivalent to s(p, q) = ∑ 푘=1cot(k/q)cot(pk/q)/(4q). The Dedekind sum 푞−1 also has the more common arithmetic expression s(p, q) = ∑ 푖=1 ((i/q))((pi/q)) where ((x)) is the sawtooth wave function. For the remainder of this section, we use the latter expression. 푞−1 It follows from the definition that i. s(p, q) = ∑ 푖=1 (i/q − ½)(pi/q − [pi/q] − ½); ii. s(−p, q) = −s(p, q); and iii. s(p, q) = s(p − kq, q). Proof: i. Since (p, q) = 1, pi/qℤ for all 0 < i < q. Therefore, for all terms in the sum, ((i/q))((pi/q)) = (i/q − [i/q] − ½)(pi/q − [pi/q] − ½). Since 0 < i/q < 1, [i/q] = 0. Therefore, ((i/q))((pi/q)) = (i/q − ½)(pi/q − [pi/q] − ½). ii. Since pi/qℤ for all 0 < i < q, ((−pi/q)) = −pi/q − [−pi/q] − ½ = −pi/q + [pi/q] + 1 − ½ = −(pi/q − [pi/q] − ½) = −((pi/q)). Therefore, s(−p, q) = −s(p, q). iii. Since ((pi/q)) = pi/q − [pi/q] − ½ for all 0 < i < q, (((p − kq)i/q)) = (p − kq)i/q − [(p − kq)i/q] − ½ = pi/q − ki − [pi/q] + ki − ½ = ((pi/q)). Therefore, s(p, q) = s(p − kq, q). Using ii and iii, we can reduce the absolute value of the argument p with the right choice of k, which we discuss later. Next, we must find a way to reduce q.

This next theorem is often referred to as the law of Dedekind reciprocity. It states that if p, qℕ and (p, q) = 1, then 12pq{s(p, q) + s(q, p)} = p2 − 3pq + q2 + 1. Dedekind proved the reciprocity theorem using modular transformations. Although his proof is not overly long, it involves some advanced analytical methods that this work does not cover. The proof we present here is due to Ulrich Dieter. Since Dieter’s proof has many components, we organize it by dividing it into sub-objectives and providing sub-proofs for each. 푘−1 We start by proving two basic lemmas. a) The sawtooth wave function has the identity ((x)) = ∑ 푖=0 (((x + i)/k)). Proof: Obviously, if xℤ, then (x + i)/kℤ for all i. The set of integers 0 through k − 1 covers the entire residue system of k exactly once. So, if xℤ, then there is exactly one value of i such that (x + i)/kℤ. If xℤ, 푘−1 then ((x)) = x − [x] − ½ and (((x + i)/k)) = (x + i)/k − [(x + i)/k] − ½. Since ∑ 푖=0 [x/k + i/k] = [x] by Hermite’s 푘−1 identity, ∑ 푖=0 {(x + i)/k − [(x + i)/k] − ½} = x − [x] − ½. If xℤ, then ((x)) − ½ = x − [x] − ½ and (((x + i)/k)) − ½ = (x + i)/k − [(x + i)/k] − ½ for exactly one term in the sum. It follows that the identity holds for all x. b) The 2 푞−1 Dedekind sum has the identity 12qs(1, q) = q − 3q + 2. Proof: Since p = 1, we have 12qs(1, q) = 12q∑ 푖=1(i/q − 2 푞−1 2 2 ½) = 12∑ 푖=1 {i /q − i + q/4} = q − 3q + 2. 푞−1 푝−1 푝−1 푞−1 Let S = s(p, q) + s(q, p) = ∑ 푖=0((i/q))((pi/q)) + ∑ 푗=0 ((j/p))((qj/p)). Then, S = ∑ 푗=0 ∑ 푖=0 ((i/q + j/p))(i/q + j/p − 1). Proof: Since ((x)) = 0 when x = 0, we can add in terms for which i = 0 and j = 0. For all other terms, let x = 푝−1 푝−1 푞−1 푞−1 pi/q and y = qj/p. Then, ((x)) = ∑ 푗=0 (((x + j)/p)) = ∑ 푗=0 ((i/q + j/p)) and ((y)) = ∑ 푖=0 (((y + i)/q)) = ∑ 푖=0 ((i/q + 푞−1 푝−1 푝−1 푞−1 푞−1 푝−1 j/p)). Therefore, S = ∑ 푖=0 ((i/q))∑ 푗=0 ((i/q + j/p)) + ∑ 푗=0 ((j/p))∑ 푖=0((i/q + j/p)) = ∑ 푖=0 ∑ 푗=0 ((i/q + j/p))((i/q)) 푝−1 푞−1 + ∑ 푗=0 ∑ 푖=0 ((i/q + j/p))((j/p)). Since the indices in the double sums are disjoint, they are interchangeable and 푞−1 푝−1 hence, the sums can be combined. So, S = ∑ 푖=0 ∑ 푗=0 ((i/q + j/p)){((i/q)) + ((j/p))}. Our expression for S can be simplified further. To see this, we partition the sum at values for which its indices are zero so that ((i/q)) = i/q − ½ and ((j/p)) = j/p − ½ for all terms in the sum except for the term with i = j = 0. Since the value of this term is

푞−1 푝−1 푞−1 zero, we do not include it in the sums. That is, S = ∑ 푖=1 ∑ 푗=1 ((i/q + j/p))(i/q + j/p − 1) + ∑ 푖=1(i/q − ½)(i/q − ½) 푝−1 푞−1 푝−1 푞−1 푝−1 + ∑ 푗=1 (j/p − ½)(j/p − ½). Since ∑ 푖=1(i/q − ½) = 0 and ∑ 푗=1 (j/p − ½) = 0, ∑ 푖=1 (i/q − ½)(i/q − ½) + ∑ 푗=1 (j/p − 푞−1 푝−1 ½)(j/p − ½) = ∑ 푖=1(i/q − ½)(i/q − c) + ∑ 푗=1 (j/p − ½)(j/p − c) for any real constant, c. Setting c = 1 yields the 푞−1 푝−1 푞−1 desired result. So, ∑ 푖=1(i/q − ½)(i/q − ½) + ∑ 푗=1 (j/p − ½)(j/p − ½) = ∑ 푖=1 ((i/q + 0/p))(i/q + 0/p − 1) + 푝−1 푝−1 푞−1 ∑ 푗=1 ((0/q + j/p))(0/q + j/p − 1). Therefore, S = ∑ 푗=0 ∑ 푖=0 ((i/q + j/p))(i/q + j/p − 1). 푝−1 푞−1 2 Let s' = ∑ 푗=0 ∑ 푖=0 {i/q + j/p − 1 − ((i/q + j/p))} . Then, s' = (pq + 3)/4. Proof: Let i/q + j/pℤ for some i and j so that pi + qj = kpq. Since (p, q) = 1, i = kq − qj/pℤ for 0 < j < p and j = kp − pi/qℤ for 0 < i < q. Hence i/q + j/pℤ in the sum only for i = j = 0. So, for i, j ≠ 0, (i/q + j/p − 1) − ((i/q + j/p)) = i/q + j/p − 1 − (i/q + j/p − [i/q + j/p] − ½) = [i/q + j/p] − ½. If i = j = 0, then, {i/q + j/p − 1 − ((i/q + j/p))}2 = 1 and {[i/q + j/p] − ½}2 푝−1 푞−1 2 = ¼. So, s' = ∑ 푗=0 ∑ 푖=0 {[i/q + j/p] − ½} + ¾. Our expression for s' can be simplified further since {[i/q + j/p] − ½}2 = [i/q + j/p]2 − [i/q + j/p] + ¼ = [i/q + j/p]([i/q + j/p] − 1) + ¼. Since 0 ≤ i/q + j/p < 2, [i/q + j/p] is either 푝−1 푞−1 0 or 1. In either event, [i/q + j/p]([i/q + j/p] − 1) = 0 and hence, s' = ∑ 푗=0 ∑ 푖=0 {¼} + ¾ = (pq + 3)/4. 푝−1 푞−1 2 푝−1 푞−1 2 푝−1 푞−1 Let s' = ∑ 푗=0 ∑ 푖=0 {i/q + j/p − 1 − ((i/q + j/p))} , x = ∑ 푗=0 ∑ 푖=0 (i/q + j/p − 1) , and y = ∑ 푗=0 ∑ 푖=0 ((i/q + j/p))2. Then, s' = x + y − 2S with x = (pq + 3)/6 + (p2 + q2)/(6pq), y = (p2q2 − 3pq + 2)/(12pq), and S = (p2 − 3pq + q2 + 1)/(12pq). Proof: Since {i/q + j/p − 1 − ((i/q + j/p))}2 = (i/q + j/p − 1)2 − 2((i/q + j/p))(i/q + j/p − 1) + ((i/q + j/p))2, s' = x + y − 2S. Since (i/q + j/p − 1)2 = i2/q2 + 2ij/(pq) − 2i/q + j2/p2 − 2j/p + 1, we can use the summation formulas for powers of i and j to show that x = (pq + 3)/6 + (p2 + q2)/(6pq). Since i/q + j/p = (pi + qj)/(pq) and (p, q) = 1, by Bezout’s lemma and the Chinese remainder theorem, for every integer, k, between 0 and pq − 1, there is exactly one 0 ≤ i ≤ q − 1 and 0 ≤ j ≤ p − 1 such that k ≡ pi + qj mod pq given by the solution to the simultaneous congruence, k ≡ pi mod q and k ≡ qj mod p. Let k ≡ pi + qj mod pq. Then, k/(pq) − (pi + qj)/(pq)ℤ and hence ((k/(pq))) = (((pi + qj)/(pq))) = ((i/q + j/p)) for all i and j in the sum. So, the sequences 푝−1 푞−1 2 푝푞−1 2 for the sums ∑ 푗=0 ∑ 푖=0 ((i/q + j/p)) and ∑ 푘=0 ((k/(pq))) have the same set of terms (in some order) and hence 푝−1 푞−1 2 푝푞−1 2 푝푞−1 the sums are equal. Since y = ∑ 푗=0 ∑ 푖=0 ((i/q + j/p)) = ∑ 푘=0 ((k/(pq))) = ∑ 푘=1 ((k/(pq)))((k/(pq))) = s(1, pq), y = (p2q2 − 3pq + 2)/(12pq). Since s' = x + y − 2S, s' = (pq + 3)/6 + (p2 + q2)/(6pq) + (p2q2 − 3pq + 2)/(12pq) − 2S = (pq + 1)/4 + (p2 + q2 + 1)/(6pq) − 2S. Since s' = (pq + 1)/4 + (p2 + q2 + 1)/(6pq) − 2S = (pq + 3)/4, we solve for S to obtain the desired result. Therefore, S = (p2 − 3pq + q2 + 1)/(12pq) = s(p, q) + s(q, p) and the reciprocity theorem is proved.

We can use the reciprocity formula to reduce the values of p and q while preserving the value of s(p, q) and we modify it slightly for that purpose. First, s(p, q) + s(q, p) = (p2 − 3pq + q2 + 1)/(12pq) = ((p − q)2 + 1)/(12pq) − 1/12. Second, s(p, q) = ((p − q)2 + 1)/(12pq) − 1/12 − s(q − pk, p). By taking advantage of this property, we can make the computation process for the Dedekind sum similar to the Euclidean algorithm by alternately reducing one argument modulo the other and then switching them. (In fact, Dedekind sums can be expressed as finite sums in terms of the Euclidean remainders of p and q. This expression for Dedekind sums is introduced in the exercises.) Since (p, q) = 1, after a finite number of iterations, we can reduce the sum to s(1, n) = ((n − 1)2 + 1)/(12n) − 1/12 for some nℕ. In fact, since we can evaluate s(−r, q), we can even reduce p below zero in modulo q. For this reason, we let kq be p rounded to the nearest q and we do so by setting k = [p/q + ½]. Example 1: Evaluate s(586, 667). Assume (586, 667) = 1. Solution: It follows that s(586, 667) = s(586 − 667[586/667 + ½], 667) = s(−81, 667) = −s(81, 667); −s(81, 667) = −((81 − 667)2 + 1)/(1281667) + 1/12 + s(667 − 81[667/81 + ½], 81) = −343,397/648,324 + 1/12 + s(19, 81); s(19, 81) = ((19 − 81)2 + 1)/(121981) − 1/12 − s(81 − 19[81/19 + ½], 19) = 3845/18,468 − 1/12 − s(5, 19); −s(5, 19) = −((5 − 19)2 + 1)/(12519) + 1/12 + s(19 − 5[19/5 + ½], 5) = −197/1140 + 1/12 + s(−1, 5); and −s(1, 5) = −((5 − 1)2 + 1)/(125) + 1/12 = −17/60 + 1/12. So, our sum total is s(586, 667) = −343,397/648,324 + 1/12 + 3845/18,468 − 1/12 − 197/1140 + 1/12 − 17/60 + 1/12 = −815/1334.

Although this section only provides an introduction to the topic, Dedekind Sums by Hans Rademacher and Emil Grosswald, is a comprehensive and remarkable resource. It starts by providing four different proofs of Dedekind reciprocity and their diverse approaches can be summarized as follows: the first is a clever elementary

proof and for this reason it is used in this work; the second uses lattice-point enumeration in three dimensions; the third uses the finite Fourier series for the sawtooth wave function; and the fourth uses a reciprocity law of Riemann-Stieltjes integration. Next, it offers many arithmetic properties of Dedekind sums, providing several identities involving the greatest integer function and/or the Jacobi symbol. The latter verify our results from quadratic reciprocity. Then, it describes modular transforms, including the derivation of the Dedekind sum from the Dedekind eta function. It elaborates on the use of Dedekind sums in the study of modular groups, lattice-point enumeration of polyhedrons, serial correlations, and the theory of partitions of an integer. It concludes the latter topic with a famous result by Hardy and Ramanujan. Finally, it discusses the history of Dedekind sums, most of which refers to various attempts at generalizing them either for purely theoretical research or for solving specialized cases of problems arising in some of the applications listed above. We believe two applications of Dedekind reciprocity are broad enough that they deserve further treatment here, so we elaborate on them below.

The most well-known application for Dedekind reciprocity in number theory is the lattice-point enumeration of polyhedrons. A polyhedron is any three-dimensional geometric structure created and bounded by the 푞−1 푝−1 intersection of planes. The Dedekind reciprocity formula leads to the identity p∑ 푖=1 i[pi/q] + q∑ 푖=1 i[qi/p] = (p − 1)(q − 1)(8pq − p − q − 1)/12 for (p, q) = 1. The proof is left as an exercise. This identity allows us to evaluate 푞−1 ∑ 푖=1 i[pi/q] for large p and q, which is an end in itself, but it can also be used to determine the number of lattice points inside or on a staircase rectangular pyramid whose base is given by the coordinates (0, 0, 0), (0, p, 0), (q, 0, 0), and (p, q, 0) and whose summit is given by the coordinates (0, 0, q). A more general result is given in the following theorem by Louis Mordell. Let i, j, kℕ be pairwise coprime and n(i, j, k) be the number of lattice points in the tetrahedron given by the coordinates (0, 0, 0), (i, 0, 0), (0, j, 0), and (0, 0, k). Then, n(i, j, k) = ijk/6 − s(jk, i) − s(ki, j) − s(ij, k) + (i + j + k + jk + ki + ij)/4 + (1/(ijk) + jk/i + ki/j + ij/k)/12 − 2. A proof is given in Dedekind Sums. If i, j, and k are not pairwise coprime, then we can still determine the number of lattice points inside or on the tetrahedron after making a series of adjustments, including dropping the condition that the arguments are coprime.

We conclude this section with the fantastic result that we can use Dedekind reciprocity to find a general solution to all solvable linear congruences, ax ≡ b mod c. In the next few paragraphs, we develop the theory to express x not implicitly in terms of a multiplicative inverse, but explicitly as a sum involving the greatest integer function. While the summation expression promises a Euclidean-like algorithm for evaluation, its main advantage is providing an explicit arithmetic expression for x for use in proofs. Both 12qs(p, q) and 12ps(q, p) are integers. Proof: It suffices to prove 12qs(p, q)ℤ because the latter case 푞−1 is the same except for a change in the order of variables. We proved above that s(p, q) = ∑ 푖=1 (i/q − ½)(pi/q − 푞−1 2 2 [pi/q] − ½) and 12q∑ 푖=1(i/q − ½) = q − 3q + 2. But (i/q − ½)(pi/q − [pi/q] − ½) = (i/q − ½)(pi/q − p/2 + p/2 − [pi/q] − ½) = (i/q − ½)(pi/q − p/2) + (i/q − ½)(p/2 − [pi/q] − ½) = p(i/q − ½)2 + (i/q − ½)(p/2 − [pi/q] − ½). 푞−1 2 2 푞−1 And 12q∑ 푖=1 {p(i/q − ½) + (i/q − ½)(p/2 − [pi/q] − ½)} = p(q − 3q + 2) + 3∑ 푖=1 {(2i − q)(p − 2[pi/q] − 1)}ℤ. Since both 12qs(p, q) and 12ps(q, p) are integers, we can reduce the reciprocity formula in modulo q. We have 12pq{s(p, q) + s(q, p)} ≡ p2 − 3pq + q2 + 1 mod q; p{12qs(p, q)} + q{12ps(q, p)} ≡ p2 + q(q − 3p) + 1 mod q; p{12qs(p, q)} ≡ p2 + 1 mod q. Since (p, q) = 1, p has a multiplicative inverse in modulo q and hence the congruence can be further simplified. Multiplying both sides by p−1 yields 12qs(p, q) ≡ p + p−1 mod q. The congruence, ax ≡ b mod c, is solvable iff d ∣ b where d = (a, c). Proof: This is an immediate corollary of Bezout’s lemma. Next we assume d ∣ b and solve the congruence (a/d)x ≡ (b/d) mod (c/d). Since (a/d, c/d) = 1, x ≡ (a/d)−1(b/d) mod (c/d); 12(c/d)s(a/d, c/d) ≡ (a/d) + (a/d)−1 mod (c/d); 12(b/d)(c/d)s(a/d, c/d) ≡ (a/d)(b/d) + (a/d)−1(b/d) mod (c/d); x ≡ 12(b/d)(c/d)s(a/d, c/d) − (a/d)(b/d) ≡ (a/d)−1(b/d) mod (c/d); and x ≡ (b/d){12(c/d)s(a/d, c/d) − (a/d)} mod (c/d). Example 2: Solve 7032x ≡ 60 mod 8004. Solution: First, we divide both sides by (7032, 8004) = 12 to get 586x ≡ 5 mod 667. Next, we set 586 = a/d, 5 = b/d, and 667 = c/d and plug these numbers into the formula for x. We get x ≡ 5{12(667)s(586, 667) − 586} mod 667. Since s(586, 667) = −815/1334 from Example 1, we get x ≡ −27,380 ≡ 634 mod 667.

CHAPTER 5: Miscellaneous Topics in Number Theory and Discrete Math

We conclude our treatment of elementary number theory with a variety of important topics. We start by developing elementary theory for finding integer solutions to one-variable equations or determining their existence. Next, we develop theory for approximating lines, curves, and eventually two-dimensional images in color with gridlines (or colored rectangles) on lattice-point diagrams. Finally, we introduce analytic number theory as an appropriate finish to serve as the transition to the analysis topics in the remaining chapters of Part II.

Section 5.1: Finding Integer Solutions to One-Variable Equations

Finding integer solutions to equations or even confirming their existence is very important in number-theory and discrete mathematics. This section develops the mathematical machinery necessary for a simple trial-and-error process for finding integer solutions or integer approximations to any explicit function using a graphing utility. It concludes with algorithms for finding integer solutions to some simple types of equations. Let us define an integer solution to y = f(x) as an ordered pair, (x, y), such that y = f(x) and x, yℤ. Then, ([x], f([x])) is an integer solution to y = f(x) iff f([x]) − [f([x])] = 0. Proof: Let [x] = k so that y = f(k) at integer values of x. Suppose (k, f(k)) is an integer solution to y = f(x). Since k, f(k)ℤ, f(k) − [f(k)] = 0. Suppose f(k) − [f(k)] = 0. Then k, f(k)ℤ. Hence, (k, f(k)) is an integer solution to y = f(x). From this point forward, we refer to the equation, y = f(x), as a Diophantine equation because it is an equation for which integer solutions are sought. Their name is derived from Diophantus of Alexandria, who studied them in the third century. Although an implicit solution for x exists, it has been proven that there is no general method for finding x other than calculator intersection. No large interval of f([x]) − [f([x])] is invertible because the function can fail the horizontal line test infinitely many times, potentially at every integer. In general, f([x]) − [f([x])] has no recognizable pattern or form of continuity. Likewise, Newton's method and even the bisection method fail. Fortunately, there is a step-by-step procedure for solving the implicit solution using calculator intersection on the TI-83. First, determine the most appropriate f(x) for the exercise. This step may involve consulting a number theory text. If f(x) cannot be expressed explicitly, then determine f −1(x); either way, if x, yℤ and their values are reversed, they should still be integers. Then set Y1 = f(X)−int(f(X)). Otherwise, set Y1 = f(int(X))−int(f(int(X))). Next, set the X and Y-windows on your graphing utility. Set the windows so that Xmin is first integer the −6 calculator tries, Xmax = Xmin + 94, Ymin = 0 and Ymax = 110 = 1E-6. If your calculator or computer is not a TI-83, then choose an interval for x that would not exceed the number of points on the graph. In other words, the interval for x should not be so large that your calculator or computer cannot plot Y1 for consecutive values of [x]. We recommend setting the Y-window from Ymin = 0 to Ymax = 110−6 so that just about every non-integer solution is eliminated. In other words, we are only interested in the zeros of Y1. And by setting Ymax so close to zero, its zeros are much easier to spot. Since the graphing utility usually 'connects the dots', when Y1 declines to zero from a value above 110−6, its graph over this razor-thin strip looks like a vertical band at that value of x, so the vertical bands are discontinuities in the graph at zeros. Ironically, when Y1 is graphed over this razor-thin strip, every point on the graph, including its zeros, is invisible. Our reason for setting the X-window from Xmin = 0 to Xmax = 94 (or from Xmin = 1 to Xmax = 95 for positive integer solutions) is due to the dimensions of the screen on the TI-83. This interval only allows for integer x-coordinates, meaning we can set Y1 = f(X)−int(f(X)). In fact, if our interval is longer than 94, the TI-83 cannot sample all integer x-coordinates over the interval, meaning some integer solutions could be lost! For this reason, we can test a maximum of 95 integer x-coordinates per interval. Make sure the calculator is set to graph on connect; this is usually the default. Then, graph Y1. Expect to see either nothing or vertical lines on the screen. A solution exists where a vertical line appears on the screen. Finally, trace the cursor to the x-coordinate of this line. The greatest integer of the x-coordinate of this line should be the integer solution for x. Always check each solution against y = f(x). In the following example, we illustrate each of the steps involved in finding integer solutions. Example 1: Solve 513xxx + xxxyyy + yyy12 = 1,164,561 where xxx, yyy  ℕ are each 3-digit numbers. Solution: It follows from the definition of the decimal digit that 513,000 + 12 + 1000xxx + xxx + 100yyy + yyy = 1,164,561. Hence, 1001xxx + 101yyy = 651,549. At this point, we can substitute x for xxx and y for yyy. We then solve for y to obtain y = (651,549 − 1001x)/101. Unfortunately, since the equation yields integer solutions at

every interval of 101, most of the solutions will not yield a 3-digit solution for y. On the other hand, this method is faster than trying 1001 integers for solutions to y = (651,549 − 101x)/1001. If we can determine a solution between 0 and 101, then we can just add 101 to our initial solution enough times to decrease y to three digits and obtain the solution. Thus, the trial-and-error process is still severely reduced. On the TI-83, we set Y1 = (651549−1001X)/101−int((651549−1001X)/101). Next, we graph Y1 and look for a vertical line on the screen. One vertical line appears on the screen. Then, we trace the line to X = 90 using the right and left arrow buttons. Finally, we search for an integer solution to y = (651,549 − 1001x)/101 such that x = 90 + 101k and x and y are integers between 0 and 999. We set the interval on the table to 101 (DeltaTbl = 101) and start at 90 (TblStart = 90). We find our solution by scrolling down the table. At x = 595 and y = 554, the solution holds. Since this is our only pair of 3-digit integer solutions, xxx = 595 and yyy = 554. Example 2: Evaluate (57,181) by factoring 57,181. Solution: Our equation is y = 57,181/x. Since 57,181 is odd, all of its factors are odd. So, our X-window begins at 1 (to test it since 1 must be a factor) and consists of all consecutive odd numbers. Since the difference between consecutive odd numbers is two units, the interval ends at 1 + 942, or 189. If no factors exist on this window, then we increase both the Xmin and Xmax by increments of 190 until Xmax exceeds √57,181. The first window presents a vertical line at x = 1 as expected, but no others. So, we examine the next window, which extends from 191 to 379. See the graph above. Since 379 > √57,181, all non-trivial factors of 57,181 must exist on this window. Two vertical lines appear on the screen. One is traced to x = 211 and the other is traced to x = 271. Since 211 is the smallest factor greater than 1, 211 is a prime. Since 211 ∤ 271, 271 must also be a prime. Since 211271 = 57,181 and both 211 and 271 are primes, 57,181 cannot have any other factors. So, 57,181 has four positive factors and (57,181) = 4.

We can also use our implicit solution to find integer approximations when no integer solutions exist. There are some very good rational approximations for irrational numbers. That is, an irrational fraction can be approximated as an integer numerator over an integer denominator. When approximating irrational numbers on an interval, we might need to adjust the window. And since some very close approximations fall below zero, we should also look for vertical lines on the interval [1 − f, 1] for values of 1 − f close to 1. All vertical line segments in previous examples extended from top to bottom on the screen. However, here, part of the line might be missing! As a rule of thumb, when approximating 0 for overestimates, set Ymin = 0 and Ymax below 0.05 and when approximating 1 for underestimates, set Ymin above 0.95 and Ymax = 1. If no vertical segments are visible over the interval, then widen the window by either increasing the value of Ymax when approximating 0 or decreasing the value of Ymin when approximating 1. Since we want a concise approximation, we rarely want the numerator to exceed two digits. For this reason, we recommend setting Xmin = 6 and Xmax = 100. If a fraction with a numerator between 1 and 5 is a sufficient approximation, then we can usually determine it by inspection. On the other hand, if no fraction with a numerator between 1 and 100 is a sufficient rational approximation for our purposes, then continued fractions are a better alternative. Aside from these adjustments, finding rational approximations for irrational numbers over a specified interval is very similar to our method for finding integer solutions to equations. Since the fraction part of any number is by definition on the interval (0,1), the denominator is larger than the numerator. Given the irrational number to at least six decimal places in scientific notation and the numerator, there is enough information to numerically determine the optimal integer denominator. To make the most of the window, we choose the numerator as our independent variable and determine the numerators of the best approximations over a specified interval. We choose an integer numerator to approximate the fraction part of the irrational number, f. Since f can be approximated with an integer numerator over an integer denominator, given the numerator, x, we want to find its denominator, f(x). So, f = x/f(x) and f(x) = x/f. Example 4: Determine rational approximations for the fraction part of . Solution: Since the reciprocal of the fraction part of  is 1/( − []) ≈ 7.062513305, we set Y1 = 7.062513305X. We first determine rational approximations below the fraction part by setting the Y-window to Ymin = 0.95 and Ymax = 1. There are no vertical line segments on this window. In fact, no lines appear until Ymin is lowered to 0.93. Each line segment is 16 spaces apart and one unit to the left of multiples of 16. The line

segment extending closest to 1 over [6,100] is at x = 95 and thus our approximation for  is 3 ₉₅ and our error is ⁶⁷¹ about 1.2910−5. We are guaranteed that our approximation is in lowest terms. If it were not in lowest terms, then a smaller numerator would offer the same approximation with a larger fraction part since the fraction part decreases linearly as x increases; we explain the reason why later on. This means that the line segment at that value of x would extend even closer to 1, a contradiction. Next, we set Ymin = 0 and Ymax = 0.05 to determine rational approximations above the fraction part of . There are many vertical line segments on this window. Since they are each multiples of 16, our approximation for  is 3 ₁₆ and our error is about 2.6710−7. Since this ¹¹³ approximation of  is very close for such a small numerator, it is among the famous rational approximations of . Notice that the numerators for our underestimates are congruent in modulo 16. In general, if f(x) is a linear equation with a positive slope, then the transformation f(k) − [f(k)] varies directly with x as its value increases to 1. Then its value falls to a number slightly above 0 and increases again. Therefore, for transformations of linear equations with a positive slope, if the x produces an optimally high y on a specific interval, then we should expect that x + 1 produces an optimally low y on the same interval. The converse also holds. Suppose we want to approximate a fraction part, fℚ, as a rational number within a specified tolerance. Then, our approximations can vary between a fixed upper bound, u, and a fixed lower bound, b, where 0 < b < u < 1. If we approximate the lower bound with overestimates, each of which is below the upper bound, then we can approximate f within a specified tolerance using just one window. Since the denominators of overestimates have been truncated, we can approximate b with the proper fraction, x/[y] where xℕ and y = x/b, iff b < x/[y] < u. The fraction, x/[y], approximates f within the specified tolerance iff 0 < y − [y] < [y](u − b)/b at x. Proof: We can derive this inequality from the inequality above. If b < x/[y] < u, then [y]b/b < x/b < [y]u/b; [y] < y < [y]u/b; 0 < y − [y] < [y](u/b − 1); and hence 0 < y − [y] < [y](u − b)/b. Since both expressions are mathematically equivalent, the converse holds. Hence, the vertical line segment at x intersects y = [x/b](u − b)/b iff b < x/[y] < u. Example 5: The fraction part representing the 'golden ratio' is f = (√5 − 1)/2. Find all rational approximations of f between b = 0.6180 and u = 0.6181 with a positive numerator, x < 100. Solution: Since our objective is overestimating b rather than approximating f, we set f(x) = x/b. On the TI-83, we set Y1 = x/0.618 − [x/0.618], Y2 = [x/0.618]0.0001/0.618, Xmin = 5, Xmax = 99, Ymin = 0, and Ymax = 0.05. Four vertical line segments shoot down the screen over the interval, but only the segment at x = 89 penetrates Y2. So, the only proper fraction between 0.6180 and 0.6181 with a numerator less than 100 is 89/144. It is not surprising that 89 and 144 are consecutive Fibonacci numbers because limnFn/Fn+1 = (√5 − 1)/2. Also 2 note that the graph of Y2 is almost linear for large x. For this reason, Y2 ≈ (u − b)x/b as long as 0 < Y2 < 1.

Finding integer solutions to linear equations of the form y = (ax + b)/c where a, b, cℤ is indispensable in number theory. We already covered the applications of expressing (a, c) as a linear combination of a and c and solving linear modular equations. Also, methods for solving some types of non-linear modular equations involve reducing them to a linear modular equation or a linear simultaneous congruence, which is a system of linear congruences of the form a1x ≡ b1 mod c1,..., aix ≡ bi mod ci,..., anx ≡ bn mod cn. Linear modular equations and linear simultaneous congruences arise in other areas of mathematics and in specialized instances they are applied in computer science, chemistry, and physics. Unfortunately, if |a| and |c| are large, then finding integer solutions using trial-and-error could be time consuming. Fortunately, we can solve them quite easily using Bezout’s lemma. This method is also known as the extended Euclidean algorithm. Before we begin, we review two results from Section 4.1 about integer solutions to linear equations. If d = (a, c), then 1) y = (ax + b)/c is solvable in integers iff d ∣ b and 2) if (x', y') is an integer solution to y = (ax + b)/c, then (x' + ck/d, y' + ak/d) is also an integer solution to y = (ax + b)/c for every integer k. Using the Euclidean algorithm and Bezout's lemma, we can find integer solutions to every equation of the form (a, c) = ax + cy provided we keep track of each linear combination of a and c for each iteration of the algorithm. In Section 4.1, we proved not only that d = ax + cy, but also that we can express every iteration of the Euclidean algorithm as a linear combination of a and c. Then, we can arrive at a solution without computing the gcd beforehand or using back substitution! In the following example, we leave a and c as variables on the right-hand side to avoid confusion and use them in place of their potentially large numerical values. Example 6: Find an (x, y)ℤ for which a. (5007, 3000) = 5007x + 3000y and b. y = (5007x + 57)/3000.

Solution: a. Set a = 5007 and c = 3000. Then, 5007 − 23000 = −993 = 1a − 2c; 3000 + 3(−993) = 21 = (1c) + 3(1a − 2c) = 3a − 5c; −993 + 4721 = −6 = (1a − 2c) + 47(3a − 5c) = 142a − 237c; 21 + 3(−6) = 3 = (3a − 5c) + 3(142a − 237c) = 429a − 716c; and −6 + 23 = 0. Since the algorithm terminates here, (5007, 3000) = 3 and (x, y) = (429, −716) is an integer solution to 3 = 5007x + 3000y. b. If y = (5007x + 57)/3000, then 57 = 5007(−x) + 3000y. Since 57 = 193, there are integer solutions of the form 3 = 5007(−x/19) + 3000(y/19) = 5007(429) + 3000(−716) and hence −x/19 = 429 and y/19 = −716. Therefore, (x, y) = (−8151, −13,604) is an integer solution to y = (5007x + 57)/3000.

Before introducing second-order Diophantine equations, we introduce two general techniques for verifying that any given Diophantine equation either has no solution or it cannot have solutions of a particular form. Since many Diophantine equations have no solution, we obviously want to determine if that is the case before we take the time to find them. Our first technique involves reducing both sides of the Diophantine equation in small moduli. Our next technique uses quadratic reciprocity on second-order Diophantine equations. Proving that a Diophantine equation must have a solution of a particular form requires much more advanced techniques. We can determine that a Diophantine equation has no solutions by reducing it in a small modulus. If it has solutions, then it would have solutions in every modulus. So, if we find a modulus without a solution, then we have proven that it has no solutions. If the Diophantine equation is expressed as a congruence, then rewrite it in equation form. In addition, it is preferable that all terms in the equation have integer coefficients and crucial that all terms are integer-valued for all integer inputs. As an example, we try to prove that y = √1,000,003 − 푥2 has no solutions. First, we rewrite it as x2 + y2 = 1,000,003 so all terms on each side of the equation are integer-valued for all integer inputs. Next, we reduce both sides in small moduli. We choose modulo 4; so our equation reduces to x2 + y2 ≡ 3 mod 4. Since x2 ≡ 0 mod 4 when x is even and x2 ≡ 1 mod 4 when x is odd, x2 + y2 is congruent to 0, 1, or 2 in modulo 4. Therefore, the circle equation, x2 + y2 = n, has no integer solutions for all n ≡ 3 mod 4. Since the purpose of quadratic reciprocity is to determine if a number is a perfect square in a given modulus, we can use it to determine if a particular second-order Diophantine equation has solutions. Suppose ax2 + bf(x, y) = n, where a, b, nℤ and a, b, n ≠ 0. If dℕ and d ∣ b, then ax2 ≡ n mod d and if gcd(a, d) = 1, then x2 ≡ a−1n mod d. Suppose ax2 + bxy + cy2 = n, where a, b, cℤ and gcd(a, b, c) ∣ n. If we fix y, then we have a one-variable quadratic equation with the discriminant, D = b2y2 − 4acy2 + 4an. If ax2 + bxy + cy2 = n has solutions, then D must be a perfect square, so D is a perfect square in any modulus. So, if d ∣ (b2 − 4ac), then D ≡ 4an mod d.

2 2 Pell equations are second-order Diophantine equations of the form x − ny = 1 where n, x, yℕ and √푛  ℤ and they have been studied since antiquity. Pell equations and some Pell-like equations of the form x2 − ny2 = m for small mℤ were studied in ancient times by Diophantus in Egypt and during the early middle ages by Brahmagupta in India. Over the next few centuries, some of Brahmagupta’s successors developed the Chakravala method for solving all Pell equations. In 1657, British mathematician William Brouncker gave the first general method known to Europe for solving Pell equations. (In fact, he used continued fractions to find the smallest positive solution to x2 − 313y2 = 1 in just a few hours!) Euler mistakenly attributed the solution to British mathematician John Pell and called them Pell equations. Interest in Pell equations grew in the 19th century with the development of algebraic number theory because solving equations of the form x2 − ny2 = 1 is necessary for finding units in the quadratic ring ℤ[√푛]. (The quadratic ring ℤ[√푛] for a natural number, n, that is not a perfect 2 2 square, includes all elements of the form x + y√푛 where x, yℤ. The norm of each element in the ring is x − ny and its units have a norm of 1.) All Pell equations have solutions in positive integers. The proof involves Diophantine approximation and is beyond the scope of this work. An elementary, though fairly long proof is given in A Friendly Introduction to Number Theory 2nd ed. by Joseph Silverman; see appendix. However, if m is not a perfect square, then x2 − ny2 = m might not have solutions. The following example demonstrates how Brouncker used continued fractions for solving Pell equations. Example 7: Find the smallest x, yℕ such that x2 − 91y2 = 1 by finding a rational approximation for √91. Solution: Since x2 − 91y2 = (x − √91y)(x + √91y), (x − √91y) = 1/(x + √91y). Even for small positive values

of x and y, 1/(x + √91y) is near zero. Since (x − √91y) ≈ 0, x/y ≈ √91. We find rational approximations for √91 using the technique introduced in Section 4.1. The convergents to √91 are [9; ̅1̅, ̅1̅, ̅5̅, ̅1̅, ̅5̅, ̅1̅, ̅1̅, ̅1̅8̅]. We represent them as improper fractions and plug in the numerator for x and the denominator for y until we find a solution. The ₁ ₆ ₇ ₄₁ ₄₈ ₈₉ ₈₉ first seven convergents to √91 are 10, 9 , 9 , 9 , 9 , 9 , and 9 . For the latter, 9 = 1574/165 and 15742 − ² ¹¹ ¹³ ⁷⁶ ⁸⁹ ¹⁶⁵ ¹⁶⁵ 911652 = 1. Since we are not guaranteed that x2 − 91y2 = −1 has solutions, we pause with the continued fractions approach and see if we can prove it has no solutions using the above techniques. Reducing it in modulo 4 yields x2 + y2 ≡ 3 mod 4. We demonstrated above that this congruence has no solutions. Had we been less clever, we could have reduced it in modulo 91 and used the Jacobi symbol to show that x2 ≡ −1 mod 91 has no solutions. The method of continued fractions is only guaranteed to find all solutions, if any, to x2 − ny2 = 1 and relying on continued fractions for solving Pell-like equations for other m may cause us to overlook solutions. The reason for this is continued fractions is a method for providing the best rational approximation, c/d, for a denominator not greater than d. This kind of laser-precision is necessary for solving x2 − ny2 = 1, but maybe not for solving x2 − ny2 = 10, so we can expect that the continued fractions method might skip solutions in the latter case. In Chapter 8, we present a more robust technique that we can use for solving Pell-like equations for small |m|.

Fortunately, we need only find the smallest integer solution to x2 − ny2 = 1 because all of its integer solutions 2 2 can be expressed in terms of its smallest integer solution. Let (x1, y1) be the smallest solution to x − ny = 1 in 2 2 positive integers so that x1 − ny1 = 1. Our Pell equation can be rewritten as (x1 + √푛y1)(x1 − √푛y1) = 1. Raising k k both sides to the kth power yields (x1 + √푛y1) (x1 − √푛y1) = 1. Using induction on k ≥ 2, we can prove that (x1 + k k √푛y1) is of the form (xk + √푛yk) and (x1 − √푛y1) is of the form (xk − √푛yk) for natural numbers, xk and yk. We omit a formal proof for this result, but you can find one in A Friendly Introduction to Number Theory 2nd ed. by Joseph Silverman; see appendix. And using a similar approach, we can prove that if (xm, ym) is an integer solution 2 2 2 2 to x − ny = m, then (xmxk + nymyk, xmyk + ymxk) is also an integer solution to x − ny = m. 2 2 We can simplify our expressions for xk and yk even further. Since x1 − ny1 = 1, x1 = √1 + 푛푦₁² and so (x1 − 2 √푛y1) = √1 + 푛푦₁² − √푛푦₁. Since y, nℕ, ny1 ≥ 1 and √1 + 푛푦₁² + √푛푦₁ > 2. Since {√1 + 푛푦₁² − √푛푦₁}{√1 + 푛푦₁² + √푛푦₁} = 1, √1 + 푛푦₁² − √푛푦₁ = 1/{√1 + 푛푦₁² + √푛푦₁}, and so 0 < √1 + 푛푦₁² − √푛푦₁ < ½. k k Since √1 + 푛푦₁² − √푛푦₁ = (x1 − √푛y1), 0 < (x1 − √푛y1) < ½ for all kℕ. Let i = (x1 − √푛y1) and j = (x1 + k √푛y1) . Then, xk = (j + i)/2. Since xkℕ, j + iℕ. Let j = k + f. Since 0 < i < ½ and j + iℤ, i = 1 − f. So, adding i to j effectively rounds j up to the nearest integer. Since 0 < i < ½ for all kℕ, [j + ½] also rounds j up to the k nearest integer. So, xk = [(x1 + √푛y1) + ½]/2. Similarly, yk = (j − i)/(2√푛). Since ykℕ, (j − i)/√푛ℕ. Let j/√푛 = k + f. Since 0 < i/√푛 < ½ and (j − i)/√푛ℤ, i/√푛 = f. So, subtracting i/√푛 from j/√푛 effectively rounds j/√푛 k down to the nearest integer. Therefore, yk = [(x1 + √푛y1) /√푛 ]/2.

The ability (or inability) to solve some non-linear Diophantine equations with large arguments is crucial in modern cryptography. In the next few paragraphs, we provide a brief treatment of additional number-theoretic concepts necessary to understand asymmetric cryptosystems. Then, we present two asymmetric cryptosystems and the methods for solving the Diophantine equations that the cryptosystems use for deciphering messages. An asymmetric (or public-key) cryptosystem is a cryptosystem that provides the public with instructions for enciphering messages, but retains instructions for deciphering them. For example, suppose Alice creates an asymmetric cryptosystem and asks Bob (and other agents) to send her secret messages while on a mission. Bob can cipher messages and send them to her. She can then decipher them. If Bob is ever captured, he would not be able to divulge how to decipher the messages. This idea was first proposed by Whitfield Diffie and Martin Hellman in 1976. In 1977, Ron Rivest, Adi Shamir, and Leonard Adleman invented the first practical asymmetric cryptosystem. They called it RSA, which stands for the first letter of each of their last names. In 1979, Israeli computer scientist Michael O. Rabin devised a cryptosystem that may be even more secure that RSA. Both cryptosystems exploit that fact that there is no polynomial-time algorithm for integer-factorization. Suppose Alice decides to use the RSA cryptosystem and she and Bob decide on a simple cipher for substituting numbers for letters. First, Alice chooses two large primes, p and q, and sets n = pq. (Since n is a product of only two primes, n is sometimes referred to as .) By large, we mean hundreds of digits, so she needs to use Fermat’s little theorem to find them. Next, Alice computes n and 휙(n). Then, Alice chooses an

integer exponent, e, such that 1 < e < 휙(n) and (휙(n), e) = 1. Finally, Alice divulges the values of e and n (the public keys) to Bob so he can cipher messages. She keeps the values of p, q, and 휙(n) (the private keys) secret. Suppose Bob has a message to cipher. First, Bob uses a simple cipher for substituting numbers for letters. This way, his message has a numerical value, m. Assume that m < n. If this is not the case, then Bob breaks the message e into smaller messages, mi, until all mi < n. Next, Bob computes m mod n to scramble it. Given the size of n, Bob needs to use successive squaring. Then, Bob sends his message to Alice. Suppose Alice receives Bob’s scrambled message, s, and wants to decipher it. She must solve the congruence, me ≡ s mod n, for m. If me ≡ s mod n , (s, n) = (휙(n), e) = 1, and s, n, and e are known, then we can solve for m by computing 휙(n). That is, for some iℕ, m ≡ si mod n where ie ≡ 1 mod 휙(n). Proof: If m ≡ si mod n, then sie ≡ s mod n. Since (s, n) = 1, sk(n)+1 ≡ s mod n for some kℕ. And since (휙(n), e) = 1, we can equate like powers of s. That is, ie = k휙(n) + 1, or equivalently, ie ≡ 1 mod 휙(n). Therefore, m ≡ si mod n. And since the congruence, ie ≡ 1 mod 휙(n), gives a unique solution for i in modulo 휙(n), the value of m is unique. The RSA cryptosystem is asymmetric. Proof: We already demonstrated that Bob can scramble the message. In order for Alice to decipher it, she must know the value of 휙(n), which is (p − 1)(q − 1). Then, she can solve for i using the extended Euclidean algorithm. By Lamé’s theorem, the upper bound on the number of steps in the Euclidean algorithm is roughly proportional to the number of decimal digits of e, which is not necessarily large. And successive squaring is not a problem. Suppose Bob is captured and asked to help decipher other intercepted messages. Bob could divulge the value of n. However, in order for someone other than Alice to decipher messages, they would need to know how to compute discrete eth roots in modulo n. At present, there is no good method for computing discrete eth roots in modulo n without using 휙(n). And at present, there is no good method for computing 휙(n) other than factoring n. algorithms have come a long way; however, if n is semiprime and its only non-trivial factors are hundreds of digits, then we cannot expect to determine them in our lifetime. So, the RSA cryptosystem cannot be cracked. No one has proven that it cannot be cracked in the future. Suppose Alice decides to use the Rabin cryptosystem and she and Bob decide on a simple cipher for substituting numbers for letters. First, Alice chooses two large primes, p and q, and sets n = pq. By large, we mean hundreds of digits. Next, Alice computes n and chooses 2 as the integer exponent. Finally, Alice divulges that the exponent is 2 and the product is n (the public keys) to Bob so that he can cipher messages. She keeps the values of p and q (the private keys) secret. Suppose Bob has a message to cipher. First, Bob uses a simple cipher for substituting numbers for letters so that his message has a numerical value, m. Assume that m < n. If this is not the 2 case, then Bob breaks the message into smaller messages, mi, until all mi < n. Next, Bob computes m mod n to scramble it. Then Bob sends his message to Alice. Suppose Alice receives Bob’s scrambled message, s, and wants to decipher it. She must solve the congruence, m2 ≡ s mod n, for m. If p and q are known, then we can solve for m by solving the quadratic simultaneous congruence, m2 ≡ s mod p and m2 ≡ s mod q. Proof: Since m2 ≡ s mod n, m2 = s + kpq for some k. So, m2 ≡ s mod p and m2 ≡ s mod q. Since (p, q) = 1 and n = pq, this simultaneous congruence has a unique solution for m2 in modulo n by the Chinese Remainder Theorem. In Section 4.2, we discussed how to compute discrete square-roots in prime moduli using successive squaring. Alice would probably choose primes that are congruent to 3 mod 4 or 5 mod 8 so that she does not have to use the Tonelli-Shanks algorithm. Once we solve both congruences for m, we are left with the linear simultaneous congruences, m ≡ s1 mod p and m ≡ s2 mod q, which we can solve by using substitution and back-substitution. We can solve the latter congruence, pk1 ≡ s2 s1 mod q for k1 by using the extended Euclidean algorithm or by computing the discrete reciprocal of p in modulo q. The Rabin cryptosystem is asymmetric. Proof: We already demonstrated that Bob can scramble the message. In order for Alice to decipher it, she must compute the discrete square-root of s in modulo n. Suppose Bob is captured and asked to help decipher other intercepted messages. Bob could divulge the value of n. However, in order for someone other than Alice to decipher messages, they would need to know p and q. In fact, Rabin proved that anyone with a polynomial-time algorithm for computing m also has a polynomial-time algorithm for computing p and q. In this respect, the Rabin cryptosystem is even more secure than RSA because it requires integer factorization. Unfortunately, the value of m is not unique. Notice that our linear simultaneous congruence is really four separate simultaneous congruences. Since Alice would have to solve all four simultaneous congruences and choose the solution that produces a coherent message (or design software that would recognize it) and the Rabin cryptosystem was developed two years after RSA, it is not in widespread use.

Section 5.2: Graphics and Block-Truncation of Images

The images of many real-life objects can be approximated with many tiny rectangles. For instance, computers, televisions, and Xerox machines approximate images and their corresponding colors so finely that the approximation appears artificial only upon closer inspection. In fact, approximations of images are just information. If the user knows what rectangles or edges thereof constitute an approximation, then the approximation can be copied manually. The objective of this chapter is approximating one-dimensional lines or curves with the edges of rectangles and two-dimensional images with shaded rectangles through the use of block-truncation. Although most of this section will not require a calculator, it will require graph paper, rulers, and colored pencils. This section combines math with art. Try to have some fun with it. We define block truncation to unit squares on the Cartesian coordinate system as the mapping f(x)[f([x])]. The unit squares on a Cartesian grid each have a length and width of one unit and their edges are formed by the intersection of horizontal and vertical gridlines at integers. So, the distances from the horizontal edges of each square to the x-axis are integers and the distances from the vertical edges of each square to the y-axis are also integers. Therefore, the mapping is an integer approximation of f(x) that maps every point to a horizontal gridline and every discontinuity to a vertical gridline. Another interesting point is that the block-truncated f(x) is an integer frozen over the interval [k, k + 1), meaning we only need to define f(x) for integer values of x. And at integer values of x, f(x) is truncated down to the nearest integer lattice point. If the graph of f(x) is not very volatile, then besides engaging in block truncation, the sandwich iteration does little to distort the shape of f(x). In fact, the sandwich iteration does not necessarily round f(x) down! Of course, if the slope of f(x) is positive, then the sandwich iteration truncates because it replaces x with k. However, if the slope of f(x) is negative, then there is a great possibility that [f(k)] > f(x) because f(k) > f(x). Example 1: The values of x for which f(x) is known are listed in the table below. Fill in the blanks.

f(x) 6.30 6.35 8.40 8.45 8.50 8.55 8.60 [f([x])] 6 _ 6 _ 8 _ 8 x 2.4 3.0 3.6 4.2 4.8 5.4 6.0

Solution: The value of the leftmost blank is [f(3)] = 6. The value of the middle blank is [f([4.2])] = [f(4)] = [f([4.8])] = 8. And the value of the rightmost blank is [f([5.4])] = [f(5)], which is unknown. We can still estimate the value of the rightmost blank using linear interpolation. Since 4.8 < 5 < 6 and f(4.8) < f(5.4) < f(6), we expect that [f(5)] = 8 and we would be surprised if that is not the case, but it may not be the case, especially if f(x) has a discontinuity near x = 5. Example 2: Block-truncate the graph of g(x) below and to the left to the edges of the nearest unit square. Solution: Notice that the graph of y = [g([x])] on the right connects the dots on the square lattice.

What if the grid is not made up of unit squares? The grid could be made up of rectangles with non-integer lengths and widths. How is the block-truncation of f(x) different in this case? First, suppose the grid forms non-unit squares each with length and width w. Since each square has a length and width of w, the transformation (for the block-truncation of f(x)) must truncate x to the nearest integer multiple of w, w[x/w], and truncate f(w[x/w]) to the nearest integer multiple of w, w[f(w[x/w])/w]. Next, suppose the grid is made up of rectangles with width, w, and with length, l. Then, the transformation for y is l[y/l]. So, if f(x) is truncated to the nearest l and x to the nearest w, then the transformation from f(x) to the block-truncated f(x) is l[f(w[x/w])/l]. This raises another question. What if the value of f(x) is unknown (or undefined) at integer multiples of w? If the value of f(x) is known (or defined) for only finitely many values of x over an interval, then we can only block-truncate f(x)

exactly if f(x) is defined at all integer multiples of w over the interval. Since this may not be the case for an arbitrary value of w, we must first approximate the value of f(kw). We can approximate f(kw) using linear interpolation. If the values of f(xi) and f(xi+1) are known, xi < xi+1, and xi is the largest value of x less than kw for which f(x) is defined, then xi < kw < xi+1. Let y denote the straight line connecting the points (xi, f(xi)) and (xi+1, f(xi+1)). If the graph of f(x) is fairly stable, then the value of f(kw) is approximately the value of y at x = kw. Using the point-slope form for a line, f(kw) ≈ {(f(xi+1) − f(xi))kw + f(xi)xi+1 − f(xi+1)xi}/(xi+1 − xi) = {f(xi)(xi+1 − kw) + f(xi+1)(kw − xi)}/(xi+1 − xi). Example 3: a. The values of x for which f(x) is known are listed in the table below. Let f '(x) be the block-truncated f(x) to rectangles with width ¾ units and length ⅔ units. Approximate the values of f '(x) using linear interpolation. b. Block-truncate the graph of g(x) below to the edges of the nearest ¾⅔-rectangle.

f(x) 5.95 6.01 6.19 6.20 6.29 8.15 8.50 8.50 8.15 7.95 8.15 9.55 9.59 9.63 9.67 9.73 9.65 x 9.6 9.8 10.0 10.2 10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8 12.0 12.2 12.4 12.6 12.8

Solution: a. Since we are block-truncating f(x) to rectangles with width ¾ units and length ⅔ units, f '(x) = ⅔[1½f(¾[1⅓x])]. Since the value of [1⅓x] is constant for all ¾k ≤ x < ¾k + 1, we are only interested in values for x between 9.6 and 12.8 that are integer multiples of ¾. These numbers are 9¾, 10½, 11¼, 12, and 12¾. Using linear interpolation, we can approximate f '(x) over the interval [9¾, 13½). We demonstrate this in the table on the left. b. Although it may be hard to tell from the graph of y = g'(x) on the right, the height of each wafer is a multiple of ⅔ and the distance between each jump discontinuity and the y-axis is an integer multiple of ¾.

¾k xi f(xi) xi+1 f(xi+1) ≈f(x) ≈f '(x) 9¾ 9.6 5.95 9.8 6.01 5.995 5⅓ 10½ 10.4 6.29 10.6 8.15 7.220 6⅔ 11¼ 11.2 8.15 11.4 7.95 8.100 8.0 12.0 12.0 9.59 12.0 9.59 9.590 9⅓ 12¾ 12.6 9.73 12.8 9.65 9.670 9⅓

The transformation from x to the edge function, f, is useful when approximating an image of only a few shapes in only two colors, such as black and white. In order to approximate the image, first examine the edges of the figure(s) and assign them distinct f’s for the interval over which the edges are functions. Next, determine f for all x on which f is defined. The easiest way to determine each f is not to make a table as in Examples 1 and 2, but to graph it. Unless the equation for f is specified by the geometry of the figure, estimate each f for the value of x in between two rectangles on the grid. There is no need to estimate f for any other x because the values of its x-coordinates are truncated to the values of the x-coordinates intersecting the nearest rectangles’ vertical edge. Then, block-truncate f. Finally, make sure the two arbitrary colors contrast enough so that the edges are visible. How can we outline a block-truncated figure? Before we can outline a block-truncated figure, we must define its interior and exterior. Its interior consists of all rectangles within the block-truncated edges of the figure. Its exterior consists of all rectangles outside the block-truncated edges of the figure. So, we can outline the figure by coloring in all interior or exterior rectangles that border the block-truncated edges of the approximation of the figure. If possible, we can outline the figure by coloring the segments of gridlines bounding the figure. If the objective is shading or coloring in the figure, then shade or color in either all interior rectangles or all exterior rectangles (not both) of the block-truncated figure. Example 4: Block-truncate the image of the rotated toothbrush below to 0.1"  0.1" squares. Solution: Since some of the exercises ask that you block-truncate figures on graph paper, we work out this example by hand to show how it should look when you do it. We generate the image by tracing a toothbrush on the graph paper. First, we determine the number of functions required to fully enclose the figure. It turns out that five functions are required-two to enclose the toothbrush handle and three to enclose the bristles. One edge function has a graph that ranges from the handle to the base of f2. Since its graph does not intersect a vertical gridline, its block-truncated graph is undefined and hence it does not exist. For this reason, we ignore it and label the other edge functions f1 through f4. Next, we mark the domains of the four functions that bound the image with an x.

Then, we block-truncate each function. Finally, we draw vertical line segments to close any gaps in the figure.

Approximating a multi-colored or detailed picture using the method above requires that we account for an unreasonably high number of edge functions and boundaries. So, when block-truncating a multi-colored or detailed image, block-truncate with respect to the color, not the border between colors. Think of the image as a contour diagram for the bivariate function, f(x, y) where the input is a Cartesian coordinate and the output is the corresponding color by that point. First, determine the width and length of the grid. Next, call the lowest leftmost point on the picture the origin. Then, determine the color of each lattice point at the intersection of gridlines. Finally, give the rectangle extending above and to the right of the point the same color as the lattice point. The justification for the last step is that all points within the rectangle with dimensions wl are block-truncated to the intersection of gridlines. That is, the transformation, xw[x/w], shifts all the points inside the rectangle to its leftmost vertical edge. Similarly, the transformation, yl[y/l], shifts all points on the leftmost vertical edge of the rectangle down to the lowest leftmost corner of the rectangle. Since this corner is the intersection of a horizontal and vertical edge on the rectangle, by definition, it is also a lattice point. So, according to the theory of block-truncation, the color of the lowest leftmost corner of a rectangle on the original picture is the color of the entire rectangle on the block-truncated image. If the rectangles are sufficiently small, then the color of the lattice point is almost certainly the color of the entire rectangle on the original picture. We also recommend truncating with respect to color and shade. Since color and shade can be quantified and then ordered in some manner, such as by wavelength, we can create an objective method for approximating the image’s color with the nearest color available to us. Some software programs quantify colors. For example, Visual Basic assigns numbers from 0 to 224 − 1 to its colors, providing the user with about 16,777,216 selections. Another method for truncating with respect to color is discrete-color pigmentation. It involves adding color by combining a set of similar objects that are already colored and have a finite number of different colors. For example, they may attempt to capture the beauty of a monarch butterfly using a lattice of thousands of miniature colored tiles, lights, pebbles, or even M&M's. An artist inevitably block-truncates an image by creating artwork using discrete-color pigmentation because the colored objects would normally have a fairly uniform and measurable length and width (except in the extreme case they use glitter or colored sand). The motivation behind manual block-truncation of a color picture is generally for the creation of block-truncated art. For most business applications, we leave block-truncation to machines; however, the theory of block-truncation can still be used to design and improve on existing software.

Section 5.3: An introduction to Analytic Number Theory

Analytic number theory is the study of properties of integers and arithmetic functions using mathematical analysis. It differs from the elementary number theory that we studied so far in that its emphasis is not on computing quantities or exact results, but more about studying asymptotic behavior of arithmetic functions for large arguments using approximations and limits. Most sources credit Peter Gustav Lejeune Dirichlet with the founding of analytic number theory in 1837 with his proof that there are infinitely many primes in every arithmetic progression of the form a + ck with (a, c) = 1 using Dirichlet characters and L-functions. However, much of the material we present in this section is an application of results discovered by Euler a century earlier. It also applies two key results we derived in Chapter 2. Before proceeding, we must define some notation. First, the Dirichlet product (or Dirichlet convolution) of arithmetic functions, f(n) and g(n), is defined as ( f *g)(n) = ∑i∣n f(i)g(n/i) = ∑kj=n f(k)g(j). The Dirichlet product was devised for evaluating terms in the hyperbolic summation of the product of infinite series. That is,    (∑ 푘=1 f(k))(∑ 푗=1g(j)) = ∑ 푛=1∑kj=n f(k)g(j). It follows from the definition that the Dirichlet product is commutative. And the Dirichlet product has many additional properties that can be used for manipulating relationships between arithmetic functions, allowing us to prove far more identities involving divisor summation than we did in Section 2.1. However, our main objective for this subsection is far narrower; it is only to study f *1. Second, when expressing numerical error, inequalities usually suffice; however if the error term must be expressed as a function and especially a function without an upper bound for large arguments, then we often use notation for its order of magnitude. Examples include the big O notation, the order of magnitude estimate, and the asymptotic equivalence. If there exists positive constants n' and c such that for all n ≥ n', | f(n)| ≤ c|g(n)|, then f(n) = O(g(n)). In other words, f(n) does not exceed the Order of magnitude of g(n). If there exists positive constants n', c1, and c2 such that for all n ≥ n', c1|g(n)| ≤ | f(n)| ≤ c2|g(n)|, then f(n) ≍ g(n). In other words, f(n) and g(n) have the same order of magnitude. If limn f(n)/g(n) = 1, then f(n) ~ g(n). In other words, f(n) and g(n) are asymptotically equivalent. Third, we need to introduce more arithmetic functions. The Mobius function is denoted as (n) and defined as (1) = 1 and for n > 1, (n) = {0 if there is a prime for which p2 ∣ n and (−1)k where k is the number of distinct primes that divide n}. It is defined so that if g = f *1, then f =  *g. This result is called Mobius inversion. The von Mangoldt function is denoted as (n) and defined as (n) = {ln(p) if n is a power of a prime (n = pk) and 0 푘 otherwise}. It is defined so that ln(n) = (n)*1. Its cumulative sum, (k) = ∑ 푛=1(n), is known as the von Mangoldt summatory function or the Chebyshev function. The nth harmonic number is denoted as Hn and defined 푛 as Hn = ∑ 푘=11/k. The Euler-Mascheroni constant is denoted as  and defined as  = limn {Hn − ln(n)}. And last, the number of primes not exceeding n is denoted as (n). The theorem states that (n) ~ n/ln(n).

Some sums of arithmetic functions that are continuously differentiable (smooth) over the naturals have traditionally been of interest in analytic number theory. Our objective is to approximate with precision Hn and ln(n!) for all large n  ℕ, and the Riemann zeta function, (s), for all s  ℂ with a positive real part other than 1. The nth harmonic number has the approximation Hn = ln(n) +  + O(1/n). We hope to approximate Hn better than this, but first we must prove  is indeed a constant and then approximate it with precision. This cannot be done directly from the definition of  because both limn Hn and limn ln(n) diverge to infinity. However, 푛 푛 rewriting ∑ 푘=11/k in integral form allows us to subtract the ln(n) term as follows: ∑ 푘=11/k = ln(n) − ln(1) + (1/n 푛 2  2 + 1/1)/2 − ∫1͏ ((x))/x dx; hence limn {Hn − ln(n)} = ½ − ∫1͏ ((x))/x dx. Since |((x))| < ½ and limx 1/x = 0, the indefinite integral clearly has a limit. It follows that  is a constant. We solve for  by first approximating the cumulative sum of 1/k with precision and we do so using the Euler-Maclaurin summation formula derived in 2 4 6 Section 2.3. This yields H'n = ln(n) + 1/(2n) − 1/(12n ) + 1/(120n ) − 1/(252n ). Next, we subtract our cumulative sum from the exact value of Hn for moderately sized n to solve for . Our results are in the table below.

n Hn to 13 places H'n to 13 places Hn − H'n to 13 places 1 1.0000000000000 0.4210317460318 0.5789682539682 5 2.2833333333333 1.7061176584658 0.5772156748675 10 2.9289682539682 2.3517525890257 0.5772156649425

15 3.3182289932290 2.7410133283259 0.5772156649031 20 3.5977396571437 3.0205239922420 0.5772156649017 25 3.8159581777535 3.2387425128518 0.5772156649017 30 3.9949871309204 3.4177714660190 0.5772156649014

And they demonstrate the sheer power of the Euler-Maclaurin summation formula. Notice that evaluating just the first five terms of the Euler-Maclaurin formula for n > 17 gives us  to 12 decimal places! And our precision thereafter was limited only by the TI-83’s round off error of Hn. We also achieved such a powerful result because we did not fuss with the first term in the sum, which has the largest error. We can avoid it because its value and error term are part of the constant term, so the cumulative sum is all that matters. So, for best results, sum the first few terms manually (exactly) and evaluate the formula at larger n. Now that we solved for , we can give the 2 4 6 Euler-Maclaurin formula for Hn; Hn ≈ ln(n) + 0.5772156649015 + 1/(2n) − 1/(12n ) + 1/(120n ) − 1/(252n ). 푛 The first three terms in our formula for Hn also help us estimate ∑ 푘=1[n/k], which is needed for our next result. 푛 Dirichlet’s divisor problem is finding the best estimate for the error term for his estimate of ∑ 푘=1(k), or 푛 equivalently ∑ 푘=1[n/k], for large n. This is still an open problem and the little progress that has been made required a lot of computation and advanced mathematics, most of which is beyond the scope of this work. However, the derivation of Dirichlet’s estimate is elementary enough that we present it here. First, we attempt to 푛 푛 푛 estimate ∑ 푘=1[n/k] intuitively using our approximation for Hn. We have ∑ 푘=1[n/k] ≈ ∑ 푘=1(n/k − ½) = 푛 푛 n{∑ 푘=11/k − ½} ≈ n(ln(n) +  − ½). Our estimate of ∑ 푘=1[n/k] is quite accurate, but since we summed [n/k] over all n and since the fraction parts of n/k are not completely random (especially as k approaches n), its error term is O(n). Fortunately, we can improve on our estimate using a lattice point diagram and symmetry. 푛 We derive Dirichlet’s estimate of ∑ 푘=1[n/k] by summing [n/k] just up to 푛 [√푛]. We can represent ∑ 푘=1[n/k] geometrically as the number of lattice points with both coordinates in ℕ (above the x-axis and to the right of the y-axis) and not above y = n/x. We do so with n = 11 in the diagram on the right. Further, we add a square that is bounded by and touches both axes and y = n/x. This square is unique and its side length is √푛. Let r1 denote the number of lattice points with coordinates in ℕ in the region above the square and not above y = n/x, r2 denote the number of lattice points with coordinates in ℕ in the region to the right of the square and not to the right of y = n/x, and r3 denote the number of lattice points with coordinates in ℕ in the region inside or on the square. Since y = n/x is its own inverse, its graph [√푛] is symmetric with respect to y = x, so r1 = r2. In addition, ∑ 푘=1 [n/k] = r1 + r3 2 [√푛] 2 [√푛] 2 [√푛] and r3 = [√푛] . It follows that 2∑ 푘=1 [n/k] − [√푛] = r1 + r2 + r3. And 2∑ 푘=1 [n/k] − [√푛] ≈ ∑ 푘=1 {2(n/k − ½) − (√푛 [√푛] [√푛] − ½)} = ∑ 푘=1 {2n/k − (√푛 + ½)} ≈ n{2∑ 푘=1 1/k − (√푛 − ½)(√푛 + ½)/n} ≈ n{2(ln(√푛 − ½) +  + 1/(2√푛)) − 1} ≈ n{2(ln(√푛) − 1/(2√푛) +  + 1/(2√푛)) − 1} ≈ n(ln(n) + 2 − 1). So, the difference between our estimates of 푛 푛 ∑ 푘=1[n/k] is only n( − ½) ≈ n/13, but the difference is significant. This time, we estimated ∑ 푘=1[n/k] by summing [n/k] only up to √푛, so the error term for Dirichlet’s estimate is at most O(√푛). Since the fraction parts of n/k for k up to √푛 appear far more random than the fraction parts of n/k for k from √푛 to n, we expect the error term to be far smaller and over the years a number of mathematicians have shown that to be the case. At present, it is widely conjectured that the error term can be as small as O(n¼+) for all  > 0. 푛 The nth partial sum of ln(n), ∑ 푘=1ln(k) = ln(n!), has the approximation ln(n!) = (n + ½)(ln(n) − 1) + c + O(1/n) for some constant, c. We hope to approximate ln(n!) better than this, but first we must prove c is indeed a constant. 푛 This cannot be done directly because the limits of both terms diverge to infinity. However, rewriting ∑ 푘=1ln(k) in 푛 integral form allows us to evaluate ln(n!) − (n + ½)(ln(n) − 1) as follows: ∑ 푘=1ln(k) = nln(n) − n − 1ln(1) + 1 + 푛  (ln(n) + ln(1))/2 + ∫1͏ ((x))/xdx; hence limn {ln(n!) − (n + ½)(ln(n) − 1)} = 1½ + ∫1͏ ((x))/xdx. For the indefinite integral to have a limit, it is not enough that |((x))| < ½ because limx ln(x) diverges. We can show that it has a limit using integration by parts, but there is a more obvious way. Since ((x)) is negative over (k, k + ½) and  positive over (k + ½, k + 1), we want to express ∫1͏ ((x))/xdx as an alternating series. Alternating series have terms k of the form (−1) |g(k)| and converge if limk g(k) = 0, which is the case for our integrand because |((x))| < ½ and

   푘+½ limx1/x = 0. We express ∫1͏ ((x))/xdx as an alternating series as follows: ∫1͏ ((x))/xdx = ∑ 푘=1{∫푘͏ ((x))/xdx + 푘+1  ∫푘͏ +½((x))/xdx} = ∑ 푘=1{−((k + ½)(ln(k + ½) − ln(k)) − ½) + (½ − (k + ½)(ln(k + 1) − ln(k + ½)))}. Our last required step is to demonstrate that the positive and negative terms in the series are asymptotically equivalent. Since the absolute value of the area of each triangle bounded by the x-axis in the sawtooth wave is ⅛, we expect that the asymptotic limit of the absolute value of the terms in the series is 1/(8(k + ½)) and we leave it for the reader to 푘+½ 푘+1 verify that −(k + ½)∫푘͏ ((x))/xdx ~ (k + ½)∫푘͏ +½((x))/xdx ~ ⅛. It follows that c is a constant. We can solve for c by approximating the cumulative sum of ln(k) with precision using the Euler-Maclaurin summation formula. However, in this case the exact value of c can be derived analytically and it is used in Sterling’s approximation formula for n!. In conclusion, ln(n!) ≈ (n + ½)(ln(n) − 1) + (ln(2) + 1)/2 + 1/(12n) − 1/(360n3) + 1/(1260n5).  −s The Riemann zeta function is defined as (s) = ∑ 푘=1k for all complex s with Re(s) > 1. Since (s) is an infinite sum unlike the partial sums discussed above, we do not need to express it in integral form to prove it has a constant term because it is the constant term. And we can evaluate it to any degree of precision using the identity  푛 ∑ 푘=1 f(k) = ∑ 푘=1 f(k) − F(n) − f(n)/2 − f ʹ(n)/12 + f ‴(n)/720 − … by choosing an arbitrarily large n. The Riemann hypothesis states that if s in the critical strip (meaning 0 < Re(s) < 1) and (s) = 0, then Re(s) = ½. Its consequences are significant and far reaching, extending from an exact formula for (n) as the infinite sum of zeta zeros on the line Re(s) = ½ to stricter bounds for several other arithmetic functions for large arguments. Since this is where the action is, we want to extend the domain of (s) through analytic continuation to the critical  −s strip. And we need to express (s) in integral form to accomplish this. We have ∑ 푘=1k = 0 − 1/(1 − s) + (0 + 1)/2  −s−1  s+1 − s∫1͏ ((x))x dx = 1/(s − 1) + ½ − s∫1͏ ((x))/x dx. It follows immediately from the formula that (0) = −½ and  2 lims1 (1) = 1/(s − 1) + . We have the latter result because we demonstrated above that  = ½ − ∫1͏ ((x))/x dx.

Some Dirichlet products of arithmetic functions (especially with 1) have traditionally been of interest in 푛 analytic number theory. Our objective is to use Dirichlet products to determine bounds for |∑ 푖=1(i)/i| and (n) and prove Wintner’s mean value theorem, which helps us determine the average value of an . 푛 For all n > 1, |∑ 푖=1(i)/i| < 1. Proof: We start with the trivial identity, 1 = [1/n]*1. Mobius inversion yields 푘 푘 [1/n] = (n)*1 = ∑i∣n(i). Summing both sides with respect to n yields 1 = ∑ 푖=1(i)[k/i]. Adding ∑ 푖=1(i)(k/i − 푘 푘 푘 [k/i]) to both sides yields 1 + ∑ 푖=1(i)(k/i − [k/i]) = ∑ 푖=1(i)k/i. It follows that |1 + ∑ 푖=1(i)(k/i − [k/i])| ≤ 1 + 푘 푘 ∑ 푖=1|(i)||k/i − [k/i]| ≤ 1 + |1||k − 2| = k − 1. Therefore, |∑ 푖=1(i)k/i| ≤ k − 1 and the result follows from dividing both sides by k. And a more general result follows from our proof. 푘 ∞ The mean value of an arithmetic function, f, is defined as limk ∑ 푛=1 f(n)/k. If f = 1*g and ∑ 푖=1|g(i)|/i ∞ converges, then ∑ 푖=1g(i)/i is the mean value of f. This result is known as Wintner’s mean value theorem. Proof: 푘 푘 푘 We have f(n) = ∑i∣ng(i). Summing both sides with respect to n yields ∑ 푛=1 f(n) = ∑ 푖=1g(i)[k/i] = ∑ 푖=1g(i)k/i − 푘 ∑ 푖=1g(i)(k/i − [k/i]). To get the mean value of f, we must divide both sides by k and then take the limit as k, 푘 푘 푘 which yields limk ∑ 푛=1 f(n)/k = limk ∑ 푖=1g(i)/i − limk ∑ 푖=1g(i)(k/i − [k/i])/k. Since Wintner’s theorem 푘 푘 푘 states that limk ∑ 푛=1 f(n)/k = limk ∑ 푖=1g(i)/i, all that is left is to show is limk ∑ 푖=1g(i)(k/i − [k/i])/k = 0. 푘 푘 푘 ∞ We have |∑ 푖=1g(i)(k/i − [k/i])| ≤ ∑ 푖=1|g(i)||k/i − [k/i]| < ∑ 푖=1|g(i)||1|. Since ∑ 푖=1|g(i)|/i converges, limk 푘 푘 푘 ∑ 푖=1|g(i)|/k = 0 by Kronecker’s lemma. Since limk ∑ 푖=1|g(i)|/k = 0, limk ∑ 푖=1g(i)(k/i − [k/i])/k = 0 and the ∞ theorem is proved. Although Wintner’s theorem requires that ∑ 푖=1g(i)/i is absolutely convergent, there are ∞ 푘 conditions for which ∑ 푖=1g(i)/i is conditionally convergent and also the mean value of f. If ∑ 푖=1|g(i)|/k < c for 푘 some constant c and all k, then limk ∑ 푖=1g(i)(k/i − [k/i])/k = 0. A formal proof is left as an exercise. 푘 The Chebyshev function, (n) ≍ n. Proof: Recall that ln(n) = (n)*1 and (k) = ∑ 푛=1(n). Let s(k) = 푘 푘 ∑ 푛=1∑i∣n(i) and S(k) = s(k) − 2s(k/2). We start by evaluating s(k) in two different ways. First, ∑ 푛=1∑i∣n(i) = 푘 ∑ 푛=1ln(n) = ln(k!) = (k + ½)(ln(k) − 1) + (ln(2) + 1)/2 + O(1/k). So, S(k) = (k + ½)(ln(k) − 1) + (ln(2) + 1)/2 – 2(k/2 + ½)(ln(k/2) − 1) − (ln(2) + 1) + O(1/k) = ln(2)k − ln(k/2)/2 + O(1/k). It follows that (ln(2) − )k < S(k) 푘 푘 푘 < ln(2)k for all  > 0 for all sufficiently large k. Next, ∑ 푛=1∑i∣n(i) = ∑ 푖=1(i)[k/i]. So, S(k) = ∑ 푖=1(i)[k/i] − [푘/2] 푘 2∑ 푖=1 (i)[(k/2)/i] = ∑ 푖=1(i)([k/i] − 2[[k/i]/2]). If [k/i] is even, then [k/i] − 2[[k/i]/2] = 0. If [k/i] is odd, then 푘 [k/i] − 2[[k/i]/2] = 1. Since (i) is non-negative and [k/i] − 2[[k/i]/2] ≤ 1, ∑ 푖=1(i)([k/i] − 2[[k/i]/2]) ≤ 푘 ∑ 푖=1(i). So, S(k) ≤ (k). And we can conclude more. For all i such that k/2 < i ≤ k, [k/i] − 2[[k/i]/2] = 1. So, S(k) 푘 [푘/2] 푘 [푘/2] = ∑ 푖=1(i)([k/i] − 2[[k/i]/2]) = ∑ 푖=1 (i)([k/i] − 2[[k/i]/2]) + ∑ 푖=[푘/2]+1(i) = ∑ 푖=1 (i)([k/i] − 2[[k/i]/2]) +

(k) − (k/2). Therefore, (k) − (k/2) ≤ S(k). It follows that (k/2) − (k/4) ≤ S(k/2) and more generally, [ln(푘)/ln(2)] i−1 i [ln(푘)/ln(2)] i−1 ∑ 푖=1 ((k/2 ) − (k/2 )) ≤ ∑ 푖=1 S(k/2 ). The sum on the left hand side telescopes to (k) − (1) = [ln(푘)/ln(2)] i−1 [ln(푘)/ln(2)] i−1 (k). So, (k) ≤ ∑ 푖=1 S(k/2 ). Therefore, S(k) ≤ (k) ≤ ∑ 푖=1 S(k/2 ). All that is left is for us to bound the expressions for S(k). Since (ln(2) − )k < S(k) < ln(2)k for all  > 0 for all sufficiently large k, we use [ln(푘)/ln(2)] i−1 ∞ i−1 0.69k < ln(2)k as the lower bound for S(k). Since ∑ 푖=1 S(k/2 ) < ∑ 푖=1ln(2)k/2 = 2ln(2)k, (k) < 2ln(2)k < 1.39k. Therefore, 0.69k < (k) < 1.39k for all sufficiently large k and the theorem is proved. This result is an important step in most proofs of the prime number theorem. We did a decent job bounding (k), but we could have done far better. Since our goal was merely to bound (k) and prove (k) ≍ k, we could use a relatively simple expression for S(k) as a linear combination of s(k/i) and be content with the bounds c1 = 0.69 < ln(2) and c2 = 1.39 > 2ln(2) for the coefficients of k. This suffices for most applications. Since Chebyshev wanted a tighter bound for his function, he set S(k) = s(k) − s(k/2) − s(k/3) − s(k/5) + s(k/30) to obtain c1 = 0.92 and c2 = 1.10. At this point, you might wonder if there are other linear th combinations of s(k/i) that allow us to achieve even tighter bounds for c1 and c2. In the 20 century, Harold Diamond and Paul Erdos proved that there are linear combinations of s(k/i) that allow us to achieve bounds for c1 and c2 arbitrarily close to 1, meaning (k) ~ k. Their proof would have also been a unique proof of the prime number theorem, using Chebyshev estimates to derive the relation (k) ~ k, except that they needed the prime number theorem to prove that c1 and c2 can be arbitrarily close to 1. 푘 It follows from the proof of our last result that ∑ 푖=1(i)/i = ln(k) + O(1). Proof: We start by setting s(k) = 푘 푘 ∑ 푛=1∑i∣n(i) and evaluating s(k) in two different ways just as we did above. Our results were s(k) = ∑ 푖=1(i)[k/i] 푘 푘 and s(k) = (k + ½)(ln(k) − 1) + (ln(2) + 1)/2 + O(1/k). It follows that s(k) = ∑ 푖=1(i)k/i − ∑ 푖=1(i)(k/i − [k/i]). 푘 푘 푘 Since (i) is non-negative and k/i − [k/i] < 1, ∑ 푖=1(i)(k/i − [k/i]) < ∑ 푖=1(i) = (k) ~ k. So, s(k) = k∑ 푖=1(i)/i 푘 + O(k) = kln(k) + O(k). Dividing both sides by k yields ∑ 푖=1(i)/i = ln(k) + O(1) and the result is proved. But the 푘 result can still be improved! Since (k) ~ k for all sufficiently large k, 0 < ∑ 푖=1(i)(k/i − [k/i]) < k. And s(k) = 푘 kln(k) − k + O(ln(k)). So, we set x = ∑ 푖=1(i)(k/i − [k/i]) and equate both expressions for s(k). It follows that s(k) 푘 푘 = k∑ 푖=1(i)/i − x = kln(k) − k + O(ln(k)). Dividing both sides by k yields ∑ 푖=1(i)/i = ln(k) + (x/k − 1) + 푘 O(ln(k)/k). Therefore, for all sufficiently large k, ln(k) − 1 < ∑ 푖=1(i)/i < ln(k).

CHAPTER 6: Advanced Series Acceleration Methods

There are a variety of important special functions in the sciences that cannot be expressed as a finite number of terms of elementary functions. Many have fast power series and the best way to numerically evaluate them with high-precision is simply to add enough terms in the order given by their index. Others have expressions as infinite sums or infinite products that we might say barely converge or slowly converge. We define this concept formally as logarithmic convergence. A logarithmically convergent series is defined as the infinite sum of a sequence {f(k)} that has the property limk f(k)/f(k + 1) = 1. When determining the rate of convergence for a product, you can determine this limit either for {ln(f(k))} or {f(k) − 1}. The Riemann zeta function is our first example of an important special function with a logarithmically convergent series. We list several more examples in the table below. Since they usually require adding or multiplying thousands of terms for just a few digits of precision, you might think they are of little value for computational purposes without a supercomputer.

Sequence type Name of special function Notation and definition  x Hyperharmonic Hurwitz zeta function (x, y) = ∑ 푘=11/(k + y) , Re(x) > 1 and Re(y) > 0  x Series Dirichlet L-function L(x, χ) = ∑ 푘=1χ(k)/k , Re(x) > 1 and |χ(k)| is 0 or 1  k x 1−x Dirichlet eta function η(x) = ∑ 푘=1−(−1) /k = (1 − 2 )(x), Re(x) > 0  k x x Dirichlet Beta function β(x) = ∑ 푘=1−(−1) /(2k − 1) = ((x, ¼) − (x, ¾))/4 Polygamma function ψ(n)(y) = −(−1)nn!(n + 1, y), nℕ and y ≥ 0 or yℤ −1  −1 −1 Hyperharmonic Beta function B(x, y) = (x + y)(xy) ∏ 푘=1k(x + y + k)(x + k) (y + k) −1  x −1 Product Gamma function (by Euler) Γ(x) = x ∏ 푘=1k(1 + 1/k) (x + k) , x > 0 or xℤ −xln(x)  x/k −1 Gamma function (by Γ(x) = e ∏ 푘=1ke (x + k) , x > 0 or xℤ where  ≈ Weierstrass) 0.5772156649015 is the Euler-Mascheroni constant  2n  2n−1 Fourier Series Standard Clausen functions Sl2n(x) = ∑ 푘=1cos(kx)/k , Cl2n−1(x) = ∑ 푘=1cos(kx)/k ,  2n−1  2n defined for all nℕ Sl2n−1(x) = ∑ 푘=1sin(kx)/k , Cl2n(x) = ∑ 푘=1sin(kx)/k ,

We solve this problem by studying series acceleration, which is the transformation of a sequence to accelerate the rate of convergence of the sequence of its partial sums. This chapter is a continuation of our study of  summation and integration from Sections 1.2 and 2.3. In Section 6.1, we derive an integral form for ∑ 푘=1 f(k) that help us approximate it with high precision, provided the series converges and {f(k)} is a positive monotone decreasing infinite sequence. In Section 6.2, we offer a method for evaluating certain forms of Fourier series exactly and we offer two methods for approximating the sum of all other forms of Fourier series with high precision, provided the series converges.

Section 6.1: Series Acceleration for Monotone Decreasing Sequences

Let {f(k)} denote a positive monotone decreasing infinite sequence with limk f(k) = 0 for all kℕ. This  section discusses a few methods for numerically approximating ∑ 푘=1 f(k) with high precision. The reader should note that this section does not discuss techniques for determining whether a sequence or series converges or any of their general properties except in specialized instances. The reader is expected to have at least medium-level knowledge of properties of sequences and series. Under certain conditions, some series acceleration methods work better than our methods; however, they also have weaknesses that our methods do not. They may work only for certain forms of {f(k)} under the right conditions and diverge under the wrong conditions. Many of them are quite tedious and hence difficult to use outside of a software package. For the remainder of this section, we make three assumptions about the sequence, {f(k)} for kℕ. Our first assumption is that {f(k)} is a positive monotone decreasing infinite sequence. Our second assumption is that 푛 {∑ 푘=1 f(k)} is logarithmically convergent. Our third assumption is that in order to numerically approximate  ∑ 푘=1 f(k) with high precision (or to any desired degree of precision) within a reasonable amount of running time (or time complexity), numerically approximating the limit simply by summing the first hundred terms is sufficient   −12 iff |∑ 푘=101 f(k)/∑ 푘=1 f(k)| < 10 .

We can often evaluate infinite sums of converging series using a combination of partial summation and a variety of other techniques. We mention four in the table below. Any sequences meeting the necessary conditions for the top three methods listed in this table are best summed using them unless an exact form can be obtained.

Acceleration Method Necessary Conditions on {f(k)} Examples Advantages or Drawbacks Euler-Maclaurin formula ∫f(x)dx has a closed-form ln(k)/k2 Diverges after too many iterations Improved Euler transform For alternating sequences Leibniz (Van Wijngaarden method) series Expand as a Laurent series f(1/x) has a power series sin(π/k2) Don’t need to take a partial sum, just sum ζ(k) times the kth coefficient. 2 Padé Approximation There is a rational function r(x) ln(1 + 1/k ) Hard to compute without a software (n) (n) 2 with limx f (x)/r (x) = 1. sin(/k ) package, r(x) diverges at poles

Despite the rich variety of acceleration methods listed above, our objective is to invent a series acceleration method that is easy to use and can sum any monotone decreasing sequence with high-precision. Our approach is to improve upon Kummer’s series acceleration method. We describe his method next and demonstrate how we can use the greatest integer function to maximize its precision at any point in the partial sum. Kummer’s series transformation can turn a barely converging infinite series into one that converges more rapidly. Let {g(k)} be an arbitrary positive monotone decreasing infinite sequence for all kℕ. Moreover,   suppose that ∑ 푘=1 f(k) = F, ∑ 푘=1g(k) = G, and limkf(k)/g(k) = C for positive constants, F, G, and C. Then,    ∑ 푘=1 f(k) = ∑ 푘=1{f(k) − Cg(k) + Cg(k)} = ∑ 푘=1{f(k) − Cg(k)} + CG. Kummer’s transformation is useful when F is sought and G is known. Since {g(k)} is arbitrary, the objective is to choose a {g(k)} so that G is known and limkf(k)/g(k) converges to C. Then limk {f(k) − Cg(k)} converges more rapidly to zero than limkf(k). For 99 example, suppose f(x) has a closed-form antiderivative and we can evaluate ∑ 푘=1 f(k) quickly on the TI-83 using the sum( and seq( operations (the TI-83 even allows us to sum up to the first 999 terms of any sequence), then partial summation and a single application of Kummer’s method is often sufficient for approximating F to six 푘+ decimal places. We start by approximating f(k) as ∫푘͏ +−1f(x)dx = g(k) where kℕ and 0 <  < 1. Then,    ∑ 푘=1f(k) = ∑ 푘=1{f(k) − g(k)} + ∑ 푘=1g(k) = 99   ∑ 푘=1{f(k) − g(k)} + ∑ 푘=100{f(k) − g(k)} + ∑ 푘=1g(k) = 99   ∑ 푘=1f(k) + ∑ 푘=100{f(k) − g(k)} + ∑ 푘=100g(k) = 99   푘+ ∑ 푘=1f(k) + ∑ 푘=100{f(k) − g(k)} + ∑ 푘=100{∫푘͏ +−1f(x)dx} = 99   ∑ 푘=1f(k) + ∫99͏ +f(x)dx + ∑ 푘=100{f(k) − g(k)}. 푘+ If f(x) is continuous and strictly decreasing, then for every value of k there is a value of  such that ∫푘͏ +−1 f(x)dx =  f(k). Further, if ∑ 푘=1 f(k) is logarithmically convergent, then limk  = ½. For this reason, in the examples that follow, we may shift arguments by a half-unit to optimize precision, even when not integrating. We chose Kummer’s method for its simplicity. We need only find a g(k) that approximates f(k) well enough for limk f(k)/g(k) = C. Since we can choose our g(k) from an infinite set of functions and our choice of g(k) only requires a rough knowledge of the convergence behavior of f(k), Kummer’s method can work on any monotone decreasing sequence. While Kummer’s series transformation is a clever trick, it is far from perfect. A clear disadvantage is that unlike the four methods listed above, Kummer’s method provides no algorithm (for finding a good g(k)). It follows that if f(x) has no Laurent series and no closed-form antiderivative, then finding a good g(k) will be more challenging. In addition, since we only sum the first n terms of f(k) in practice, Kummer’s method leaves the error  term, ∑ 푘=푛+1{f(k) − Cg(k)}, which may still be a logarithmically convergent series. For this reason, depending on how much precision is required, we may either have to sum thousands of terms or use Kummer’s method again to find an h(k) such that limk {f(k) − Cg(k)}/h(k) = C′. This can usually be done, but it is also usually harder than finding a g(k). Instead, we improve upon Kummer’s method by embracing the error term. We use his series transform to express the error term as the definite integral of a function that is almost smooth for large k and usually monotone. This allows us to estimate the error term with high-precision using conventional numerical integration techniques.  We derive an integral form for ∑ 푘=1 f(k) by expressing it as the sum of the areas of an infinite number of

adjacent boxes, each with width g(k) and height f(k)/g(k) where f(k) and g(k) retain their definitions above. So, the first box has width g(1) and height f(1)/g(1). To simplify matters, let the first box extend from x = 0 to g(1). Since the second box is adjacent and to the right of the first, it must extend from x = g(1) to g(1) + g(2). Since the kth box 푘−1 푘 푘 is adjacent and to the right of the k−1st box, it must extend from x = ∑ 푖=1 g(i) to x = ∑ 푖=1g(i). Let G(k) = ∑ 푖=1g(i). Then G(0) = 0, G(k) − G(k − 1) = g(k), and limk G(k) = G. If 0 < x < G, then x is in one of these boxes. Let k(x) denote the placement of the box x is in. Then k(x) = k for all x such that G(k − 1) ≤ x < G(k). And the height of the kth adjacent box in terms of f(k) and G(k) is f(k(x))/{G(k(x)) − G(k(x) − 1)}. So, if we can solve for k(x), then we can solve for the integrand. Suppose k − 1 ≤ y < k and G(y) = x. Then, y = G−1(x) and k = [G−1(x)] + 1 = k(x).  퐺 −1 −1 −1 Therefore, ∑ 푘=1 f(k) = ∫0͏ f([G (x)] + 1)/{G([G (x)] + 1) − G([G (x)])}dx. For practical use, we set  푛 퐺 −1 −1 −1 ∑ 푘=1 f(k) = ∑ 푘=1 f(k) + C(G − G(n)) + ∫퐺͏ (푛) f([G (x)] + 1)/{G([G (x)] + 1) − G([G (x)])} − Cdx. We improved upon Kummer’s method by expressing the error term as a definite integral (of an error function) that we can estimate with precision using conventional numerical integration techniques. For large n, the area under the graph can fit inside a small box with width and height near zero. And since the sequence, {f(k)/g(k) − C}, converges to zero, its variability, let alone curvature, is small for large n. These conditions allow us to estimate the integral with precision. This means that we no longer have to find a g(k) such that limk f(k)/g(k) converges rapidly to a positive constant. It suffices that limk f(k)/g(k) converges to a positive constant. We call this summation technique the Improved-Kummer’s, or the Kummer-Beitler series acceleration method. This method is pretty straight forward except for three caveats. First, g(k) must have a closed-form cumulative sum, G(k). Since finding a closed-form cumulative sum for g(k) is usually difficult if not impossible, we recommend against choosing g(k). We recommend choosing G(k) instead because G(k) can be chosen from the set of monotone increasing functions whose sequence converges to a positive constant. There is an abundance of such functions. Second, it is not always possible to express G−1(x) explicitly. If there is no good explicit expression for f(x) or G−1(x) in terms of elementary functions, then they must be expressed implicitly and evaluated using numerical approximation techniques such as Newton’s method. The TI-83 has a solve( function for this purpose. Ironically, if G−1(x) is implicitly defined as G(y) = x, then iterative numerical approximation gives an exact result after a finite number of iterations because it is sufficient that we find [G−1(x)]. Third, we cannot integrate our expression up to G because G−1(G) is undefined. We address the third problem by setting the upper bound for our integral just short of G. The additional error created by this adjustment is negligeable.  Example 1: Let f(k) = sin(/k)/(k) where sine is in radians. a. Approximate ∑ 푘=1 f(k) by subtracting the first six Laurent terms from f(k) and then summing the first hundred terms of the difference. A table of values for  the Riemann zeta function is provided below for reference. b. Approximate ∑ 푘=1 f(k) by summing the first hundred terms of {f(k)} and using the Improved-Kummer’s method on the remainder. Find G(k) on your own.

 −2  −3  −4  −5  −6 (2) = ∑ 푛=1n (3) = ∑ 푛=1n (4) = ∑ 푛=1n (5) = ∑ 푛=1n (6) = ∑ 푛=1n 1.644934066848 1.202056903159 1.082323233711 1.036927755143 1.017343061984 2/6 4/90 6/945

 Solution: There are a number of ways to numerically evaluate ∑ 푘=1sin(/k)/(k) with high-precision. We start by subtracting the first six Laurent terms from f(k) and then summing the first hundred terms of the difference. Replacing k with 1/x yields f(1/x) = xsin(x)/ = xsin(x)/2, which has the power series {(x)2 − (x)4/3! + (x)6/5! − …}/2. Therefore, the first six terms in the Laurent series for sin(/x)/(x) are {(/x)2 − (/x)4/3! + (/x)6/5!}/2. It follows that {f(k)} = {1/k2 − (2/6)/k4 + (4/120)/k6 + (sin(/k)/(k) − 2 2 4 4 6   2 2 4 4 6 1/k +  /(6k ) −  /(120k )}. Therefore, ∑ 푘=1 f(k) = ∑ 푘=1(1/k − ( /6)/k + ( /120)/k ) +  2 2 4 4 6 ∑ 푘=1(sin(/k)/(k) − 1/k +  /(6k ) −  /(120k ). The first sum on the right-hand-side evaluates to (2) − 2(4)/6 + 4(6)/120 ≈ 0.690404232855. And we conveniently approximate the latter sum as 100 2 2 4 4 6  ∑ 푘=1{sin(/k)/(k) − 1/k +  /(6k ) −  /(120k )} ≈ −0.167560683922. So, ∑ 푘=1sin(/k)/(k) ≈ 0.690404232855 − 0.167560683922 = 0.522843548933. Since the remainder term of our series is less than  6 8 −14 ∑ 푘=101 /(7!k ) < 10 , it is too small for us to measure using only 12 digits of precision, meaning our estimate is correct to either 11 or 12 decimal places! Next, we use the Improved-Kummer’s method to derive an error function. Even though our method is more

 involved and less precise than summing the Laurent series for ∑ 푘=1sin(/k)/(k), we proceed with this example 2 to see how our method compares. We start by finding G(k) using creativity. Since limx x f(x) = limx 2 푛 2 xsin(/x)/ = limx0 sin(x)/(x) = 1, g(x) is roughly a constant multiple of 1/x . Since ∑ 푘=11/k has no closed-form, we choose our G(k) by integrating instead and we get∫1/x2dx = c − 1/x. We set  = ½ for optimal precision and c = 2 so G(0) = 0. Then, G(k) = 2 − 1/(k + ½), G(k) − G(k − 1) = 1/(k2 − ¼) = g(k), which approximates f(k) very well even for small positive values of k, and k(x) = [1/(2 − x) + ½]. To perform this calculation easily on the TI-83 we set Y1 = int(1/(2−X)+1/2) and Y2 = sin(/X)/(X)/(1/(X−.5)−1/(X+.5))−1. The graph of the error function, Y2(Y1(X)), is nearly smooth and monotone almost up to G. We encourage the reader to graph it over various intervals and take the time to study its intricacies. This is essential for visualizing how the Improved-Kummer’s method works. Finally, we numerically integrate the error function. We can numerically integrate it graphically or without graphing using the fnInt( function, which is usually faster and more precise. Since k(x) is undefined at x = 2, we set the upper bound for the integral at 1.9999 to minimize rounding error. Since we add and subtract C(G − G(n))  in our expression for ∑ 푘=1 f(k), limxG⁻ Y2(Y1(X)) = 0. It follows that the additional error we create from adjusting the upper bound cannot exceed |Y2(Y1(1.9999))(2 − 1.9999)/2|, which is less than a trillionth. We summarize our results in the table below for various n. Quantities, q, with an asterisk denote the estimate given by the TI-83.

n = 100 n = 200 n = 500 푛 q1 = ∑ 푘=1sin(/k)/(k) 0.512893922396 0.517856096127 0.520845551974 q2 = C(G − G(n)) 0.009950248756 0.004987531172 0.001998001998 1.9999 q3 = ∫퐺͏ (푛) Y2(Y1(X))dx* −.000000623636 −.000000078207 −.000000005037 q4 = q1 + q2 + q3* 0.522843547516 0.522843549092 0.522843548935 −9 −10 −12 |q4 − 0.522843548933| 1.41710 1.5910 210

 Example 2: Let f(k) = ln(k)sin(/k)/(k) where sine is in radians. Approximate ∑ 푘=1f(k) by summing the first hundred terms and then using the Improved-Kummer’s method. Find G(k) on your own. Solution: In this case, the antiderivative of f(x) is not obvious and it probably cannot be expressed in closed-form. Neither can we find its Laurent series because ln(1/x) = −ln(x), which is not differentiable at x = 0. 2 However, we can still find a good G(k) using creativity. Since limx x sin(x)/(x) = 1, limx 2 푛 2 ln(x)sin(x)/(x)/{ln(x)/x } = 1. Since ∑ 푘=1ln(k)/k has no closed-form, we choose our G(k) by integrating instead and we get ∫ln(x)/x2dx = c − (ln(x) + 1)/x. We set  = ½ for optimal precision and set c = 1 for simplicity. We can break with the convention that G(0) = 0 because it suffices that G(k) > 0 for all k > n. Then, G(k) = 1 − (ln(k + ½) + 1)/(k + ½) and G(k) − G(k − 1) = {(k + ½)ln(k − ½) − (k − ½)ln(k + ½) + 1}/(k2 − ¼) = g(k), which 100 approximates f(k) very well even for small positive values of k. In addition, G(100) ≈ 0.944177535050, ∑ 푘=1 f(k) ≈ 0.778271233071, and C = 1. Solving for G−1(x) is a little trickier. We can express k implicitly as G(k) + (ln(k + ½) + 1)/(k + ½) − 1 = 0. Then we substitute x for G(k) and use the solve( operation on the TI-83. To help the calculator find the solution more quickly, we plug in 200 as a first guess. Any number above 100 would do. It follows that k(x) = [solve(x+(ln(k+.5)+1)/(k+.5)−1,k,200))]+1. At this point, we have everything we need to numerically approximate the infinite sum. To perform this calculation on the TI-83, we set Y1 = int(solve(X+(ln(K+.5)+1)/(K+.5)−1,K,200))+1. Next, we set Y2 = −9 ln(X)sin(/X)/(X)/((ln(X−.5)+1)/(X−.5)−(ln(X+.5)+1)/(X+.5))−1. Since |Y2(Y1(.9999))| < 9 10 ,  integrating past x = 0.9999 will produce negligible results. Therefore, we estimate ∑ 푘=1 f(k) as {.778271233071 + .9999 −6 1 − .944177535050} + ∫.͏ 944177535050Y2(Y1(X))dx. The TI-83 approximated the integral as −3.006949 10 . So,  −6 ∑ 푘=1ln(k)sin(/k)/(k) ≈ 0.834093698021 − 3.006949 10 ≈ 0.834090691072.  Although we cannot know the exact value of ∑ 푘=1 f(k), we can compare our estimate of the sum with a much more precise estimate of the sum using a partial sum of the first 999 terms of f(k) and then estimating the error. 999 .9999 −9 Since ∑ 푘=1 f(k) ≈ 0.826179484232, G(999) ≈ 0.992088789241, and ∫.͏ 992088789241Y2(Y1(X))dx ≈ −4.510 10 ,  −9 −9 ∑ 푘=1 f(k) ≈ {.826179484232 + 1 − .992088789241} − 4.510 10 ≈ 0.834090694991 − 4.510 10 ≈ 0.834090690481. Since our latter estimate is almost certainly correct to 11 significant digits and the difference between estimates is 591 trillionths, our first estimate is correct to 8 figures. This last example demonstrates that

even when the sequence is complicated and its sum is slowly converging, we can achieve a few more digits of precision by finding a good G(k) and then about three more using our improvement on Kummer’s method.

At this point, we know how to use the Improved-Kummer’s method to evaluate some moderately difficult infinite series with high-precision, but we have yet to find out how well it works on edge cases. In the first two examples, the sequences were constructed from continuous functions that we arbitrarily restricted to integers, so they were quite malleable and hence with adjustments, we might have been able to use some of the common series acceleration methods described above. Next, we analyze a family of sequences for which this is not the case. As with the Riemann zeta function, we can define some functions as an infinite series. We call the transform k→[k/x] for x > 1 a dilation of a series because it slows {f(k)} by making it repeat some of its terms. It      immediately follows that ∑ 푘=0 f([k/x]) = ∑ 푘=0xf(k) for all xℕ and ∑ 푘=0 f([k/x]) = ∑ 푘=0[x]f(k) + ∑ 푘=0 f([k/(x − [x])]) for all positive xℕ. It is easier to see how these properties hold with simple examples. If x = 5, then f([k/x]) is frozen for all k over the interval [5n, 5n + 4] with nℕ, so each term of {f(k)} is summed five times. If x = √5, then each term of {f(k)} is summed at least twice and about √5 − 2, or 23% of terms are summed three times. Which terms are summed three times is determined by the sequence {[k/(√5 − 2)]} ≈ {[4.236k]}, whose terms are spaced either four or five units apart. The formal proofs of these twin properties are elementary and left  for the reader. When explaining Kummer’s method, we set F = ∑ 푘=1 f(k). Next, we denote the dilation function as  F(x) and set F(x) = ∑ 푘=0 f([k/x]) for a positive monotone decreasing sequence, {f(k)}. The dilation function has several more interesting properties for positive xℕ. First, dilations can be used to partition the sum. If xℚ, then the dilation adds or subtracts terms from the sum in arithmetic progression. For    example, F(⅗) = ∑ 푘=0( f(5k) + f(5k + 1) + f(5k + 3)), F(¼) = ∑ 푘=0 f(4k), and F(1¼) = ∑ 푘=0( f(k) + f(4k)). If xℚ, then we can use a Beatty sequence to partition a sum into two sums with distinct terms except for f(0). For example, if x is the golden ratio,  = (√5 + 1)/2, then F(−1) + F(−2) = F + 2f(0). So, the dilation function has a reciprocity property for irrational arguments. Second, F(x) is what you might call a beast of real analysis. Since its argument is nested within the greatest integer function, its slope, defined at xℚ, is zero. However, F(x) has a  right discontinuity at all xℚ. For example, the magnitude of the right discontinuity at F(⅗) is |∑ 푘=1( f(5k) −  f(5k − 1))|, which converges because ∑ 푘=1 | f(k)| converges. And since {f(k)} is monotone decreasing, all right discontinuities in F(x) are positive. So, F(x) is defined for all x even if all its discontinuities are positive and its set of discontinuities in any open interval is dense. Our next question is whether we can evaluate F(x) with precision. Example 3: Let f(k) = sin(/[(k + 1)1.9]) where sine is in radians. Approximate F(1) and F(√5) by summing the first 500 terms and the first 900 terms and then using the Improved-Kummer’s method. Find G(k) on your own. Solution: We start by finding a G(k) for F. Since |sin(/[(k + 1)1.9]) − sin(/(k + 1)1.9)| < 1.510−10 for all k 1.9 1.9 > 500, we ignore the greatest integer function when choosing G(k). Since limx (x + 1) sin(/(x + 1) ) = , g(x) is roughly a constant multiple of (x + 1)−1.9 and integrating yields ∫x + 1)−1.9dx = c − 10/9(x + 1)−0.9. We set −0.9  = ½ for optimal precision and c = G = 1 for convenience. Then, G(k) = 1 − 10/9(k + 1½) , C = limk sin(/[(k + 1)1.9])/(10/9(k + ½)−0.9 − 10/9(k + 1½)−0.9) = , and k(x) = [(.9 − .9x)−10/9 − ½]. On the TI-83, we set Y1 = int((.9−.9X)^(-10/9)−.5) and Y2 = .9sin(/int((X+1)^1.9))/((X+.5)^-.9−(X+1.5)^-.9)−. We set the upper bound for the integral at 0.9999 to minimize rounding error. The additional error we create from adjusting the upper bound cannot exceed a trillionth. Next, we find a G(k) for F(√5). Using the properties of dilations explained  above, F(√5) = 2F(1) + F(√5 − 2). In addition, F(√5 − 2) = ∑ 푘=0 f([k/(√5 − 2)]) and f([k/(√5 − 2)]) = f([(√5 +  1.9 2)k]) ≈ f((√5 + 2)k − ½) for large k. So, our task is to evaluate ∑ 푘=0sin(/[[(√5 + 2)k + 1] ]) and we can do so by −0.9 1.9 setting G(k) = 1 − 10/9((√5 + 2)k + 1) . Then, C = limk sin(/[[(√5 + 2)k + 1] ])/(10/9((√5 + 2)(k − 1) + 1)−0.9 − 10/9((√5 + 2)k + 1)−0.9) = (√5 − 2) and k(x) = [(√5 − 2)((.9 − .9x)−10/9 − 1)] + 1. We set the upper bound for the integral at 0.99999 to minimize rounding error. The additional error we create from adjusting the upper bound cannot exceed fifty trillionths. We summarize our results in the table below for various n. Quantities, q, with an asterisk denote the estimate given by the TI-83.

푛 1.9 q1 = ∑ 푘=1sin(/[(k + 1) ]) or F(1), n = 500 F(1), n = 900 F(√5 − 2), n = 500 F(√5 − 2), n = 900 푛 1.9 q1 = ∑ 푘=0sin(/[[(√5 + 2)k + 1] ]) 2.377681766888 2.382997454286 0.291472603127 0.291815731185 q2 = C(G − G(n)) 0.012961746386 0.007646068867 0.000836412110 0.000492897988

퐺−휀 q3 = ∫퐺͏ (푛)Y2(Y1(X))dx* 0.000000009540 0.000000002827 −.000000529688 −.000000193408 q4 = q1 + q2 + q3* 2.390643522814 2.390643525980 0.292308485549 0.292308435765 −9 −9 −8 −8 |q(4, 900) − q(4, 500)| 3.16610 3.16610 4.978410 4.978410

The Improved-Kummer’s method did a decent job evaluating F(x) at x = 1 and x = √5 − 2, but it did not do nearly as well as we had hoped. Although {f([k/x])} is monotone decreasing, the discontinuities created by the greatest integer function prevented {f([k/x])/g(k) − C} from decreasing monotonically. That made the integrand, Y2(Y1(X)), very jagged and that made it difficult for the TI-83 to evaluate it with precision. Although it is difficult to formally prove, we believe there is no elementary function that we can choose for G(k) to correct for this. Even so, the integral gave us about one extra digit of precision after our application of Kummer’s method. In conclusion, our estimate of F(√5) from summing the first 500 terms of {f([(√5 + 2)k])} is F(√5) ≈ 5.073595531177 and our estimate of F(√5) from summing the first 900 terms of {f([(√5 + 2)k])} is F(√5) ≈ 5.073595487725. Either way, our estimates of F(√5) differ by less than 50 billionths. Since our objective is a series acceleration method with high-precision, we are not satisfied with this result and we try to improve upon it. Fortunately, we can achieve a significant improvement using a smoothing technique on the integrand. To the best of our knowledge, this opportunity is unique to our acceleration method because ours is the only one that attempts to compute the error term with high-precision. Our idea is simply to sum blocks of ten consecutive terms of {f(k)} at a time and adjust our G(k) and k(x) accordingly. Before proceeding, we address two immediate problems you might have with this approach. First, you might consider this cheating because we could just as easily sum several hundred more terms of {f(k)} before applying Kummer’s method. Our answer to that is if the level of precision you need requires a smoothing technique, then you certainly should use more computing power. You can sum several hundred more terms first and summing the first ten thousand terms using a modern computer would not be unreasonable. You can also sample a few hundred more points on the integrand. However, both approaches have diminishing returns. Further, the graph of y = f(x)/g(x) − C is jagged, so a second application of Kummer’s method would be very difficult if not impossible. Our argument is that after these measures have been exhausted, the best use of any additional computing power is smoothing the integrand. Second, you might wonder why we recommend summing blocks of terms at a time instead of computing a moving average of a finite sum of consecutive terms. Our answer to that is what is most precious to us is not continuity, but monotonicity of the integrand. We can agree that the graph of the integrand is jagged because the differences between some consecutive terms in {f([k/x])/g(k)} and C are larger than expected and the differences between other consecutive terms in {f([k/x])/g(k)} and C are smaller than expected and may even differ in sign. It follows that summing groups of terms at a time can even out such differences. Even using a simple moving average, these differences, though much smaller, are still only dependent on two terms, namely the difference between the first term in the finite sum and the term just after the last term in the finite sum. However, by summing the terms in blocks, these differences are dependent on all the terms in each block. And the value of the integrand over the interval of each block is constant. Even summing blocks of ten consecutive terms at a time is not enough to make the integrand monotone in this example; however, it is much less jagged than before. 10 10 1.9 So, our new f(k) = ∑ 푖=1 f(10k + i) = ∑ 푖=1sin(/[[(√5 + 2)(10k + i) + 1] ]) and we adjust our G(k) and k(x) 10 accordingly. For graphing on the TI-83, we set k = x and implement ∑ 푖=1 f(10k + i) as sum(seq(sin(/int(int((√(5)+2)(10X+I)+1)^1.9)),I,1,10)). We replace G(k) with G(10(k + 1)), so our new G(k) = 1 − 10/9((√5 + 2)10(k + 1) + 1)−0.9. Therefore, our new k(x) is [.1(√5 − 2)((.9 − .9x)−10/9 − 1)]. It is still the case that C = (√5 − 2) and G = 1. Since we still use the sums of the first 500 and first 900 terms of {f(k)} to estimate 퐺−휀 F(√5 − 2), the lower bounds of the integral, ∫퐺͏ (푛)Y2(Y1(X))dx, for our estimates remain unchanged; all that changes is the value of n. That is, G(49) ≈ 0.998872196702 and G(89) ≈ 0.999335385069. Further, we still set the 퐺−휀 upper bound at 0.99999. We now have everything we need to evaluate ∫퐺͏ (푛)Y2(Y1(X))dx. We summarize our new results in the table below for various n. Quantities, q, with an asterisk denote the estimate given by the TI-83.

F(√5 − 2), n = 49 F(√5 − 2), n = 89 푛 10 1.9 q1 = ∑ 푘=0∑ 푖=1sin(/[[(√5 + 2)(10k + i) + 1] ]) 0.291472603127 0.291815731185 q2 = C(G − G(n)) 0.000836412110 0.000492897988

퐺−휀 q3 = ∫퐺͏ (푛)Y2(Y1(X))dx* −.000000574641 −.000000188162 q4 = q1 + q2 + q3* 0.292308440596 0.292308441011 −10 −10 |q(4, 900) − q(4, 500)| 4.1510 4.1510

We believe the results speak for themselves. Our objective was to invent a series acceleration method that is easy to use and can sum any monotone decreasing sequence with high-precision. And we believe the results show that we met this objective. By summing ten terms at a time only in the numerator of the integrand, we used far less than ten times the computing power to evaluate the integral, but doing so cut the disagreement between our estimates by more than a hundred-fold.

Section 6.2: Fourier series Acceleration

Let {osc(nx)} denote an oscillating sequence that is bounded and periodic for all nℕ. This section discusses  a few methods for evaluating ∑ 푛=1 f(n)osc(nx) exactly or numerically approximating it with high precision. The reader should note that this section does not discuss the Dini-Lipschitz test or any other method for determining whether a Fourier series converges or any other properties of Fourier series except in specialized instances. The reader is expected to have read Section 1.2 and have at least medium-level knowledge of Fourier series. For the remainder of this section, we make four basic assumptions about {f(n)osc(nx)} for all nℕ. Our first assumption is that {f(n)} is either a positive monotone decreasing infinite sequence or it can be expressed as a  finite linear combination of them. Our second assumption is that ∑ 푛=1 f(n)osc(nx) is logarithmically convergent.  Our third assumption is that in order to numerically approximate ∑ 푛=1 f(n)osc(nx) with high precision (or to any desired degree of precision) within a reasonable amount of running time or time complexity, numerically approximating the limit simply by summing the first hundred terms is sufficient iff   −12  |∑ 푛=101 f(n)osc(nx)/∑ 푘=1 f(n)osc(nx)| < 10 . Our fourth assumption is that ∑ 푛=1 f(n)osc(nx) is a Fourier series and hence osc(nx) is either a sine function of x, a cosine function of x, or a finite linear combination of them.  The Fourier series for a periodic function has the form F(x) = a0/2 + ∑ 푛=1{ancos(nx/t) + bnsin(nx/t)} (not to be confused with the dilation transform from the last section). If the series converges, then its last two  components are each of the form ∑ 푛=1 f(n)osc(nx). We can evaluate certain forms of F(x) exactly and accelerate other forms by expressing its trigonometric series in terms of the greatest integer function. We conclude with a technique for transforming a Fourier series into a positive monotone decreasing infinite series so that it can be accelerated to any degree of precision using conventional techniques.

We can evaluate some Fourier series exactly or with high-precision using successive antiderivatives of the sawtooth wave function. Near the end of Section 1.2, we showed that each of its antiderivatives is of the form fk(x)  k  k = 2∑ 푛=1sin(2nx)/(2n) for odd k and fk(x) = 2∑ 푛=1cos(2nx)/(2n) for even k. So, we can evaluate exactly all Fourier series that are a linear combination of a finite number of these forms and we can approximate with high-precision all Fourier series that are a linear combination of an infinite number of these forms. For  k convenience, we would rather work with a variant of the Standard Clausen functions, Slk(x) = ∑ 푛=1sin(nx)/n for  k odd k and Slk(x) = ∑ 푛=1cos(nx)/n for even k. Since we believe it is more natural to multiply the index by , we k deviate slightly from convention and define SLk(x) as Slk(x)/ . So, we make the necessary adjustments to the polynomial representation for fk(x) to get its Fourier series in this form. Our results are presented in Table 1 below.

Table 1 Polynomial Representation for SLk(x) Fourier Series for SLk(x) ₁ 1 SL1(x) −(x/2 − [x/2]) + , x/2ℤ ∑  sin(nx)/(n) ² 푛=1 2 ₁ 2 SL2(x) (x/2 − [x/2]) − (x/2 − [x/2]) + ∑  cos(nx)/(n) ⁶ 푛=1 ₂ 3 2 ₁ 3 SL3(x) (x/2 − [x/2]) − (x/2 − [x/2]) + (x/2 − [x/2]) ∑  sin(nx)/(n) ³ ³ 푛=1 ₁ 4 ₂ 3 ₁ 2 ₁ 4 SL4(x) − (x/2 − [x/2]) + (x/2 − [x/2]) − (x/2 − [x/2]) + ∑  cos(nx)/(n) ³ ³ ³ ⁹⁰ 푛=1 ₂ 5 ₁ 4 ₂ 3 ₁  5 SL5(x) − (x/2 − [x/2]) + (x/2 − [x/2]) − (x/2 − [x/2]) + (x/2 − [x/2]) ∑ sin(nx)/(n) ¹⁵ ³ ⁹ ⁴⁵ 푛=1 ₂ 6 ₂ 5 ₁ 4 ₁ 2 ₁ 6 SL6(x) (x/2 − [x/2]) − (x/2 − [x/2]) + (x/2 − [x/2]) − (x/2 − [x/2]) + ∑  cos(nx)/(n) ⁴⁵ ¹⁵ ⁹ ⁴⁵ ⁹⁴⁵ 푛=1

Armed with the information in Table 1, we would be ready to begin, except that Fourier series can appear in a variety of forms that we must first convert into a linear combination of the forms presented above. Fourier series  can appear in the form ∑ 푛=1 f(n)osci(nx)oscj(ny) where x and y are real variables. If a component of a Fourier   series is of the form ∑ 푛=1 f(n)osc(nx) or ∑ 푛=1 f(n)osci(nc)oscj(nx) for a constant, cℚ, then the terms in the underlying sequence could be a linear combination of one or more of the following four common forms: even terms {f(2n)osc(2nx)}, alternating positive and negative terms {−(−1)nf(n)osc(nx)}, odd terms {f(2n − 1)osc((2n − 1)x)}, or odd alternating positive and negative terms {−(−1)nf(2n − 1)osc((2n − 1)x)}. We devote the next few paragraphs to converting series from each of these forms to SLk(x). If the Fourier series appears as a product of cosine and/or sine terms, then we can use trigonometric identities to express the product of sine and cosine functions as sum of them. It follows from the sum and difference identities for sine and cosine that 2sin(x)cos(y) = sin(x + y) + sin(x − y); 2cos(x)sin(y) = sin(x + y) − sin(x − y); 2cos(x)cos(y) = cos(x + y) + cos(x − y); and 2sin(x)sin(y) = cos(x − y) − cos(x + y).  If F(x) = ∑ 푛=1osc(nx)/(n), then we can express the first three of its four common forms in terms of F(x) and  F(2x). Since F(x) = ∑ 푛=1osc(nx)/(n) is the sum of all terms in {osc(nx)/(n)}, F(2x)/2 =  ∑ 푛=1osc(2nx)/(2n) is the sum of all even terms in {osc(nx)/(n)}. It follows that F(x) − F(2x)/2 is the sum of all odd terms in {osc(nx)/(n)} and F(x) − F(2x) is the sum of all alternating terms in {osc(nx)/(n)}.  n However, there is no way to express ∑ 푛=1−(−1) osc((2n − 1)x)/((2n − 1)) as a finite linear combination of F(cx).  n To see this, we use SL1(x) as a counterexample. We start by using the identity ∑ 푛=1−(−1) sin((2n − 1)x)/((2n −    1)) = ∑ 푛=1sin(n/2)sin(nx)/(n) = ∑ 푛=1(cos(n(x − ½)) − cos(n(x + ½)))/(2n). Notice that ∑ 푛=1(cos(n(x − ½)) − cos(n(x + ½)))/(2n) diverges at x = ½; however, the series for SL1(x) converges for all x and hence, the odd alternating sine series cannot be expressed as a finite linear combination of sine series with sequences of the form {sin(ncx)/(n)}. On the other hand, since sin(n/2)cos(nx)/(n) = (sin(n(x + ½)) − sin(n(x − ½)))/(2n), (SL1(x + ½) − SL1(x − ½))/2 is the cosine series with odd alternating terms. Our results are summarized in Table 2 below for k = 1 and k = 2. The reader is expected to be familiar enough with infinite series to be able to express each of the four common forms in terms of SLk(x) without referring to Table 2. Table 2 Even terms Odd terms Alternating terms Odd Alternating terms  1 ∑ 푛=1sin(nx)/(n) SL1(2x)/2 SL1(x) − SL1(2x)/2 SL1(x) − SL1(2x)  1 ∑ 푛=1cos(nx)/(n) (SL1(x + ½) − SL1(x − ½))/2  2 ∑ 푛=1sin(nx)/(n) (SL2(x − ½) − SL2(x + ½))/2  2 ∑ 푛=1cos(nx)/(n) SL2(2x)/4 SL2(x) − SL2(2x)/4 SL2(x) − SL2(2x)/2

 3  n 3 Example 1: Evaluate each sum exactly: a. ∑ 푛=1sin(10n)/n and b. ∑ 푛=1(−1) cos(10n − 5)/(10n − 5) .  3  3 3 Solution: a. Since SL3(x) = ∑ 푛=1sin(nx)/(n) , ∑ 푛=1sin(10n)/n =  SL3(10/) ≈ −.457240. b. Since  n 3  3  ∑ 푛=1(−1) cos(10n − 5)/(10n − 5) = ∑ 푛=1−sin(n/2)cos(5n)/(5n) = ∑ 푛=1−(sin(n(/2 + 5))/2 + sin(n(/2 − 3  3  n 3 3 5)))/(5n) and SL3(x) = ∑ 푛=1sin(nx)/(n) , ∑ 푛=1(−1) cos(10n − 5)/(10n − 5) = −(/5) (SL3(5/ + ½)/2 − SL3(5/ − ½)) ≈ −.002579. 8 (−1)ⁿ ₁ ₁ Example 2: Let F(x, y) = x + ∑  sin( (2n − 1)x)cos( (2n − 1)y). Express F(x, y) in closed-form. 2 푛=1 (2푛−1)2 ² ² Solution: It follows from the above trigonometric identities that 8 (−1)ⁿ ₁ ₁ 2 a) x + ∑  sin( (2n − 1)x)cos( (2n − 1)y) = x + ∑  −8sin(n/2)sin(nx/2)cos(ny/2)/(n) ; and 2 푛=1 (2푛−1)2 ² ² 푛=1 b) x − 8sin(n/2)sin(nx/2)cos(ny/2)/(n)2 = x − 4sin(n/2)(sin(n(x + y)/2) + sin(n(x − y)/2))/(n)2 = x + (−2cos(n(x + y − 1)/2) + 2cos(n(x + y + 1)/2) − 2cos(n(x − y − 1)/2) + 2cos(n(x − y + 1)/2))/(n)2.  2 Since SL2(x) = ∑ 푛=1cos(nx)/(n) , we simplify further by expressing the sum in terms of SL2((x − y − 1)/2). Therefore, F(x, y) = x − 2SL2((x + y − 1)/2) + 2SL2((x + y + 1)/2) − 2SL2((x − y − 1)/2) + 2SL2((x − y + 1)/2). It turns out that when x is fixed so that F(x, y) = F(y), then F(y) is the inverse Laplace transform of F(s) = sinh(xs)/(s2cosh(s)) for 0 < x < 1. It is common for inverse Laplace transforms of elementary functions to have complicated Fourier series representations like the one above. k Even if the an or bn component in a Fourier series is not a finite linear combination of terms of the form ck/n , we might still be able to use SLk(x) to accelerate the sine or cosine series provided that the an or bn component can

k be expressed as an infinite Laurent series of terms of the form ck/n for kℕ. More specifically, we can use  2 SLk(x) to accelerate Fourier series of the form ∑ 푛=1bnsin(nx/t) provided bn is of the form cn/f(n ) and we can use  2 SLk(x) to accelerate Fourier series of the form ∑ 푛=1ancos(nx/t) provided an is of the form c/f(n ). Notice that both forms exclude the odd alternating series and Cl-type Clausen functions. If the series is odd alternating, then it must first be expressed as a linear combination of these forms and if the series is that for a Clausen function for k > 1, then it can still be accelerated using advanced techniques, one of which is discussed at the end of this section. We already illustrated how well this acceleration method works in the previous section.  2 Example 3: Round ∑ 푛=1cos(2n/9)/(4n − 1) to the nearest trillionth using Laurent series. 2 −2 −2 −2 −4 −6 Solution: Since an = a(n) = 1/(4n − 1) = (2n) /(1 − (2n) ) = (2n) + (2n) + (2n) + …,  2 3  2k  2 −2 −4 ∑ 푛=1cos(2n/9)/(4n − 1) = ∑ 푘=1∑ 푛=1cos(2n/9)/(2n) + ∑ 푛=1{1/(4n − 1) − (2n) − (2n) − −6 2 4 6  2 −2 −4 (2n) }cos(2n/9) = (/2) SL2(2/9) − (/2) SL4(2/9) + (/2) SL6(2/9) + ∑ 푛=1{1/(4n − 1) − (2n) − (2n) − −6 100 2 −2 (2n) }cos(2n/9) ≈ 0.167539580883 + 0.047849623987 + 0.011996534949 + ∑ 푛=1{1/(4n − 1) − (2n) − (2n)−4 − (2n)−6}cos(2n/9). Since the partial sum is 0.003992267769 to 12 decimal places, our estimate  2 for ∑ 푛=1cos(2n/9)/(4n − 1) to 12 decimal places is 0.231378007588. It turns out that we can determine the  2 exact value of the sum. Since |sin(x)| has the Fourier series 2/ − (4/)∑ 푛=1cos(2nx)/(4n − 1),  2 ∑ 푛=1cos(2n/9)/(4n − 1) = (2 − sin(/9))/4 ≈ 0.231378007587. Since our estimate for the sum is the sum of four estimates to 12 decimal places, we attribute the difference of one-trillionth to round-off error, not a weakness in our method. Therefore, this method is probably sufficient for all practical purposes.

We can evaluate any Fourier series to any degree of precision with polynomial time complexity iff we can sum its coefficients to any degree of precision with polynomial time complexity. Since the sequence of terms oscillates to its limit, the oscillation may appear to be random at first glance. However, if x/tℚ, then the Fourier series is not a random oscillator because its sine and cosine terms are cyclic. We make our argument only for the sine series because it can be applied to the cosine series with minor adjustments. We leave it to the reader to verify  the following identities. If |x/t| = c/d in lowest terms with c even, then ∑ 푛=1bnsin(nx/t) = 푑−1   푑−1  ∑ 푖=1 (∑ 푛=1bdn+i−d)sin(ix/t). If |x/t| = c/d in lowest terms with c odd, then ∑ 푛=1bnsin(nx/t) = ∑ 푖=1 (∑ 푛=1( n−1 −1) bdn+i−d)sin(ix/t). Since there are a finite number of distinct sine terms in each double sum, they can be replaced by constants and hence the Fourier series can be accelerated using conventional techniques iff the sum of   n−1 each subsequence, ∑ 푛=1bdn+i−d (or ∑ 푛=1( −1) bdn+i−d as the case may be) can be accelerated using conventional techniques. If x/t is irrational or rational with a very large value of d in lowest terms and the Fourier series is continuous at x/t, then we can still approximate the value of the Fourier series at x/t using at least two good rational approximations of x/t where d is not large. The only caveats are that we need at least one good rational approximation just below x/t and at least one good rational approximation just above x/t and the Fourier series cannot have a discontinuity between the rational approximations of x/t. The best way to find a few good rational approximations of x/t with a numerator not exceeding 100 is to use the method presented in Section 5.1. We can sum the Fourier series precisely at each rational approximation of x/t using conventional techniques and then use polynomial interpolation to approximate the Fourier series at x/t. As for discontinuities, it is a good idea to locate them before using an acceleration technique. A crude approach is summing its first 25 terms or so at a few points near x/t and looking for a large gap between two points relative to that between other points. But we can usually do much better. If the {an} and {bn}-terms in the Fourier series have a Laurent series, then we can locate and remove any 1 discontinuities in the Fourier series. The c1osci(nx)oscj(ny)/n -term is either a finite linear combination of ((ax + 2 by)) or Cl1(ax + by) = −ln(4sin ((ax + by)/2))/2, (ax + by)/(2)ℤ. At discontinuities, ((x)) = 0 and Cl1(x) = +. 2 The c2osci(nx)oscj(ny)/n -term is either a finite linear combination of SL2 or Cl2 functions. Since the sawtooth wave function is defined everywhere, its antiderivatives are obviously continuous. It turns out that Cl2(x) is everywhere continuous with a vertical tangent at x = 2k. It follows that if the Fourier series has a Laurent series, 1 then any discontinuities would be in its c1/n -term and we can remove all such discontinuities. To see this, subtract the first two Laurent terms of an or bn from itself and then differentiate the Fourier series with respect to x. We conclude this section with an outline for a method for summing a Fourier series to any degree of precision within a reasonable time complexity.

CHAPTER 7: An Introduction to ODEs with Piecewise Constant Arguments Partially Completed Differential equations with piecewise-constant arguments (EPCA’s) are also known as hybrid continuous- discrete time systems or differential equations with discrete time delay. Many complex dynamical systems that are studied in the sciences contain a combination of continuous and discrete arguments. Differential EPCA’s can be used to model dynamical systems with continuous and discrete arguments in a continuous timeframe. The discrete arguments are best incorporated in a continuous timeframe by expressing them as the greatest integer of a continuous argument (such as time) or as the greatest integer of a function of a continuous argument. The study of differential EPCA’s is relatively young, spearheaded mostly by Joseph Wiener and Kenneth Cooke in the 1980’s. This chapter is a brief introduction to the subject. In Section 2.3, we demonstrated that a first-order ordinary difference equation can be expressed as a first-order ordinary differential equation with the greatest integer function at integers. So, we can numerically approximate first-order ordinary differential EPCA’s with an initial condition using recursive numerical approximation methods such as the improved Euler’s method and we consider this a good place to start. Later, we will demonstrate how to numerically approximate higher-order ordinary differential EPCA’s with an initial condition and express more complicated dynamical systems with discrete arguments as ordinary differential EPCA’s. We conclude each section with solutions to a few simple forms of ordinary differential EPCA’s.

Section 7.1: First-Order ODEs with Piecewise Constant Arguments

In this section, we study first-order differential equations of the form yʹ(x) = f(x, y(x), y(g(x))) where g(x) is a step function. They are often called functional differential equations because yʹ(x) is a function of y(x). If g(x) ≤ x, then they may also be referred to as delayed or retarded. And if g(x) ≥ x, then they may also be referred to as anticipatory or advanced. This terminology does not imply any problem with the differential equation or the dynamical system it represents. Since the first-order difference equation, yn+1 − yn = f(yn), and the differential equation, yʹ(x) = f(x, y([x])) with y(0) = y0, are equivalent for all x  ℕ, we can evaluate f(x, y([x])) at integers using the corresponding difference equation. This relationship does not necessarily hold for higher-order differential EPCA’s. We start this section by approximating differential equations with difference equations. Suppose that the differential equation, yʹ(x) = f(x, y(x)), has the initial condition, y(x0) = y0 and that f(y) and ∂f/∂y are continuous at all x ≥ x0. However, if y(x) = y(g(x)), then y(x) is not continuous. In addition, let y(xn) = yn and let xn+1 = xn + ∆x. Here ∆x denotes the change in x. It is an arbitrary constant usually chosen to be positive and 푥₁+푛∆푥 small. It follows that xn = x0 + n∆x and xn+1 = x1 + n∆x. Since yn+1 = yn + (yn+1 − yn), yn+1 = yn + ∫푥͏ ₀+푛∆푥 yʹ(x)dx. At 푥₁+푛∆푥 푥₁+푛∆푥 this point, we must decide how we want to approximate ∫푥͏ ₀+푛∆푥 yʹ(x)dx. For sufficiently small ∆x, ∫푥͏ ₀+푛∆푥 yʹ(x)dx is approximately the area of a rectangle with height yʹ(xn) = f(xn, y(xn)) and width ∆x. So, yn+1 ≈ yn + ∆xf(xn, yn) for sufficiently small ∆x. The numerical approximation of yʹ(x) = f(x, y(x)) using the recurrence, yn+1 = yn + ∆xf(xn, yn), is known as Euler’s method. Notice that Euler’s method is exact when yʹ(x) = f(y(∆xk[x/(∆xk)])) for all k  ℕ. 푥₁+푛∆푥 Unfortunately, in most instances, numerical approximation of ∫푥͏ ₀+푛∆푥 yʹ(x)dx using rectangles is not very good. In fact, it can produce an error proportional to the size of ∆x. 푥₁+푛∆푥 We can improve upon Euler’s method by numerically approximating ∫푥͏ ₀+푛∆푥 yʹ(x)dx using the trapezoidal rule. 푥₁+푛∆푥 For sufficiently small ∆x, ∫푥͏ ₀+푛∆푥 yʹ(x)dx is approximately the area of a trapezoid with average height, {f(xn, y(xn)) + f(xn+1, y(xn+1))}/2, and width, ∆x. So, yn+1 ≈ yn + {f(xn, y(xn)) + f(xn+1, y(xn+1))}/2. We can use this approximation by entering ∆x, xn, and yn and then solving for yn+1 implicitly. However, if we are satisfied with a decent approximation, then we do not have to settle for an implicit solution. Although the value of yn+1 is not yet known, we do know that yn+1 ≈ yn + ∆xf(xn, yn) for sufficiently small ∆x. So, yn+1 ≈ yn + ∆x{f(xn, yn) + f(xn + ∆x, yn + ∆xf(xn, yn))}/2 for sufficiently small ∆x. The numerical approximation of yʹ(x) = f(x, y(x)) using the recurrence, yn+1 = yn + ∆x{f(xn, yn) + f(xn + ∆x, yn + ∆xf(xn, yn))}/2, is known as the improved Euler’s method. The numerical 푥₁+푛∆푥 approximation of ∫푥͏ ₀+푛∆푥 yʹ(x)dx using a trapezoid is decent, but still not very good. It can produce an error proportional to the size of (∆x)2. Although there are more powerful numerical approximation techniques such as the Taylor and Runge-Kutta methods, we do not pursue them here. Our primary objectives were to present a decent numerical approximation technique and describe the relationship between difference and differential equations. After this next example, our objectives will be met.

Example 1: Let yʹ(x) = y([4x]/4) − 2xy(x) and y(1.25) = 1. Set ∆x = 0.1 and numerically approximate y(x) for 1.3 ≤ x ≤ 1.9 using the improved Euler’s method. Solution: To compute y(1.3) and y(1.8), we have to cheat a bit and set the step size at 0.05 because we were not given y(1.2) and y([4x]/4) is constant over [k/4, (k + 1)/4). Using the formula to compute y(1.3) yields y(1.3) ≈ y(1.25) + 0.025{y(1.25) − 2  1.25y(1.25) + y(1.25) − 2  1.3(y(1.25) + 0.05(y(1.25) − 2  1.25y(1.25)))} = 0.927375. In this case, we cannot easily proceed backward. That is, we cannot compute values of y(x) for x < 1.25 without first setting ∆x = −0.25 and working backwards because the value of y(1.0) is not given. Our outputs from the iterative process are provided in the table below. They were rounded to six significant figures at each step.

x 1.250 1.300 1.400 1.500 1.600 1.700 1.750 1.800 1.900 y(x) 1.000 0.927 0.797 0.668 0.549 0.453 0.406 0.359 0.283

If the improved Euler’s method is new for you, then the above example might have been a little tricky. Although we only recognized x and y(x) as variables, y([4x]/4) is a third variable if [4x]/4 < x. We could treat y([4x]/4) as a constant when approximating y(1.7) because its value is frozen over [1.5, 1.75), but it forces us to compute y(1.75) just to compute y(1.8) and it prevented us from easily working backwards because the value of y(1.0) is not given. For this reason, numerical approximation of differential equations with the greatest integer function in some of the arguments may require many initial conditions. Our next objective is to construct differential equations that describe dynamical systems with continuous and discrete arguments in a continuous timeframe. Dynamical systems with continuous and discrete arguments can arise in just about any scientific discipline. We offer some practical examples of EPCA’s in the next paragraphs. Our first example, from Akhmet, is a model of arterial blood pressure over time. Our assumptions are that a) blood circulation is continuous and blood pressure, P(t), decays exponentially with respect to time after each heartbeat and b) each heartbeat occurs at a discrete and roughly uniform time interval, i, instantaneously raising the blood pressure by u units. This system can be described mathematically as Pʹ(t) = −cP(t) for some positive constant, c, and t/i  ℕ and ∆P(t) = u for some positive constant, u, and t/i  ℕ. It follows that P(t) = p(t) + u[t/i] where p(t) represents the cumulative drop in blood pressure over time. Further, p(t) is continuous but not differentiable at t/i  ℕ and Pʹ(t) = pʹ(t) = −c(p(t) + u[t/i]) at t/i  ℕ.

Our last objective for this section is solving a few simple forms of first-order ODEs involving [x]. Since they are hybrid difference-differential equations, their solution requires us to solve both difference and differential equations. All of these results have been published by Wiener and Cooke. Let yʹ(x) = ay([x]) and y(0) = y0 where a is a constant. At integers, this is equivalent to the recurrence relation, k yk+1 − yk = ayk; yk+1 = (a + 1)yk; and hence yk = (a + 1) y0. If x  ℕ, then we are done. Otherwise, x = k + f and for k 푘+푓 k all k ≤ x < k + 1, yʹ(x) = ay(k) = a(a + 1) y0. Integrating with respect to x yields y(x) = yk + ∫푘͏ yʹ(x)dx = afy0(a + 1) k [x] + (a + 1) y0. Hence, y(x) = y0(ax − a[x] + 1)(a + 1) . Since yʹ(x) is defined for all x, y(x) is continuous. Let yʹ(x) = ay(x) + by([x]) and y(0) = y0 where a and b are constants. This equation is trickier because the greatest integer function is not in both arguments on the right hand side. Fortunately for all k ≤ x < k + 1, yʹ(x) = ax ay(x) + byk. Since yk is a constant, this is a first-order differential equation with the solution yk(x) = ce − byk/a where c is a constant and yk(x) denotes the local solution just for all k ≤ x < k + 1. Since k is a constant, we can set ak ax a(x−k) c' = ce so that ce = c'e . This substitution will help us express yk+1 strictly in terms of yk. Setting x = k yields a(x−k) a(x−k) y(k) = yk = c' − byk/a; c' = (b/a + 1)yk; and hence yk(x) = (e b/a + e − b/a)yk. Since yʹ(x) is defined for all a a x, y(x) is continuous. This means that yk(k + 1) = yk+1(k + 1). So, yk(k + 1) = yk+1 = (e b/a + e − b/a)yk; yk = a a k a(x−k) a(x−k) a a k (e b/a + e − b/a) y0; and hence yk(x) = (e b/a + e − b/a)(e b/a + e − b/a) y0. Therefore, y(x) = a(x−[x]) a(x−[x]) a a [x] (e b/a + e − b/a)(e b/a + e − b/a) y0.

I am using the following sources to write this chapter. Their partial bibliographies are listed below. Generalized Solutions of Functional Differential Equations by Joseph Wiener http://books.google.com/books?id=cGiPd6X88ckC Nonlinear Hybrid Continuous/Discrete-Time Models by Marat Akhmet http://books.google.com/books?id=Q1sSdcHwPWEC

CHAPTER 8: General methods for solving Diophantine Equations Preview Only In Section 5.1, we introduced Diophantine equations and studied how to solve some of them and some of their applications. We presented fast Euclidean and Euclidean-like algorithms for solving linear Diophantine equations and Pell equations, which are by far the most important type. However, if we want to solve other types of Diophantine equations over large intervals for x, then the trial-and-error process could be time consuming. Since there are more efficient methods for solving many types of Diophantine equations and the ability to solve them is still of great theoretical interest, we present general algorithms for solving them here. While many texts restrict solutions to the rationals, as Diophantus originally did, we go a step further and restrict solutions to the integers or positive integers, depending on the context. Given the level of difficulty and depth of the problem of solving Diophantine equations, there is no good general method for solving them. And decent general methods for solving them are exceedingly scarce and don’t work on most forms. Our method is no exception, but it stands out from the rest in that it is a Euclidean-like algorithm. It is applicable on a broad set of Diophantine equations. And on those for which it works, it is very powerful at reducing the trial-and-error intervals. Before continuing, you may want to review previous sections of this work for additional background. If you have not yet read Section 1.1 and Section 5.1, then do so now. And if you are unfamiliar with lattice points or reciprocity theorems, then review all of Chapter 4. It is important to read this chapter in order because each example, though an end in itself, is also a stepping stone for the next.

Section 8.1: Developing Theory for Solving Diophantine Equations

Three common methods for solving a broad range of Diophantine equations or proving no solutions exist are Fermat’s method of descent, Gauss’s method of exclusion, and Unique factorization in rings. We briefly introduce each method in order of discovery from oldest to most recent and discuss some pros and cons of each. We also encourage the reader to seek additional sources. Fermat’s method of infinite descent is used for solving Diophantine equations of the form f(x, y) = 1. To use it, start with a solution to f(x, y) = k for any k > 1. Then, using a variety of clever manipulations to f(x, y) = k, you find a solution to f(x, y) = k' for k' < k. By proceeding in this fashion, you are guaranteed to arrive at a solution to f(x, y) = 1 after a finite (and usually relatively small) number of steps. The Chakravala method for solving Pell equations (which precedes Fermat by several centuries) and Fermat’s algorithm for solving x2 + y2 = p for a prime, p ≡ 1 mod 4, are special cases of infinite descent. And Euler would later use it to prove Fermat’s theorem that x2 + y2 = p is solvable iff p ≡ 1 mod 4. It is also a powerful method for proving by contradiction that a particular type of Diophantine equation has no solutions; start by assuming it has a smallest integer solution and then prove it must have an even smaller one. A good elementary source for learning more about infinite descent is A Friendly Introduction to Number Theory 2nd ed. by Joseph Silverman; see appendix. Gauss’s method of exclusion reduces the trial-and-error interval by excluding arithmetic progressions for x which cannot produce a solution. In Section 5.1, we introduced simple methods for proving some Diophantine equations either cannot have solutions or cannot have solutions of a particular form (such as x ≡ 3 mod 8). Of course, when setting out to solve one, this is an important first step, but we can go further. Gauss leveraged these techniques to exclude non-integer solutions over the trial-and-error interval. By excluding forms that have no solution in a dozen or so different moduli, he could eliminate well over 99% of non-integer solutions over the interval before starting trial-and-error on the Diophantine equation. Sieving (for finding primes or divisors of a natural number) is a special case of exclusion. Unfortunately, when multiple moduli are used, the trial-and-error process is not eliminated; it is merely shifted from finding solutions to the Diophantine equation to finding the next value of x that is not excluded. In Gauss’s time, this greatly simplified the arithmetic and hence accelerated the process. However, with modern computing, it is more often not worth the fuss. For this reason, his method is not used much today. Even so, it can often help us determine if no solutions exist and it can still reduce the trial-and-error interval if just one modulus is used. That is, when using the calculator intersection method from Section 5.1, you can replace x with mx + b for any b not excluded in modulo m. You can go through the values of b until the desired number of integer solutions is found or the list is exhausted. A good illustration of Gauss’s method of exclusion is given in “Replies” by Escott and Ling; see appendix. If the Diophantine equation can be expressed as (y + k)n = p(x) where p(x) is a polynomial with integer coefficients, then it can often be solved with a non-trivial factorization of p(x) into two coprime polynomials with integer coefficients. If p(x) cannot be factored into smaller polynomials with coefficients in ℤ, p(x) may be amenable to factorization with coefficients in ℤ[i] or ℤ[√2i] or another ℤ-ring. Any ℤ-ring can be used as long as it is a unique factorization domain (UFD). The factorization can usually be expressed as another, often much simpler Diophantine equation which can be solved by factoring. The factorization is not of the difficulty level necessary for the RSA cryptosystem; rather it is usually so trivial that it can be done without a calculator. Plenty of good sources for learning more about this method are online. For more advanced results on Diophantine equations of this form, see the works of Axel Thue and Louis Mordell.

There is another general approach we can use for solving some types of Diophantine equations of the form y = f(x). In Example 1 of Section 5.1, we inverted f(x) because its domain exceeds its range. By finding integer solutions for f −1(x), we used trial-and-error over a narrower interval, [0, 100] as opposed to [0, 1000]. Our objective is successively narrowing the interval for trial-and-error without eliminating integer solutions. We meet this objective by performing successive iterations on our Diophantine equation with the following four steps:

______BIBLIOGRAPHIES______Funk, Gerald M., and Martin Buntinas. Applied Statistics for Science. Chicago: Loyola University of Chicago Press, 2002.

Granville, Andrew, Binomial coefficients modulo prime powers School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, USA. http://www.dms.umontreal.ca/~andrew/PDF/BinCoeff.pdf.

"Calendars" Encyclopaedia Brittanica. 1992 edition.

Weisstein, Eric W. "Clausen Function." From MathWorld—A Wolfram Web Resource. http://mathworld.wolfram.com/ClausenFunction.html

Graham, Ronald L.; Knuth, Donald E.; Patashnik, Oren (1990), Concrete Mathematics, Reading Ma.: Addison-Wesley, ISBN 0-201-55802-5

Rademacher, Hans and Grosswald, Emil. Dedekind Sums. Carus Mathematical Monographs #16. Washington D.C.: The Mathematical Association of America, 1972. http://www.maths.ed.ac.uk/~aar/papers/rademacher2.pdf

C. T. Long, Elementary Introduction to Number Theory. Boston, Massachusetts: D.C. Heath, 1965.

Dence, Joseph B., Elements of the Theory of Numbers. San Diego, California: Academic Press, 1999.

"Euclid" Encyclopaedia Brittanica. 1992 edition.

Rotman, Joseph J., A First Course in Abstract Algebra, Second Edition. Upper Saddle River, New Jersey: Prentice−Hall Inc., 2000.

"Floor and Ceiling Functions" Wikipedia, the free Encyclopedia. http://en.wikipedia.org/wiki/Greatest_integer_function#cite_note−17

Stein, Elias M. and Shakarchi, Rami. Fourier Analysis An Introduction. Princeton, New Jersey: Princeton University Press, 2002. http://prof.usb.ve/bueno/Libros/069111384X%20-%20Princeton%20University%20- %20Fourier%20Analysis~%20An%20Introduction%20-%20%282003%29.pdf

Silverman, Joseph H., A Friendly Introduction to Number Theory, Second Edition. Upper Saddle River, New Jersey: Prentice−Hall Inc., 2001.

Nagle, R. Kent, et al. Fundamentals of Differential Equations and boundary value problems. Reading, Massachusetts: Addison Wesley Longman, 2000.

S. Avital, et al. "The Greatest Integer Function: A Function with Many Uses" Two−Year College Mathematics Readings. 1981.

Hildebrand, A.J., Introduction to Analytic Number Theory Math 531 Lecture Notes, Fall 2005 https://faculty.math.illinois.edu/~hildebr/ant/main.pdf

"Jacobi Symbol" Wikipedia, the free Encyclopedia. http://en.wikipedia.org/wiki/Jacobi_symbol

Weisstein, Eric W. "Kummer's Series Transformation." From MathWorld—A Wolfram Web Resource. http://mathworld.wolfram.com/KummersSeriesTransformation.html

Merriam−Webster's collegiate dictionary.−10th ed. Springfield, Massachusetts: Merriam−Webster, Inc., 1993.

Bascilla, Julius Magalona., “On the solution of x² + dy²=m” Proc. Japan Acad. Ser. A Math. Sci. Volume 80, Number 5 (2004), 40−41. http://projecteuclid.org/euclid.pja/1116442240

Rousseau, Suzanne. “Quadratic and Cubic Reciprocity” (2012). EWU Masters Thesis Collection. 27. http://dc.ewu.edu/thesis/27

Escott, E. B., and G. H. Ling. “Replies.” The American Mathematical Monthly, Volume 26, No. 6, 1919, pp. 239–241. JSTOR, JSTOR, www.jstor.org/stable/2973523.

Li, Kin Y. “Summation by Parts” Mathematical Excalibur Volume 11, Number 3, 2006 https://www.math.ust.hk/excalibur/v11_n3.pdf

Berndt, Bruce C., and Ulrich Dieter. "Sums involving the greatest integer function and Riemann−Stieltjes integration" Journal für die reine und angewandte Mathematik. 1982. https://gdz.sub.uni-goettingen.de/id/PPN243919689_0337?tify={"view"%3A"info"%2C"pages"%3A[212]}

Stewart, B. M., Theory of Numbers, New York, New York: MacMillan Company, 1963.

A−F______INDEX G−P______INDEX Q−Z______INDEX Anton’s congruence 2.2,3.2 Gauss, Carl Friedrich 4.1,4.2 quadratic residue 4.2,5.1 arithmetic function 2.1,5.3 method of exclusion 8.1 quotient 1.1,2.1,3.1,3.2 bar graph 1.1 third proof of QR 4.2 radicals, solution by 4.1 Bezout's lemma 4.1,5.1 golden ratio 1.2,4.1,5.1 rational approximation 4.1,5.1 Bernoulli polynomial 1.2,2.3 greatest common divisor, gcd 2.1 mapping from ℤ onto ℚ 1.2 binomial coefficient 2.1,3.2 of Eisenstein integers 4.1 reciprocity 4.1,4.2,4.3 binary expansion 3.2 of Gaussian integers 4.1 subtraction for gcd 2.1,4.1 block-truncation 5.2 Greatest integer axiom 1.1,2.1 Quadratic 4.2 bracket notation 1.1 greatest integer exponent, e 2.1 Dedekind 4.3 calculator−intersection 5.1 gridlines 5.2 rectangle 5.2 calendar formula 2.2 Hermite’s identity 2.1,4.3 recurrence relation 2.3,7.1 ceiling and floor 1.1,2.3 ideal 4.1 relatively prime 2.1,2.2,4.1 change of base 3.2 image 5.2 remainder 1.1,2.1,2.2,4.1 Chinese remainder theorem 4.1 imaginary inverse of [x] 2.1 Riemann-Stieltjes Integral4.3,9.2 Chebyshev function 5.3 Improved Euler’s method 7.1 Riemann zeta,ζ(s) 1.2,5.3,6.1,6.2 circle equation 8.1 Indicator divisor 2.1 Riemann hypothesis 5.3 congruence relation 2.2,4.1 induction 1.2 rounding 1.1,4.1,4.2,4.3 connected graph 2.1,2.2,5.1 integer (see also number) 1.1 function, round(x,n) 1.1 continued fractions 4.1,5.1 factorization 5.1,8.2 sandwich iteration 1.1,5.1,5.2 continuity 1.1,1.2,2.3 functions 1.1 sawtooth wave function 1.1 convergence 1.2,2.3,5.3,6.1,6.2 Gaussian and Eisenstein 4.1 Dedekind summation 4.3 convergent 4.1 solutions to equations 5.1,8.1 Properties of 1.1,1.2,4.3 Cornacchia’s algorithm 8.1 interpolation 5.2,6.2 Fourier series of 1.2,6.2 countable set 1.2,2.3 inverse semiprime 5.1,8.2 cryptosystem, asymmetric 5.1 imaginary inverse of [x] 1.2 sequence Dedekind eta function 4.3 multiplicative 2.2,3.2,4.3 Beatty 1.2 Dedekind sum 4.3 summation−integration 2.3 Cauchy 1.2 de Polignac's formula 1.2,3.2 iPart(x), fix(x) 1.1 Fibonacci 4.1,5.1 derivative of [x] 1.1,1.2 irrational number 1.2,4.1,5.1 Implicit 2.3 Dieter, Ulrich 4.3,9.2 Jacobi symbol 4.2,4.3 limit of 1.2,2.3,6.1 digit 3.1,3.2 jump discontinuity 1.1,1.2 monotone 1.2,2.3,6.1 decimal, definition of 3.1 Kummer −Beitler acceleration 6.1 of series (see series) mixed base 3.2 lattice−point diagram 2.3,4.1,4.2 series (see also sequence) 1.2 b−adic 3.2 Legendre symbol 4.2 acceleration 6.1 difference equation 2.3,7.1 limits (see convergence) Dirichlet 5.3,6.1 differential equation 2.3,7.1 linear transformation 1.1 divergent 1.2 dilation transform of series 6.1 logical iterations of [x] 1.1 Fourier series 1.2,4.3,6.2 Diophantine equation 5.1 long division 3.1,3.2 implicit 2.3 algorithm 8.1 Lucas's theorem 3.2 Laurent 6.1,6.2 Dirichlet, Peter Gustav 5.3 mathematical model for [x] 1.2 Leibniz 2.3 divisor problem 5.3 mixed base 3.2 Set spanning 1.2 product (see also series) 5.3 Mobius function 5.3 Sigma function 2.1 discontinuity 1.1,2.1,2.2 mod function, x mod m 1.1,2.2 Standard Clausen function 6.2 discrete 1.1 modular equation 3.2,4.2,5.1 step (integer) functions 1.1 color pigmentation 5.2 modular transformation 4.3 square 5.2 roots in moduli 4.2,5.1 modulus (in congruences) 2.2 perfect 4.1 reciprocal mod m 3.2,4.3,5.1 of a complex number 4.1 successive squaring 3.2 divisibility 2.1,2.2,3.1,4.1 monotonicity 1.2,2.3,6.1,8.1 summation 1.2,2.2,5.1,6.1 division algorithm 1.1,4.1,5.1 nonresidue 4.2 Abel (by parts) 2.3 see also gcd and reciprocity norm 4.1 And integration 2.3,5.3,6.1 edge function 5.2 number 1.1 Dedekind 4.3 Eisenstein, Gotthold Max 4.2 base (b-adic) 3.1,3.2 double 3.2 integers 4.1 complex, ℂ 4.1 Euler-Maclaurin 2.3,5.3,6.1 lemma 4.2 harmonic 5.3 expression for gcd and tau 2.1 Euclid's lemma 2.1,3.2,4.1 integer, ℤ or k 1.1 identities (see reciprocity) Euclidean Algorithm 4.1 natural, ℕ 1.1 of b−adic digits 3.1,3.2 Extended (by Bezout) 5.1 rational, ℚ 1.1 of Fourier series 1.2,6.2 Ring 4.1 real, ℝ 1.1 of divisors 2.1,5.3 Euler’s criterion 4.2 theoretic function 2.1 o f monotone series 2.3, 6.1 Euler formulas 1.2 oscillating function 1.2,6.2 simultaneous congruence 4.1,5.1 Euler-Maclaurin formula 2.3,6.1 Pell equation 5.1,8.1 tetrahedron 4.3 Euler-Mascheroni constant 5.3,6.1 periodic function 1.2 tau divisor function 2.1,5.1,5.3 Euler phi function 2.2,5.1 pigeonhol e principle 1.1 truncation 1.1 Fermat's little theorem 2.1 polyhedron 4.3 block 5.2 Fourier, Joseph 1.2 prime number 2.1,2.2 uniform summation notation 2.1 fraction part, denoted as f 1.1 modulus 2.2,3.2,4.2 von Mangoldt function 5.3 function-see integer functions testing 3.2,4.2 wafer 1.1,1.2 set {nx − [nx]} in reals 1.2 primitive root 4.2 Wilson’s Theorem 2.2 probability distribution 1.1,2.1 Wintner’s mean value theorem 5.3