A Scalable System-On-A-Chip Architecture for Prime Number Validation

Total Page:16

File Type:pdf, Size:1020Kb

A Scalable System-On-A-Chip Architecture for Prime Number Validation A SCALABLE SYSTEM-ON-A-CHIP ARCHITECTURE FOR PRIME NUMBER VALIDATION Ray C.C. Cheung and Ashley Brown Department of Computing, Imperial College London, United Kingdom Abstract This paper presents a scalable SoC architecture for prime number validation which targets reconfigurable hardware This paper presents a scalable SoC architecture for prime such as FPGAs. In particular, users are allowed to se- number validation which targets reconfigurable hardware. lect predefined scalable or non-scalable modular opera- The primality test is crucial for security systems, especially tors for their designs [4]. Our main contributions in- for most public-key schemes. The Rabin-Miller Strong clude: (1) Parallel designs for Montgomery modular arith- Pseudoprime Test has been mapped into hardware, which metic operations (Section 3). (2) A scalable design method makes use of a circuit for computing Montgomery modu- for mapping the Rabin-Miller Strong Pseudoprime Test lar exponentiation to further speed up the validation and to into hardware (Section 4). (3) An architecture of RAM- reduce the hardware cost. A design generator has been de- based Radix-2 Scalable Montgomery multiplier (Section veloped to generate a variety of scalable and non-scalable 4). (4) A design generator for producing hardware prime Montgomery multipliers based on user-defined parameters. number validators based on user-specified parameters (Sec- The performance and resource usage of our designs, im- tion 5). (5) Implementation of the proposed hardware ar- plemented in Xilinx reconfigurable devices, have been ex- chitectures in FPGAs, with an evaluation of its effective- plored using the embedded PowerPC processor and the soft ness compared with different size and speed tradeoffs (Sec- MicroBlaze processor. Our work demonstrates the flexibil- tion 6). (6) Evaluation of a SoC approach using customised ity and trade-offs in using reconfigurable platform for de- IP blocks with both hard [5] and soft processor (Section 7). veloping System-on-a-Chip applications. It is shown that, The rest of the paper is organized as follows. Section 2 de- for instance, a 1024-bit Primality test can be completed in scribes the related work. Section 3 presents a design flow less than a second, and a low cost XC3S2000 FPGA chip for mapping scalable prime number validators into hard- can accommodate a 32k-bit scalable primality test with 64 ware. Section 4 covers a scalable architecture for modu- parallel processing elements. lar multiplication. Section 5 presents a design generator GenDesign which produces architectures with different 1 Introduction design trade-offs, based on user-specified parameters. Sec- tion 6 provides experimental results of one single IP-block. Section 7 describes the SoC-based framework using em- For centuries, the problem of validating prime numbers, the bedded processors. Finally, Section 8 summarises our cur- Primality test, has posed a great challenge to both computer rent and future research. scientists and mathematicians [6]. The problem of identi- fying Prime and Composite numbers is known as one of the most important problems in arithmetic. Many security 2 Background and Related work applications also involve large numbers: while it is easy to Primality test is essential for prime number generation. multiply prime numbers to get a product, the reverse pro- Cryptography often uses large prime numbers. For in- cess of recovering the primes is much more difficult. stance, RSA public key algorithm requires 512-bit, 1,024- Field programmable gate arrays offer a rapid-prototyping bit for key generation [17]. Much research work has been platform for the verification of embedded systems.FPGAs done on primality test and generation, and most algorithms are also used as active components in security systems, are based on factorization [16]. In [2], the primality test has such as Firewall processors [12]. As the threat of attacks been proved to be solvable in polynomial time. However, appears to be increasing, many existing systems do not it requires a log operation which is expensive for hardware seem to be secure enough. One of the reasons is due to implementation; it also requires knowledge of the primal- the use of weak key generation. It has been recognised ity of preceding numbers, which is impractical for arbitrary that strong prime number generation is important, and the prime testing. Joye et al [10] has proposed an efficient prime validation is an intrinsic part of the generation. Here prime number generation scheme based on pseudo-random we define validation as the process of performing prime number generation using a smart-card implementation. Lu testing on a given number. et al [13] has further developed the RSA key generation for smart card using different prime number algorithms. Input: The number under primality test: p Input: X * Y (mod M), Output: S 1. S = 0 1. Random choose a number b in [1, p-1] Z 2. for i = 0 to n - 1 (q m) 2 (0) (0) (0) 2. Let p = 2 × +1 where m is an odd integer 3. (Ca,S ) = xi*Y + S 3. IF either 4. if S(0) = 1 then m 0 4. Case 1: b = 1 (mod p) or 5. (Cb,S(0)) = S(0) + S(0) 5. Case 2: there is an integer i in [0, q-1] i 6. for j = 1 to e m 2 (j) (j) (j) 6. such that b × = -1 (mod p) 7. (Ca,S ) = Ca + xi*Y + S (j) (j) (j) 7. RETURN "Inconclusive" 8. (Cb,S ) = Cb + M + S 8. ELSE (j 1) (j) (j 1) 9. S − = (S0 ,Sw −1::1) 9. RETURN "Composite" 10. end for − 11. else 12. for j = 1 to e (j) (j) (j) Figure 1: The Rabin-Miller Probabilistic Primality Test 13. (Ca,S ) = Ca + xi*Y + S (j 1) (j) (j 1) 14. S − = (S0 ,Sw −1::1) 15. end for − 16. end if There is a special class of very large prime numbers (e) which have the representation of 2n 1. They are called 17. s = 0 − 18. end for Mersenne prime numbers. The 40th Mersenne prime num- ber 224;036;583 1 has been discovered in May 2004. A − clustered workforce GIMPS [9] in the Internet is currently Figure 2: The Multiple Word Radix-2 Montgomery Multiplica- dedicated on locating the next Mersenne number. tion Algorithm 2.1 Rabin-Miller Primality Test cation has been implemented as a coprocessor in reconfig- urable RSA System-on-Chip building block in [8]. Further- In this paper, we have selected the Rabin-Miller probabilis- more, a high-radix design of scalable modular multiplier tic primality test algorithm [15] (Figure 1) as the core for has also been discussed. We observe that the scalable de- the primality test. The algorithm can quickly determine sign is particularly useful for very large precision such as the primality of the given large number with a control- the prime test. lably small probability of error [1]. It requires a num- ber of small primes for repeatedly testing the input num- ber. The probability of falsely identifying a composite 3 Design Flow as a prime decreases with every additional small prime used. This method enables the tradeoff between accuracy In this section we present the design flow for our scalable (more small primes) and efficiency (fewer small primes) prime number validator. The input to the system includes for different applications. As shown in the algorithm, user-specified constraints, such as the bit-width of the input “inconclusive” implies the number p maybe prime. prime number and the word-width which is required for The selection of number b is based on a set of small primes the scalable multiplier, while the output from the system is such as 2, 3, 5, ... With the use of first eight small primes, a synthesizeable hardware block that can be embedded in the 100% accuracy of primality test can be achieved for different cryptographic designs. This flexible design flow numbers up to 3:4 1010 [1]. facilitates the ability to upgrade existing security systems × with very small overhead. 2.2 Scalable Montgomery Algorithm The major features of the proposed validator include: (1) A variable compile-time prime number validator in which The fundamental operation of the RSA public key cryp- user can easily update the complexity of the system by tosystem is modular exponentiation, achieved by repeated controlling the prime width. (2) The variable input small modular multiplications. The Montgomery modular mul- prime numbers determine the accuracy and performance of tiplication [14] has been widely used to speed up the mul- the system. (3) The user-specified word-width controls the tiplication, squaring and exponentiation. There are many modular operation taken in the hardware. (4) Users are able different extended algorithms [11] and software implemen- to select different Montgomery architectures for hardware tation based on the Montgomery algorithm, and the high- implementation and thus enhance rapid prototyping. radix Montgomery modular exponentiation has been im- plemented in reconfigurable hardware [3]. The Rabin-Miller algorithm is first implemented using standard multiplication and exponentiation with a sequen- The reusability and scalability of the Montgomery multi- tial modulo-multiply. We observe that one of the bottle- plier have been investigated in [18, 19] such that the de- necks of the Rabin-Miller algorithm is the modular ex- sign is no longer restricted to a fixed precision. The in- ponentiation in testing the two valid checks in Section put operands are partitioned into multiple words. Figure 2 2. Montgomery algorithm is widely used for modu- shows the algorithmic description of the Radix-2 Mont- lar multiplication and exponentiation. It requires a con- gomery multiplication. The operand Y (multiplicand) is stant at the start of algorithm to facilitate conversion be- scanned word-by-word and the operand X (multiplier) is tween standard and Montgomery space.
Recommended publications
  • Computation of 2700 Billion Decimal Digits of Pi Using a Desktop Computer
    Computation of 2700 billion decimal digits of Pi using a Desktop Computer Fabrice Bellard Feb 11, 2010 (4th revision) This article describes some of the methods used to get the world record of the computation of the digits of π digits using an inexpensive desktop computer. 1 Notations We assume that numbers are represented in base B with B = 264. A digit in base B is called a limb. M(n) is the time needed to multiply n limb numbers. We assume that M(Cn) is approximately CM(n), which means M(n) is mostly linear, which is the case when handling very large numbers with the Sch¨onhage-Strassen multiplication [5]. log(n) means the natural logarithm of n. log2(n) is log(n)/ log(2). SI and binary prefixes are used (i.e. 1 TB = 1012 bytes, 1 GiB = 230 bytes). 2 Evaluation of the Chudnovsky series 2.1 Introduction The digits of π were computed using the Chudnovsky series [10] ∞ 1 X (−1)n(6n)!(A + Bn) = 12 π (n!)3(3n)!C3n+3/2 n=0 with A = 13591409 B = 545140134 C = 640320 . It was evaluated with the binary splitting algorithm. The asymptotic running time is O(M(n) log(n)2) for a n limb result. It is worst than the asymptotic running time of the Arithmetic-Geometric Mean algorithms of O(M(n) log(n)) but it has better locality and many improvements can reduce its constant factor. Let S be defined as n2 n X Y pk S(n1, n2) = an . qk n=n1+1 k=n1+1 1 We define the auxiliary integers n Y2 P (n1, n2) = pk k=n1+1 n Y2 Q(n1, n2) = qk k=n1+1 T (n1, n2) = S(n1, n2)Q(n1, n2).
    [Show full text]
  • Course Notes 1 1.1 Algorithms: Arithmetic
    CS 125 Course Notes 1 Fall 2016 Welcome to CS 125, a course on algorithms and computational complexity. First, what do these terms means? An algorithm is a recipe or a well-defined procedure for performing a calculation, or in general, for transform- ing some input into a desired output. In this course we will ask a number of basic questions about algorithms: • Does the algorithm halt? • Is it correct? That is, does the algorithm’s output always satisfy the input to output specification that we desire? • Is it efficient? Efficiency could be measured in more than one way. For example, what is the running time of the algorithm? What is its memory consumption? Meanwhile, computational complexity theory focuses on classification of problems according to the com- putational resources they require (time, memory, randomness, parallelism, etc.) in various computational models. Computational complexity theory asks questions such as • Is the class of problems that can be solved time-efficiently with a deterministic algorithm exactly the same as the class that can be solved time-efficiently with a randomized algorithm? • For a given class of problems, is there a “complete” problem for the class such that solving that one problem efficiently implies solving all problems in the class efficiently? • Can every problem with a time-efficient algorithmic solution also be solved with extremely little additional memory (beyond the memory required to store the problem input)? 1.1 Algorithms: arithmetic Some algorithms very familiar to us all are those those for adding and multiplying integers. We all know the grade school algorithm for addition from kindergarten: write the two numbers on top of each other, then add digits right 1-1 1-2 1 7 8 × 2 1 3 5 3 4 1 7 8 +3 5 6 3 7 914 Figure 1.1: Grade school multiplication.
    [Show full text]
  • Abstract in This Paper, D-Strong and Almost D-Strong Near-Rings Ha
    Periodica Mathematica Hungarlca Vol. 17 (1), (1986), pp. 13--20 D-STRONG AND ALMOST D-STRONG NEAR-RINGS A. K. GOYAL (Udaipur) Abstract In this paper, D-strong and almost D-strong near-rings have been defined. It has been proved that if R is a D-strong S-near ring, then prime ideals, strictly prime ideals and completely prime ideals coincide. Also if R is a D-strong near-ring with iden- tity, then every maximal right ideal becomes a maximal ideal and moreover every 2- primitive near-ring becomes a near-field. Several properties, chain conditions and structure theorems have also been discussed. Introduction In this paper, we have generalized some of the results obtained for rings by Wong [12]. Corresponding to the prime and strictly prime ideals in near-rings, we have defined D-strong and almost D-strong near-rings. It has been shown that a regular near-ring having all idempotents central in R is a D-strong and hence almost D-strong near-ring. If R is a D-strong S-near ring, then it has been shown that prime ideals, strictly prime ideals and com- pletely prime ideals coincide and g(R) = H(R) ~ ~(R), where H(R) is the intersection of all strictly prime ideals of R. Also if R is a D-strong near-ring with identity, then every maximal right ideal becomes a maximal ideal and moreover every 2-primitive near-ring becomes a near-field. Some structure theorems have also been discussed. Preliminaries Throughout R will denote a zero-symmetric left near-ring, i.e., R -- Ro in the sense of Pilz [10].
    [Show full text]
  • Neural Networks and the Search for a Quadratic Residue Detector
    Neural Networks and the Search for a Quadratic Residue Detector Michael Potter∗ Leon Reznik∗ Stanisław Radziszowski∗ [email protected] [email protected] [email protected] ∗ Department of Computer Science Rochester Institute of Technology Rochester, NY 14623 Abstract—This paper investigates the feasibility of employing This paper investigates the possibility of employing ANNs artificial neural network techniques for solving fundamental in the search for a solution to the quadratic residues (QR) cryptography problems, taking quadratic residue detection problem, which is defined as follows: as an example. The problem of quadratic residue detection Suppose that you have two integers a and b. Then a is is one which is well known in both number theory and a quadratic residue of b if and only if the following two cryptography. While it garners less attention than problems conditions hold: [1] such as factoring or discrete logarithms, it is similar in both 1) The greatest common divisor of a and b is 1, and difficulty and importance. No polynomial–time algorithm is 2) There exists an integer c such that c2 ≡ a (mod b). currently known to the public by which the quadratic residue status of one number modulo another may be determined. This The first condition is satisfied automatically so long as ∗ work leverages machine learning algorithms in an attempt to a 2 Zb . There is, however, no known polynomial–time (in create a detector capable of solving instances of the problem the number of bits of a and b) algorithm for determining more efficiently. A variety of neural networks, currently at the whether the second condition is satisfied.
    [Show full text]
  • Version 0.5.1 of 5 March 2010
    Modern Computer Arithmetic Richard P. Brent and Paul Zimmermann Version 0.5.1 of 5 March 2010 iii Copyright c 2003-2010 Richard P. Brent and Paul Zimmermann ° This electronic version is distributed under the terms and conditions of the Creative Commons license “Attribution-Noncommercial-No Derivative Works 3.0”. You are free to copy, distribute and transmit this book under the following conditions: Attribution. You must attribute the work in the manner specified by the • author or licensor (but not in any way that suggests that they endorse you or your use of the work). Noncommercial. You may not use this work for commercial purposes. • No Derivative Works. You may not alter, transform, or build upon this • work. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to the web page below. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author’s moral rights. For more information about the license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ Contents Preface page ix Acknowledgements xi Notation xiii 1 Integer Arithmetic 1 1.1 Representation and Notations 1 1.2 Addition and Subtraction 2 1.3 Multiplication 3 1.3.1 Naive Multiplication 4 1.3.2 Karatsuba’s Algorithm 5 1.3.3 Toom-Cook Multiplication 6 1.3.4 Use of the Fast Fourier Transform (FFT) 8 1.3.5 Unbalanced Multiplication 8 1.3.6 Squaring 11 1.3.7 Multiplication by a Constant 13 1.4 Division 14 1.4.1 Naive
    [Show full text]
  • Diffie-Hellman Key Exchange
    Public Key Cryptosystems - Diffie Hellman Public Key Cryptosystems - Diffie Hellman Get two parties to share a secret number that no one else knows Public Key Cryptosystems - Diffie Hellman Get two parties to share a secret number that no one else knows Receiver Sender Public Key Cryptosystems - Diffie Hellman Get two parties to share a secret number that no one else knows Can only use an insecure communications channel for exchange Receiver Sender Attacker Public Key Cryptosystems - Diffie Hellman Get two parties to share a secret number that no one else knows Can only use an insecure communications channel for exchange p,g Receiver Sender p: prime & (p-1)/2 prime g: less than p n = gk mod p Attacker for all 0<n<p and some k (see http://gauss.ececs.uc.edu/Courses/c6053/lectures/Math/Group/group.html) Public Key Cryptosystems - Diffie Hellman A safe prime p: p = 2q + 1 where q is also prime Example: 479 = 2*239 + 1 If p is a safe prime, then p-1 has large prime factor, namely q. If all the factors of p-1 are less than logcp, then the problem of solving the discrete logarithm modulo p is in P (it©s easy). Therefore, for cryptosystems based on discrete logarithm (such as Diffie-Hellman) it is required that p-1 has at least one large prime factor. Public Key Cryptosystems - Diffie Hellman A strong prime p: p is large p-1 has large prime factors (p = aq+1 for integer a and prime q) q-1 has large prime factors (q = br+1 for integer b, prime r) p+1 has large prime factors.
    [Show full text]
  • 1 Multiplication
    CS 140 Class Notes 1 1 Multiplication Consider two unsigned binary numb ers X and Y . Wewanttomultiply these numb ers. The basic algorithm is similar to the one used in multiplying the numb ers on p encil and pap er. The main op erations involved are shift and add. Recall that the `p encil-and-pap er' algorithm is inecient in that each pro duct term obtained bymultiplying each bit of the multiplier to the multiplicand has to b e saved till all such pro duct terms are obtained. In machine implementations, it is desirable to add all such pro duct terms to form the partial product. Also, instead of shifting the pro duct terms to the left, the partial pro duct is shifted to the right b efore the addition takes place. In other words, if P is the partial pro duct i after i steps and if Y is the multiplicand and X is the multiplier, then P P + x Y i i j and 1 P P 2 i+1 i and the pro cess rep eats. Note that the multiplication of signed magnitude numb ers require a straight forward extension of the unsigned case. The magnitude part of the pro duct can b e computed just as in the unsigned magnitude case. The sign p of the pro duct P is computed from the signs of X and Y as 0 p x y 0 0 0 1.1 Two's complement Multiplication - Rob ertson's Algorithm Consider the case that we want to multiply two 8 bit numb ers X = x x :::x and Y = y y :::y .
    [Show full text]
  • Divide-And-Conquer Algorithms
    Chapter 2 Divide-and-conquer algorithms The divide-and-conquer strategy solves a problem by: 1. Breaking it into subproblems that are themselves smaller instances of the same type of problem 2. Recursively solving these subproblems 3. Appropriately combining their answers The real work is done piecemeal, in three different places: in the partitioning of problems into subproblems; at the very tail end of the recursion, when the subproblems are so small that they are solved outright; and in the gluing together of partial answers. These are held together and coordinated by the algorithm's core recursive structure. As an introductory example, we'll see how this technique yields a new algorithm for multi- plying numbers, one that is much more efficient than the method we all learned in elementary school! 2.1 Multiplication The mathematician Carl Friedrich Gauss (1777–1855) once noticed that although the product of two complex numbers (a + bi)(c + di) = ac bd + (bc + ad)i − seems to involve four real-number multiplications, it can in fact be done with just three: ac, bd, and (a + b)(c + d), since bc + ad = (a + b)(c + d) ac bd: − − In our big-O way of thinking, reducing the number of multiplications from four to three seems wasted ingenuity. But this modest improvement becomes very significant when applied recur- sively. 55 56 Algorithms Let's move away from complex numbers and see how this helps with regular multiplica- tion. Suppose x and y are two n-bit integers, and assume for convenience that n is a power of 2 (the more general case is hardly any different).
    [Show full text]
  • WHAT IS the BEST APPROACH to COUNTING PRIMES? Andrew Granville 1. How Many Primes Are There? Predictions You Have Probably Seen
    WHAT IS THE BEST APPROACH TO COUNTING PRIMES? Andrew Granville As long as people have studied mathematics, they have wanted to know how many primes there are. Getting precise answers is a notoriously difficult problem, and the first suitable technique, due to Riemann, inspired an enormous amount of great mathematics, the techniques and insights penetrating many different fields. In this article we will review some of the best techniques for counting primes, centering our discussion around Riemann's seminal paper. We will go on to discuss its limitations, and then recent efforts to replace Riemann's theory with one that is significantly simpler. 1. How many primes are there? Predictions You have probably seen a proof that there are infinitely many prime numbers. This was known to the ancient Greeks, and today there are many different proofs, using tools from all sorts of fields. Once one knows that there are infinitely many prime numbers then it is natural to ask whether we can give an exact formula, or at least a good approximation, for the number of primes up to a given point. Such questions were no doubt asked, since antiquity, by those people who thought about prime numbers. With the advent of tables of factorizations, it was possible to make predictions supported by lots of data. Figure 1 is a photograph of Chernac's Table of Factorizations of all integers up to one million, published in 1811.1 There are exactly one thousand pages of factorization tables in the book, each giving the factorizations of one thousand numbers. For example, Page 677, seen here, enables us to factor all numbers between 676000 and 676999.
    [Show full text]
  • Local Prime Factor Decomposition of Approximate Strong Product Graphs
    Local Prime Factor Decomposition of Approximate Strong Product Graphs Von der Fakultät für Mathematik und Informatik der Universität Leipzig angenommene DISSERTATION zur Erlangung des akademischen Grades DOCTOR RERUM NATURALIUM (Dr. rer. nat.) im Fachgebiet Informatik vorgelegt von Diplom Wirtschaftsmathematiker Marc Hellmuth geboren am 25. Juni 1980 in Nordhausen Die Annahme der Dissertation wurde empfohlen von: 1. Professor Dr. Peter F. Stadler (Leipzig, Deutschland) 2. Professor Dr. Sandi Klavžar (Ljubljana, Slowenien) Die Verleihung des akademischen Grades erfolgt mit Bestehen der Vertei- digung am 22.04.2010 mit dem Gesamtprädikat summa cum laude. Acknowledgements Let me thank you very much !! peter F. stadler and wilfried imrich paula werner kloeckl, daniel merkle, lydia gringmann, maribel hernandez-rosales, steve hoffmann, phil ostermeier, kon- stantin klemm, sven findeisz, and the entire beerinformatics community my family christine rahn, marlen pelny, dietrich becker, gilbert spiegel, min choe, and my old friends in nordhausen jens steuck and petra pregel josef leydold, manja marz, sonja prohaska, and martin middendorf ... and all other persons that are in the closed neighborhood of mine !! Abstract In practice, graphs often occur as perturbed product structures, so-called approximate graph products. The practical application of the well-known prime factorization algorithms is therefore limited, since most graphs are prime, although they can have a product-like structure. This work is concerned with the strong graph product. Since strong product graphs G contain subgraphs that are itself products of subgraphs of the underlying factors of G, we follow the idea to develop local approaches that cover a graph by factorizable patches and then use this information to derive the global factors.
    [Show full text]
  • Fast Integer Multiplication Using Modular Arithmetic ∗
    Fast Integer Multiplication Using Modular Arithmetic ∗ Anindya Dey Piyush P Kurur,z Computer Science Division Dept. of Computer Science and Engineering University of California at Berkeley Indian Institute of Technology Kanpur Berkeley, CA 94720, USA Kanpur, UP, India, 208016 [email protected] [email protected] Chandan Saha Ramprasad Saptharishix Dept. of Computer Science and Engineering Chennai Mathematical Institute Indian Institute of Technology Kanpur Plot H1, SIPCOT IT Park Kanpur, UP, India, 208016 Padur PO, Siruseri, India, 603103 [email protected] [email protected] Abstract ∗ We give an N · log N · 2O(log N) time algorithm to multiply two N-bit integers that uses modular arithmetic for intermediate computations instead of arithmetic over complex numbers as in F¨urer's algorithm, which also has the same and so far the best known complexity. The previous best algorithm using modular arithmetic (by Sch¨onhageand Strassen) has complexity O(N · log N · log log N). The advantage of using modular arithmetic as opposed to complex number arithmetic is that we can completely evade the task of bounding the truncation error due to finite approximations of complex numbers, which makes the analysis relatively simple. Our algorithm is based upon F¨urer'salgorithm, but uses FFT over multivariate polynomials along with an estimate of the least prime in an arithmetic progression to achieve this improvement in the modular setting. It can also be viewed as a p-adic version of F¨urer'salgorithm. 1 Introduction Computing the product of two N-bit integers is nearly a ubiquitous operation in algorithm design.
    [Show full text]
  • Chap03: Arithmetic for Computers
    CHAPTER 3 Arithmetic for Computers 3.1 Introduction 178 3.2 Addition and Subtraction 178 3.3 Multiplication 183 3.4 Division 189 3.5 Floating Point 196 3.6 Parallelism and Computer Arithmetic: Subword Parallelism 222 3.7 Real Stuff: x86 Streaming SIMD Extensions and Advanced Vector Extensions 224 3.8 Going Faster: Subword Parallelism and Matrix Multiply 225 3.9 Fallacies and Pitfalls 229 3.10 Concluding Remarks 232 3.11 Historical Perspective and Further Reading 236 3.12 Exercises 237 CMPS290 Class Notes (Chap03) Page 1 / 20 by Kuo-pao Yang 3.1 Introduction 178 Operations on integers o Addition and subtraction o Multiplication and division o Dealing with overflow Floating-point real numbers o Representation and operations x 3.2 Addition and Subtraction 178 Example: 7 + 6 Binary Addition o Figure 3.1 shows the sums and carries. The carries are shown in parentheses, with the arrows showing how they are passed. FIGURE 3.1 Binary addition, showing carries from right to left. The rightmost bit adds 1 to 0, resulting in the sum of this bit being 1 and the carry out from this bit being 0. Hence, the operation for the second digit to the right is 0 1 1 1 1. This generates a 0 for this sum bit and a carry out of 1. The third digit is the sum of 1 1 1 1 1, resulting in a carry out of 1 and a sum bit of 1. The fourth bit is 1 1 0 1 0, yielding a 1 sum and no carry.
    [Show full text]