Non-Restoring Integer Square Root: a Case Study in Design by Principled

Total Page:16

File Type:pdf, Size:1020Kb

Non-Restoring Integer Square Root: a Case Study in Design by Principled Non-Restoring Integer Square Ro ot: A Case Study in Design by Principled Optimization 1 1 2 1 John O'Leary , Miriam Leeser , Jason Hickey , Mark Aagaard 1 Scho ol Of Electrical Engineering Cornell University Ithaca, NY 14853 2 Department of Computer Science Cornell University Ithaca, NY 14853 Abstract. Theorem proving techniques are particularly well suited for reasoning ab out arithmetic ab ove the bit level and for relating di erent levels of abstraction. In this pap er we showhow a non-restoring integer square ro ot algorithm can b e transformed to a very ecient hardware implementation. The top level is a Standard ML function that op erates on unb ounded integers. The b ottom level is a structural description of the hardware consisting of an adder/subtracter, simple combinational logic and some registers. Lo oking at the hardware, it is not at all obvious what function the circuit implements. At the top level, we prove that the algorithm correctly implements the square ro ot function. We then show a series of optimizing transformations that re ne the top level algorithm into the hardware implementation. Each transformation can b e veri ed, and in places the transformations are motivated by knowledge ab out the op erands that we can guarantee through veri cation. By decomp osing the veri cation e ort into these transformations, we can show that the hardware design implements a square ro ot. Wehave implemented the algorithm in hardware b oth as an Altera programmable device and in full-custom CMOS. 1 Intro duction In this pap er we describ e the design, implementation and veri cation of a sub- tractive, non-restoring integer square ro ot algorithm. The top level description is a Standard ML function that implements the algorithm for unb ounded integers. The b ottom level is a highly optimized structural description of the hardware implementation. Due to the optimizations that have b een applied, it is very dif- cult to directly relate the circuit to the algorithmic description and to prove that the hardware implements the function correctly.We showhow the pro of can b e done by a series of transformations from the SML co de to the optimized structural description. At the top level, wehave used the Nuprl pro of development system [Lee92] to verify that the SML function correctly pro duces the square ro ot of the input. We then use Nuprl to verify that transformations to the implementation preserve the correctness of the initial algorithm. Intermediate levels use Hardware ML [OLLA93], a hardware description lan- guage based on Standard ML. Starting from a straightforward translation of the SML function into HML, a series of transformations are applied to obtain the hardware implementation. Some of these transformations are expressly con- cerned with optimization and rely on knowledge of the algorithm; these trans- formations can b e justi ed by proving prop erties of the top-level description. The hardware implementation is highly optimized: the core of the design is a single adder/subtracter. The rest of the datapath is registers, shift registers and combinational logic. The square ro ot of a 2n bit wide numb er requires n cycles through the datapath. Wehavetwo implementations of square ro ot chips based on this algorithm. The rst is done as a full-custom CMOS implementation; the second uses Altera EPLD technology. Both are based on a design previously published by Bannur and Varma [BV85]. Implementing and verifying the design from the pap er required clearing up a numb er of errors in the pap er and clarifying many details. This is a go o d case study for theorem proving techniques. At the top level, we reason ab out arithmetic op erations on unb ounded integers, a task theorem provers are esp ecially well suited for. Relating this to lower levels is easy to do using theorem proving based techniques. Many of the optimizations used are applicable only if very sp eci c conditions are satis ed by the op erands. Verifying that the conditions hold allows us to safely apply optimizations. Automated techniques such as those based on BDDs and mo del checking are not well-suited for verifying this and similar arithmetic circuits. It is dicult to come up with a Bo olean statement for the correctness of the outputs as a function of the inputs and to argue that this sp eci cation correctly describ es the intended b ehavior of the design. Similarly, sp eci cations required for mo del checkers are dicult to de ne for arithmetic circuits. There have b een several veri cations of hardware designs which lift the rea- soning ab out hardware to the level of integers, including the Sob el Image pro- cessing chip [NS88], and the factorial function [CGM86]. Our work di ers from these and similar e orts in that we justify the optimizations done in order to realize the square ro ot design. The DDD system [BJP93] is based on the idea of design byveri ed transformation, and was used to derive an implementation of the FM9001 micropro cessor. High level transformations in DDD are not veri ed by explicit use of theorem proving techniques. The most similar researchisVerkest's pro of of a non-restoring division algo- rithm [VCH94]. This pro of was also done by transforming a design description to an implementation. The top level of the division pro of involves consideration of several cases, while our top level pro of is done with a single lo op invariant. The two implementations vary as well: the division algorithm was implemented on an ALU, and the square ro ot on custom hardware. The algorithms and im- plementations are suciently similar that it would b e interesting to develop a single veri ed implementation that p erforms b oth divide and square ro ot based on the research in these two pap ers. The remainder of this pap er is organized as follows. In section 2 we describ e the top-level non-restoring square ro ot algorithm and its veri cation in the Nuprl pro of development system. We then transform this algorithm down to a level suitable for mo delling with a hardware description language. Section 3 presents a series of ve optimizing transformations that re ne the register transfer level description of the algorithm to the nal hardware implementation. In section 4 we summarize the lessons learned and our plans for future research. 2 The Non-Restoring Square Ro ot Algorithm p x where x is the radicand, y is the ro ot, An integer square ro ot calculates y = and b oth x and y are integers. We de ne the precise square ro ot ptobethe real valued square ro ot and the correct integer square ro ot to b e the o or of the precise ro ot. We can write the sp eci cation for the integer square ro ot as shown in De nition 1. De nition 1 Correct integer squareroot 2 2 y is the correct integer square ro ot of x =^ y x<y +1 Wehave implemented a subtractive, non-restoring integer square ro ot algo- 2n rithm [BV85]. For radicands in the range x = f0::2 1g, subtractive metho ds n1 b egin with an initial guess of y =2 and then iterate from i =n 1 :::0. In each iteration we square the partial ro ot y , subtract the squared partial ro ot from the radicand and revise the partial ro ot based on the sign of the result. There are two ma jor classes of algorithms: restoring and non-restoring [Flo63]. In restoring algorithms, we b egin with a partial ro ot for y = 0 and at the end of each iteration, y is never greater than the precise ro ot p. Within each iteration th 2 i, we set the i bit of y , and test if x y is negative; if it is, then setting the th th i bit made y to o big, so we reset the i bit and pro ceed to the next iteration. Non-restoring algorithms mo dify each bit p osition once rather than twice. th 2 Instead of setting the the i bit of y , testing if x y is p ositive, and then p ossibly th resetting the bit; the non-restoring algorithms add or subtract a 1 in the i bit 2 of y based on the sign of x y in the previous iteration. For binary arithmetic, the restoring algorithm is ecient to implement. However, most square ro ot hardware implementations use a higher radix, non-restoring implementation. For higher radix implementations, non-restoring algorithms result in more ecient hardware implementations. The results of the non-restoring algorithms do not satisfy our de nition of correct, while restoring algorithms do satisfy our de nition. The resulting value of y in the non-restoring algorithms mayhave an error in the last bit p osition. For the algorithm used here, we can show that the nal value of y will always b e either the precise ro ot for radicands which are p erfect squares or will b e o dd and b e within one of the correct ro ot. The error in non-restoring algorithms is easily b e corrected in a cleanup phase following the algorithm. Belowwe showhow a binary, non-restoring algorithm runs on some values for n =3. Note that the result is either exact or o dd. 2 x =18 = 10100 iterate 1 y = 100 x y =+ 10 2 2 iterate 2 y = 110 x y = iterate 3 y = 101 2 x =15 = 01111 iterate 1 y = 100 x y = 10 2 2 iterate 2 y = 010 x y =+ iterate 3 y = 011 2 x =16 = 10000 iterate 1 y = 100 x y =0 10 2 In our description of the non-restoring square ro ot algorithm, we de ne a datatyp e state that contains the state variables.
Recommended publications
  • The Division Algorithm We All Learned Division with Remainder At
    The Division Algorithm We all learned division with remainder at elementary school. Like 14 divided by 3 has reainder 2:14 3 4 2. In general we have the following Division Algorithm. Let n be any integer and d 0 be a positive integer. Then you can divide n by d with remainder. That is n q d r,0 ≤ r d where q and r are uniquely determined. Given n we determine how often d goes evenly into n. Say, if n 16 and d 3 then 3 goes 5 times into 16 but there is a remainder 1 : 16 5 3 1. This works for non-negative numbers. If n −16 then in order to get a positive remainder, we have to go beyond −16 : −16 −63 2. Let a and b be integers. Then we say that b divides a if there is an integer c such that a b c. We write b|a for b divides a Examples: n|0 for every n :0 n 0; in particular 0|0. 1|n for every n : n 1 n Theorem. Let a,b,c be any integers. (a) If a|b, and a|cthena|b c (b) If a|b then a|b c for any c. (c) If a|b and b|c then a|c. (d) If a|b and a|c then a|m b n c for any integers m and n. Proof. For (a) we note that b a s and c a t therefore b c a s a t a s t.Thus a b c.
    [Show full text]
  • Primality Testing for Beginners
    STUDENT MATHEMATICAL LIBRARY Volume 70 Primality Testing for Beginners Lasse Rempe-Gillen Rebecca Waldecker http://dx.doi.org/10.1090/stml/070 Primality Testing for Beginners STUDENT MATHEMATICAL LIBRARY Volume 70 Primality Testing for Beginners Lasse Rempe-Gillen Rebecca Waldecker American Mathematical Society Providence, Rhode Island Editorial Board Satyan L. Devadoss John Stillwell Gerald B. Folland (Chair) Serge Tabachnikov The cover illustration is a variant of the Sieve of Eratosthenes (Sec- tion 1.5), showing the integers from 1 to 2704 colored by the number of their prime factors, including repeats. The illustration was created us- ing MATLAB. The back cover shows a phase plot of the Riemann zeta function (see Appendix A), which appears courtesy of Elias Wegert (www.visual.wegert.com). 2010 Mathematics Subject Classification. Primary 11-01, 11-02, 11Axx, 11Y11, 11Y16. For additional information and updates on this book, visit www.ams.org/bookpages/stml-70 Library of Congress Cataloging-in-Publication Data Rempe-Gillen, Lasse, 1978– author. [Primzahltests f¨ur Einsteiger. English] Primality testing for beginners / Lasse Rempe-Gillen, Rebecca Waldecker. pages cm. — (Student mathematical library ; volume 70) Translation of: Primzahltests f¨ur Einsteiger : Zahlentheorie - Algorithmik - Kryptographie. Includes bibliographical references and index. ISBN 978-0-8218-9883-3 (alk. paper) 1. Number theory. I. Waldecker, Rebecca, 1979– author. II. Title. QA241.R45813 2014 512.72—dc23 2013032423 Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given.
    [Show full text]
  • Lesson 8: the Long Division Algorithm
    NYS COMMON CORE MATHEMATICS CURRICULUM Lesson 8 8•7 Lesson 8: The Long Division Algorithm Student Outcomes . Students explore a variation of the long division algorithm. Students discover that every rational number has a repeating decimal expansion. Lesson Notes In this lesson, students move toward being able to define an irrational number by first noting the decimal structure of rational numbers. Classwork Example 1 (5 minutes) Scaffolding: There is no single long division Example 1 algorithm. The algorithm commonly taught and used in ퟐퟔ Show that the decimal expansion of is ퟔ. ퟓ. the U.S. is rarely used ퟒ elsewhere. Students may come with earlier experiences Use the example with students so they have a model to complete Exercises 1–5. with other division algorithms that make more sense to them. 26 . Show that the decimal expansion of is 6.5. Consider using formative 4 assessment to determine how Students might use the long division algorithm, or they might simply different students approach 26 13 observe = = 6.5. long division. 4 2 . Here is another way to see this: What is the greatest number of groups of 4 that are in 26? MP.3 There are 6 groups of 4 in 26. Is there a remainder? Yes, there are 2 left over. This means we can write 26 as 26 = 6 × 4 + 2. Lesson 8: The Long Division Algorithm 104 This work is derived from Eureka Math ™ and licensed by Great Minds. ©2015 Great Minds. eureka-math.org This work is licensed under a This file derived from G8-M7-TE-1.3.0-10.2015 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
    [Show full text]
  • Long Division
    Long Division Short description: Learn how to use long division to divide large numbers in this Math Shorts video. Long description: In this video, learn how to use long division to divide large numbers. In the accompanying classroom activity, students learn how to use the standard long division algorithm. They explore alternative ways of thinking about long division by using place value to decompose three-digit numbers into hundreds, tens, and ones, creating three smaller (and less complex) division problems. Students then use these different solution methods in a short partner game. Activity Text Learning Outcomes Students will be able to ● use multiple methods to solve a long division problem Common Core State Standards: 6.NS.B.2 Vocabulary: Division, algorithm, quotients, decompose, place value, dividend, divisor Materials: Number cubes Procedure 1. Introduction (10 minutes, whole group) Begin class by posing the division problem 245 ÷ 5. But instead of showing students the standard long division algorithm, rewrite 245 as 200 + 40 + 5 and have students try to solve each of the simpler problems 200 ÷ 5, 40 ÷ 5, and 5 ÷ 5. Record the different quotients as students solve the problems. Show them that the quotients 40, 8, and 1 add up to 49—and that 49 is also the quotient of 245 ÷ 5. Learning to decompose numbers by using place value concepts is an important piece of thinking flexibly about the four basic operations. Repeat this type of logic with the problem that students will see in the video: 283 ÷ 3. Use the concept of place value to rewrite 283 as 200 + 80 + 3 and have students try to solve the simpler problems 200 ÷ 3, 80 ÷ 3, and 3 ÷ 3.
    [Show full text]
  • "In-Situ": a Case Study of the Long Division Algorithm
    Australian Journal of Teacher Education Volume 43 | Issue 3 Article 1 2018 Teacher Knowledge and Learning In-situ: A Case Study of the Long Division Algorithm Shikha Takker Homi Bhabha Centre for Science Education, TIFR, India, [email protected] K. Subramaniam Homi Bhabha Centre for Science Education, TIFR, India, [email protected] Recommended Citation Takker, S., & Subramaniam, K. (2018). Teacher Knowledge and Learning In-situ: A Case Study of the Long Division Algorithm. Australian Journal of Teacher Education, 43(3). Retrieved from http://ro.ecu.edu.au/ajte/vol43/iss3/1 This Journal Article is posted at Research Online. http://ro.ecu.edu.au/ajte/vol43/iss3/1 Australian Journal of Teacher Education Teacher Knowledge and Learning In-situ: A Case Study of the Long Division Algorithm Shikha Takker K. Subramaniam Homi Bhabha Centre for Science Education, TIFR, Mumbai, India Abstract: The aim of the study reported in this paper was to explore and enhance experienced school mathematics teachers’ knowledge of students’ thinking, as it is manifested in practice. Data were collected from records of classroom observations, interviews with participating teachers, and weekly teacher-researcher meetings organized in the school. In this paper, we discuss the mathematical challenges faced by a primary school teacher as she attempts to unpack the structure of the division algorithm, while teaching in a Grade 4 classroom. Through this case study, we exemplify how a focus on mathematical knowledge for teaching ‘in situ’ helped in triggering a change in the teacher’s well-formed knowledge and beliefs about the teaching and learning of the division algorithm, and related students’ capabilities.
    [Show full text]
  • 16. the Division Algorithm Note That If F(X) = G(X)H(X) Then Α Is a Zero of F(X) If and Only If Α Is a Zero of One of G(X) Or H(X)
    16. The division algorithm Note that if f(x) = g(x)h(x) then α is a zero of f(x) if and only if α is a zero of one of g(x) or h(x). It is very useful therefore to write f(x) as a product of polynomials. What we need to understand is how to divide polynomials: Theorem 16.1 (Division Algorithm). Let n n−1 X i f(x) = anx + an−1x + ··· + a1x + a0 = aix m m−1 X i g(x) = bmx + bm−1x + ··· + b1x + b0 = bix be two polynomials over a field F of degrees n and m > 0. Then there are unique polynomials q(x) and r(x) 2 F [x] such that f(x) = q(x)g(x) + r(x) and either r(x) = 0 or the degree of r(x) is less than the degree m of g(x). Proof. We proceed by induction on the degree n of f(x). If the degree n of f(x) is less than the degree m of g(x), there is nothing to prove, take q(x) = 0 and r(x) = f(x). Suppose the result holds for all degrees less than the degree n of f(x). n−m Put q0(x) = cx , where c = an=bm. Let f1(x) = f(x) − q1(x)g(x). Then f1(x) has degree less than g. By induction then, f1(x) = q1(x)g(x) + r(x); where r(x) has degree less than g(x). It follows that f(x) = f1(x) + q0(x)f(x) = (q0(x) + q1(x))f(x) + r(x) = q(x)f(x) + r(x); where q(x) = q0(x) + q1(x).
    [Show full text]
  • Chap03: Arithmetic for Computers
    CHAPTER 3 Arithmetic for Computers 3.1 Introduction 178 3.2 Addition and Subtraction 178 3.3 Multiplication 183 3.4 Division 189 3.5 Floating Point 196 3.6 Parallelism and Computer Arithmetic: Subword Parallelism 222 3.7 Real Stuff: x86 Streaming SIMD Extensions and Advanced Vector Extensions 224 3.8 Going Faster: Subword Parallelism and Matrix Multiply 225 3.9 Fallacies and Pitfalls 229 3.10 Concluding Remarks 232 3.11 Historical Perspective and Further Reading 236 3.12 Exercises 237 CMPS290 Class Notes (Chap03) Page 1 / 20 by Kuo-pao Yang 3.1 Introduction 178 Operations on integers o Addition and subtraction o Multiplication and division o Dealing with overflow Floating-point real numbers o Representation and operations x 3.2 Addition and Subtraction 178 Example: 7 + 6 Binary Addition o Figure 3.1 shows the sums and carries. The carries are shown in parentheses, with the arrows showing how they are passed. FIGURE 3.1 Binary addition, showing carries from right to left. The rightmost bit adds 1 to 0, resulting in the sum of this bit being 1 and the carry out from this bit being 0. Hence, the operation for the second digit to the right is 0 1 1 1 1. This generates a 0 for this sum bit and a carry out of 1. The third digit is the sum of 1 1 1 1 1, resulting in a carry out of 1 and a sum bit of 1. The fourth bit is 1 1 0 1 0, yielding a 1 sum and no carry.
    [Show full text]
  • The Formal Division Algorithm
    Elementary Mathematics Using real-life contexts: the formal division algorithm Teacher Education through School-based Support in India www.TESS-India.edu.in http://creativecommons.org/licenses/ TESS-India (Teacher Education through School-based Support) aims to improve the classroom practices of elementary and secondary teachers in India through the provision of Open Educational Resources (OERs) to support teachers in developing student-centred, participatory approaches. The TESS-India OERs provide teachers with a companion to the school textbook. They offer activities for teachers to try out in their classrooms with their students, together with case studies showing how other teachers have taught the topic and linked resources to support teachers in developing their lesson plans and subject knowledge. TESS-India OERs have been collaboratively written by Indian and international authors to address Indian curriculum and contexts and are available for online and print use (http://www.tess-india.edu.in/). The OERs are available in several versions, appropriate for each participating Indian state and users are invited to adapt and localise the OERs further to meet local needs and contexts. TESS-India is led by The Open University UK and funded by UK aid from the UK government. Video resources Some of the activities in this unit are accompanied by the following icon: . This indicates that you will find it helpful to view the TESS-India video resources for the specified pedagogic theme. The TESS-India video resources illustrate key pedagogic techniques in a range of classroom contexts in India. We hope they will inspire you to experiment with similar practices.
    [Show full text]
  • Fast Division of Large Integers --- a Comparison of Algorithms
    NADA Numerisk analys och datalogi Department of Numerical Analysis Kungl Tekniska Högskolan and Computer Science 100 44 STOCKHOLM Royal Institute of Technology SE-100 44 Stockholm, SWEDEN Fast Division of Large Integers A Comparison of Algorithms Karl Hasselström [email protected] TRITA-InsertNumberHere Master’s Thesis in Computer Science (20 credits) at the School of Computer Science and Engineering, Royal Institute of Technology, February 2003 Supervisor at Nada was Stefan Nilsson Examiner was Johan Håstad Abstract I have investigated, theoretically and experimentally, under what circumstances Newton division (inversion of the divisor with Newton’s method, followed by di- vision with Barrett’s method) is the fastest algorithm for integer division. The competition mainly consists of a recursive algorithm by Burnikel and Ziegler. For divisions where the dividend has twice as many bits as the divisor, Newton division is asymptotically fastest if multiplication of two n-bit integers can be done in time O(nc) for c<1.297, which is the case in both theory and practice. My implementation of Newton division (using subroutines from GMP, the GNU Mul- tiple Precision Arithmetic Library) is faster than Burnikel and Ziegler’s recursive algorithm (a part of GMP) for divisors larger than about five million bits (one and a half million decimal digits), on a standard PC. Snabb division av stora heltal En jämförelse av algoritmer Sammanfattning Jag har undersökt, ur både teoretiskt och praktiskt perspektiv, under vilka omstän- digheter Newtondivision (invertering av nämnaren med Newtons metod, följd av division med Barretts metod) är den snabbaste algoritmen för heltalsdivision. Den främsta konkurrenten är en rekursiv algoritm av Burnikel och Ziegler.
    [Show full text]
  • 13 Algorithms for Multiplication and Divi- Sion of Whole Numbers
    13 Algorithms for Multiplication and Divi- sion of Whole Numbers In this section, we discuss algorithms of whole numbers' multiplication and division. Algorithms for Whole Numbers Multiplication Similar to addition and subtraction, a developemnt of our standard mul- tiplication algorithm is shown in Figure 13.1. Figure 13.1 Whole number properties help justify the standard procedure: 34 × 2 = (30 + 4) × 2 Expanded notation = (30 × 2) + (4 × 2) Distributivity = 60 + 8 multiplication = 68 addition Example 13.1 Perform 35 × 26 using the expanded algorithm. 1 Solution. Lattice Multiplication Figure 13.2 illustrates the steps of this algorithm in computing 27 × 34: Figure 13.2 2 The Russian Peasant Algorithm This algorithm employs halving and doubling. Remainders are ignored when halving. The algorithm for 33 × 47 is shown in Figure 13.3. Figure 13.3 Practice Problems Problem 13.1 (a) Compute 83 × 47 with the expanded algorithm. (b) Compute 83 × 47 with the standard algorithm. (c) What are the advantages and disadvantages of each algorithm? Problem 13.2 Suppose you want to introduce a fourth grader to the standard algorithm for computing 24 × 4: Explain how to find the product with base-ten blocks. Draw a picture. Problem 13.3 In multiplying 62 × 3, we use the fact that (60 + 2) × 3 = (60 × 3) + (2 × 3): What property does this equation illustrate? Problem 13.4 (a) Compute 46 × 29 with lattice multiplication. (b) Compute 234 × 76 with lattice multiplication. 3 Problem 13.5 Show two other ways besides the standard algorithm to compute 41 × 26: Problem 13.6 Four fourth graders work out 32 × 15.
    [Show full text]
  • The Euclidean Algorithm
    The Euclidean Algorithm The Euclidean algorithm is one of the oldest known algorithms (it appears in Euclid's Elements) yet it is also one of the most important, even today. Not only is it fundamental in mathematics, but it also has important appli- cations in computer security and cryptography. The algorithm provides an extremely fast method to compute the greatest common divisor (gcd) of two integers. Definition. Let a; b be two integers. A common divisor of the pair a; b is any integer d such that d j a and d j b. Reminder: To say that d j a means that 9c 2 Z such that a = d · c. I.e., to say that d j a means that a is an integral multiple of d. Example. The common divisors of the pair 12; 150 include ±1; ±2; ±3; ±6. These are ALL the common divisors of this pair of integers. Question: How can we be sure there aren't any others? • Divisors of 12 are ±1; ±2; ±3; ±4; ±6; ±12 and no others. • Divisors of 150 are ±1; ±2; ±3, ±5, ±6, ±10, ±15, ±25, ±30, ±50, ±75, ±150 and no others. • Now take the intersection of the two sets to get the common divisors. Definition. The greatest common divisor (written as gcd(a; b)) of a pair a; b of integers is the biggest of the common divisors. In other words, the greatest common divisor of the pair a; b is the maximum element of the set of common divisors of a; b. Example. From our previous example, we know the set of common divisors of the pair 12; 150 is the set {±1; ±2; ±3; ±6g.
    [Show full text]
  • A High-Speed Division Algorithm for Modular Numbers Based on the Chinese Remainder Theorem with Fractions and Its Hardware Implementation
    electronics Article A High-Speed Division Algorithm for Modular Numbers Based on the Chinese Remainder Theorem with Fractions and Its Hardware Implementation Nikolai Chervyakov, Pavel Lyakhov , Mikhail Babenko, Anton Nazarov, Maxim Deryabin *, Irina Lavrinenko and Anton Lavrinenko Department of Applied Mathematics and Mathematical Modeling, North-Caucasus Federal University, Stavropol 355009, Russia; [email protected] (N.C.); [email protected] (P.L.); [email protected] (M.B.); [email protected] (A.N.); [email protected] (I.L.); [email protected] (A.L.) * Correspondence: [email protected]; Tel.: +7-919-734-8307 Received: 29 January 2019; Accepted: 21 February 2019; Published: 27 February 2019 Abstract: In this paper, a new simplified iterative division algorithm for modular numbers that is optimized on the basis of the Chinese remainder theorem (CRT) with fractions is developed. It requires less computational resources than the CRT with integers and mixed radix number systems (MRNS). The main idea of the algorithm is (a) to transform the residual representation of the dividend and divisor into a weighted fixed-point code and (b) to find the higher power of 2 in the divisor written in a residue number system (RNS). This information is acquired using the CRT with fractions: higher power is defined by the number of zeros standing before the first significant digit. All intermediate calculations of the algorithm involve the operations of right shift and subtraction, which explains its good performance. Due to the abovementioned techniques, the algorithm has higher speed and consumes less computational resources, thereby being more appropriate for the multidigit division of modular numbers than the algorithms described earlier.
    [Show full text]