Parallel Implementation of Multiple-Precision Arithmetic and 2,576,980,370,000 Decimal Digits of Π Calculation

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Implementation of Multiple-Precision Arithmetic and 2,576,980,370,000 Decimal Digits of Π Calculation View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Tsukuba Repository Parallel implementation of multiple-precision arithmetic and 2,576,980,370,000 decimal digits of π calculation 著者 Takahashi Daisuke journal or Parallel computing publication title volume 36 number 8 page range 439-448 year 2010-08 権利 (C) 2010 Elsevier B.V. URL http://hdl.handle.net/2241/106066 doi: 10.1016/j.parco.2010.02.007 Parallel Implementation of Multiple-Precision Arithmetic and 2; 576; 980; 370; 000 Decimal Digits of π Calculation Daisuke Takahashi Graduate School of Systems and Information Engineering, University of Tsukuba 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan Abstract We present efficient parallel algorithms for multiple-precision arithmetic operations of more than several million decimal digits on distributed-memory parallel com- puters. A parallel implementation of floating-point real FFT-based multiplication is used, since the key operation for fast multiple-precision arithmetic is multiplication. The operation for releasing propagated carries and borrows in multiple-precision ad- dition, subtraction and multiplication was also parallelized. More than 2.576 trillion decimal digits of π were computed on 640 nodes of Appro Xtreme-X3 (648 nodes, 147.2 GFlops/node, 95.4 TFlops peak performance) with a computing elapsed time of 73 hours 36 minutes which includes the time required for verification. Key words: multiple-precision arithmetic, fast Fourier transform, distributed-memory parallel computer 1 Introduction Several software packages are available for multiple-precision computation [1{ 4]. At present, GNU MP [3] is probably the most widely used package due to its greater functionality and efficiency. Using GMP 4.2 with sufficient memory, it should be possible to compute up to 41 billion digits [3]. In 2009, Bellard computed π up to about 2.7 trillion digits in about 131 days using an Intel Core i7 PC [5]. However, parallel processing using a distributed-memory parallel computer is required to calculate further digits in a reasonable amount of time. Email address: [email protected] (Daisuke Takahashi). Preprint submitted to Elsevier Science 1 February 2010 Parallel implementation of Karatsuba's multiplication algorithm [6,7] on a distributed-memory parallel computer was proposed [8]. Karatsuba's algo- rithm is known as the O(nlog2 3) multiplication algorithm. However, multiple-precision multiplication of n-digit numbers can be per- formed in O(n log n log log n) operations by using the Sch¨onhage-Strassenal- gorithm [9], which is based on the fast Fourier transform (FFT). In multiple-precision multiplication of more than several thousand decimal dig- its, FFT-based multiplication is the fastest method. FFT-based multiplication algorithms are known to be good candidates for parallel implementation. The Fermat number transform (FNT) for large integer multiplication was performed on the Connection Machine CM-2 [10]. On the other hand, the number theoretic transform (NTT) uses many modulo operations, which are slow due to the integer division process. Thus, floating-point real FFT-based multiplication was used for multiple-precision multiplication on distributed- memory parallel computers. p Parallel computation of 2 up to 1 million decimal digits was performed on a network of workstations [11]. However, multiple-precision parallel division was not presented in this paper, and a parallel version of Karatsuba's multi- plication algorithm was used. Section 2 describes the parallelization of multiple-precision addition, subtrac- tion and multiplication. Section 3 describes the parallelization of multiple- precision division and square root operations. Section 4 presents the experi- mental results. Section 5 describes the calculation of π to 2; 576; 980; 370; 000 decimal digits on a distributed-memory parallel computer. Finally, section 6 presents some concluding remarks. 2 Parallelization of Multiple-Precision Addition, Subtraction, and Multiplication 2.1 Parallelization of Multiple-Precision Addition, Subtraction Let us consider an n-digit number X with radix-B. nX−1 i X = xiB ; (1) i=0 where 0 ≤ xi < B. 2 Algorithm 1 Sequential Addition P P Input: X = n−1 x Bi;Y = n−1 y Bi i=0 i P i=0 i n−1 i Output: Z = X + Y := i=0 ziB 1: c 0 2: for i = 0 to n − 2 do 3: w xi + yi + c 4: c w div B 5: zi w mod B 6: zn−1 xn−1 + yn−1 + c 7: return Z. Fig. 1. Multiple-precision sequential addition In the case of block distribution, the n-digit multiple-precision numbers are distributed across all P processors. The corresponding index at processor m (0 ≤ m ≤ P − 1) is denoted as i (m = bi=dn=P ec) in equation (1). In the case of cyclic distribution, the n-digit multiple-precision numbers are also distributed across all P processors. The corresponding index at processor m (0 ≤ m ≤ P − 1) is denoted as i (m = i mod P ) in equation (1). The arithmetic operation counts for n-digit multiple-precision sequential ad- ditions, subtractions is clearly O(n). Also, the arithmetic operation counts for n-digit multiple-precision sequential multiplication by a single-precision inte- ger is O(n). However, releasing the carries and borrows in these operations is a major factor that can prevent parallelization. For example, a pseudo code of multiple-precision sequential addition is shown in Fig. 1. Here, c is a variable to store the carry, w is a temporary variable and B is the radix of the multiple-precision numbers. We assume that the input data has been normalized from 0 to B − 1 and is stored in arrays X and Y . Since in this program, at line 3, the value of c is used recurrently to deter- mine the value of w, the algorithm can not be parallelized because of data dependency. Fig. 2 shows an algorithm that allows the parallelization of addition by creating a method for the releasing of the carries. We assume that the input data has been normalized from 0 to B − 1 and is stored in arrays X and Y . Multiple-precision addition without the propagation of carries is performed at lines 1 and 2. While the maximum value of zi (0 ≤ i ≤ n − 2) is greater than or equal to B, the carries are computed in lines 5 and 6, then the carries are released in lines 7 to 9. The maximum value can be evaluated in parallel easily. Here, ci (0 ≤ i ≤ n − 1) is a working array to store carries. 3 Algorithm 2 Parallel Addition P P Input: X = n−1 x Bi;Y = n−1 y Bi i=0 i P i=0 i n−1 i Output: Z = X + Y := i=0 ziB 1: for i = 0 to n − 1 do in parallel 2: zi xi + yi 3: while max (zi) ≥ B do 0≤i≤n−2 4: c0 0 5: for i = 0 to n − 2 do in parallel 6: ci+1 zi div B 7: for i = 0 to n − 2 do in parallel 8: zi (zi mod B) + ci 9: zn−1 zn−1 + cn−1 10: return Z. Fig. 2. Multiple-precision parallel addition In line 6, communication is required to send the carries to the neighbor proces- sor. The amount of communication is O(1) in the block distribution, whereas the amount of communication is O(n=P ) in the cyclic distribution on parallel computers that have P processors. In this algorithm, it is necessary to repeat the normalization until all car- ries have been released. For random inputs, this process is performed usu- ally few times. When the propagation of carries repeats, as in the case of 0:99999999 ··· 9 + 0:00000000 ··· 1, we have to use the carry skip method [12]. A pseudo code for multiple-precision parallel addition with the carry skip method is shown in Fig. 3. We assume that the input data has been normalized from 0 to B − 1 and is stored in array X and Y . Here, ci (0 ≤ i ≤ n − 1) is a working array to store carries. Multiple-precision addition without the propagation of carries is performed at lines 1 and 2. Then, incomplete normalization is performed from 0 to B in lines 3 to 9. Note that the while loop in lines 3 to 9 is repeated at most twice. The range for carry skipping is determined in lines 11 to 22. Finally, carry skipping is performed in lines 23 to 26. Because carry skips occur intermittently in the case of 0:998998998 ··· + 0:0100100100 ···, this is the one of the worst case of the carry skip method. For the worst case, the carry look-ahead method is effective. However, the carry look-ahead method requires O(log P ) steps on parallel computers that have P processors. Since we assumed that such a worst case occur rarely, the carry skip method was used for multiple-precision parallel addition and subtraction. 4 Algorithm 3 Parallel Addition with Carry Skip Method P P Input: X = n−1 x Bi;Y = n−1 y Bi i=0 i P i=0 i n−1 i Output: Z = X + Y := i=0 ziB 1: for i = 0 to n − 1 do in parallel 2: zi xi + yi 3: while max (zi) > B do 0≤i≤n−2 4: c0 0 5: for i = 0 to n − 2 do in parallel 6: ci+1 zi div B 7: for i = 0 to n − 2 do in parallel 8: zi (zi mod B) + ci 9: zn−1 zn−1 + cn−1 10: while max (zi) = B do 0≤i≤n−2 11: for i = 0 to n − 1 do in parallel 12: if (zi = B) then 13: pi i 14: else 15: pi n − 1 16: l min (pi) 0≤i≤n−1 17: for i = l + 1 to n − 1 do in parallel 18: if (zi < B − 1) then 19: pi i 20: else 21: pi n − 1 22: m min (pi) l+1≤i≤n−1 23: zl zl − B 24: for i = l + 1 to m − 1 do in parallel 25: zi zi − (B − 1) 26: zm zm + 1 27: return Z.
Recommended publications
  • Mathematics Calculation Policy ÷
    Mathematics Calculation Policy ÷ Commissioned by The PiXL Club Ltd. June 2016 This resource is strictly for the use of member schools for as long as they remain members of The PiXL Club. It may not be copied, sold nor transferred to a third party or used by the school after membership ceases. Until such time it may be freely used within the member school. All opinions and contributions are those of the authors. The contents of this resource are not connected with nor endorsed by any other company, organisation or institution. © Copyright The PiXL Club Ltd, 2015 About PiXL’s Calculation Policy • The following calculation policy has been devised to meet requirements of the National Curriculum 2014 for the teaching and learning of mathematics, and is also designed to give pupils a consistent and smooth progression of learning in calculations across the school. • Age stage expectations: The calculation policy is organised according to age stage expectations as set out in the National Curriculum 2014 and the method(s) shown for each year group should be modelled to the vast majority of pupils. However, it is vital that pupils are taught according to the pathway that they are currently working at and are showing to have ‘mastered’ a pathway before moving on to the next one. Of course, pupils who are showing to be secure in a skill can be challenged to the next pathway as necessary. • Choosing a calculation method: Before pupils opt for a written method they should first consider these steps: Should I use a formal Can I do it in my Could I use
    [Show full text]
  • Computation of 2700 Billion Decimal Digits of Pi Using a Desktop Computer
    Computation of 2700 billion decimal digits of Pi using a Desktop Computer Fabrice Bellard Feb 11, 2010 (4th revision) This article describes some of the methods used to get the world record of the computation of the digits of π digits using an inexpensive desktop computer. 1 Notations We assume that numbers are represented in base B with B = 264. A digit in base B is called a limb. M(n) is the time needed to multiply n limb numbers. We assume that M(Cn) is approximately CM(n), which means M(n) is mostly linear, which is the case when handling very large numbers with the Sch¨onhage-Strassen multiplication [5]. log(n) means the natural logarithm of n. log2(n) is log(n)/ log(2). SI and binary prefixes are used (i.e. 1 TB = 1012 bytes, 1 GiB = 230 bytes). 2 Evaluation of the Chudnovsky series 2.1 Introduction The digits of π were computed using the Chudnovsky series [10] ∞ 1 X (−1)n(6n)!(A + Bn) = 12 π (n!)3(3n)!C3n+3/2 n=0 with A = 13591409 B = 545140134 C = 640320 . It was evaluated with the binary splitting algorithm. The asymptotic running time is O(M(n) log(n)2) for a n limb result. It is worst than the asymptotic running time of the Arithmetic-Geometric Mean algorithms of O(M(n) log(n)) but it has better locality and many improvements can reduce its constant factor. Let S be defined as n2 n X Y pk S(n1, n2) = an . qk n=n1+1 k=n1+1 1 We define the auxiliary integers n Y2 P (n1, n2) = pk k=n1+1 n Y2 Q(n1, n2) = qk k=n1+1 T (n1, n2) = S(n1, n2)Q(n1, n2).
    [Show full text]
  • Course Notes 1 1.1 Algorithms: Arithmetic
    CS 125 Course Notes 1 Fall 2016 Welcome to CS 125, a course on algorithms and computational complexity. First, what do these terms means? An algorithm is a recipe or a well-defined procedure for performing a calculation, or in general, for transform- ing some input into a desired output. In this course we will ask a number of basic questions about algorithms: • Does the algorithm halt? • Is it correct? That is, does the algorithm’s output always satisfy the input to output specification that we desire? • Is it efficient? Efficiency could be measured in more than one way. For example, what is the running time of the algorithm? What is its memory consumption? Meanwhile, computational complexity theory focuses on classification of problems according to the com- putational resources they require (time, memory, randomness, parallelism, etc.) in various computational models. Computational complexity theory asks questions such as • Is the class of problems that can be solved time-efficiently with a deterministic algorithm exactly the same as the class that can be solved time-efficiently with a randomized algorithm? • For a given class of problems, is there a “complete” problem for the class such that solving that one problem efficiently implies solving all problems in the class efficiently? • Can every problem with a time-efficient algorithmic solution also be solved with extremely little additional memory (beyond the memory required to store the problem input)? 1.1 Algorithms: arithmetic Some algorithms very familiar to us all are those those for adding and multiplying integers. We all know the grade school algorithm for addition from kindergarten: write the two numbers on top of each other, then add digits right 1-1 1-2 1 7 8 × 2 1 3 5 3 4 1 7 8 +3 5 6 3 7 914 Figure 1.1: Grade school multiplication.
    [Show full text]
  • An Arbitrary Precision Computation Package David H
    ARPREC: An Arbitrary Precision Computation Package David H. Bailey, Yozo Hida, Xiaoye S. Li and Brandon Thompson1 Lawrence Berkeley National Laboratory, Berkeley, CA 94720 Draft Date: 2002-10-14 Abstract This paper describes a new software package for performing arithmetic with an arbitrarily high level of numeric precision. It is based on the earlier MPFUN package [2], enhanced with special IEEE floating-point numerical techniques and several new functions. This package is written in C++ code for high performance and broad portability and includes both C++ and Fortran-90 translation modules, so that conventional C++ and Fortran-90 programs can utilize the package with only very minor changes. This paper includes a survey of some of the interesting applications of this package and its predecessors. 1. Introduction For a small but growing sector of the scientific computing world, the 64-bit and 80-bit IEEE floating-point arithmetic formats currently provided in most computer systems are not sufficient. As a single example, the field of climate modeling has been plagued with non-reproducibility, in that climate simulation programs give different results on different computer systems (or even on different numbers of processors), making it very difficult to diagnose bugs. Researchers in the climate modeling community have assumed that this non-reproducibility was an inevitable conse- quence of the chaotic nature of climate simulations. Recently, however, it has been demonstrated that for at least one of the widely used climate modeling programs, much of this variability is due to numerical inaccuracy in a certain inner loop. When some double-double arithmetic rou- tines, developed by the present authors, were incorporated into this program, almost all of this variability disappeared [12].
    [Show full text]
  • Neural Correlates of Arithmetic Calculation Strategies
    Cognitive, Affective, & Behavioral Neuroscience 2009, 9 (3), 270-285 doi:10.3758/CABN.9.3.270 Neural correlates of arithmetic calculation strategies MIRIA M ROSENBERG -LEE Stanford University, Palo Alto, California AND MARSHA C. LOVETT AND JOHN R. ANDERSON Carnegie Mellon University, Pittsburgh, Pennsylvania Recent research into math cognition has identified areas of the brain that are involved in number processing (Dehaene, Piazza, Pinel, & Cohen, 2003) and complex problem solving (Anderson, 2007). Much of this research assumes that participants use a single strategy; yet, behavioral research finds that people use a variety of strate- gies (LeFevre et al., 1996; Siegler, 1987; Siegler & Lemaire, 1997). In the present study, we examined cortical activation as a function of two different calculation strategies for mentally solving multidigit multiplication problems. The school strategy, equivalent to long multiplication, involves working from right to left. The expert strategy, used by “lightning” mental calculators (Staszewski, 1988), proceeds from left to right. The two strate- gies require essentially the same calculations, but have different working memory demands (the school strategy incurs greater demands). The school strategy produced significantly greater early activity in areas involved in attentional aspects of number processing (posterior superior parietal lobule, PSPL) and mental representation (posterior parietal cortex, PPC), but not in a numerical magnitude area (horizontal intraparietal sulcus, HIPS) or a semantic memory retrieval area (lateral inferior prefrontal cortex, LIPFC). An ACT–R model of the task successfully predicted BOLD responses in PPC and LIPFC, as well as in PSPL and HIPS. A central feature of human cognition is the ability to strategy use) and theoretical (parceling out the effects of perform complex tasks that go beyond direct stimulus– strategy-specific activity).
    [Show full text]
  • A Scalable System-On-A-Chip Architecture for Prime Number Validation
    A SCALABLE SYSTEM-ON-A-CHIP ARCHITECTURE FOR PRIME NUMBER VALIDATION Ray C.C. Cheung and Ashley Brown Department of Computing, Imperial College London, United Kingdom Abstract This paper presents a scalable SoC architecture for prime number validation which targets reconfigurable hardware This paper presents a scalable SoC architecture for prime such as FPGAs. In particular, users are allowed to se- number validation which targets reconfigurable hardware. lect predefined scalable or non-scalable modular opera- The primality test is crucial for security systems, especially tors for their designs [4]. Our main contributions in- for most public-key schemes. The Rabin-Miller Strong clude: (1) Parallel designs for Montgomery modular arith- Pseudoprime Test has been mapped into hardware, which metic operations (Section 3). (2) A scalable design method makes use of a circuit for computing Montgomery modu- for mapping the Rabin-Miller Strong Pseudoprime Test lar exponentiation to further speed up the validation and to into hardware (Section 4). (3) An architecture of RAM- reduce the hardware cost. A design generator has been de- based Radix-2 Scalable Montgomery multiplier (Section veloped to generate a variety of scalable and non-scalable 4). (4) A design generator for producing hardware prime Montgomery multipliers based on user-defined parameters. number validators based on user-specified parameters (Sec- The performance and resource usage of our designs, im- tion 5). (5) Implementation of the proposed hardware ar- plemented in Xilinx reconfigurable devices, have been ex- chitectures in FPGAs, with an evaluation of its effective- plored using the embedded PowerPC processor and the soft ness compared with different size and speed tradeoffs (Sec- MicroBlaze processor.
    [Show full text]
  • Banner Human Resources and Position Control / User Guide
    Banner Human Resources and Position Control User Guide 8.14.1 and 9.3.5 December 2017 Notices Notices © 2014- 2017 Ellucian Ellucian. Contains confidential and proprietary information of Ellucian and its subsidiaries. Use of these materials is limited to Ellucian licensees, and is subject to the terms and conditions of one or more written license agreements between Ellucian and the licensee in question. In preparing and providing this publication, Ellucian is not rendering legal, accounting, or other similar professional services. Ellucian makes no claims that an institution's use of this publication or the software for which it is provided will guarantee compliance with applicable federal or state laws, rules, or regulations. Each organization should seek legal, accounting, and other similar professional services from competent providers of the organization's own choosing. Ellucian 2003 Edmund Halley Drive Reston, VA 20191 United States of America ©1992-2017 Ellucian. Confidential & Proprietary 2 Contents Contents System Overview....................................................................................................................... 22 Application summary................................................................................................................... 22 Department work flow................................................................................................................. 22 Process work flow......................................................................................................................
    [Show full text]
  • Ball Green Primary School Maths Calculation Policy 2018-2019
    Ball Green Primary School Maths Calculation Policy 2018-2019 Contents Page Contents Page Rationale 1 How do I use this calculation policy? 2 Early Years Foundation Stage 3 Year 1 5 Year 2 8 Year 3 11 Year 4 14 Year 5 17 Year 6 20 Useful links 23 1 Maths Calculation Policy 2018– 2019 Rationale: This policy is intended to demonstrate how we teach different forms of calculation at Ball Green Primary School. It is organised by year groups and designed to ensure progression for each operation in order to ensure smooth transition from one year group to the next. It also includes an overview of mental strategies required for each year group [Year 1-Year 6]. Mathematical understanding is developed through use of representations that are first of all concrete (e.g. base ten, apparatus), then pictorial (e.g. array, place value counters) to then facilitate abstract working (e.g. columnar addition, long multiplication). It is important that conceptual understanding, supported by the use of representation, is secure for procedures and if at any point a pupil is struggling with a procedure, they should revert to concrete and/or pictorial resources and representations to solidify understanding or revisit the previous year’s strategy. This policy is designed to help teachers and staff members at our school ensure that calculation is taught consistently across the school and to aid them in helping children who may need extra support or challenges. This policy is also designed to help parents, carers and other family members support children’s learning by letting them know the expectations for their child’s year group and by providing an explanation of the methods used in our school.
    [Show full text]
  • Version 0.5.1 of 5 March 2010
    Modern Computer Arithmetic Richard P. Brent and Paul Zimmermann Version 0.5.1 of 5 March 2010 iii Copyright c 2003-2010 Richard P. Brent and Paul Zimmermann ° This electronic version is distributed under the terms and conditions of the Creative Commons license “Attribution-Noncommercial-No Derivative Works 3.0”. You are free to copy, distribute and transmit this book under the following conditions: Attribution. You must attribute the work in the manner specified by the • author or licensor (but not in any way that suggests that they endorse you or your use of the work). Noncommercial. You may not use this work for commercial purposes. • No Derivative Works. You may not alter, transform, or build upon this • work. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to the web page below. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author’s moral rights. For more information about the license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ Contents Preface page ix Acknowledgements xi Notation xiii 1 Integer Arithmetic 1 1.1 Representation and Notations 1 1.2 Addition and Subtraction 2 1.3 Multiplication 3 1.3.1 Naive Multiplication 4 1.3.2 Karatsuba’s Algorithm 5 1.3.3 Toom-Cook Multiplication 6 1.3.4 Use of the Fast Fourier Transform (FFT) 8 1.3.5 Unbalanced Multiplication 8 1.3.6 Squaring 11 1.3.7 Multiplication by a Constant 13 1.4 Division 14 1.4.1 Naive
    [Show full text]
  • Calculation Devices
    Type 2 Calculation Devices Description of Accommodation This accommodation provides an alternate method of computation for a student with a disability who is unable to effectively use paper-and-pencil methods. Assessments For a student who meets the eligibility criteria, this accommodation may be used on • STAAR grades 3–8 mathematics tests and STAAR grades 5 and 8 science tests • STAAR Spanish grades 3–5 mathematics tests and STAAR Spanish grade 5 science test • STAAR L grades 3–8 mathematics tests and STAAR L grades 5 and 8 science tests • STAAR Modified grades 3–8 mathematics tests and STAAR Modified grades 5 and 8 science tests Student Eligibility Criteria A student may use this accommodation if he or she receives special education or Section 504 services, routinely, independently, and effectively uses the accommodation during classroom instruction and testing, and meets at least one of the following for the applicable grade. Grades 3 and 4 Grades 5 through 8 • The student has a physical disability • The student has a physical disability that prevents him or her from that prevents him or her from independently writing the numbers independently writing the numbers required for computations and cannot required for computations and cannot effectively use other allowable effectively use other allowable materials to address this need (e.g., materials to address this need (e.g., whiteboard, graph paper). whiteboard, graph paper). • The student has an impairment in • The student has an impairment in vision that prevents him or her from vision that prevents him or her from seeing the numbers they have seeing the numbers they have written during computations and written during computations and cannot effectively use other allowable cannot effectively use other allowable materials to address this need (e.g., materials to address this need (e.g., magnifier).
    [Show full text]
  • 1 Multiplication
    CS 140 Class Notes 1 1 Multiplication Consider two unsigned binary numb ers X and Y . Wewanttomultiply these numb ers. The basic algorithm is similar to the one used in multiplying the numb ers on p encil and pap er. The main op erations involved are shift and add. Recall that the `p encil-and-pap er' algorithm is inecient in that each pro duct term obtained bymultiplying each bit of the multiplier to the multiplicand has to b e saved till all such pro duct terms are obtained. In machine implementations, it is desirable to add all such pro duct terms to form the partial product. Also, instead of shifting the pro duct terms to the left, the partial pro duct is shifted to the right b efore the addition takes place. In other words, if P is the partial pro duct i after i steps and if Y is the multiplicand and X is the multiplier, then P P + x Y i i j and 1 P P 2 i+1 i and the pro cess rep eats. Note that the multiplication of signed magnitude numb ers require a straight forward extension of the unsigned case. The magnitude part of the pro duct can b e computed just as in the unsigned magnitude case. The sign p of the pro duct P is computed from the signs of X and Y as 0 p x y 0 0 0 1.1 Two's complement Multiplication - Rob ertson's Algorithm Consider the case that we want to multiply two 8 bit numb ers X = x x :::x and Y = y y :::y .
    [Show full text]
  • Divide-And-Conquer Algorithms
    Chapter 2 Divide-and-conquer algorithms The divide-and-conquer strategy solves a problem by: 1. Breaking it into subproblems that are themselves smaller instances of the same type of problem 2. Recursively solving these subproblems 3. Appropriately combining their answers The real work is done piecemeal, in three different places: in the partitioning of problems into subproblems; at the very tail end of the recursion, when the subproblems are so small that they are solved outright; and in the gluing together of partial answers. These are held together and coordinated by the algorithm's core recursive structure. As an introductory example, we'll see how this technique yields a new algorithm for multi- plying numbers, one that is much more efficient than the method we all learned in elementary school! 2.1 Multiplication The mathematician Carl Friedrich Gauss (1777–1855) once noticed that although the product of two complex numbers (a + bi)(c + di) = ac bd + (bc + ad)i − seems to involve four real-number multiplications, it can in fact be done with just three: ac, bd, and (a + b)(c + d), since bc + ad = (a + b)(c + d) ac bd: − − In our big-O way of thinking, reducing the number of multiplications from four to three seems wasted ingenuity. But this modest improvement becomes very significant when applied recur- sively. 55 56 Algorithms Let's move away from complex numbers and see how this helps with regular multiplica- tion. Suppose x and y are two n-bit integers, and assume for convenience that n is a power of 2 (the more general case is hardly any different).
    [Show full text]