A Radix-10 Combinational Multiplier
Total Page:16
File Type:pdf, Size:1020Kb
A Radix-10 Combinational Multiplier Tomas´ Lang and Alberto Nannarelli∗ Dept. of Electrical Engineering and Computer Science, University of California, Irvine, USA ∗Dept. of Informatics & Math. Modelling, Technical University of Denmark, Kongens Lyngby, Denmark Abstract— In this work, we present a combinational decimal where for decimal operands r =10, yi ∈ [0, 9] and x is a multiply unit which can be pipelined to reach the desired n-digit BCD vector. We consider in the following n =16. throughput. With respect to previous implementations of decimal To avoid complicated multiples of x,werecode multiplication, the proposed unit is combinational (parallel) and = + ∈{0 5 10} ∈{−2 1 0 1 2} not sequential, has a simpler recoding of the operands which yi yHi yLi with yH , , and yL , , , , reduces the number of partial product precomputations and as indicated in Table I. With this recoding, we need to precom- uses counters to eliminate the need of the decimal equivalent pute only the multiples 5x and 2x, while 10x is obtained by of a 4:2 adder. The results of the implementation show that left-shifting x one digit. The negative values −x and −2x the combinational decimal multiplier offers a good compromise are represented in radix-10 radix-complement. Each partial between latency and area when compared to other decimal multiply units and to binary double-precision multipliers. product xyi is positive and sign extension is not necessary. The partial products are accumulated by using an adder tree, I. INTRODUCTION and the multiplication is completed by a carry-save to BCD Hardware implementations of decimal arithmetic units have conversion, as shown in Fig. 1. recently gained importance because they provide higher ac- A. Precomputation of 2x and 5x curacy in financial applications [1]. In this work, we present a combinational decimal multiplier which can be pipelined to The multiplication by 2 is straightforward. Each digit is reach the desired throughput. The multiply unit is organized as multiplied by 2 and the carry is propagated to the next digit. follows: the multiplier is recoded; the partial products are kept The carry does not propagate any further. 5 2 in a redundant format; the partial product are accumulated by In [5] the multiples x and x are used for decimal 5 a tree of redundant adders and the final product is obtained multiplication and division. However, the generation of x is by converting the carry-save tree’s outputs into binary-coded performed with a carry propagation over the whole number. decimal (BCD) format. We now present the algorithm we use for the precomputation 5 With respect to previous implementations of radix-10 multi- of x without carry propagation. =5 =10 2 pliers such as the ones in [2], [3] and [4], our design is different To obtain g x x/ we perform the following two in the following aspects: 1) the multiplier is combinational steps: =10 (parallel) and not sequential; 2) we recode only the multiplier (1) e x (shift left one digit) = 2 while in [4] both operands are recoded in -5 to +5; 3) in (2) g e/ : the partial product generation only multiples of 5 and 2 are To perform this operation we divide by two each digit of e. 2 required; 4) the accumulation of partial products is done in a However, since ei/ has a fractional part when ei is odd, we tree of radix-10 carry-save adders and counters while in the have sequential unit of [4] signed-digit adders are used. ei/2=fi + hi+1/2 fi =0,...,4 and hi =0, 1 We present the standard cells implementation of the mul- gi = fi +5hi gi =0,...,9 tiplier and compare its latency with those of the schemes That is, the algorithm to produce g =5x is presented in [2] and [3]. Moreover, we compare the delay and =1 the area of the decimal combinational multiplier with those hi+1 if ei (or xi+1) odd = 2 = 2 obtained by the implementation of a binary (radix-4) double- fi ei/ xi+1/ = +5 precision multiplier. gi fi hi II. MULTIPLIER ARCHITECTURE For the multiplication p = x · y, we assume that both the yi yH yL yi yH yL multiplicand x and the multiplier y are sign-and-magnitude 0 00 5 50 1 01 6 51 n-digit fractional numbers in BCD format normalized in 2 02 7 52 [0.1, 1.0). The multiplication shift-and-add algorithm is based 3 5-2 8 10 -2 on the identity 4 5-1 9 10 -1 n−1 TABLE I i RECODING OF DIGITS OF y. p = x · y = xyir i=0 1424407850/06/$20.00 313 X Yi XY n*4 16 16 4 9’s complement precompute 5X precompute 2X shift − left 1 digit Partial 9’s complement Product recoder Generation 10x 5x Precomputations 0 precomputations __ _ 4 Partial−Product generation 2x x x 2x −2 −10 1 2 5:1−digit mux Adder 0 (n+1)*4 Tree 10 5 0 2 3:1−digit mux (n+1)*4 carry−save −> BCD 1 radix−10 carry−save adder cin 32 (n+1)*4 n+1 xy xy P i S i C Fig. 1. Scheme of the multiplier. Fig. 2. Partial product generation. n − 1 ... 10 B. Partial Product Generation xyHi xxxx ... xxxx xxxx xyLi xxxx ... xxxx xxxx The recoding of Table I produces two n-digit terms for xyiS ssss ... ssss ssss each partial product (PP). Table II shows an example of xyiC c ... cb multiplication (4×4 digits). Negative numbers are represented b=1if yLi < 0 in radix-10 complement and are implemented by doing the The scheme for the generation of a partial product is shown 9’s complement of the BCD digit and adding a 1 in the least- in Fig. 2. significant position. To reduce the delay of the additions, we perform the C. Partial Product Accumulation addition xyi = xyHi + xyLi carry save and use a radix-10 carry-save representation of the partial products The carry-free accumulation of the partial products is done using radix-10 carry-save adders (CSA), which add a carry- save operand plus another BCD operand to produce a carry- xyiS ∈ [0, 9] xyi = xyHi + xyLi = save result (see Fig. 3). xyiC ∈ [0, 1] For 16-digit operands, we obtain 16 carry-save partial products. Because the radix-10 CSA reduces two BCD digits That is, the addition to produce a partial product results in and one carry bit to one BCD digit and one carry bit, by arranging in a first-level of the tree the xyiSs and by using 8 CSAs, we leave the carries (xyiC s) of 8 partial products not x : 1963× accounted for. This is shown in Fig. 4. y : 8145= xy0 :5x 9815 These carry vectors are added by using an array of carry- 0 0000 counters (CC). A digit counter of this array adds 8 carries of xy :5x 1 9815 the same weight and produces a decimal digit. That is, −x -1963 xy : x 2 1963 ∈ [0 1] 0 0000 inputs 8 m-bit vectors Ci cj , xy3 :10x 19630 output 1 m-digit vector Zzj ∈ [0, 8] −2x -3926 15988635 asshowninFig.5. TABLE II By arranging the radix-10 carry-save adders and the carry- EXAMPLE OF MULTIPLICATION (4 × 4 DIGITS). counters as in Fig. 6, we can accumulate the partial products in a 6-level tree. 314 ABD C i n*4 n*4 n m m r−10 CSA r−10 CC n*4 n m*4 S C Z n − 1 ... 10 m − 1 m − 2 ... 210 A aaaa ... aaaa aaaa C1 cc... ccc B bbbb ... bbbb bbbb C2 cc... ccc D d ... dd C3 cc... ccc S ssss ... ssss ssss C4 cc... ccc C c ... c0 C5 cc... ccc C6 cc... ccc n Fig. 3. -digit radix-10 CSA. C7 cc... ccc C8 cc... ccc Z zzzz zzzz ... zzzz zzzz zzzz Fig. 5. m-digit radix-10 counter. The adder can be divided into three stages. Like in radix-2, the first stage of the unit produces p and g defined as 1 if ai + di =9 1 if ai + di =10 pi = gi = 0 otherwise 0 otherwise In parallel, the following operations are computed (modulo 10) s0i = ai + di10 i =0, 1,...,2n − 1 s1i = ai + di +110 A second stage consists of a parallel prefix structure to compute the carries ci from the psandgs and a final stage selects the precomputed digits of s0 or s1 according to the specific carry bit ci s0i if ci =0 si = i =0, 1,...,2n − 1 s1i if ci =1 This adding scheme can be generalized to a BCD carry- propagate adder by first applying a simplified radix-10 CSA1 which reduces the two BCD vectors to a radix-10 carry-save representation. Then, the above presented converter is used to Fig. 4. Array for partial products. Solid circles indicate BCD digits, hollow complete the carry-propagate addition. circles indicate bits. III. IMPLEMENTATION AND COMPARISONS The 16-digit radix-10 combinational multiplier (Fig. 1) has D. Radix-10 carry-save to BCD converter been synthesized using the STM 90 nm CMOS standard cells The final carry-propagating addition consists in converting library [6] and Synopsys Design Compiler. The synthesized 2 65 0 3 2 the radix-10 carry-save representation into the BCD one. This unit has a critical path of . ns and an area of . mm 68 000 can be done with a simplified radix-10 CPA in which the input equivalent to the area of about , NAND-2 gates (Ta- is just a value in radix-10 carry-save representation.