A Radix-10 Combinational Multiplier

A Radix-10 Combinational Multiplier Tomas´ Lang and Alberto Nannarelli∗ Dept. of Electrical Engineering and Computer Science, University of California, Irvine, USA ∗Dept. of Informatics & Math. Modelling, Technical University of Denmark, Kongens Lyngby, Denmark Abstract— In this work, we present a combinational decimal where for decimal operands r =10, yi ∈ [0, 9] and x is a multiply unit which can be pipelined to reach the desired n-digit BCD vector. We consider in the following n =16. throughput. With respect to previous implementations of decimal To avoid complicated multiples of x,werecode multiplication, the proposed unit is combinational (parallel) and = + ∈{0 5 10} ∈{−2 1 0 1 2} not sequential, has a simpler recoding of the operands which yi yHi yLi with yH , , and yL , , , , reduces the number of partial product precomputations and as indicated in Table I. With this recoding, we need to precom- uses counters to eliminate the need of the decimal equivalent pute only the multiples 5x and 2x, while 10x is obtained by of a 4:2 adder. The results of the implementation show that left-shifting x one digit. The negative values −x and −2x the combinational decimal multiplier offers a good compromise are represented in radix-10 radix-complement. Each partial between latency and area when compared to other decimal multiply units and to binary double-precision multipliers. product xyi is positive and sign extension is not necessary. The partial products are accumulated by using an adder tree, I. INTRODUCTION and the multiplication is completed by a carry-save to BCD Hardware implementations of decimal arithmetic units have conversion, as shown in Fig. 1. recently gained importance because they provide higher ac- A. Precomputation of 2x and 5x curacy in financial applications [1]. In this work, we present a combinational decimal multiplier which can be pipelined to The multiplication by 2 is straightforward. Each digit is reach the desired throughput. The multiply unit is organized as multiplied by 2 and the carry is propagated to the next digit. follows: the multiplier is recoded; the partial products are kept The carry does not propagate any further. 5 2 in a redundant format; the partial product are accumulated by In [5] the multiples x and x are used for decimal 5 a tree of redundant adders and the final product is obtained multiplication and division. However, the generation of x is by converting the carry-save tree’s outputs into binary-coded performed with a carry propagation over the whole number. decimal (BCD) format. We now present the algorithm we use for the precomputation 5 With respect to previous implementations of radix-10 multi- of x without carry propagation. =5 =10 2 pliers such as the ones in [2], [3] and [4], our design is different To obtain g x x/ we perform the following two in the following aspects: 1) the multiplier is combinational steps: =10 (parallel) and not sequential; 2) we recode only the multiplier (1) e x (shift left one digit) = 2 while in [4] both operands are recoded in -5 to +5; 3) in (2) g e/ : the partial product generation only multiples of 5 and 2 are To perform this operation we divide by two each digit of e. 2 required; 4) the accumulation of partial products is done in a However, since ei/ has a fractional part when ei is odd, we tree of radix-10 carry-save adders and counters while in the have sequential unit of [4] signed-digit adders are used. ei/2=fi + hi+1/2 fi =0,...,4 and hi =0, 1 We present the standard cells implementation of the mul- gi = fi +5hi gi =0,...,9 tiplier and compare its latency with those of the schemes That is, the algorithm to produce g =5x is presented in [2] and [3]. Moreover, we compare the delay and =1 the area of the decimal combinational multiplier with those hi+1 if ei (or xi+1) odd = 2 = 2 obtained by the implementation of a binary (radix-4) double- fi ei/ xi+1/ = +5 precision multiplier. gi fi hi II. MULTIPLIER ARCHITECTURE For the multiplication p = x · y, we assume that both the yi yH yL yi yH yL multiplicand x and the multiplier y are sign-and-magnitude 0 00 5 50 1 01 6 51 n-digit fractional numbers in BCD format normalized in 2 02 7 52 [0.1, 1.0). The multiplication shift-and-add algorithm is based 3 5-2 8 10 -2 on the identity 4 5-1 9 10 -1 n−1 TABLE I i RECODING OF DIGITS OF y. p = x · y = xyir i=0 1424407850/06/$20.00 313 X Yi XY n*4 16 16 4 9’s complement precompute 5X precompute 2X shift − left 1 digit Partial 9’s complement Product recoder Generation 10x 5x Precomputations 0 precomputations __ _ 4 Partial−Product generation 2x x x 2x −2 −10 1 2 5:1−digit mux Adder 0 (n+1)*4 Tree 10 5 0 2 3:1−digit mux (n+1)*4 carry−save −> BCD 1 radix−10 carry−save adder cin 32 (n+1)*4 n+1 xy xy P i S i C Fig. 1. Scheme of the multiplier. Fig. 2. Partial product generation. n − 1 ... 10 B. Partial Product Generation xyHi xxxx ... xxxx xxxx xyLi xxxx ... xxxx xxxx The recoding of Table I produces two n-digit terms for xyiS ssss ... ssss ssss each partial product (PP). Table II shows an example of xyiC c ... cb multiplication (4×4 digits). Negative numbers are represented b=1if yLi < 0 in radix-10 complement and are implemented by doing the The scheme for the generation of a partial product is shown 9’s complement of the BCD digit and adding a 1 in the least- in Fig. 2. significant position. To reduce the delay of the additions, we perform the C. Partial Product Accumulation addition xyi = xyHi + xyLi carry save and use a radix-10 carry-save representation of the partial products The carry-free accumulation of the partial products is done using radix-10 carry-save adders (CSA), which add a carry- save operand plus another BCD operand to produce a carry- xyiS ∈ [0, 9] xyi = xyHi + xyLi = save result (see Fig. 3). xyiC ∈ [0, 1] For 16-digit operands, we obtain 16 carry-save partial products. Because the radix-10 CSA reduces two BCD digits That is, the addition to produce a partial product results in and one carry bit to one BCD digit and one carry bit, by arranging in a first-level of the tree the xyiSs and by using 8 CSAs, we leave the carries (xyiC s) of 8 partial products not x : 1963× accounted for. This is shown in Fig. 4. y : 8145= xy0 :5x 9815 These carry vectors are added by using an array of carry- 0 0000 counters (CC). A digit counter of this array adds 8 carries of xy :5x 1 9815 the same weight and produces a decimal digit. That is, −x -1963 xy : x 2 1963 ∈ [0 1] 0 0000 inputs 8 m-bit vectors Ci cj , xy3 :10x 19630 output 1 m-digit vector Zzj ∈ [0, 8] −2x -3926 15988635 asshowninFig.5. TABLE II By arranging the radix-10 carry-save adders and the carry- EXAMPLE OF MULTIPLICATION (4 × 4 DIGITS). counters as in Fig. 6, we can accumulate the partial products in a 6-level tree. 314 ABD C i n*4 n*4 n m m r−10 CSA r−10 CC n*4 n m*4 S C Z n − 1 ... 10 m − 1 m − 2 ... 210 A aaaa ... aaaa aaaa C1 cc... ccc B bbbb ... bbbb bbbb C2 cc... ccc D d ... dd C3 cc... ccc S ssss ... ssss ssss C4 cc... ccc C c ... c0 C5 cc... ccc C6 cc... ccc n Fig. 3. -digit radix-10 CSA. C7 cc... ccc C8 cc... ccc Z zzzz zzzz ... zzzz zzzz zzzz Fig. 5. m-digit radix-10 counter. The adder can be divided into three stages. Like in radix-2, the first stage of the unit produces p and g defined as 1 if ai + di =9 1 if ai + di =10 pi = gi = 0 otherwise 0 otherwise In parallel, the following operations are computed (modulo 10) s0i = ai + di10 i =0, 1,...,2n − 1 s1i = ai + di +110 A second stage consists of a parallel prefix structure to compute the carries ci from the psandgs and a final stage selects the precomputed digits of s0 or s1 according to the specific carry bit ci s0i if ci =0 si = i =0, 1,...,2n − 1 s1i if ci =1 This adding scheme can be generalized to a BCD carry- propagate adder by first applying a simplified radix-10 CSA1 which reduces the two BCD vectors to a radix-10 carry-save representation. Then, the above presented converter is used to Fig. 4. Array for partial products. Solid circles indicate BCD digits, hollow complete the carry-propagate addition. circles indicate bits. III. IMPLEMENTATION AND COMPARISONS The 16-digit radix-10 combinational multiplier (Fig. 1) has D. Radix-10 carry-save to BCD converter been synthesized using the STM 90 nm CMOS standard cells The final carry-propagating addition consists in converting library [6] and Synopsys Design Compiler. The synthesized 2 65 0 3 2 the radix-10 carry-save representation into the BCD one. This unit has a critical path of . ns and an area of . mm 68 000 can be done with a simplified radix-10 CPA in which the input equivalent to the area of about , NAND-2 gates (Ta- is just a value in radix-10 carry-save representation.

A Radix-10 Combinational Multiplier

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support