Arithmetic Optimization Using Carry-Save-Adders

Arithmetic Optimization Using Carry-Save-Adders

Arithmetic Optimization using Carry-Save-Adders y z y Taewhan Kim William Jao Steve Tjiang y z Synopsys Inc. Aimfast Corp. 700 E. Middle eld Rd. 846 Stewart Dr. Mountain View, CA 94043 Sunnyvale, CA 94086 Abstract The n-bit CSA consists of n disjoint full addersFAs. It consumes three n-bit input vectors and pro duces Carry-save-adderCSA is the most often used typ e two outputs, i.e., n-bit sum vector S and n-bit carry of op eration in implementing a fast computation of vector C .We use the blo ck symb ol in Figure 1b to arithmetics of register-transfer level design in indus- represent a CSA op eration. Unlike the normal adders try. This pap er establishes a relationship b etween the e.g., ripple-carry adderRCA and carry-lo okahead prop erties of arithmetic computations and several op- adderCLA, a CSA contains no carry propagation. timizing transformations using CSAs to derive consis- Consequently, the CSA has the same propagation de- tently b etter qualities of results than those of manual lay as only one FA delay compared to RCA's n FA implementations. In particular, weintro duce two im- delay where n is the bit-width, and the delay is con- p ortant concepts, operation-duplication and operation- stant for anyvalue of n.For suciently large n, the split, which are the main driving techniques of our al- CSA implementation b ecomes much faster and also gorithm for achieving an extensive utilization of CSAs. relatively smaller in size than the implementation of Exp erimental results from a set of typical arithmetic 1 normal adders. computations found in industry designs indicate that automating CSA optimization with our algorithm pro- There has b een an extensive researchwork on duces designs with signi cantly faster timing and less the arithmetic optimizations in several areas[2, 3, 4]. area. They,however, fo cused on the transformation of op er- ations using techniques such as simple algebraic ma- 1 Intro duction nipulations and constant propagation; they did not address the problem of arithmetic optimization us- Hardware designers have long applied many arith- ing CSAs. This pap er intro duces the concept of CSA metic optimization techniques for implementation transformation and presents an algorithm that e ec- of arithmetic functionality like addition, subtrac- tively utilizes CSAs to derive consistently b etter qual- tion, and multiplication. Among them, carry-save- ity of results than that of manual implementations. adderCSA[1] has proved a p owerful mechanism to improve timing with little, if any, area p enalty even We de ne a CSA tree to b e a tree of CSA op erators reduced area. Figure 1a shows the structure of an and one adder at the ro ot of the tree. A CSA tree n-bit CSA. can b e used to transform an arbitrary numb er of ad- ditions to pro duce two adding op erands and the adder Xn-1 YZn-1 n-1 Xn-2 Yn-2 Z n-2 X00Y0 ZCI is used at the ro ot of CSA tree to pro duce a nal sum. In other words, an expression of N additions can b e transformed to a CSA tree of depth log N and the :5 FA FA FA 1 overall delayislog N plus the delay of nal adder 1:5 whichisaboutlog n for a CLA. For example, ex- 2 CO Sn-1 Sn-2S 0 A+B +C can b e transformed into one CSA C pression n-1 (a) C1 C 0 wn in Figure 1c. ABC and one adder as sho eral results of our exp erimentations, XYZ Based on sev CI our transformation can b e stated as follows: Given a non-cyclic data ow graph of arithmetic op erations, we wish to transform the computations using as many CO CSA op erations as p ossible while preserving the func- CS y of the design. (b) tionalit F (c) 1 Figure 1: An n-bit CSA and an example of use Note that a CSA p erforms the same functionality of the conventional adders in the sense that each reduces the number of adding op erands by one, i.e., a CSA reduces from 3 to 2 and an adder from 2 to 1. 35th Design Automation Conference ® Copyright ©1998 ACM 1-58113-049-x-98/0006/$3.50 DAC98 - 06/98 San Francisco, CA USA 2 ApplicabilityofTransformation transformed yet in the previous iterations. We refer the op eration to root. The ro ot is then expanded to- CSA transformation is not limited to addition only. ward the input b oundary of design to construct a tree We can transform other arithmetic op erations like sub- of op erations. Note that the typ e of ro ot must b e traction and multiplicationinto additions to pro duce one of addition, subtraction, and multiplication op er- longer chains of additions. We replace a subtraction 2 ations. by adding the negation of the subtraction. That is, Supp ose that op eration A is a leaf of the op eration x-y in expression is changed into x +y+ 1. For tree expanded so far and B is an op eration in which multiplication, we can use two p ossible options: its output is used as an input of op eration A. We expand the current op eration tree by including B if sum of pro ducts: the following four conditions are satis ed: Amultiplication is decomp osed into a set of shift- and-add op erations, which is de nitely b ene cial Condition 1: A must b e one of addition and sub- 3 when it has a constant as input. With full de- traction op erations. comp osition we can e ectively extend the addi- tion tree by merging the decomp osed additions Condition 2: B must b e one of addition, subtrac- with another additions. However, decomp osition tion, and multiplication op erations. can increase the numb er of CSA op erations dras- tically as the bit-width increases. Condition 3: Output of B must b e single fanout. That is, B drives only an input of A. partial multiplication and addition: Condition 4: There should b e no \leak" of data Twotyp es of implementation for multiplication values through the connections from B to A up-to op eration are typically found in designs[6]: i the output bit-width of ro ot of the tree. wallace tree mo del for fast timing and ii carry- save array mo del for small area. The wallace tree 4 Condition 1 ensures that only addition and sub- mo del consists of two blo cks called partial-mult traction are used as non-leaf op erations of the tree, and nal-add.Partial-mult uses the two inputs and Condition 2 ensures that only addition, subtrac- of the multiplication as input and pro duces two tion and multiplication are used as leaf op erations. outputs, in whichby adding them the nal result 5 Condition 3 ensures that we do not allow transform- of multiplication is obtained. Consequently,by ing non-tree structure of op erations. We solve the decomp osing into a partial-mult and a nal-add multiple fanout case in Sec. 4.1. Finally, Condition we can merge the nal-add with a descendent op- 4 is required for preserving the functionality of de- eration to transform into CSAs. This option in- sign b efore and after transformation. The examples creases only one op eration, but is less exible than shown in Figure 2 clarify the concept of leakage of the case of full decomp osition. data values. Supp ose op eration A is also the ro ot of the current tree. Figure 2a shows a case of upp er- 3 An Algorithm bit truncation b etween B and A. Because up-to 8 Our transformation algorithm consists of three ma- bits of data values must b e preserved, the truncation jor steps: 1 identi cation of op eration tree to b e do es not allow merging op eration B with A. Simi- transformed, 2 translation of the expression of the larly, Figure 2b shows a case of lower-bit truncation identi ed tree into an addition expression, and 3 between the two op erations. It also prevents merg- conversion of the addition expression into a CSA tree. ing the two op erations. However, Figure 2c shows Given a non-cycle data ow graph of arithmetic com- a case of preserved data values up-to 8 bits from B putations, our algorithm iteratively p erforms the three through A. Consequently, the two op eration can b e steps until there is no candidate expression of tree to merged and transformed into CSA op erations without b e transformed. The following subsections describ e changing the functionality of design. the details of the three steps. 3.2 Conversion to Additions 3.1 Candidate Identi cation With the identi ed cluster of op eration tree ob- As explained in Sec. 2, the CSA transformation tained from Step 1, this step converts the expression can b e applied to anytyp e of op erations that can of op eration tree consisting of additions, subtractions, b e converted to additions. A candidate cluster is and multiplications into a tree of additions. What we basically a tree which contains op erations like addi- are interested in this step is to extract all the op erands tions/subtractions/multiplications. Our algorithm is from the converted addition expression. Table 1 sum- a b ottom-up from the output b oundary of design to- marizes all p ossible conversion rules for basic op er- ward the input b oundary approach.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us