Arithmetic Optimization Using Carry-Save-Adders

Arithmetic Optimization using Carry-Save-Adders y z y Taewhan Kim William Jao Steve Tjiang y z Synopsys Inc. Aimfast Corp. 700 E. Middle eld Rd. 846 Stewart Dr. Mountain View, CA 94043 Sunnyvale, CA 94086 Abstract The n-bit CSA consists of n disjoint full addersFAs. It consumes three n-bit input vectors and pro duces Carry-save-adderCSA is the most often used typ e two outputs, i.e., n-bit sum vector S and n-bit carry of op eration in implementing a fast computation of vector C .We use the blo ck symb ol in Figure 1b to arithmetics of register-transfer level design in indus- represent a CSA op eration. Unlike the normal adders try. This pap er establishes a relationship b etween the e.g., ripple-carry adderRCA and carry-lo okahead prop erties of arithmetic computations and several op- adderCLA, a CSA contains no carry propagation. timizing transformations using CSAs to derive consis- Consequently, the CSA has the same propagation de- tently b etter qualities of results than those of manual lay as only one FA delay compared to RCA's n FA implementations. In particular, weintro duce two im- delay where n is the bit-width, and the delay is con- p ortant concepts, operation-duplication and operation- stant for anyvalue of n.For suciently large n, the split, which are the main driving techniques of our al- CSA implementation b ecomes much faster and also gorithm for achieving an extensive utilization of CSAs. relatively smaller in size than the implementation of Exp erimental results from a set of typical arithmetic 1 normal adders. computations found in industry designs indicate that automating CSA optimization with our algorithm pro- There has b een an extensive researchwork on duces designs with signi cantly faster timing and less the arithmetic optimizations in several areas[2, 3, 4]. area. They,however, fo cused on the transformation of op erations using techniques such as simple algebraic ma- 1 Intro duction nipulations and constant propagation; they did not address the problem of arithmetic optimization us- Hardware designers have long applied many arith- ing CSAs. This pap er intro duces the concept of CSA metic optimization techniques for implementation transformation and presents an algorithm that e ec- of arithmetic functionality like addition, subtrac- tively utilizes CSAs to derive consistently b etter qual- tion, and multiplication. Among them, carry-save- ity of results than that of manual implementations. adderCSA[1] has proved a p owerful mechanism to improve timing with little, if any, area p enalty even We de ne a CSA tree to b e a tree of CSA op erators reduced area. Figure 1a shows the structure of an and one adder at the ro ot of the tree. A CSA tree n-bit CSA. can b e used to transform an arbitrary numb er of additions to pro duce two adding op erands and the adder Xn-1 YZn-1 n-1 Xn-2 Yn-2 Z n-2 X00Y0 ZCI is used at the ro ot of CSA tree to pro duce a nal sum. In other words, an expression of N additions can b e transformed to a CSA tree of depth log N and the :5 FA FA FA 1 overall delayislog N plus the delay of nal adder 1:5 whichisaboutlog n for a CLA. For example, ex- 2 CO Sn-1 Sn-2S 0 A+B +C can b e transformed into one CSA C pression n-1 (a) C1 C 0 wn in Figure 1c. ABC and one adder as sho eral results of our exp erimentations, XYZ Based on sev CI our transformation can b e stated as follows: Given a non-cyclic data ow graph of arithmetic op erations, we wish to transform the computations using as many CO CSA op erations as p ossible while preserving the func- CS y of the design. (b) tionalit F (c) 1 Figure 1: An n-bit CSA and an example of use Note that a CSA p erforms the same functionality of the conventional adders in the sense that each reduces the number of adding op erands by one, i.e., a CSA reduces from 3 to 2 and an adder from 2 to 1. 35th Design Automation Conference ® Copyright ©1998 ACM 1-58113-049-x-98/0006/$3.50 DAC98 - 06/98 San Francisco, CA USA 2 ApplicabilityofTransformation transformed yet in the previous iterations. We refer the op eration to root. The ro ot is then expanded to- CSA transformation is not limited to addition only. ward the input b oundary of design to construct a tree We can transform other arithmetic op erations like sub- of op erations. Note that the typ e of ro ot must b e traction and multiplicationinto additions to pro duce one of addition, subtraction, and multiplication op er- longer chains of additions. We replace a subtraction 2 ations. by adding the negation of the subtraction. That is, Supp ose that op eration A is a leaf of the op eration x-y in expression is changed into x +y+ 1. For tree expanded so far and B is an op eration in which multiplication, we can use two p ossible options: its output is used as an input of op eration A. We expand the current op eration tree by including B if sum of pro ducts: the following four conditions are satis ed: Amultiplication is decomp osed into a set of shift- and-add op erations, which is de nitely b ene cial Condition 1: A must b e one of addition and sub- 3 when it has a constant as input. With full de- traction op erations. comp osition we can e ectively extend the addition tree by merging the decomp osed additions Condition 2: B must b e one of addition, subtrac- with another additions. However, decomp osition tion, and multiplication op erations. can increase the numb er of CSA op erations dras- tically as the bit-width increases. Condition 3: Output of B must b e single fanout. That is, B drives only an input of A. partial multiplication and addition: Condition 4: There should b e no \leak" of data Twotyp es of implementation for multiplication values through the connections from B to A up-to op eration are typically found in designs[6]: i the output bit-width of ro ot of the tree. wallace tree mo del for fast timing and ii carry- save array mo del for small area. The wallace tree 4 Condition 1 ensures that only addition and sub- mo del consists of two blo cks called partial-mult traction are used as non-leaf op erations of the tree, and nal-add.Partial-mult uses the two inputs and Condition 2 ensures that only addition, subtrac- of the multiplication as input and pro duces two tion and multiplication are used as leaf op erations. outputs, in whichby adding them the nal result 5 Condition 3 ensures that we do not allow transform- of multiplication is obtained. Consequently,by ing non-tree structure of op erations. We solve the decomp osing into a partial-mult and a nal-add multiple fanout case in Sec. 4.1. Finally, Condition we can merge the nal-add with a descendent op- 4 is required for preserving the functionality of de- eration to transform into CSAs. This option in- sign b efore and after transformation. The examples creases only one op eration, but is less exible than shown in Figure 2 clarify the concept of leakage of the case of full decomp osition. data values. Supp ose op eration A is also the ro ot of the current tree. Figure 2a shows a case of upp er- 3 An Algorithm bit truncation b etween B and A. Because up-to 8 Our transformation algorithm consists of three ma- bits of data values must b e preserved, the truncation jor steps: 1 identi cation of op eration tree to b e do es not allow merging op eration B with A. Simi- transformed, 2 translation of the expression of the larly, Figure 2b shows a case of lower-bit truncation identi ed tree into an addition expression, and 3 between the two op erations. It also prevents merg- conversion of the addition expression into a CSA tree. ing the two op erations. However, Figure 2c shows Given a non-cycle data ow graph of arithmetic com- a case of preserved data values up-to 8 bits from B putations, our algorithm iteratively p erforms the three through A. Consequently, the two op eration can b e steps until there is no candidate expression of tree to merged and transformed into CSA op erations without b e transformed. The following subsections describ e changing the functionality of design. the details of the three steps. 3.2 Conversion to Additions 3.1 Candidate Identi cation With the identi ed cluster of op eration tree ob- As explained in Sec. 2, the CSA transformation tained from Step 1, this step converts the expression can b e applied to anytyp e of op erations that can of op eration tree consisting of additions, subtractions, b e converted to additions. A candidate cluster is and multiplications into a tree of additions. What we basically a tree which contains op erations like addi- are interested in this step is to extract all the op erands tions/subtractions/multiplications. Our algorithm is from the converted addition expression. Table 1 sum- a b ottom-up from the output b oundary of design to- marizes all p ossible conversion rules for basic op er- ward the input b oundary approach.

Arithmetic Optimization Using Carry-Save-Adders

Digital Electronic Circuits

Design and Analysis of Power Efficient PTL Half Subtractor Using 120Nm Technology

Design of Adder / Subtractor Circuits Based On

Area and Power Optimized D-Flip Flop and Subtractor

High Performance Full Subtractor Using Floating-Gate MOSFET

Design and Implementation of Adder/Subtractor and Multiplication Units for Floating-Point Arithmetic

Design of Fault Tolerant Full Subtractor

8-Bit Adder and Subtractor with Domain Label Based on DNA Strand Displacement

VHDL: Dataflow

Digital Electonics and Microprocessor Date

COMPUTER ARITHMETIC 1. Addition and Subtraction of Unsigned Numbers the Direct Method of Subtraction Taught in Elementary Schools Uses the Borrowconcept

Multi‐Bit Adder/Subtractor with Overflow