University of California s12

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Study Homework: Arithmetic NTU IC541CA (Fall 2004) SOLUTIONS

1 Short adders

1A The delay of the ripple carry adder can be split into three parts: the first bit, the middle six bits,

and the last bit. For the first bit, the critical path for the cout is one XOR, one AND, and one OR gate. Then for the next 6 bits, since p has had plenty of time to be computed already, we need

only one AND and one OR gate to get from cin to cout. In the final stage, the important path is from cin to the s output, which is one XOR delay.

Therefore, the delays are: 2 2 tfirst = 0.5*2 *tp + 2*0.25*2 *tp = 4tp 2 tmiddle = 2*0.25*2 *tp = 2tp 2 tlast = 0.5*2 *tp = 2tp

ttotal_ripple8 = tfirst + 6*tmiddle + tlast = 4tp + 6*2tp + 2tp = 18tp

1B The CLA logic functions implemented are

co3 = g3 + p3 g2+p3 p2 g1+p3 p2 p1 g0+p3 p2 p1 p0 cin co2= g2+ p2 g1+ p2 p1 g0+ p2 p1 p0 cin co1= g1+ p1 g0+ p1 p0 cin co0= g0+ p0 cin

cin c o3 4bit CLA 4bit CLA

Figure 1. Carry look-ahead architecture.

For the first 4-bit CLA group, the important path is from cin to co3. After one XOR delay, all the gi’s and pi’s are computed. In the most straightforward implementation, the worst-case path is then through one 5-input AND gate and a 5-input OR gate (other topologies are possible) which takes a total of: 2 2 2 tfirstCLA = 0.5*2 *tp + 0.25*5 *tp + 0.25*5 *tp = 14.5tp

For the second 4-bit CLA group, the important path is from cin to co2 to the final sum bit. Again using the most straightforward implementation, cin to co2 takes one 4-input AND gate and one 4- input OR gate. From co2 to sum goes through one XOR gate: 2 2 2 tsecondCLA = 0.25*4 *tp + 0.25*4 *tp + 0.5*2 *tp = 10tp

The total delay is then the sum of these two groups:

ttotal_CLA8 = tfirstCLA + tsecondCLA = 24.5tp

1C In the analysis of a 32-bit ripple carry adder, the same delays calculated for the 8-bit case can be reused: ttotal_ripple32 = tfirst + 30*tmiddle + tlast = 4tp + 30*2tp + 2tp = 66tp

For the 32-bit CLA using a hierarchical propagate/generate scheme, refer to the figure for the worst-case path through the entire adder.

Obtain pi,gi (2tp) Calculate the sum bit

2tp+2tp Obtain P,G block propagate generates Pass the carry back 8tp+8tp 2t + 2t p p

It takes one XOR delay to get the propagate/generate terms for each bit, which is 2tp. The propagate/generate terms in the second- and third-levels are computed as:

Pmiddle = Ptop0*Ptop1*Ptop2*Ptop3 Gmiddle = Gtop3 + Gtop2*Ptop3 + Gtop1*Ptop3*Ptop2 + Gtop0*Ptop3*Ptop2*Ptop1 So the worst-case path to make these P and G terms is one 4-input AND gate followed by a 4-

input OR gate, which totals 8tp for each level.

Traversing back up the tree, the groups can compute their carry-outs using the simple

co = G + P*ci so it only takes one 2-input AND gate and one 2-input OR gate, which is 2tp for each level.

Once the top level is reached, the final carry takes 2tp and the sum requires an additional 2-input XOR, adding another 2tp.

Totaling all the delays yields a critical path delay of

ttotal_CLA32 = 2tp + (8tp + 8tp) + (2tp + 2tp) + (2tp + 2tp) = 26tp

As the as the number of bits increases, the carry look ahead adder has a distinct advantage. However, for adders with less than 10-bits, it is usually faster to implement the simpler ripple carry architecture. 2 Comparator design

2A The simplest implementation calculates the (B-A). If the resulting is negative (ie. if the sign bit is one), then A is greater than B. If the sign bit is zero, then A <= B. The implementation would simply bitwise complement B and compute OUT = A + (~B) + 1. The output of the N-bit comparator would simply be the last bit of this result, ie. OUT[N-1].

A N OUT[N-1] B A>B OUT N N

1

Assuming the adder is a ripple carry adder, the worst case delay of the comparator would be of the same form as 1A:

4*tp+(N-2)*2tp +2*tp = (N+1)*2tp 3 Signed Multiplication

This solution implements Baugh-Wooley multiplier technique. As an illustrative example, extend the traditional binary notation to allow bits to be negative (similar to the carry-save technique). In this case, the failing example can be transformed to: Extended binary Decimal equivalent -1 1 0 0 (-8)+4 = (-4) x -1 0 0 1 (-8)+1 = (-7) -1 1 0 0 (-8)+4 = (-4) 0 0 0 0 (0) 0 0 0 0 (0) 1 -1 0 0 64+(-32)=32 0 1 -1 0 -1 1 0 0 64+(-32)+(-8)+4=28

So, this wacky method appears to give the correct answer (at least for this example). It is left as an exercise to prove that it works for all numbers.

Now, consider how to implement this multiplier. In the general case, the following partial products are required (parentheses signify the negative weighted bits).

(a3) a2 a1 a0

(b3) b2 b1 b0

(a3b0) a2b0 a1b0 a0b0

(a3b1) a2b1 a1b1 a0b1

(a3b2) a2b2 a1b2 a0b2

a3b3 (a2b3) (a1b3) (a0b3)

P7 p6 p5 p4 p3 p2 p1 p0

The bits in the partial products can be reordered to form more useful groups consisting of positive and negative terms:

(a3) a2 a1 a0

(b3) b2 b1 b0

a2b0 a1b0 a0b0

a2b1 a1b1 a0b1

a3b3 0 a2b2 a1b2 a0b2

(a3b2) (a3b1) (a3b0)

(a2b3) (a1b3) (a0b3)

P7 p6 p5 p4 p3 p2 p1 p0

We are trying to add the positive partial products (above dashed line) and subtract the negative partial products (those below the dashed line). The negatively weighted partial products can be interpreted as two numbers that can be negated, allowing the whole array to be added together. Note that blank entries in the table are implicit zeros. Consider the first negative row. Negation is tricky here because we only want to negate if a3 is one. If a3 is zero, we want the entire row to be zero. The logic function for each bit is then a3 bi . For the entire number [0 0 (a3b2) (a3b1) (a3b0)], the result would be the following sum:

a3 a3 a3 b2 a3 b1 a3 b0

a3 where the extra a3 terms sprinkled around are for sign-extension and for the +1 in the two’s complement conversion. As another simplification, the leading two positions can be interpreted is either as:

[0 0] a3  0 [a4 a3 ] = [a3 a3 ] =   [11] a3  1 or equivalently as:

[11] 1 a3  0 [0 0] a3  0 [1 a4 ] +1 = [1 a3 ] +1 =   =   [1 0] 1 a3  1 [11] a3  1

If the above methods are also use to negate the partial products starting with b3, the resulting the initial partial sum array would be:

(a3) a2 a1 a0

(b3) b2 b1 b0

a2b0 a1b0 a0b0

a2b1 a1b1 a0b1

a3b3 0 a2b2 a1b2 a0b2

1 a3 a3 b2 a3 b1 a3 b0 1

1 b3 a2b3 a1b3 a0b3 1 a3 b3

P7 P6 p5 p4 p3 p2 p1 p0

Merging the constant one’s together, yields the final partial sum array:

(a3) a2 a1 a0

(b3) b2 b1 b0

a2b0 a1b0 a0b0

a2b1 a1b1 a0b1

a3b3 0 a2b2 a1b2 a0b2

a3 a3 b2 a3 b1 a3 b0

b3 a2b3 a1b3 a0b3 1 a3 b3

P7 P6 p5 p4 p3 p2 p1 p0 We can implement this array in the following way. Using the block below as the processing element of the main array we can obtain the result p[7:0] sin a sin a b b cout cin cout FA cin sout

sout a1 a0 b3 a3 a2 b0

b1

b2

b3 a3 1 ~a3 1 1 ~b4

p7 p6 p5 p4 p3 p2 p1 p0