Arithmetic and Logic Unit (ALU)
Designing an Adder
……………….… Traditional circuit design: truth table approach Cost/Speed tradeoff: • n-bit adder: truth + table with 2n inputs, 22n rows → fast circuit ……………….… (theoretically), but very costly • n 1-bit adders: cheap but slow + ……… + + • Tradeoff: n 1-bit adders and additional circuitry to speed up computation
1-bit Adder
Half Adder r x y S R 0 0 0 0 0 R 0 0 1 1 0 0 1 0 1 0 Full 0 1 1 0 1 Adder 1 0 0 1 0 R = x.y + r.(x+y) 1 0 1 0 1 x y ⊕ = x.y + r.(x ⊕y) 1 1 0 0 1 0 0 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 y x S Exclusive OR R r y ⊕ + x x y r x y S = r’(x’.y + x.y’) + r.(x.y + x’.y’)=r ⊕ x⊕y S
1 n-bit Addition/Substraction
y3 c y2 c y1 c y0 c
y3 x3 y2 x2 y1 x1 y0 x0 r r r r 3 2 1 0 c + + + +
c=0: addition S S S S c=1: substraction 3 2 1 0 Substraction:
• X-Y = X + (-Y) = X + Y’ + 1 r3 r2 r1 r0
• c = 1 → no additional adder X x3 x2 x1 x0
Bottleneck: carry Y + y3 y2 y1 y0 propagation
Z r3 s3 s2 s1 s0
A Simple 1-bit ALU
c1 c r c1 0 c m0 0 m1 m m2 MUX m3 + Multiplexor: ALU select among several inputs MUX
c1 c0 m
0 0 m0 x y R 0 1 m1
1 0 m2 c1 c0 ALU
1 1 m 3 0 0 ADD 0 1 AND Elementary operations: 1 0 NOT • ADD, AND, NOT 1 1 d ALU: Arithmetic and Logic Unit
n-bits ALU
n 1-bit ALUs Embryo of instruction set: n X n n Z Y c1 c0 ALU 0 0 ADD 0 1 AND 1 0 NOT 1 1 d
c1 c0
2 Overflow
1 0 0 1 0 0 0 6 0 1 1 0 -4 1 1 0 0
5 + 0 1 0 1 -7 + 1 0 0 1
-5 1 0 1 1 5 1 0 1 0 1
Detect overflow in twos-complement ?
Overflow
Overflow signal Soverflow (1=overflow) zn-1 xn-1 yn-1 Soverflow Overflow in twos-complement: 0 0 0 0 • x ≤ 0, y ≥ 0 → no overflow possible 0 0 1 0 • x ≥ 0, y ≥ 0: 0 1 0 0
Overflow: Z=X+Y (twos-complement) 0 1 1 1 n-1 n-1 n-1 Z > 2 -1, et 0 ≤ X ≤ 2 -1, 0 ≤ Y ≤ 2 - 1 0 0 1 1 ⇒2n-1-1 < Z ≤ 2n-2 1 0 1 0 ⇒In twos-complement, negative 1 1 0 0 numbers coded in [2 n-1;2 n-1] ⇒Z is negative 1 1 1 0 • x ≤ 0, y ≤ 0: same; in case of overflow, Z is positive → overflow detection criterion
= ' + ' ' Soverflow xn−1.yn−1.zn−1 xn−1.yn−1.zn−1
Overflow
Action upon overflow ? Several solutions: • Stop program (TRAP )
Example: MIPS R3000;
• Raise a signaling bit such as Soverflow Example: Intel x86, Sun SPARC; • Do nothing Example: using C on a Sun SPARC 32-bits -2147483648 (-231 ) → 2147483647 (2 31 -1)
01110111001101011001010000000000 + 01110111001101011001010000000000 ------11101110011010110010100000000000
3 Speeding Up Carry Propagation
tr7 =16 tr3 =8
+ + + + + + + + c
r3 r2 r1 r0 p7g7 p6g6 p5g5 p4g4 p3g3 p2g2 p1g1 p0g0 CLA Carry Look Ahead tr7 =6 tr3 =4 = + ⊕ x,y ( t=0 ) r0 (x0.y0 ) c.( x0 y0 ) r ( t=t ) r4k-1 (t=t r4k-1G)énération de la retenue r = g + c.p x,y (t=0) 0 0 = + Propagation de la retenue r1 g1 r0.p1 p,g ( t=2 ) = + + (g1 g0.p1) c.( p0.p1) = + S ( t=t r+2 ) G1 c.P1 = + r2 G2 c.P2 r4k,4k+1,4k+2,4k+3 (t=max(t r4k-1+2,4) ) = + r3 G3 c.P3
Speeding Up Carry Propagation
x3y3 x2y2 x1y1 x0y0 n = number of bits c CLA: O(n) . p3g3 p2g2 p1g1 p0g0 Tree structure:
O(log n) . r0=g 0+p 0c 2 P2..1 =p 2p1 ⊕ G2..1 =g 2+p 2g1 pi= x i yi r gi=xi.y i 2 ⊕ si= p i ri-1 r3 r1
p p3 p2 p1 0 a2,b 2 a1,b 1 a,b r
s s s3 2 s1 0 b2+b 1.a 2 a1.a 2 b + a.r
Speeding Up Carry Propagation
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 c n=16. Brent-Kung. CLA: 5 steps (pg and 4 CLA) to propagate carry Tree: 7 steps (pg and 6 PG operators) Starting with 32 bits, tree (8) faster than CLA (9)
4 Speeding Up Carry Propagation
n=16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 c Han-Carlson. Cost/Perform ance tradeoff Itanium: • 64 bits, • 0,18 µm, • 482 ps for one addition
5