Arithmetic Circuits & Multipliers

Arithmetic Circuits & Multipliers • Addition, subtraction • Performance issues -- ripple carry -- carry bypass -- carry skip -- carry lookahead • Multipliers Reminder: Lab #3 due tonight! Pizza Wed 6p 6.111 Fall 2017 Lecture 8 1 Signed integers: 2’s complement N bits -2N-1 2N-2 … … … 23 22 21 20 Range: – 2N-1 to 2N-1 –1 “sign bit” “decimal” point 8-bit 2’s complement example: 11010110 = –27 + 26 + 24 + 22 + 21 = – 128 + 64 + 16 + 4 + 2 = – 42 If we use a two’s complement representation for signed integers, the same binary addition mod 2n procedure will work for adding positive and negative numbers (don’t need separate subtraction rules). The same procedure will also handle unsigned numbers! By moving the implicit location of “decimal” point, we can represent fractions too: 1101.0110 = –23 + 22 + 20 + 2-2 + 2-3 = – 8 + 4 + 1 + 0.25 + 0.125 = – 2.625 6.111 Fall 2017 Lecture 8 2 Sign extension Consider the 8-bit 2’s complement representation of: 42 = 00101010 -5 = ~00000101 + 1 = 11111010 + 1 = 11111011 What is their 16-bit 2’s complement representation? 4242 == 00000000________00101010000000000010101000101010 -5-5 == ________11111011________111110111111111111111011 Extend the MSB (aka the “sign bit”) into the higher-order bit positions 6.111 Fall 2017 Lecture 8 3 Adder: a circuit that does addition Here’s an example of binary addition as one might do it by “hand”: Carries from previous 1 1 0 1 column Adding two N-bit 1101 numbers produces an + 0101 (N+1)-bit result 10010 If we build a circuit that implements one column: we can quickly build a circuit to add two 4-bit numbers… “Ripple- carry adder” 6.111 Fall 2017 Lecture 8 4 “Full Adder” building block ABCSCO 00000 00110 01010 The “half adder” circuit has only the 01101 A and B inputs 10010 10101 11001 11111 S A B C CO ABC ABC ABC ABC (A A)BC (B B)AC AB(C C) BC AC AB 6.111 Fall 2017 Lecture 8 5 Subtraction: A-B = A + (-B) Using 2’s complement representation: –B = ~B + 1 ~ = bit-wise complement So let’s build an arithmetic unit that does both addition and subtraction. Operation selected by control input: But what about the “+1”? 6.111 Fall 2017 Lecture 8 6 Condition Codes Besides the sum, one often wants four other bits of information from an arithmetic unit: To compare A and B, perform A–B and use Z (zero): result is = 0 big NOR gate condition codes: Signed comparison: N (negative): result is < 0 S N-1 LT NV C (carry): indicates an add in the most LE Z+(N V) significant position produced a carry, e.g., EQ Z 1111 + 0001 from last FA NE ~Z GE ~(NV) V (overflow): indicates that the answer GT ~(Z+(NV)) has too many bits to be represented correctly by the result width, e.g., Unsigned comparison: 0111 + 0111 LTU C LEU C+Z V A B S A B S GEU ~C N1 N1 N1 N1 N1 N1 GTU ~(C+Z) V COUT CIN N1 N1 6.111 Fall 2017 Lecture 8 7 Condition Codes in Verilog Z (zero): result is = 0 N (negative): result is < 0 C (carry): indicates an add wire signed [31:0] a,b,s; in the most significant wire z,n,v,c; assign {c,s} = a + b; position produced a carry, assign z = ~|s; e.g., 1111 + 0001 assign n = s[31]; assign v = a[31]^b[31]^s[31]^c; V (overflow): indicates that the answer has too many Might be better to use sum-of- bits to be represented products formula for V from previous correctly by the result slide if using LUT implementation width, e.g., 0111 + 0111 (only 3 variables instead of 4). 6.111 Fall 2017 Lecture 8 8 Modular Arithmetic The Verilog arithmetic operators (+,-,*) all produce full-precision results, e.g., adding two 8-bit numbers produces a 9-bit result. In many designs one chooses a “word size” (many computers use 32 or 64 bits) and all arithmetic results are truncated to that number of bits, i.e., arithmetic is performed modulo 2word size. Using a fixed word size can lead to overflow, e.g., when the operation produces a result that’s too large to fit in the word size. One can •Avoid overflow: choose a sufficiently large word size •Detect overflow: have the hardware remember if an operation produced an overflow – trap or check status at end •Embrace overflow: sometimes this is exactly what you want, e.g., when doing index arithmetic for circular buffers of size 2N. •“Correct” overflow: replace result with most positive or most negative number as appropriate, aka saturating arithmetic. Good for digital signal processing. 6.111 Fall 2017 Lecture 8 9 Speed: tPD of Ripple-carry Adder CO = AB + ACI + BCI Worst-case path: carry propagation from LSB to MSB, (N) is read e.g., when adding 11…111 to 00…001. “order N” : means that the latency of our tPD = (N-1)*(tPD,OR + tPD,AND) + tPD,XOR (N) adder grows at worst in proportion to CI to CO CIN-1 to SN-1 the number of bits in the operands. 6.111 Fall 2017 Lecture 8 10 How about the tPD of this circuit? Cn-1 Cn-2 C2 C1 C0 Is the tPD of this circuit = 2 * tPD,N-BIT RIPPLE ? Nope! tPD of this circuit = tPD,N-BIT RIPPLE + tPD,FA !!! Timing analysis is tricky! 6.111 Fall 2017 Lecture 8 11 Alternate Adder Logic Formulation How to Speed up the Critical (Carry) Path? (How to Build a Fast Adder?) A B Full Cin C Adder o S Generate (G) = AB Propagate (P) = A B Note: can also use P = A + B for Co Faster carry logic Let’s see if we can improve the speed by rewriting the equations for COUT: C = AB + AC + BC AB OUT IN IN = AB + (A + B)CIN where G = AB COUT CIN = G + P CIN P = A + B generate propagate S Actually, P is usually module fa(input a,b,cin, output s,cout); wire g = a & b; defined as P = A^B wire p = a ^ b; which won’t change assign s = p ^ cin; COUT but will allow us assign cout = g | (p & cin); to express S as a endmodule simple function : S = P^CIN 6.111 Fall 2017 Lecture 8 13 Virtex II Adder Implementation Cout LUT: AB B A Y = A B Cin P G Dedicated adder logic 1 half-Slice = 1-bit adder Cin 6.111 Fall 2017 Lecture 8 14 Virtex II Carry Chain 1 CLB = 4 Slices = 2, 4-bit adders 64-bit Adder: 16 CLBs A[63:0] + Y[63:0] B[63:0] Y[64] A[63:60] CLB15 B[63:60] Y[63:60] A[7:4] CLB1 B[7:4] Y[7:4] A[3:0] CLB0 B[3:0] Y[3:0] CLBs must be in same column 6.111 Fall 2017 Lecture 8 15 Carry Bypass Adder A0 B0 A1 B1 A2 B2 A3 B3 P,G P,G P,G P,G Can compute P, G P0 G0 P1 G1 P2 G2 P3 G3 in parallel for all bits C/S C/S C/S Ci,0 C/S Co,0 Co,1 Co,2 Co,3 BP= P0P1P2P3 P,G P,G P,G P,G P0 G0 P1 G1 P2 G2 P3 G3 Ci,0 0 C/S C/S C/S C/S Co,3 Co,0 Co,1 Co,2 1 Key Idea: if (P0 P1 P2 P3) then Co,3 = Ci,0 6.111 Fall 2017 Lecture 8 16 16-bit Carry Bypass Adder BP= P0P1P2P3 BP= P4P5P6P7 BP= P8P9P10P11 BP= P12P13P14P15 P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G Ci,0 Co,3 C C C/S C/S C/S C/S 0 o,7 o,11 C/S C/S C/S C/S 0 C/S C/S C/S C/S 0 C/S C/S C/S C/S 0 Co,0 Co,1 C C C o,2 1 o,4 o,5 Co,6 C C C C C C 1 o,8 o,9 o,10 1 o,12 o,13 o,14 1 Co,15 What is the worst case propagation delay for the 16-bit adder? Assume the following for delay each gate: P, G from A, B: 1 delay unit P, G, Ci to Co or Sum for a C/S: 1 delay unit 2:1 mux delay: 1 delay unit 6.111 Fall 2017 Lecture 8 17 Critical Path Analysis BP= P P P P 0 1 2 3 BP2= P4P5P6P7 BP3= P8P9P10P11 BP4= P12P13P14P15 P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G Ci,0 Co,3 C C C/S C/S C/S C/S 0 o,7 o,11 C/S C/S C/S C/S 0 C/S C/S C/S C/S 0 C/S C/S C/S C/S 0 Co,0 Co,1 C C C o,2 1 o,4 o,5 Co,6 C C C C C C 1 o,8 o,9 o,10 1 o,12 o,13 o,14 1 Co,15 For the second stage, is the critical path: BP2 = 0 or BP2 = 1 ? Message: Timing analysis is very tricky – Must carefully consider data dependencies for false paths 6.111 Fall 2017 Lecture 8 18 Carry Bypass vs Ripple Carry Ripple Carry: tadder = (N-1) tcarry + tsum Carry Bypass: tadder = 2(M-1) tcarry + tsum + (N/M-1) tbypass This image cannot currently Thisbe imagedisplayed.

Load more