EECS150 - Digital Design Lecture 23 - Arithmetic Blocks, Counters, & Other stuff

April 14, 2009 John Wawrzynek

Spring 2009 EECS150 - Lec23-counters Page 1

Multiplication Review

a3 a2 a1 a0 Multiplicand

b3 b2 b1 b0 Multiplier

X a3b0 a2b0 a1b0 a0b0

a3b1 a2b1 a1b1 a0b1 Partial

a3b2 a2b2 a1b2 a0b2 products a3b3 a2b3 a1b3 a0b3

. . . a1b0+a0b1 a0b0 Product Many different circuits exist for . Each one has a different balance between speed (performance) and amount of logic (cost). Spring 2009 EECS150 - Lec23-counters Page 2 “Shift and Add” Multiplier • Sums each partial product, one at a time. • In binary, each partial product is shifted versions of A or 0.

Control Algorithm: 1. P ← 0, A ← multiplicand, B ← multiplier 2. If LSB of B==1 then add A to P else add 0 • Cost α n, Τ = n clock cycles. 3. Shift [P][B] right 1 • What is the critical path for determining the min clock 4. Repeat steps 2 and 3 n-1 times. period? 5. [P][B] has product.

Spring 2009 EECS150 - Lec23-counters Page 3

Bit-serial Multiplier • Bit-serial multiplier (n2 cycles, one bit of result per n cycles):

• Control Algorithm:

repeat n cycles { // outer (i) loop repeat n cycles{ // inner (j) loop shiftA, selectSum, shiftHI } Note: The occurrence of a control shiftB, shiftHI, shiftLOW, reset signal x means x=1. The absence } of x means x=0.

Spring 2009 EECS150 - Lec23-counters Page 4 Array Multiplier Single cycle multiply: Generates all n partial products simultaneously.

Each row: n-bit with AND gates

What is the critical path? Spring 2009 EECS150 - Lec23-counters Page 5

Carry-Save Addition • Speeding up multiplication is a • Example: sum three numbers,

matter of speeding up the 310 = 0011, 210 = 0010, 310 = 0011 summing of the partial products. • “Carry-save” addition can help. 310 0011 • Carry-save addition passes + 2 0010 (saves) the carries to the output, 10 carry-save add c 0100 = 4 rather than propagating them. 10 s 0001 = 110

carry-save add 310 0011

c 0010 = 210 carry-propagate add s 0110 = 610

1000 = 810 • In general, carry-save addition takes in 3 numbers and produces 2. • Whereas, carry-propagate takes 2 and produces 1. • With this technique, we can avoid carry propagation until final addition Spring 2009 EECS150 - Lec23-counters Page 6 Carry-save Circuits

• When adding sets of numbers, carry-save can be used on all but the final sum. • Standard adder (carry propagate) is used for final sum. • Carry-save is fast (no carry propagation) and cheap (same cost as ripple adder)

Spring 2009 EECS150 - Lec23-counters Page 7

Array Multiplier using Carry-save Addition

Fast carry- propagate adder

Spring 2009 EECS150 - Lec23-counters Page 8 Carry-save Addition CSA is associative and communitive. For example:

(((X0 + X1) + X2 ) + X3 ) = ((X0 + X1) +( X2 + X3 ))

• A balanced tree can be used to reduce the logic delay.

• This structure is the basis of the Wallace Tree Multiplier. • Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum.

• Multiplier delay α log3/2N + log2N

Spring 2009 EECS150 - Lec23-counters Page 9

Constant Multiplication • Our discussion so far has assumed both the multiplicand (A) and the multiplier (B) can vary at runtime. • What if one of the two is a constant? Y = C * X • “Constant Coefficient” multiplication comes up often in signal processing and other hardware. Ex:

yi = αyi-1+ xi xi yi

where α is an application dependent constant that is hard-wired into the circuit.

• How do we build and array style (combinational) multiplier that takes advantage of the constancy of one of the operands? Spring 2009 EECS150 - Lec23-counters Page 10 Multiplication by a Constant • If the constant C in C*X is a power of 2, then the multiplication is simply a shift of X. • Ex: 4*X

• What about division?

• What about multiplication by non- powers of 2?

Spring 2009 EECS150 - Lec23-counters Page 11

Multiplication by a Constant • In general, a combination of fixed shifts and addition: – Ex: 6*X = 0110 * X = (22 + 21)*X

– Details:

Spring 2009 EECS150 - Lec23-counters Page 12 Multiplication by a Constant

• Another example: C = 2310 = 010111

• In general, the number of additions equals then number of 1’s in the constant minus one, C. • Using carry-save adders (for all but one of these) helps reduce the delay and cost, but the number of adders is still the number of 1’s in C minus 2.

• Is there a way to further reduce the number of adders (and thus the cost and delay)? Spring 2009 EECS150 - Lec23-counters Page 13

Multiplication using Subtraction • Subtraction is ~ the same cost and delay as addition.

• Consider C*X where C is the constant value 1510 = 01111. C*X requires 3 additions. • We can “recode” 15 from 01111 = (23 + 22 + 21 + 20 ) to 10001 = (24 - 20 ) where 1 means negative weight. • Therefore, 15*X can be implemented with only one .

Spring 2009 EECS150 - Lec23-counters Page 14 Canonic Signed Digit Representation • CSD represents numbers using 1, 1, & 0 with the least possible number of non-zero digits. – Strings of 2 or more non-zero digits are replaced. – Leads to a unique representation. • To form CSD representation might take 2 passes: – First pass: replace all occurrences of 2 or more 1’s: 01..10 by 10..10 – Second pass: same as a above, plus replace 0110 by 0010 • Examples:

011101 = 29 0010111 = 23 0110110 = 54 100101 = 32 - 4 + 1 0011001 1011010 0101001 = 32 - 8 - 1 1001010 = 64 - 8 - 2 • Can we further simplify the multiplier circuits?

Spring 2009 EECS150 - Lec23-counters Page 15

“Constant Coefficient Multiplication” (KCM)

Binary multiplier: Y = 231*X = (27 + 26 + 25 + 22 + 21+20)*X

• CSD helps, but the multipliers are limited to shifts followed by adds. – CSD multiplier: Y = 231*X = (28 - 25 + 23 - 20)*X

• How about shift/add/shift/add …? – KCM multiplier: Y = 231*X = 7*33*X = (23 - 20)*(25 + 20)*X

• No simple algorithm exists to determine the optimal KCM representation. • Most use exhaustive search method. Spring 2009 EECS150 - Lec23-counters Page 16 Counter Circuits

Spring 2009 EECS150 - Lec23-counters Page 17

Counters • Special sequential circuits (FSMs) that repeatedly sequence through a set of outputs. • Examples: – binary counter: 000, 001, 010, 011, 100, 101, 110, 111, 000, – gray code counter: 000, 010, 110, 100, 101, 111, 011, 001, 000, 010, 110, … – one-hot counter: 0001, 0010, 0100, 1000, 0001, 0010, … – BCD counter: 0000, 0001, 0010, …, 1001, 0000, 0001 – pseudo-random sequence generators: 10, 01, 00, 11, 10, 01, 00, ... • Moore machines with “ring” structure in State Transition Diagram: S0 S3 S1 Spring 2009 EECS150 - Lec23-counters S2 Page 18 What are they used? • Counters are commonly used in hardware designs because most (if not all) computations that we put into hardware include iteration (looping). Examples: – Shift-and-add multiplication scheme. – Bit serial communication circuits (must count one “words worth” of serial bits. • Other uses for counter: – Clock divider circuits 16MHz ÷64 – Systematic inspection of data-structures • Example: Network packet parser/filter control. • Counters simplify “controller” design by: – providing a specific number of cycles of action, – sometimes used with a decoder to generate a sequence of timed control signals. – Consider using a counter when many FSM states with few branches. Spring 2009 EECS150 - Lec23-counters Page 19

Controller using Counters

• Example, Bit-serial multiplier (n2 cycles, one bit of result per n cycles):

• Control Algorithm: repeat n cycles { // outer (i) loop repeat n cycles{ // inner (j) loop shiftA, selectSum, shiftHI } Note: The occurrence of a control shiftB, shiftHI, shiftLOW, reset signal x means x=1. The absence } of x means x=0.

Spring 2009 EECS150 - Lec23-counters Page 20 Controller using Counters

• State Transition Diagram: – Assume presence of two binary counters. An “i” counter for the outer loop and “j” counter for inner loop.

TC is asserted when the counter reaches it maximum count value. CE is “count enable”. The counter increments its value on the rising edge of the clock if CE is asserted.

Spring 2009 EECS150 - Lec23-counters Page 21

Controller using Counters

• Controller circuit • Outputs: implementation:

CEi = q2

CEj = q1

RSTi = q0

RSTj = q2

shiftA = q1

shiftB = q2

shiftLOW = q2

shiftHI = q1 + q2

reset = q2

selectSUM = q1

Spring 2009 EECS150 - Lec23-counters Page 22 How do we design counters?

• For binary counters (most common case) incrementer circuit would work: 1 + register

• In Verilog, a counter is specified as: x = x+1; – This does not imply an adder – An incrementer is simpler than an adder – And a counter is simpler yet.

• In general, the best way to understand counter design is to think of them as FSMs, and follow general procedure, however some special cases can be optimized.

Spring 2009 EECS150 - Lec23-counters Page 23

Synchronous Counters All outputs change with clock edge.

+ c b a c+ b+ a+ a = a’ • Binary Counter Design: + 0 0 0 0 0 1 b = a ⊕ b Start with 3-bit version and 0 0 1 0 1 0 0 1 0 0 1 1 cb generalize: 0 1 1 1 0 0 a 00 01 11 10 1 0 0 1 0 1 0 0 0 1 1 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 c+ = a’c + abc’ + b’c 1 1 1 0 0 0 = c(a’+b’) + c’(ab) = c(ab)’ + c’(ab) = c ⊕ ab

Spring 2009 EECS150 - Lec23-counters Page 24 Synchronous Counters

• How do we extend to n-bits? • Extrapolate c+: d+ = d ⊕ abc, e+ = e ⊕ abcd

• Has difficulty scaling (AND gate inputs grow with n)

TC

• CE is “count enable”, allows external control of counting, • TC is “terminal count”, is asserted on highest value, allows cascading, external sensing of occurrence of max value. Spring 2009 EECS150 - Lec23-counters Page 25

Synchronous Counters

TC

• How does this one scale? • Generation of TC signals very similar to  Delay grows α n generation of carry signals in adder. • “Parallel Prefix” circuit reduces delay:

log2n

log2n

Spring 2009 EECS150 - Lec23-counters Page 26 Up-Down Counter

c b a c+ b+ a+ 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 1 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0

Down-count

Spring 2009 EECS150 - Lec23-counters Page 27

Odd Counts

• Extra can • Alternative: be added to terminate count before max value is reached: • Example: count to 12

Spring 2009 EECS150 - Lec23-counters Page 28 Ring Counters

• “one-hot” counters • What are these good for? 0001, 0010, 0100, 1000, 0001, …

“Self-starting” version:

Spring 2009 EECS150 - Lec23-counters Page 29

Johnson Counter

Spring 2009 EECS150 - Lec23-counters Page 30 Asynchronous “Ripple” counters

A3 A2 A1 A0 0000 • Each stage is ÷2 of 0001 Forbidden in previous. 0010 Synchronous • Look at output 0011 Design waveforms: 0100 time CLK 0101 A 0110 0 A1 0111 A 1000 2 A 1001 3 1010 • Often called 1011 “asynchronous” 1100 counters. 1101 • A “T” flip-flop is a 1110 “toggle” flip-flop. 1111 Flips it state on cycles when T=1.

Spring 2009 EECS150 - Lec23-counters Page 31