10/19/2016

Technique #5 : Logical Effort  The optimum effective fan-out for a chain of N inverters driving a load C is N L f  CL / Cin  Set N such that the fan-out per stage is around 4, whenever possible (FO4)  Can we generalize this approach (logical effort) to any gate?  The inverter equation is Lecture 11, 12: Introduction to Dynamic Logic tp = tp0 (1 + Cext/ Cg) = tp0 (1 + f/) we can generalize it to…

Reading: Chap. 9, section 9.2 (esp. 9.2.4) October 17, 19 2016 tp = tp0 (p + g f/)

Weste & Harris Prof. R. Iris Bahar  tp0 is the intrinsic delay of an inverter

 f is the effective fan-out (Cext/Cg) – also called the electrical effort  p is the ratio of the intrinsic delay of the gate relative to a simple inverter (a function of the gate topology and layout style): parasitic delay  g is the logical effort © 2016 R.I. Bahar Portions of these slides taken from Professors J. Rabaey, J. Irwin, V. Narayanan, and S. Reda

Path delay (equation derivation) Path delay of logic gate network (cont.)

 For gate i in the chain, its size is determined by h = figibi  The path logical effort, G =  gi [1] 1 c  Path effective fanout (path electrical effort) is F = CL/Cg1 a b C 5  The branching effort accounts for fan-out to other gates in L the network: b = (Con-path + Coff-path)/Con-path  For this network what do we need to compute h?  The path branching effort is then B =  b i  F = CL/Cg1 = 5 …and the total path effort is then H = GFB [1]  G = 1 x 5/3 x 5/3 x 1 = 25/9 N  B = 1 (no branching)  The gate effort that minimizes path delay is h  H 4  So, the minimum delay through the path is  H = GFB = 125/9, so the optimal stage effort is h= H = 1.93  Fanout factors are computed as fi=h/(gi·bi). Since bi=1 we have: N N  N H  f = h/g =1.93, f = 1.93 / (5/3) = 1.16, D  t  p   [1] Note the textbook 1 1 2 p0  j  swaps the definitions j1 γ f = 1.93/(5/3)=1.16, f = 1.93   of F,H and fi, h 3 4

1 10/19/2016

Path delay of logic gate network (cont.) Path delay of logic gate network (cont.)  g  Given f for each gate i in the chain, what is the final sizing? So what are the actually scaling sizes of the gates? c c i i g,i g,i 1   h 1 c a b c 1 scal e g,i 1 c 5  a b CL i  h 5 CL  Consider again the intrinsic capacitance values we calculated  f =1.93, f =1.16, f = 1.16, f = 1.93 1 2 3 4  cg,4=5/1.93=2.59, cg,3=2.59/1.16=2.23, cg,2=2.23/1.16=1.93, c =1.93/1.93=1.00 relative to a minimum sized inverter  So the gate sizes are (working from outputs to inputs): g,1  We need to adjust to the gate type:  c: f =c /c = c /c = 1.93  c =5/1.93=2.59 4 ext,4 g,4 L g,4 g,4  cg,4 = 2.59  gate c is 2.59X size of minimum sized inverter, so S =2.59 since gate c is an inverter as well.  b: f3=cext,3/cg,3 = cg,4/cg,3 = 1.16  cg,3=2.59/1.16=2.23 c  c = 2.23  gate b is 2.23X size of minimum sized NOR, which is  a: f =c /c = c /c = 1.16  c =2.23/1.16=1.93 g,3 2 ext,2 g,2 g,3 g,2 g,2 5/3 as big as min sized INV so Sb = 2.23 X 3/5 = 1.34  NOR is 1.34X size of minimum sized NOR  g1: f1=cext,1/cg,1 = cg,2/cg,1 = 1.93  cg,1=1.93/1.93=1.00  cg,2 = 1.93  gate a is 1.93X size of minimum sized NAND3, which is 5/3 as big as min sized NAND so Sa = 1.93X 3/5 = 1.16  NAND is 1.16X size of minimum sized NAND3

Dynamic gate Properties of dynamic gates

off  Logic function is implemented by the PDN only CLK M CLK M p p on 1  How many are needed to implement an N-input Out Out function in dynamic logic? !((A&B)|C)  In1 CL What does this imply about total area and load capacitance, CL? A In2 PDN  Full swing outputs (VOL = GND and VOH = VDD) C In  Faster switching speeds 3 B  Fewer transistors means reduced logical effort. CLK Me off What is the logical effort of a 2-input dynamic NOR gate? CLK Me on  no short circuit current, I , so all the current provided by PDN goes Two phase operation sc into discharging CL Precharge (CLK = 0)  We assume tpLH = 0, but the presence of the evaluation Evaluate (CLK = 1) slows down the tpHL (extra transistor in series).

2 10/19/2016

Conditions on output Dynamic behavior

CLK Evaluate  Once the output of a dynamic gate is discharged, it Out 2.5 cannot be charged again until the next precharge In1 operation. In2  What does this imply about the inputs to the gate 1.5 transitioning during evaluation? In3 In & 0.5 CLK Out Precharge In4  Output can be in the high impedance state during the evaluation stage if the PDN is turned off CLK -0.5 00.51Time, ns State is stored on CL

Input/Output Coupling: Bootstrapping Susceptibility to Noise

 The amount by which the output voltage drops is a strong 3

2.5  What’s with the overshoot and function of the input voltage and the available evaluation time. 2 undershoot?  Noise needed to corrupt the signal has to be larger if the evaluation time 1.5  Notice input changes with steep is short 1 voltage step 0.5  Takes a while for output to react CLK 0 2.5 0 100 200 300 400 to change in input value -0.5  Gate-drain capacitances of the Vout (VG=0.45) transistors couple input directly to 1.5 output in the short time before Vout (VG=0.55) 2.5 transistors have a chance to Vout (VG=0.5) switch (V) Voltage 0.5 1.5 VG  Extra charge must be supplied to discharge the bootstrap -0.5 0.5 capacitor 0 20 40 60 80 100  See section 4.4.6.6 -0.5 Time (ns) 00.51

3 10/19/2016

Power and energy Power dissipation of dynamic gate

 Power dissipation: how much energy is consumed per CLK Mp operation and how much heat the circuit dissipates Out

 Two important components: static and dynamic In1 CL

In2 PDN In 2 3 E (joules) = CL Vdd P01 + tsc Vdd Ipeak P01 + Vdd Ileakage CLK Me

f01 = P01 * fclock Power only dissipated when previous Out = 0 2 P(watts) = CL Vdd f01 + tscVdd Ipeak f01 + Vdd Ileakage 2 Probability the output E = CL VDD P01 transitions from 0 to 1

Dynamic power is data dependent Properties of dynamic gates (cont.) Dynamic 2-input NOR Gate  Power dissipation should be better Assume signal probabilities  no short circuit power consumption since the pull-up path is not ABOut on when evaluating PA=1 = 1/2 00 1 PB=1 = 1/2  lower CL 01 0  by construction can have at most one transition per cycle Then transition probability 10 0  no glitching P01 = Pout=0 × Pout=1 11 0  But power dissipation can be significantly higher due to = 3/4 × 1 = 3/4  higher transition probabilities  extra load on CLK Switching activity can be higher in dynamic gates!  Needs a precharge clock P01 = Pout=0

4 10/19/2016

Cascading dynamic gates Domino Logic V CLK CLK M M CLK p p CLK M Out2 CLK Mp p Out1 1  1 Out1 Out2 In 1  0 In 0  0 In1 0  1

CLK CLK VTn In PDN In4 PDN Me Me Out1 2 In In3 5 V CLK M Out2 CLK Me e

t Only a single 0  1 transition allowed at the inputs during the evaluation period!

Why Domino? NP-CMOS (Zipper)

!CLK M CLK Mp e 1  1 Out1 CLK 1  0 In PUN In1 4 In In1 In2 PDN 5 In PDN In PDN In PDN In PDN In 0  0 Out2 i i i i 3 0  1 Inj Inj Inj Inj (to PDN) CLK M !CLK Mp CLK e

to other to other PDN’s PUN’s Only 0  1 transitions allowed at inputs of PDN Like falling dominos! Only 1  0 transitions allowed at inputs of PUN

5 10/19/2016

Dynamic Zero Detector Dynamic Manchester Carry Chain

In7 In6 In5 In4 In3 In2 In1 In0 CLK not zero P0 P1 P2 P3

Ci,4

CLK Ci,0 G0 G1 G2 G3

CLK

 Efficient use of a dynamic gate.

 Wide NOR gate that uses minimum sized gate !(G0 + P0 Ci,0)!(G1 + P1G0 + P1P0 Ci,0)  Low logical effort

Charge leakage Impact of charge leakage  Output settles to an intermediate voltage determined by a  Ideally, if the PDN is off, the output should remain high (VDD) for the duration of the evaluate stage resistive divider of the pull-up and pull-down networks  Leakage currents cause charge to degrade over time  If output drops below switching threshold, output interprets as low CLK 4 CLK 3 Mp CLK Out 2.5 1 Out CL A=0 1.5 2 Evaluate

V (V) Voltage CLK Out Me 0.5 Precharge -0.5 Leakage sources 02040

Minimum clock rate of a few kHz Time (ms)

6 10/19/2016

A solution to charge leakage Charge sharing

Charge stored originally on CL  Keeper compensates for the charge lost due to the CLK Mp is redistributed (shared) over C Out L pull-down leakage paths. and C leading to possible Keeper A A CL circuit malfunction. CLK Mp Mkp B=0 Ca !Out Ca is initially discharged A CLK CL Me Cb B

CLK Me Ca If Vout  VDD ( )  VTn , Ca  CL How should the keeper device be sized relative to the this could cause the gate it drives to malfunction. NMOS devices?

Charge sharing example Solution to charge redistribution What is the worst case voltage drop on y? Load CLK CLK (Assume inputs switch after prechargeCLK inverter Mp Mkp y = A  B  C and internal nodes are initially at 0V.) Out A!A A a Cy=50fF B B b Ca=15fF !B B !B c d Cb=15fF CLK Me !C C Cc=15fF Cd=10fF

CLK Precharge internal nodes using a clock-driven transistor (at the cost of increased area and capacitance)

Vout = - VDD ((Ca + Cc)/((Ca + Cc) + Cy)) = - 2.5V*(30/(30+50)) = -0.94V

7 10/19/2016

Differential (dual rail) domino DCVS logic off on CLK Mp Mkp Mkp Mp CLK  DCVS: Differential Cascade Voltage Switch logic !Out = !(AB) Out = AB 1 0 1 0 A !A !B on  off B 10 0 off on  1 Out !Out M CLK e In1 !In 1 PDN1 PDN2 In2 off on on off !In2 Due to its high-performance, differential domino has been very popular in the design of in several commercial microprocessors. PDN1 and PDN2 are mutually exclusive

DCVS example NMOS transistors in series/parallel

 So far we have assumed that primary inputs are only allowed to drive gate terminals of MOS transistors.  Now assume primary inputs can drive both gate and !Out source/drain terminals Out  NMOS switch closes when the gate input is high AB B !B !B B X = Y if A and B XY A A!A

B X = Y if A or B XY  Some transistors are shared between Out and !Out  Overall lower transistor count  Remember - NMOS transistors pass a strong 0 but a weak 1

8 10/19/2016

Pass Transistor (PT) LogicB VTC of pass transistor AND gate B B A A F = A  B B B 2 0 F= A  B B=VDD, A=0VDD , V

0 out V A 1 B A=V , B=0V F= AB DD DD  Gate is static – a low-impedance path exists to both A=B=0VDD 0 0 supply rails under all circumstances 012  N transistors instead of 2N  Pure PT logic is not regenerative: signal gradually  No static power consumption degrades after passing through a number of PTs  Bidirectional (versus non-directional)  fix with static CMOS inverter insertion

NMOS only PT driving an inverter Transmission gates (TGs) In 3 C C In = VDD  Most widely used solution Vx = x = 1.8V V M2 A B GS V -V 2 A = VDD DD Tn D S ABC B M1 1 Voltage, V Voltage, Out C 0 C = GND C = GND 00.511.52 Time, ns A = VDD B A = GND B  Vx does not pull up to VDD , but VDD –VTn  Threshold voltage drop causes static power dissipation C = V C = V (M may be weakly conducting forming a path from DD DD 2  Full swing bidirectional switch controlled by gate signal C: VDD to GND) A = B if C = 1

9 10/19/2016

TG multiplexer S S F Transmission gate example S

VDD

In2 What is this function?

S F A F = A ⊕ B

In1

S B GND F = !(In1  S + In2  S)

In1 SSIn2

Logical Effort and Transmission gates Sequential logic

 What is the logical effort for In and Sel? ___ Sel Inputs Outputs Combinational 4 g = 2 In 2 Out In Logic 2 2 Current Next gSel = 2/3 Sel State State

 Consider the NMOS pass transistor as part of the pull- down network  In transmission gates the PMOS and NMOS have the same clock size since they conduct at the same time

10 10/19/2016

MUX based latches TG MUX based latch implementation

 Change the stored value by cutting the feedback loop clk

feedback feedback Q 1 0 Q Q !clk input sampled D 0 D 1 (transparent mode) D clk clk clk Negative Latch Positive Latch clk D Q !clk Q = clk & Q | !clk & D Q = !clk & Q | clk & D transparent when the transparent when the clock is low clock is high clk feedback (hold mode)

11