Performance Characterization • Delay analysis • sizing • Logical effort • Power analysis

ECE 261 James Morizio 1 Delay Definitions

• tpdr: rising propagation delay

– From input to rising output crossing VDD/2

• tpdf: falling propagation delay

– From input to falling output crossing VDD/2

• tpd: average propagation delay

– tpd = (tpdr + tpdf)/2

• tr: rise time

– From output crossing 0.2 VDD to 0.8 VDD

• tf: fall time

– From output crossing 0.8 VDD to 0.2 VDD

ECE 261 James Morizio 2 Simulated Delay • Solving differential equations by hand is too hard • SPICE simulator solves the equations numerically – Uses more accurate I-V models too! • But simulations take time to write 2.0

1.5

1.0 (V) t = 66ps t = 83ps V pdf pdr in V 0.5 out

0.0

0.0 200p 400p 600p 800p 1n t(s)

ECE 261 James Morizio 3 Delay Estimation

• We would like to be able to easily estimate delay – Not as accurate as simulation – But easier to ask “What if?” • The step response usually looks like a 1st order RC response with a decaying exponential. • Use RC delay models to estimate delay – C = total capacitance on output node – Use effective resistance R

– So that tpd = RC • Characterize by finding their effective R – Depends on average current as gate switches

ECE 261 James Morizio 4 RC Delay Models • Use equivalent circuits for MOS transistors – Ideal switch + capacitance and ON resistance – Unit nMOS has resistance R, capacitance C – Unit pMOS has resistance 2R, capacitance C • Capacitance proportional to width • Resistance inversely proportional to width d s kC kC R/k d 2R/k d g k g kC g k g s kC kC kC s s d

ECE 261 James Morizio 5 Example: 3-input NAND

• Sketch a 3-input NAND with transistor widths chosen to achieve effective rise and fall resistances equal to a unit inverter (R).

ECE 261 James Morizio 6 Example: 3-input NAND

• Sketch a 3-input NAND with transistor widths chosen to achieve effective rise and fall resistances equal to a unit inverter (R).

ECE 261 James Morizio 7 Example: 3-input NAND • Sketch a 3-input NAND with transistor widths chosen to achieve effective rise and fall resistances equal to a unit inverter (R). 2 2 2

3 3 3

ECE 261 James Morizio 8 3-input NAND Caps • Annotate the 3-input NAND gate with gate and diffusion capacitance.

2 2 2

3

3

3

ECE 261 James Morizio 9 3-input NAND Caps • Annotate the 3-input NAND gate with gate and diffusion capacitance. 2C 2C 2C 2C 2C 2C 2 2 2 2C 2C 2C

3C 3 3C 3C 3 3C 3C 3 3C 3C

ECE 261 James Morizio 10 3-input NAND Caps • Annotate the 3-input NAND gate with gate and diffusion capacitance. 2 2 2

3 9C 5C 3 3C 5C 3 3C 5C

ECE 261 James Morizio 11 Elmore Delay • ON transistors look like resistors • Pullup or pulldown network modeled as RC ladder • Elmore delay of RC ladder t pd ≈ ƒ Ri−to−sourceCi nodes i

= R1C1 + (R1 + R2 )C2 + ... + (R1 + R2 + ... + RN )CN

R1 R2 R3 RN

C1 C2 C3 CN

ECE 261 James Morizio 12 Example: 2-input NAND • Estimate worst-case rising and falling delay of 2- input NAND driving h identical gates.

2 2 Y A 2 h copies x B 2

ECE 261 James Morizio 13 Example: 2-input NAND

• Estimate rising and falling propagation delays of a 2-input NAND driving h identical gates.

2 2 Y A 2 6C 4hC x h copies B 2 2C

ECE 261 James Morizio 14 Example: 2-input NAND • Estimate rising and falling propagation delays of a 2-input NAND driving h identical gates.

2 2 Y A 2 6C 4hC x h copies B 2 2C

R Y t = (6+4h)C pdr

ECE 261 James Morizio 15 Example: 2-input NAND • Estimate rising and falling propagation delays of a 2-input NAND driving h identical gates.

2 2 Y A 2 6C 4hC x B 2 2C h copies

R Y t = (6 + 4h)RC (6+4h)C pdr

ECE 261 James Morizio 16 Example: 2-input NAND • Estimate rising and falling propagation delays of a 2-input NAND driving h identical gates.

2 2 Y A 2 6C 4hC x h copies B 2 2C

ECE 261 James Morizio 17 Example: 2-input NAND • Estimate rising and falling propagation delays of a 2-input NAND driving h identical gates.

2 2 Y A 2 6C 4hC h copies x B 2 2C

R/2 x Y t = R/2 2C (6+4h)C pdf

ECE 261 James Morizio 18 Example: 2-input NAND • Estimate rising and falling propagation delays of a 2-input NAND driving h identical gates.

2 2 Y A 2 6C 4hC x h copies B 2 2C

t = (2C)(R )+ »(6 + 4h)Cÿ(R + R ) x R/2 Y pdf 2 ⁄ 2 2 2C (6+4h)C R/2 = (7 + 4h)RC

ECE 261 James Morizio 19 Delay Components

• Delay has two parts – Parasitic delay • 6 or 7 RC • Independent of load – Effort delay • 4h RC • Proportional to load capacitance

ECE 261 James Morizio 20 Contamination Delay • Best-case (contamination) delay can be substantially less than propagation delay. • Ex: If both inputs fall simultaneously

2 2 Y A 2 6C 4hC x B 2 2C

R R Y t = (3+ 2h)RC (6+4h)C cdr

ECE 261 James Morizio 21 Diffusion Capacitance • We assumed contacted diffusion on every s / d. • Good layout minimizes diffusion area • Ex: NAND3 layout shares one diffusion contact – Reduces output capacitance by 2C – Merged uncontacted diffusion might help too

2C 2C Shared Contacted Diffusion Isolated Contacted 2 2 2 Merged Diffusion Uncontacted 3 7C Diffusion 3 3C 3C 3C 3C 3 3C

ECE 261 James Morizio 22 Layout Comparison

• Which layout is better? VDD VDD A B A B

Y Y

GND GND

ECE 261 James Morizio 23 Resizing the Inverter Minimum-sized 2λ transistor: W=3λ, L=2λ To get equal rise and fall times, Ω βn = βp Wp = 3Wn, assuming 3λ n-diffusion that electron mobility is three times that of holes

poly Wp=9λ

2λ Sometimes the function being implemented 9λ p-diffusion makes resizing unnecessary!

poly

ECE 261 James Morizio 24 Analyzing the NAND Gate VDD

βp1 1 βp2 βp3 a βn, eff = b c 1 + 1 + 1 F βn1 βn2 βn3 a βn1 Resistances are in series (conductances b β n2 are in parallel)

c βn3 If βn1 = βn2 = βn3 = βn then βn, eff = βn/3 • Pull-down circuit has three times resistance, Why not consider reGsisntdances in parallel?one-third times the conductance

For pull-up, only one transistor has to be on, βp, eff = min{βp1,βp2,βp3} Ω If βp1 = βp2 = βp3 = βp = βn/3 then βn, eff = βp no resizing is necessary

ECE 261 James Morizio 25 Analyzing the NOR Gate VDD 1 β = p, eff 1 a + 1 + 1 βp1 βp1 βp2 βp3 b βp2 Resistances are in series (conductances c are in parallel) βp3

βn1 If β = β = β = β then β = β /3 βn2 βn3 p1 p2 p3 p p, eff p a b c • Pull-up circuit has three times resistance, Gnd one-third times the conductance

For pull-down, only one transistor has to be on, βn, eff = min{βn1,βn2,βn3} Ω If βn1 = βn2 = βn3 = βn = 3βp then βn,eff=9βp,eff considerable resizing is necessary Wp = 9Wn!

ECE 261 James Morizio 26 Effect of Series Transistors

Diffusion Diffusion L poly poly

L poly 3L

L poly

W W

ECE 261 James Morizio 27 Effect of Series Transistors

VDD Transistor resizing Resize the pull-up transistors to a example βp make pull-up times equal c βp b After resizing: βp a: 2βp, b: 2βp, c: βp

Pull-down

ECE 261 James Morizio 28 Transistor Placement (Series Stack) How to order transistors in a series stack?

Body effect: δVt ∝ √Vsb • At time t = 0, a=b=c=0, f=1, capacitances are charged Pull-up • Ideally V = V = V ≈ 0.8V stack ta tb tc • However, V > V > V because of F ta tb tc a t body effect a Ca b • If a, b, c become 1 at the same time, which tb Cb transistor will switch on first? c • tc will switch on first (Vsb for tc is zero), Cc will Cc discharge, pulling Vsb for tb to zero tc • If signals arrive at different times, how should the Gnd transistors be ordered? • Design strategy: place latest arriving signal nearest to output-early signals will discharge internal nodes

ECE 261 James Morizio 29 Transistor Placement Pull-up stack 2 a F 2 ta C 2 a

2 tb b Cb c Primary Cc Pull-up t inputs c stack (change Gnd 2 2 b simultaneously) 2 ta C F 2 a

tb a Cb

c Cc tc

ECE 261 James Morizio 30 Some Design Guidelines • Use NAND gates (instead of NOR) wherever possible • Placed inverters (buffers) at high fanout nodes to improve drive capability • Avoid use of NOR completely in high-speed

circuits: A1 + A2 + … + An = A1.A2….An

ECE 261 James Morizio 31 Some Design Guidelines

• Use limited fan-in (<10): high fan-in Ω long series stacks • Use minimum-sized gates on high fan-out nodes: minimize load presented to driving gate

ECE 261 James Morizio 32 Logical Effort

• Chip designers face a bewildering array of choices ? ? ? – What is the best circuit topology for a function? – How many stages of logic give least delay? – How wide should the transistors be?

• Logical effort is a method to make these decisions – Uses a simple model of delay – Allows back-of-the-envelope calculations – Helps make rapid comparisons between alternatives – Emphasizes remarkable symmetries

ECE 261 James Morizio 33 Example

• Ben Bitdiddle is the memory designer for the Motoroil 68W86, an embedded automotive processor. Help Ben design the decoder for a A[3:0] A[3:0] register file. 32 bits 4 : 1 1 6 6

• Decoder specifications:

w D 16 o e r d c Register File s – 16 word register file o d e – Each word is 32 bits wide r – Each bit presents load of 3 unit-sized transistors – True and complementary address inputs A[3:0] – Each input may drive 10 unit-sized transistors • Ben needs to decide: – How many stages to use? – How large should each gate be? – How fast can decoder operate?

ECE 261 James Morizio 34 Delay in a • Express delays in process-independent unit d τ = 3RC d = abs τ ≈ 12 ps in 180 nm process 40 ps in 0.6 µm process

ECE 261 James Morizio 35 Delay in a Logic Gate • Express delays in process-independent unit d d = abs τ • Delay has two components d = f + p

ECE 261 James Morizio 36 Delay in a Logic Gate

• Express delays in process-independent unit d d = abs τ • Delay has two components d = f + p • Effort delay f = gh (a.k.a. stage effort) – Again has two components

ECE 261 James Morizio 37 Delay in a Logic Gate • Express delays in process-independent unit d d = abs τ • Delay has two components d = f + p • Effort delay f = gh (a.k.a. stage effort) – Again has two components • g: logical effort – Measures relative ability of gate to deliver current – g ≡ 1 for inverter

ECE 261 James Morizio 38 Delay in a Logic Gate • Express delays in process-independent unit d d = abs τ • Delay has two components d = f + p • Effort delay f = gh (a.k.a. stage effort) – Again has two components

• h: electrical effort = Cout / Cin – Ratio of output to input capacitance – Sometimes called fanout

ECE 261 James Morizio 39 Delay in a Logic Gate • Express delays in process-independent unit d d = abs τ • Delay has two components d = f + p

• Parasitic delay p – Represents delay of gate driving no load – Set by internal parasitic capacitance

ECE 261 James Morizio 40 Delay Plots 2-input NAND Inverter 6 d = f + p g = d : 5 y p = a = gh + p l

e d =

D 4 g = d

e p = z i l 3

a d = m r

o 2 N

1

0 0 1 2 3 4 5 Electrical Effort:

h = Cout / Cin

ECE 261 James Morizio 41 Delay Plots 2-input NAND Inverter 6 d = f + p g = 4/3 d : 5 y p = 2 a = gh + p l

e d = (4/3)h + 2

D 4 g = 1 d

e p = 1 z i l 3

a d = h + 1

• What about m r

o 2 Effort Delay: f NOR2? N 1 Parasitic Delay: p 0 0 1 2 3 4 5 Electrical Effort:

h = Cout / Cin

ECE 261 James Morizio 42 Computing Logical Effort • DEF: Logical effort is the ratio of the input capacitance of a gate to the input capacitance of an inverter delivering the same output current. • Measure from delay vs. fanout plots • Or estimate by counting transistor widths 2 2 A 4 Y 2 B 4 A 2 A Y Y 1 B 2 1 1

Cin = 3 Cin = 4 Cin = 5 g = 3/3 g = 4/3 g = 5/3

ECE 261 James Morizio 43 Catalog of Gates

• Logical effort of common gates Gate type Number of inputs 1 2 3 4 n Inverter 1 NAND 4/3 5/3 6/3 (n+2)/3 NOR 5/3 7/3 9/3 (2n+1)/3 Tristate / mux 2 2 2 2 2

ECE 261 James Morizio 44 Catalog of Gates • Parasitic delay of common gates

– In multiples of pinv (≈1) Gate type Number of inputs 1 2 3 4 n Inverter 1 NAND 2 3 4 n NOR 2 3 4 n Tristate / mux 2 4 6 8 2n XOR, XNOR 4 6 8

ECE 261 James Morizio 45 Example: Ring Oscillator

• Estimate the frequency of an N-stage ring oscillator

Logical Effort: g = Electrical Effort: h = Parasitic Delay: p = Stage Delay: d =

Frequency: fosc =

ECE 261 James Morizio 46 Example: Ring Oscillator

• Estimate the frequency of an N-stage ring oscillator

31 stage ring oscillator in Logical Effort: g = 1 0.6 µm process has Electrical Effort: h = 1 frequency of ~ 200 MHz Parasitic Delay: p = 1 Stage Delay: d = 2

Frequency: fosc = 1/(2*N*d) = 1/4N

ECE 261 James Morizio 47 Example: FO4 Inverter • Estimate the delay of a fanout-of-4 (FO4) inverter d

Logical Effort: g = Electrical Effort: h = Parasitic Delay: p = Stage Delay: d =

ECE 261 James Morizio 48 Example: FO4 Inverter • Estimate the delay of a fanout-of-4 (FO4) inverter d

Logical Effort: g = 1 Electrical Effort: h = 4 The FO4 delay is about Parasitic Delay: p = 1 200 ps in 0.6 µm process 60 ps in a 180 nm process Stage Delay: d = 5 f/3 ns in an f µm process

ECE 261 James Morizio 49 Multistage Logic Networks • Logical effort generalizes to multistage networks • Path Logical Effort G = ∏ gi

Cout-path • Path Electrical Effort H = Cin-path

• Path Effort F = ∏ fi = ∏ gihi

10 x y z 20 g1 = 1 g2 = 5/3 g3 = 4/3 g4 = 1 h1 = x/10 h2 = y/x h3 = z/y h4 = 20/z

ECE 261 James Morizio 50 Multistage Logic Networks • Logical effort generalizes to multistage networks • Path Logical Effort G = ∏ gi

Cout path • Path Electrical Effort H = − Cin− path

• Path Effort F = ∏ fi = ∏ gihi

• Can we write F = GH?

ECE 261 James Morizio 51 Paths that Branch

• No! Consider paths that branch: 15 90 5 G = 15 H = 90 GH =

h1 =

h2 = F = GH?

ECE 261 James Morizio 52 Paths that Branch

• No! Consider paths that branch: 15 90 5 G = 1 15 H = 90 / 5 = 18 90 GH = 18

h1 = (15 +15) / 5 = 6

h2 = 90 / 15 = 6

F = g1g2h1h2 = 36 = 2GH

ECE 261 James Morizio 53 Branching Effort • Introduce branching effort – Accounts for branching between stages in path C + C b = on path off path Con path Note: B = ∏bi ∏hi = BH • Now we compute the path effort – F = GBH

ECE 261 James Morizio 54 Multistage Delays

• Path Effort Delay DF = ƒ fi

• Path Parasitic Delay P = ƒ pi

• Path Delay D = ƒdi = DF + P

ECE 261 James Morizio 55 Designing Fast Circuits

D = ƒdi = DF + P • Delay is smallest when each stage bears same effort

1 ˆ N f = gihi = F • Thus minimum delay of N stage path is 1 D = NF N + P • This is a key result of logical effort – Find fastest possible delay – Doesn’t require calculating gate sizes

ECE 261 James Morizio 56 Gate Sizes • How wide should the gates be for least delay? fˆ = gh = g Cout Cin g C Ω C = i outi ini fˆ

• Working backward, apply capacitance transformation to find input capacitance of each gate given load it drives. • Check work by verifying input cap spec is met.

ECE 261 James Morizio 57 Example: 3-stage path

• Select gate sizes x and y for least delay from A to B x

y x 45 A 8 x y B 45

ECE 261 James Morizio 58 Example: 3-stage path

x

y x 45 A 8 x y B 45

Logical Effort G = Electrical Effort H = Branching Effort B = Path Effort F = Best Stage Effort fˆ = Parasitic Delay P = Delay D =

ECE 261 James Morizio 59 Example: 3-stage path x

y x 45 A 8 x y B 45

Logical Effort G = (4/3)*(5/3)*(5/3) = 100/27 Electrical Effort H = 45/8 Branching Effort B = 3 * 2 = 6 Path Effort F = GBH = 125 Best Stage Effort fˆ = 3 F = 5 Parasitic Delay P = 2 + 3 + 2 = 7 Delay D = 3*5 + 7 = 22 = 4.4 FO4

ECE 261 James Morizio 60 Example: 3-stage path

• Work backward for sizes y = x = x

y x 45 A 8 x y B 45

ECE 261 James Morizio 61 Example: 3-stage path • Work backward for sizes y = 45 * (5/3) / 5 = 15 x = (15*2) * (5/3) / 5 = 10

45 A P: 4 P: 4 N: 4 P: 12 N: 6 B N: 3 45

ECE 261 James Morizio 62 Best Number of Stages • How many stages should a path use? – Minimizing number of stages is not always fastest • Example: drive 64-bit datapath with unit inverter Initial Driver 1 1 1 1

D =

Datapath Load 64 64 64 64

N: 1 2 3 4 f: D:

ECE 261 James Morizio 63 Best Number of Stages • How many stages should a path use? – Minimizing number of stages is not always fastest • Example: drive 64-bit datapath with unit inverter

Initial Driver 1 1 1 1

8 4 2.8 1/N D = NF + P 16 8 = N(64)1/N + N 23

Datapath Load 64 64 64 64

N: 1 2 3 4 f: 64 8 4 2.8 D: 65 18 15 15.3 Fastest

ECE 261 James Morizio 64 Derivation • Consider adding inverters to end of path – How many give least delay? N - n1 Extra Inverters Logic Block: n n Stages 1 1 1 N Path Effort F D = NF + ƒ pi + (N − n1 )pinv i=1 ∂D 1 1 1 = −F N ln F N + F N + p = 0 ∂N inv 1 • Define best stage effort ρ = F N

pinv + ρ (1− ln ρ )= 0

ECE 261 James Morizio 65 Best Stage Effort

pinv + ρ (1− ln ρ )= 0 • has no closed-form solution

• Neglecting parasitics (pinv = 0), we find ρ = 2.718 (e)

• For pinv = 1, solve numerically for ρ = 3.59

ECE 261 James Morizio 66 Review of Definitions Term Stage Path number of stages 1 N g logical effort G = ∏ gi C h = Cout H = out-path electrical effort Cin Cin-path

Con-path +Coff-path b = C B = b branching effort on-path ∏ i effort f = gh F = GBH D = f effort delay f F ƒ i parasitic delay p P = ƒ pi D d D P delay d = f + p = ƒ i = F +

ECE 261 James Morizio 67 Method of Logical Effort 1) Compute path effort F = GBH

2) Estimate best number of stages N = log4 F 3) Sketch path with N stages 1 4) Estimate least delay D = NF N + P 1 ˆ N 5) Determine best stage effort f = F g C i outi Cin = 6) Find gate sizes i fˆ

ECE 261 James Morizio 68 Limits of Logical Effort

• Chicken and egg problem – Need path to compute G – But don’t know number of stages without G • Simplistic delay model – Neglects input rise time effects • Interconnect – Iteration required in designs with wire • Maximum speed only – Not minimum area/power for constrained delay

ECE 261 James Morizio 69 Summary

• Logical effort is useful for thinking of delay in circuits – Numeric logical effort characterizes gates – NANDs are faster than NORs in CMOS – Paths are fastest when effort delays are ~4 – Path delay is weakly sensitive to stages, sizes – But using fewer stages doesn’t mean faster paths

– Delay of path is about log4F FO4 inverter delays – Inverters and NAND2 best for driving large caps • Provides language for discussing fast circuits – But requires practice to master

ECE 261 James Morizio 70 Power and Energy

• Power is drawn from a voltage source attached to

the VDD pin(s) of a chip. P(t) = iDD (t)VDD • Instantaneous Power: T T E = — P(t)dt = — iDD (t)VDDdt 0 0 • Energy: E 1 T Pavg = = — iDD (t)VDDdt T T 0 • Average Power:

ECE 261 James Morizio 71 Dynamic Power • Dynamic power is required to charge and discharge load capacitances when transistors switch. • One cycle involves a rising and falling output.

• On rising output, charge Q = CVDD is required • On falling output, charge is dumped to GND

• This repeats Tfsw times over an interval of T VDD iDD(t)

C fsw

ECE 261 James Morizio 72 Dynamic Power Cont.

Pdynamic =

VDD

iDD(t)

C fsw

ECE 261 James Morizio 73 Dynamic Power Cont. 1 T Pdynamic = — iDD (t)VDDdt T 0 T VDD = — iDD (t)dt T 0

VDD = [Tf CV ] VDD T sw DD iDD(t) 2 = CVDD fsw

C fsw

ECE 261 James Morizio 74 Activity Factor

• Suppose the system clock frequency = f

• Let fsw = αf, where α = activity factor – If the signal is a clock, α = 1 – If the signal switches once per cycle, α = ½ – Dynamic gates: • Switch either 0 or 2 times per cycle, α = ½ – Static gates: • Depends on design, but typically α = 0.1

2 • Dynamic power: Pdynamic = αCVDD f

ECE 261 James Morizio 75 Short Circuit Current

• When transistors switch, both nMOS and pMOS networks may be momentarily ON at once • Leads to a blip of “short circuit” current. • < 10% of dynamic power if rise/fall times are comparable for input and output

ECE 261 James Morizio 76 Example

• 200 Mtransistor chip – 20M logic transistors • Average width: 12 λ – 180M memory transistors • Average width: 4 λ – 1.2 V 100 nm process

– Cg = 2 fF/µm

ECE 261 James Morizio 77 Dynamic Example

• Static CMOS logic gates: activity factor = 0.1 • Memory arrays: activity factor = 0.05 (many banks!)

• Estimate dynamic power consumption per MHz. Neglect wire capacitance and short-circuit current.

ECE 261 James Morizio 78 Dynamic Example • Static CMOS logic gates: activity factor = 0.1 • Memory arrays: activity factor = 0.05 (many banks!) • Estimate dynamic power consumption per MHz. Neglect wire capacitance.

6 Clogic = (20 ×10 )(12λ)(0.05µm / λ)(2 fF / µm) = 24nF

6 Cmem = (180 ×10 )(4λ)(0.05µm / λ)(2 fF / µm) = 72nF » ÿ 2 Pdynamic = 0.1Clogic + 0.05Cmem ⁄(1.2) f = 8.6 mW/MHz

ECE 261 James Morizio 79 Static Power • Static power is consumed even when chip is quiescent. – Ratioed circuits burn power in fight between ON transistors – Leakage draws power from nominally OFF devices

V −V gs t » −Vds ÿ nvT vT Ids = Ids0e …1− e Ÿ … ⁄Ÿ

Vt = Vt0 −ηVds + γ ( φs +Vsb − φs )

ECE 261 James Morizio 80 Ratio Example • The chip contains a 32 word x 48 bit ROM – Uses pseudo-nMOS decoder and bitline pullups – On average, one wordline and 24 bitlines are high • Find static power drawn by the ROM – β = 75 µA/V2

– Vtp = -0.4V

ECE 261 James Morizio 81 Ratio Example • The chip contains a 32 word x 48 bit ROM – Uses pseudo-nMOS decoder and bitline pullups – On average, one wordline and 24 bitlines are high • Find static power drawn by the ROM – β = 75 µA/V2 – V = -0.4V tp 2 (VDD − Vtp ) • Solution: I = β = 24A pull-up 2

Ppull-up = VDD Ipull-up = 29W

Pstatic = (31+ 24)Ppull-up = 1.6 mW

ECE 261 James Morizio 82 Leakage Example

• The process has two threshold voltages and two oxide thicknesses. • Subthreshold leakage:

– 20 nA/µm for low Vt

– 0.02 nA/µm for high Vt • Gate leakage: – 3 nA/µm for thin oxide – 0.002 nA/µm for thick oxide • Memories use low-leakage transistors everywhere • Gates use low-leakage transistors on 80% of logic

ECE 261 James Morizio 83 Leakage Example Cont.

• Estimate static power:

ECE 261 James Morizio 84 Leakage Example Cont. • Estimate static power:

– High leakage: (20 ×106 )(0.2)(12λ)(0.05µm / λ) = 2.4 ×106 µm – Low leakage: (20 ×106 )(0.8)(12λ)(0.05µm / λ)+ (180 ×106 )(4λ)(0.05µm / λ) = 45.6 ×106 µm 6 » ÿ Istatic = (2.4 ×10 µm) (20nA/ µm)/ 2 + (3nA/ µm)⁄+

6 (45.6 ×10 µm) »(0.02nA/ µm)/ 2 + (0.002nA/ µm)⁄ÿ = 32mA

Pstatic = IstaticVDD = 38mW

ECE 261 James Morizio 85 Leakage Example Cont.

• Estimate static power: (20 ×106 )(0.2)(12λ)(0.05µm / λ) = 2.4 ×106 µm – High leakage: (20 ×106 )(0.8)(12λ)(0.05µm / λ)+ – Low leakage: 6 6 (180 ×10 )(4λ)(0.05µm / λ) = 45.6 ×10 µm 6 » ÿ Istatic = (2.4 ×10 µm) (20nA/ µm)/ 2 + (3nA/ µm)⁄+

6 (45.6 ×10 µm) »(0.02nA/ µm)/ 2 + (0.002nA/ µm)⁄ÿ = 32mA

Pstatic = IstaticVDD = 38mW

• If no low leakage devices, Pstatic = 749 mW (!)

ECE 261 James Morizio 86 Low Power Design

• Reduce dynamic power – α: – C:

– VDD: – f: • Reduce static power

ECE 261 James Morizio 87 Low Power Design

• Reduce dynamic power – α: clock gating, sleep mode – C:

– VDD: – f: • Reduce static power

ECE 261 James Morizio 88 Low Power Design

• Reduce dynamic power – α: clock gating, sleep mode – C: small transistors (esp. on clock), short wires

– VDD: – f: • Reduce static power

ECE 261 James Morizio 89 Low Power Design

• Reduce dynamic power – α: clock gating, sleep mode – C: small transistors (esp. on clock), short wires

– VDD: lowest suitable voltage – f: • Reduce static power

ECE 261 James Morizio 90 Low Power Design

• Reduce dynamic power – α: clock gating, sleep mode – C: small transistors (esp. on clock), short wires

– VDD: lowest suitable voltage – f: lowest suitable frequency • Reduce static power

ECE 261 James Morizio 91 Low Power Design

• Reduce dynamic power – α: clock gating, sleep mode – C: small transistors (esp. on clock), short wires

– VDD: lowest suitable voltage – f: lowest suitable frequency • Reduce static power – Selectively use ratioed circuits

– Selectively use low Vt devices – Leakage reduction: stacked devices, body bias, low temperature

ECE 261 James Morizio 92