EE241 - Spring 2005 Advanced Digital Integrated Circuits

Lecture 7: Logic Families for Performance

Admin

Homeworks due on We. New assignment on your way. Will get feedback on the projects within a week.

2

1 Logical Effort: Summary

Stage Path Logical Effort g = G ∏gi Electrical Effort (Fanout) f = C /C = out in F Cout /Cin = Branching Effort n/a B ∏ bi Effort h = fg H = FGB Effort Delay h = DH ∑ hi Number of Stages 1 N

Intrinsic Delay p = P ∑ pi

Delay d = h + p D = DH + P

3

Increasing Performance

Scaling technology Circuit/logic level: 1. Logic optimizations 2. sizing, buffering 3. Wire optimization, repeaters 4. Supply voltage 5. Threshold voltage 6. Logic styles 7. Timing, latches Microarchitecture level

4

2 Design Techniques

Performance does not come for free

Design Dynamic custom Effort Custom design Structured ASIC

‘Enhanced’ ASIC

ASIC/RTL

Performance 5

RTL Design Flow

HDL

RTL Manual Module Synthesis Design Generators

a 0 d q Library b 1 s clk

a 0 d q netlist b 1

s clk physical design

layout [from K. Keutzer] 6

3 RTL/ASIC Design

Design description in Verilog/VHDL RTL Synthesized logic Standard cells Pre-defined macros Static timing verification, pre- and post-layout Statistical vs. extracted wire loads Physical design Top-level floorplan Automatic Clock tree synthesized Post layout optimization, verification

7

Logic Optimization

a 0 d q netlist b 1 Perform a variety of s clk transformations and optimizations logic Structural graph transformations Library optimization Boolean transformations Mapping into a physical library

a 0 d q 1 netlist b s clk smaller, faster less power

[from K. Keutzer] 8

4 Optimization

Input:

• Initial Boolean network

• Timing characterization for the module - input arrival times and drive factors - output loading factors

• Optimization goals - output required times

• Target library description Output:

• Minimum-area netlist of library gates which meets timing constraints

A very difficult optimization problem ! [from K. Keutzer] 9

Logic Optimization

2-level Logic opt netlist

tech multilevel independent Logic opt logic Library optimization tech dependent Generic Library netlist

Real Library

[from K. Keutzer] 10

5 Modern Approach to Logic Optimization

Divide logic optimization into two subproblems: • Technology-independent optimization - determine overall logic structure - estimate costs (mostly) independent of technology - simplified cost modeling • Technology-dependent optimization (technology mapping) - binding onto the gates in the library - detailed technology-specific cost model

Orchestration of various optimization/transformation techniques for each subproblem

11

Logic Level Optimizations

Logic Depth

or

Techniques: Restructuring, pipelining, retiming, technology mapping

R R

Well covered by today’s logic and sequential synthesis 12

6 Logic Optimizations (2)

Late arriving Fanout

Tp = O(FO) also effects wiring capacitance

Technique: Removal of common sub-expression Start from tree structure/output

13

Technology mapping

4.0 Fanin tpHL 3.0 2 Tp = O(FI ) ! 2.0 quadratic tp Observation: only true if FI (nsec) p

t translates in series devices - 1.0 t otherwise linear linear pLH e.g. NAND pull-down 0.0 13579 NOR pull-up fan-in

AVOID LARGE FAN-IN GATES! (Typically not more than FI < 4)

14

7 Technology Mapping for Performance

Alternative coverings Use low FI modules on critical path(s) Library composition? 15

CMOS Logic Styles

CMOS tradeoffs: Speed Power (energy) Area Design tradeoffs Robustness, scalability Design time Many styles: don’t try to remember the names – remember the principles Changing the logic style – can it be done without breaking the synthesis flow?

16

8 CMOS Logic Styles

Complementary

VDD Pass Transistor Logic

A B PUN C A B LOGIC OUT OUT C NETWORK

A B PDN C

simple and fast GND not always very efficient versatile robust scales large and slow

17

CMOS Logic Styles

Ratioed Logic Dynamic Logic

VDD VDD

φ LOAD Out

GND OUT CL In1 In2 PDN A B RPDN << In C PDN 3 RLOAD φ

GND

small & fast Small & fastest! static power Noise issues Scales? 18

9 Others

Current-mode logic Adiabatic logic

19

Pulsed Static CMOS

RH – Reset high RL – Reset low

Fast pull-up Fast pull-down

Chen, Ditlow, US Pat. 5,495,188 Feb. 1996. 20

10 PS-CMOS

Evaluation and reset waves: reset is 1.5x slower

21

PS-CMOS

Advantages: No dynamic nodes – good noise immunity Reset delay slower than evaluation No data dependent delay (worst case gets better) No false transitions Disadvantages Width of reset wave limits logic depth Margin in design

22

11 Skewing Gates

Different rising and falling delays

W

W

LE =

23

Skewing Gates

4W

W

LE =

24

12 Ratioed Logic

VDD VDD VDD

Resistive Depletion PMOS Load RL Load VT < 0 Load VSS F F F In1 In1 In1 In2 PDN In2 PDN In2 PDN In3 In3 In3

VSS VSS VSS (a) resistive load (b) depletion load NMOS (c) pseudo-NMOS

Goal: to reduce the number of devices over complementary CMOS

25

Pseudo-NMOS

3.0 VDD

2.5 PMOS load 2.0 W/Lp = 4

F 1.5 , V W/Lp = 2 out 1.0 In1 V In W/L = 0.5 2 PDN p W/Lp = 1 0.5 In3 W/Lp = .25

0.0 0.0 0.5 1.0 1.5 2.0 2.5

Vin , V Trade-off between performance and power + noise margins 26

13 Differential Logic

27

Differential Logic

Differential Cascode Voltage Switch (DCVS) Differential Split-Level Logic (DSL) Regenerative Push-Pull Cascode Logic (PPCL) Pass transistor logic families Dynamic logic families

28

14 Differential Logic

+ implicit invert, higher logic density

29

Cascode Voltage Switch Logic

VDD VDD

M1 M2

Out Out

A A B PDN1 PDN2 B

VSS VSS Cascode Voltage Switch Logic (CVSL)

Sometimes called Differential Cascode Voltage Switch Logic (DCVSL) 30

15 CVSL

VDD -Vth

2. 5 Out Out Out 1. 5 Out A M 1 M M A,B A 3 B 4 Voltage,V 0. 5 A, B

B M2 -0.5 0 0.2 0.4 0.6 0.8 1.0 Time, ns

Fast (but hysteresis due to latch function) No static power dissipation BUT: large cross-over current! 31

CVSL

Full adder design

How to design for reduced transistor count? 32

16 Technique

33

Karnaugh Map Technique

x2x3 00 01 11 10 x1 0 0 001

1 0 111

LOAD LOAD

Q Q Q Q Build shared x1 x1 x1 x1 cubes first! x1 x2

x2 x2 x2 x2 Add other cubes next

x3 x3 x3 x3 34

17 Example

Q = x1x2x3x4 + x1(x2+x3+x4)

35

Push-Pull Cascode Logic

Gieseke et al, U.S. Patent 5,023,480 June 1991. 36

18 DSL Differential Split-Level Logic

37

Simulation Results for Different Adders

38

19 Pass-Transistor Logic

B

Switch Out A

ts Out u p

n Network B I B

• N • No static consumption B A • Transistor implementation B F = AB using NMOS 0

39

Pass-Transistor Logic

Performance of PTL: Advantage over CMOS in implementing XOR, MUX Disadvantage in implementing AND, OR. Datapaths, arithmetic circuits are examples of use: Adders and multipliers use XOR, MUX Advantage of complementary implementation Comparisons: When a new logic family is introduced, the examples are chosen to show its advantages; (not disadvantages). Comparison papers sometimes point to the disadvantages Full-custom design

40

20 Examples of PTL Styles

Complementary Pass-Transistor Logic NMOS-only pass-transistor network Transmission-gate logic NMOS+PMOS pass gates Double Pass-Transistor Logic NMOS+PMOS network Numerous other logic families

41

NMOS-only switch

3.0 In

C =2.5V C =2.5 V Out 2.0 M2 x A =2.5V A =2.5 V B Mn B V Voltage, 1.0 CL M1

0.000.511.52 Time, ns

Threshold voltage loss causes static power consumption

42

21 Solutions

Transmission gates – adding Low-threshold switches – leakage! Level-restoration

V DD Level Restorer V DD

M r B M 2 X A M n Out

M 1

43

Single-Ended Level Restoring

Level Restoration Transistor Output

Input Output

Feedback Inverter

44

22 Differential Level Restoring

f f

Differential NMOS Logic Tree Inputs

Inputs Different level restoration leads to different logic families 45

Different Restoration Schemes

Swing-Restored Pass-Transistor Logic

f f

Differential NMOS Logic Tree Inputs

Parameswar, et al Inputs CICC’94, JSSC 6/96

46

23 Other Level-Restoring Schemes

f f

f f

Differential NMOS Logic Tree Differential NMOS Logic Tree Inputs Inputs

Inputs Inputs Energy Economized Pass-Transistor DCVS with Pass Gates Logic (DCVS-PG) 47

Pass-Transistor Logic Families

48

24 Complementary Pass-Transistor Logic (CPL)

A Pass-Transistor A B Network F B

A Complementary A Pass-Transistor F B Network B

• Complementary functions • Reduced number of logic levels • Less transistors than CMOS • Fast – reduced load • Complementary inputs – complementary outputs

• VT drop – several solutions 49

CPL

Level restoration

Yano et al, CICC’89, JSSC 4/90 50

25 CPL

Same topology of networks Just different signal arrangements 51

Complementary Pass-Transistor Logic (CPL)

A A A A B nFET logic n1 n2 B B network n3 n4 B -Fast C - V drop T C - Efficient QQb S S (a) (b) implementation S S of arithmetic XOR Sum

52

26 CPL Karnaugh Maps

B A B A C 2 C1 C2 C 0 0 1 A A

B 0 1 A A

C1 C2

A⋅ B A⋅ B

53

CPL vs. CMOS

54

27 Skewing Output Inverter

55

Differential vs. Single-Ended

56

28 Leap Cell Library

Yano et al, CICC’94, JSSC 6/96

Goal: Implement full logic functionality with small library Rely on automated design methodology 57

Various Logic Functions of the Leap Library

58

29 LEAP Comparison

59

Double Pass-Transistor Logic (DPL)

VDD A B B A AND/NAND A B B B

A A

O O A B A B A B A B

A B B A XOR/XNOR A B B A A B

O O 60

30 Designing DPL Gates

B A

A C4 C2

C A B 0 0 1

A×B B 0 1 C 2 A B C1 C3 C4 C3

61

Designing DPL Gates (2)

A A

C C A 1 2 B B C 0 1 2 AÅ B 1 0 A A B C C C1 4 3 B B C C 3 A 4 A A

C2 C C3 4 0 1 B B

B 1 0 AÅ B C4 B B

C C3 C1 C2 1 A A 62

31 Applications of DPL

Full adder: 1.5ns 32-bit ALU in 0.25µm CMOS

Suzuki, ISSCC’93 JSSC 11/93 63

Comparison of Logic Styles

Zimmermann, Fichtner, JSSC 7/97 64

32 Comparison of Logic Styles

65

Comparison of Logic Styles

66

33 Results

67

Results

68

34 Results

69

35