EE241 - Spring 2005 Advanced Digital Integrated Circuits
Lecture 7: Logic Families for Performance
Admin
Homeworks due on We. New assignment on your way. Will get feedback on the projects within a week.
2
1 Logical Effort: Summary
Stage Path Logical Effort g = G ∏gi Electrical Effort (Fanout) f = C /C = out in F Cout /Cin = Branching Effort n/a B ∏ bi Effort h = fg H = FGB Effort Delay h = DH ∑ hi Number of Stages 1 N
Intrinsic Delay p = P ∑ pi
Delay d = h + p D = DH + P
3
Increasing Performance
Scaling technology Circuit/logic level: 1. Logic optimizations 2. Transistor sizing, buffering 3. Wire optimization, repeaters 4. Supply voltage 5. Threshold voltage 6. Logic styles 7. Timing, latches Microarchitecture level
4
2 Design Techniques
Performance does not come for free
Design Dynamic custom Effort Custom design Structured ASIC
‘Enhanced’ ASIC
ASIC/RTL
Performance 5
RTL Design Flow
HDL
RTL Manual Module Synthesis Design Generators
a 0 d netlist q Library b 1 s clk logic optimization
a 0 d q netlist b 1
s clk physical design
layout [from K. Keutzer] 6
3 RTL/ASIC Design
Design description in Verilog/VHDL RTL Synthesized logic Standard cells Pre-defined macros Static timing verification, pre- and post-layout Statistical vs. extracted wire loads Physical design Top-level floorplan Automatic place and route Clock tree synthesized Post layout optimization, verification
7
Logic Optimization
a 0 d q netlist b 1 Perform a variety of s clk transformations and optimizations logic Structural graph transformations Library optimization Boolean transformations Mapping into a physical library
a 0 d q 1 netlist b s clk smaller, faster less power
[from K. Keutzer] 8
4 Combinational Logic Optimization
Input:
• Initial Boolean network
• Timing characterization for the module - input arrival times and drive factors - output loading factors
• Optimization goals - output required times
• Target library description Output:
• Minimum-area netlist of library gates which meets timing constraints
A very difficult optimization problem ! [from K. Keutzer] 9
Logic Optimization
2-level Logic opt netlist
tech multilevel independent Logic opt logic Library optimization tech dependent Generic Library netlist
Real Library
[from K. Keutzer] 10
5 Modern Approach to Logic Optimization
Divide logic optimization into two subproblems: • Technology-independent optimization - determine overall logic structure - estimate costs (mostly) independent of technology - simplified cost modeling • Technology-dependent optimization (technology mapping) - binding onto the gates in the library - detailed technology-specific cost model
Orchestration of various optimization/transformation techniques for each subproblem
11
Logic Level Optimizations
Logic Depth
or
Techniques: Restructuring, pipelining, retiming, technology mapping
R R
Well covered by today’s logic and sequential synthesis 12
6 Logic Optimizations (2)
Late arriving Fanout
Tp = O(FO) also effects wiring capacitance
Technique: Removal of common sub-expression Start from tree structure/output
13
Technology mapping
4.0 Fanin tpHL 3.0 2 Tp = O(FI ) ! 2.0 quadratic tp Observation: only true if FI (nsec) p
t translates in series devices - 1.0 t otherwise linear linear pLH e.g. NAND pull-down 0.0 13579 NOR pull-up fan-in
AVOID LARGE FAN-IN GATES! (Typically not more than FI < 4)
14
7 Technology Mapping for Performance
Alternative coverings Use low FI modules on critical path(s) Library composition? 15
CMOS Logic Styles
CMOS tradeoffs: Speed Power (energy) Area Design tradeoffs Robustness, scalability Design time Many styles: don’t try to remember the names – remember the principles Changing the logic style – can it be done without breaking the synthesis flow?
16
8 CMOS Logic Styles
Complementary
VDD Pass Transistor Logic
A B PUN C A B LOGIC OUT OUT C NETWORK
A B PDN C
simple and fast GND not always very efficient versatile robust scales large and slow
17
CMOS Logic Styles
Ratioed Logic Dynamic Logic
VDD VDD
φ LOAD Out
GND OUT CL In1 In2 PDN A B RPDN << In C PDN 3 RLOAD φ
GND
small & fast Small & fastest! static power Noise issues Scales? 18
9 Others
Current-mode logic Adiabatic logic
19
Pulsed Static CMOS
RH – Reset high RL – Reset low
Fast pull-up Fast pull-down
Chen, Ditlow, US Pat. 5,495,188 Feb. 1996. 20
10 PS-CMOS
Evaluation and reset waves: reset is 1.5x slower
21
PS-CMOS
Advantages: No dynamic nodes – good noise immunity Reset delay slower than evaluation No data dependent delay (worst case gets better) No false transitions Disadvantages Width of reset wave limits logic depth Margin in design
22
11 Skewing Gates
Different rising and falling delays
W
W
LE =
23
Skewing Gates
4W
W
LE =
24
12 Ratioed Logic
VDD VDD VDD
Resistive Depletion PMOS Load RL Load VT < 0 Load VSS F F F In1 In1 In1 In2 PDN In2 PDN In2 PDN In3 In3 In3
VSS VSS VSS (a) resistive load (b) depletion load NMOS (c) pseudo-NMOS
Goal: to reduce the number of devices over complementary CMOS
25
Pseudo-NMOS
3.0 VDD
2.5 PMOS load 2.0 W/Lp = 4
F 1.5 , V W/Lp = 2 out 1.0 In1 V In W/L = 0.5 2 PDN p W/Lp = 1 0.5 In3 W/Lp = .25
0.0 0.0 0.5 1.0 1.5 2.0 2.5
Vin , V Trade-off between performance and power + noise margins 26
13 Differential Logic
27
Differential Logic
Differential Cascode Voltage Switch (DCVS) Differential Split-Level Logic (DSL) Regenerative Push-Pull Cascode Logic (PPCL) Pass transistor logic families Dynamic logic families
28
14 Differential Logic
+ implicit invert, higher logic density
29
Cascode Voltage Switch Logic
VDD VDD
M1 M2
Out Out
A A B PDN1 PDN2 B
VSS VSS Cascode Voltage Switch Logic (CVSL)
Sometimes called Differential Cascode Voltage Switch Logic (DCVSL) 30
15 CVSL
VDD -Vth
2. 5 Out Out Out 1. 5 Out A M 1 M M A,B A 3 B 4 Voltage,V 0. 5 A, B
B M2 -0.5 0 0.2 0.4 0.6 0.8 1.0 Time, ns
Fast (but hysteresis due to latch function) No static power dissipation BUT: large cross-over current! 31
CVSL
Full adder design
How to design for reduced transistor count? 32
16 Karnaugh Map Technique
33
Karnaugh Map Technique
x2x3 00 01 11 10 x1 0 0 001
1 0 111
LOAD LOAD
Q Q Q Q Build shared x1 x1 x1 x1 cubes first! x1 x2
x2 x2 x2 x2 Add other cubes next
x3 x3 x3 x3 34
17 Example
Q = x1x2x3x4 + x1(x2+x3+x4)
35
Push-Pull Cascode Logic
Gieseke et al, U.S. Patent 5,023,480 June 1991. 36
18 DSL Differential Split-Level Logic
37
Simulation Results for Different Adders
38
19 Pass-Transistor Logic
B
Switch Out A
ts Out u p
n Network B I B
• N transistors • No static consumption B A • Transistor implementation B F = AB using NMOS 0
39
Pass-Transistor Logic
Performance of PTL: Advantage over CMOS in implementing XOR, MUX Disadvantage in implementing AND, OR. Datapaths, arithmetic circuits are examples of use: Adders and multipliers use XOR, MUX Advantage of complementary implementation Comparisons: When a new logic family is introduced, the examples are chosen to show its advantages; (not disadvantages). Comparison papers sometimes point to the disadvantages Full-custom design
40
20 Examples of PTL Styles
Complementary Pass-Transistor Logic NMOS-only pass-transistor network Transmission-gate logic NMOS+PMOS pass gates Double Pass-Transistor Logic NMOS+PMOS network Numerous other logic families
41
NMOS-only switch
3.0 In
C =2.5V C =2.5 V Out 2.0 M2 x A =2.5V A =2.5 V B Mn B V Voltage, 1.0 CL M1
0.000.511.52 Time, ns
Threshold voltage loss causes static power consumption
42
21 Solutions
Transmission gates – adding complexity Low-threshold switches – leakage! Level-restoration
V DD Level Restorer V DD
M r B M 2 X A M n Out
M 1
43
Single-Ended Level Restoring
Level Restoration Transistor Output Inverter
Input Output
Feedback Inverter
44
22 Differential Level Restoring
f f
Differential NMOS Logic Tree Inputs
Inputs Different level restoration leads to different logic families 45
Different Restoration Schemes
Swing-Restored Pass-Transistor Logic
f f
Differential NMOS Logic Tree Inputs
Parameswar, et al Inputs CICC’94, JSSC 6/96
46
23 Other Level-Restoring Schemes
f f
f f
Differential NMOS Logic Tree Differential NMOS Logic Tree Inputs Inputs
Inputs Inputs Energy Economized Pass-Transistor DCVS with Pass Gates Logic (DCVS-PG) 47
Pass-Transistor Logic Families
48
24 Complementary Pass-Transistor Logic (CPL)
A Pass-Transistor A B Network F B
A Complementary A Pass-Transistor F B Network B
• Complementary functions • Reduced number of logic levels • Less transistors than CMOS • Fast – reduced load • Complementary inputs – complementary outputs
• VT drop – several solutions 49
CPL
Level restoration
Yano et al, CICC’89, JSSC 4/90 50
25 CPL
Same topology of networks Just different signal arrangements 51
Complementary Pass-Transistor Logic (CPL)
A A A A B nFET logic n1 n2 B B network n3 n4 B -Fast C - V drop T C - Efficient QQb S S (a) (b) implementation S S of arithmetic XOR Sum
52
26 CPL Karnaugh Maps
B A B A C 2 C1 C2 C 0 0 1 A A
B 0 1 A A
C1 C2
A⋅ B A⋅ B
53
CPL vs. CMOS
54
27 Skewing Output Inverter
55
Differential vs. Single-Ended
56
28 Leap Cell Library
Yano et al, CICC’94, JSSC 6/96
Goal: Implement full logic functionality with small library Rely on automated design methodology 57
Various Logic Functions of the Leap Library
58
29 LEAP Comparison
59
Double Pass-Transistor Logic (DPL)
VDD A B B A AND/NAND A B B B
A A
O O A B A B A B A B
A B B A XOR/XNOR A B B A A B
O O 60
30 Designing DPL Gates
B A
A C4 C2
C A B 0 0 1
A×B B 0 1 C 2 A B C1 C3 C4 C3
61
Designing DPL Gates (2)
A A
C C A 1 2 B B C 0 1 2 AÅ B 1 0 A A B C C C1 4 3 B B C C 3 A 4 A A
C2 C C3 4 0 1 B B
B 1 0 AÅ B C4 B B
C C3 C1 C2 1 A A 62
31 Applications of DPL
Full adder: 1.5ns 32-bit ALU in 0.25µm CMOS
Suzuki, ISSCC’93 JSSC 11/93 63
Comparison of Logic Styles
Zimmermann, Fichtner, JSSC 7/97 64
32 Comparison of Logic Styles
65
Comparison of Logic Styles
66
33 Results
67
Results
68
34 Results
69
35