EE241 - Spring 2012 Advanced Digital Integrated Circuits
Lecture 18: Dynamic Voltage Scaling
Outline
Finish multiple supplies Dynamic voltage scaling
2
1 Supply Voltage Tradeoffs
Multiple Supplies in a Block
CVS Layout:
Usami’98 4
2 Level-Converting Flip-Flop
VH
VL
CLK CK Q CK M M CK CK 1 2 D CK CK CK
5
Three VDD’s
1.4
1.3
1.2
1.1
1
0.9(V) V2 (V) 3
0.8V
0.7 + 0.6 Power Reduction Ratio
0.5
0.4
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 V1 (V) V2 (V)
From Kuroda V1 = 1.5V, VTH = 0.3V, p(t):lambda 6
3 Optimum Numbers of Supplies
{ V1, V2 } { V1, V2, V3 } { V1, V2, V3, V4 } 1.0
V2/V1 V2/V1 V2/V1 V3/V1
V3/V1 0.5 V4/V1 Supply Voltage Ratio Voltage Supply 1.0
P2/P1 P3/P1 P4/P1 0.4 Power Dissipation Ratio 0.5 1.0 1.5 0.5 1.0 1.5 0.5 1.0 1.5 V1 (V) V1 (V) V1 (V)
The more VDD’s, the less power, but the effect will be saturated. Power reduction effect will be decreased as VDD’s are scaled. Optimum V /V is around 0.7. 2 1 Hamada, CICC’01 7
Multiple Supply Voltages
Two supply voltages per block are optimal Optimal ratio between the supply voltages is 0.7 Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF) An option is to use an asynchronous (combinatorial) level converter More sensitive to coupling and supply noise
8
4 Dual-Supply-Datapath: Layout Issue
: VDDH circuit : VDDL circuit : Signal flow
VDDL Row
VDDH Row (empty) (a) Dedicated row (Conventional)
V Row DDL Complex interconnections VDDH Row (b) Possible layout reduction (Conventional)
(c) Shared-well layout A shared-well technique is appropriate for random placement of cells 9
Standard-Cell Dual-Supply-Voltage N-well isolation VDDH VDDL VDDH VDDL i1 o1 i2 o2
VSS
VSS
VDDH circuit VDDL circuit VDDH circuit VDDL circuit (a) circuit schematic (b) layout
A VDDH circuit is assigned only to a critical path
A VDDL circuit is used in a non-critical path and for driving a large capacitive load 10
5 Shared-Well Dual-Supply-Voltage Shared N-well
VDDH VDDH VDDL
VDDL i1 o1 i2 o2
VSS VSS
VDDH circuit VDDL circuit VDDH circuit VDDL circuit (a) circuit schematic (b) layout Both circuits can be placed in the same N-well Cell layout becomes complex An intrinsic negative back-biasing of PMOS degrades speed Shimazaki, ISSCC’03 11
Measured Results: Energy & Delay Room temp. 800 1.16GHz 700 Single-supply VDDL=1.4V 600 Energy:-25.3% Shared well Delay :+2.8% (V =1.8V) 500 DDH Energy [pJ] 400 VDDL=1.2V Energy:-33.3% 300 Delay :+8.3% 200 0.6 0.8 1.0 1.2 1.4 1.6
TCYCLE [ns] The dual-supply technique expands the power-delay optimization space 12
6 Power /Energy Optimization Space
Constant Throughput/Latency Variable Throughput/Latency
Energy Design Time Sleep Mode Run Time
Logic design Scaled V DFS, DVS Active DD Clock gating Trans. sizing
Multi-VDD
Stack effects Trans sizing Sleep T’s
Leakage Scaling VDD Multi-VDD Variable VTh + Variable VTh
+ Multi-VTh + Input control 13
Adaptive Supply Voltages
14
7 Processors for Portable Devices
1000 Dynamic Voltage 100 Scaling Notebook Computers
10 Pocket-PCs
Performance (MIPS) 1 PDAs 0.1 110 Processor Energy (Watt*sec) Burd ISSCC’00 • Eliminate performance energy trade-off 15
Typical MPEG IDCT Histogram
16
8 Processor Usage Model
Desired Compute-intensive and Throughput low-latency processes Maximum Processor Speed
Background and time System Idle high-latency processes System Optimizations: Burd • Maximize Peak Throughput ISSCC’00 • Minimize Average Energy/operation 17
Common Design Approaches (Fixed VDD)
Compute ASAP: Excess throughput
Always high throughput time Clock Frequency Reduction:
fCLK Reduced Delivered Throughput Delivered
Energy/operation remains unchanged… time 18 while throughput scaled down with fCLK
9 Scale VDD with Clock Frequency
Constant supply voltage 1 3.3V
~10x Energy 0.5 Reduction
Reduce VDD, slow circuits down. Energy/operation 0 1.1V
00.51Burd Throughput ( f ) ISSCC’00 CLK 19
CMOS Circuits Track Over VDD
1.0 CLK f
Inverter RingOsc 0.5 RegFile SRAM Normalized max. 0 VT 2VT 3VT 4VT VDD Burd Delay tracks within +/- 10% ISSCC’00 20
10 Dynamic Voltage Scaling (DVS)
1 Vary fCLK,VDD 2 Dynamically adapt Delivered Throughput
time
Burd • Dynamically scale energy/operation with throughput. ISSCC’00 • Always minimize speed minimize average energy/operation. • Extend battery life up to 10x with the exact same hardware! 21
Operating System Sets Processor Speed
• DVS requires a voltage scheduler (VS). • VS predicts workload to estimate CPU cycles. • Applications supply completion deadlines. Processor Speed (MPEG)
80
60
CPU cycles (MHz) F time DESIRED 40 20 DESIRED F 0 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Time (sec) 22
11 Converter Loop Sets VDD, fCLK
IDD fCLK
RST Counter
f1MHz Latch Ring Oscillator Processor
FMEAS 7
PENAB F V Set by DES FERR DD N O.S. ENAB L 0110100 + CDD Register Digital Loop Filter Buck converter • Feedback loop sets VDD so that FERR 0. • Ring oscillator delay-matched to CPU critical paths. Burd • Custom loop implementation Can optimize C . ISSCC’00 DD 23
Design Over Wide Range of Voltages
• Circuit design constraints. (Functional verification)
• Circuit delay variation. (Timing verification)
• Noise margin reduction. (Power grid, coupling)
• Delay sensitivity. (Local power distribution)
Design verification complexity similar to
high-performance processor design @ fixed VDD 24
12 Delay Variation & Circuit Constraints
1.0 CLK f Inverter RingOsc 0.5 RegFile SRAM Normalized max. 0 VT 2VT 3VT 4VT VDD Cannot use NMOS pass gates – fails for V < 2V . • DD T Burd • Functional verification only needed at one V value ISSCC’00 DD . 25
Relative Delay Variation Delay relative to ring oscillator +40
Four extreme cases of +20 critical paths:
Gate 0 Interconnect Diffusion All vary monotonically with VDD. Series Percent Delay Variation -20 V 2VT 3VT 4VT T V Burd DD ISSCC’00 • Timing verification only needed at min. & max. VDD. 26
13 Delay Sensitivity
Delay Delay VDD ,() VIVRDD DD Delay VDD Delay() V DD 1
0.8
0.6 Delay / Delay Delay / 0.4
0.2 Burd 0 Normalized ISSCC’00 VT 2VT 3VT 4VT VDD • Design of local power grid (for timing constraints) only need to consider VDD 2VT. 27
Multiple Path Tracking
A. Drake, ISSCC’07 28
14 Alternative: Error Detection
Bull, ISSCC’2010 29
Design for Dynamically Varying VDD
• Static CMOS logic.
• Ring oscillator.
• Dynamic logic (& tri-state busses).
• Sense amp (& memory cell).
Max. allowed |dVDD/dt| Min. CDD = 100nF (0.6m)
Circuits continue to properly operate as VDD changes 30
15 Static CMOS Logic
VDD
rds|PMOS Vin = 0 Vout = VDD
Vout CL
max. = 4ns
0.6m CMOS: |dVDD/dt| < 200V/s
• Static CMOS robustly operates with varying VDD. 31
Ring Oscillator
Simulated with dVDD/dt = 20V/s 4
3
2 VDD Volts 1
fCLK 0 60 80 100 120 140 160 180 200 220 240 260 Time (ns)
• Output fCLK instantaneously adapts to new VDD. 32
16 Dynamic Logic
VDD clk = 1 Errors clk Vout VDD False logic low: VDD > VTP VDD V Vin out
Volts VDD Latch-up: V > V clk DD be
Time
0.6m CMOS: |dVDD/dt| < 20V/s • Cannot gate clock in evaluation state. • Tri-state busses fail similarly Use hold circuit. 33
Measured System Performance & Energy
100
Dynamic VDD 80 x 85 MIPS @ 60 Static VDD 5.6 mW/MIPS (3.8V) 40 6 MIPS @ 20 0.54 mW/MIPS Dhrystone 2.1 MIPS (1.2V) 0 0 1 2 3 4 5 6 Energy (mW/MIPS) Burd ISSCC’00 • Dynamic operation can increase energy efficiency > 10x. 34
17 VDD-Hopping
MPEG-4 encoding Time 1 #n #n+1 Transition 0.8 time between ƒ 0.6 levels Next milestone = 200µs n-th slice finished here 0.4 0.2 Application slicing and software Normalized power feedback guarantee real-time 0 operation. 1 23 8 # of frequency levels Two hopping levels are sufficient. 35
Power /Energy Optimization Space
Constant Throughput/Latency Variable Throughput/Latency
Energy Design Time Sleep Mode Run Time
Logic design Scaled V DFS, DVS Active DD Clock gating Trans. sizing
Multi-VDD
Stack effects Trans sizing Sleep T’s
Leakage Scaling VDD Multi-VDD Variable VTh + Variable VTh
+ Multi-VTh + Input control 36
18 Clock gating
Requires careful skew control ... Well handled in today’s EDA tools
37
Clock-gating efficiently reduces power Without clock gating 30.6mW
With clock gating MPEG4 decoder 8.5mW
DEU 0 5 10 15 20 25 VDE Power [mW] MIF DSP/ 90% of F/F’s were clock-gated. HIF 896Kb SRAM 70% power reduction by clock- gating alone.
Courtesy M. Ohashi, Matsushita, ISSCC 2002 38
19 Local Clock Gating
2 Q CKI 1.2 0.85 0.85 DI 0.5 D 0.85 0.5 0.5
CKIB CKIB 0.5
0.5
0.85 0.5 0.85 0.5 Data-Transition Look-Ahead
Pulse XNOR Generator
CKIB ‘Clock on demand’ 0.85 Flip-flop CKI CP 0.5 39
Power /Energy Optimization Space
Constant Throughput/Latency Variable Throughput/Latency
Energy Design Time Sleep Mode Run Time
Logic design Scaled V DFS, DVS Active DD Clock gating Trans. sizing
Multi-VDD
Stack effects Trans sizing Sleep T’s
Leakage Scaling VDD Multi-VDD Variable VTh + Variable VTh
+ Multi-VTh + Input control 40
20 Circuit-Level Activity Encoding
Conditional Inversion Coding for Interconnect
41
Number Representation
42
21 Next Lecture
Leakage management
43
22