MicroMicro transductorstransductors ’’0808 LowLow PowerPower VLSIVLSI DesignDesign 22
Dr.-Ing. Frank Sill Department of Electrical Engineering, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil
[email protected] http://www.cpdee.ufmg.br/~frank/ AgendaAgenda
Recap Power reduction on Gate level Architecture level Algorithm level System level
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 2 Recap:Recap: ProblemsProblems ofof PowerPower DissipationDissipation
Continuously increasing performance demands
Î Increasing power dissipation of technical devices
Î Today: power dissipation is a main problem
High Power dissipation leads to:
// ReducedReduced time time of of operation operation // HighHigh efforts efforts for for cooling cooling // HigherHigher weight weight (batteries) (batteries) // IncreasingIncreasing operational operational costs costs // ReducedReduced mobility mobility // ReducedReduced reliability reliability
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 3 Recap:Recap: ConsumptionConsumption inin CMOSCMOS
Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second (liter/s) Energy Amount of Water
1
CL 0
Energy consumption is proportional to capacitive load!
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 4 Recap:Recap: EnergyEnergy andand PowerPower
Power is height of curve Watts
Approach 1
Approach 2
time Energy is area under curve Watts
Approach 1
Approach 2
time Energy = Power * time for calculation = Power * Delay
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 5 Recap:Recap: PowerPower EquationsEquations inin CMOSCMOS
2 P = α f CL VDD + VDD Ipeak (P0→1 + P1→0 )+ VDD Ileak
Dynamic power Short-circuit power Leakage power (≈ 40 - 70% today (≈ 10 % today and (≈ 20 – 50 % and decreasing decreasing today and relatively) absolutely) increasing)
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 6 RecapRecap:: LevelsLevels ofof OptimizationOptimization MEM MEM
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 7 nach Massoud Pedram Recap:Recap: LogicLogic RestructuringRestructuring
Logic restructuring: changing the topology of a logic network to reduce transitions
AND: P0→1 = P0 * P1 = (1 - PAPB) * PAPB 3/16 0.5 0.5 (1-0.25)*0.25 = 3/16 A Y A W 7/64 = 0.109 0.5B 15/256 X B 15/256 0.5 F 0.5 C C 0.5 D F 0.5 0.5DZ 3/16 = 0.188
Î Chain implementation has a lower overall switching activity than tree implementation for random inputs BUT: Ignores glitching effects Source: Timmernann, 2007
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 8 Recap:Recap: InputInput OrderingOrdering
(1-0.5x0.2)*(0.5x0.2)=0.09 (1-0.2x0.1)*(0.2x0.1)=0.0196 0.5 0.2 B A X X C B F 0.2 C F 0.1 A 0.1 0.5
AND: P0→1 = (1 - PAPB) * PAPB
Beneficial: postponing introduction of signals with a high transition rate (signals with signal probability close to 0.5)
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 9 Recap:Recap: GlitchingGlitching
A X B C Z
ABC 101 000
X
Z
Unit Delay
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 10 DesignDesign Layer:Layer: GateGate LevelLevel
Basic elements: Logic gates Sequential elements (flipflops, latches) Behavior of elements is described in libraries
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 11 DynamicDynamic PowerPower andand DeviceDevice SizeSize
Device Sizing (= changing gate width)
Î Affects input capacitance Cin 1.5 Î Affects load capacitance Cload fcircuit=1
Î Affects dynamic power consumption Pdyn f =2 Optimal fanout factor f for Pdyn is smaller 1 circuit than for performance (especially for fcircuit=5 large loads) normalized energy 0.5 f =10 e.g., for Cload=20, Cin=1 circuit
Î fcircuit = 20 fcircuit=20 Î fopt_energy = 3.53 0 1 2 3 4 5 6 7 Î fopt_performance = 4.47 fanout f For Low Power: avoid oversizing (f too big) beyond the optimal Source: Nikolic, UCB
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 12 VVDD versusversus DelayDelay andand PowerPower
6 10
d 5 8
4 dyn 6 t Pdyn 3 d 4 2 Relative P Relative
Relative Delay t Delay Relative 1 2 0 0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Supply voltage (VDD)
Delay (td) and dynamic power consumption (Pdyn) are functions of VDD
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 13 MultipleMultiple VVDD
Main ideas:
Use of different supply voltages within the same design High VDD for critical parts (high performance needed) Low VDD for non-critical parts (only low performance demands)
At design phase:
Determine critical path(s) (see upper next slide)
High VDD for gates on those paths
Lower VDD on the other gates (in non-critical paths) For low VDD: prefer gates that drive large capacitances (yields the largest energy benefits)
Usually two different VDD (but more are possible)
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 14 MultipleMultiple VVDD contcont’’dd
Level converters: Necessary, when module at lower supply drives gate at higher supply (step-up)
If gate supplied with VDDL drives a gate supplied with VDDH Î then PMOS never turns off Possible implementation: VDDH Cross-coupled PMOS transistors NMOS transistor operate on VDDL V reduced supply out Vin No need of level converters for step-down change in voltage Reducing of overhead: Conversions at register boundaries Embedding of inside flipflop
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 15 DataData PathsPaths
Data propagate through different data paths between registers (flipflops - FF) Paths mostly differ in propagation delay times Frequency of clock signal (CLK) depends on path with longest delay Î critical path
Paths Path
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 16 DataData Paths:Paths: SlackSlack
A
G1 ready with B evaluation
Y all inputs of G2 all Inputs of G1 arrived arrived C
delay of G1 Slack for G1 time
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 17 MultipleMultiple VVDD inin DataData PathsPaths
Minimum energy consumption when all logic paths are critical (same delay) Possible Algorithm: clustered voltage-scaling
Each path starts with VDDH and switches to VDDL (blue gates) when slack is available Level conversion in flipflops at end of paths
Connected with VDDL
Connected with VDDH
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 18 DesignDesign Layer:Layer: ArchitectureArchitecture LevelLevel
Also known as Register transfer level (RTL) Base elements: Register structures Arithmetic logic units (ALU) Memory elements Only behavior is described (no inner structure)
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 19 ClockClock GatingGating
Most popular method for power reduction of clock signals and functional units Gate off clock to idle functional units Logic for generation of disable signal necessary R Higher complexity of control logic Functional ' e unit ' Higher power consumption g ' Critical timing critical for avoiding of clock glitches at OR gate output ' Additional gate delay on clock signal clock disable
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 20 ClockClock GatingGating contcont’’dd
Clock-Gating in Low-Power Flip-Flop
D D Q
CLK
Source: Agarwal, 2007
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 21 ClockClock GatingGating contcont’’dd
Clock gating over consideration of state in Finite-State- Machines (FSM)
PI
Combinational logic PO Flip-flops
Clock activation Latch logic Source: L. Benini and G. De Micheli, CLK Dynamic Power Management, Boston: Springer, 1998.
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 22 ClockClock Gating:Gating: ExampleExample
Without clock gating 30.6mW
With clock gating
DEU 8.5mW VDE
MIF 0 5 10 15 20 25 DSP/ Power [mW] HIF 896Kb SRAM 90% of FlipFlops clock-gated
70% power reduction by clock-gating MPEG4 decoder
Source: M. Ohashi, Matsushita, 2002
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 23 Recap:Recap: VVDD versusversus DelayDelay andand PowerPower
6 10
d 5 8
4 dyn 6 t Pdyn 3 d 4 2 Relative P Relative
Relative Delay t Delay Relative 1 2 0 0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Supply voltage (VDD)
Dynamic Power can be traded by delay
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 24 AA ReferenceReference DatapathDatapath
Combinational Output Input logic Register Register
Cref CLK
Supply voltage = Vref Total capacitance switched per cycle = Cref Clock frequency = fClk 2 Power consumption: Pref = CrefVref fclk
Source: Agarwal, 2007
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 25 ParallelParallel ArchitectureArchitecture
Supply voltage: Each copy processes Comb. VN ≤ Vref every Nth input, Logic operates at f /N Copy 1 N = Deg. of clk Register reduced voltage parallelism Comb. Logic Output Input Copy 2 Register Register fclk/N fclk N to 1 multiplexer
Multiphase Comb. Clock gen. Logic f /N Copy N and mux clk Register control CK Source: Agarwal, 2007
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 26 PipelinedPipelined ArchitectureArchitecture
Reduces the propagation time of a block by factor N Î Voltage can be reduced at constant clock frequency Constant throughput
Area A A/N A/N A/N
CLK CLK
Functionality:
Data
CLK
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 27 ParallelParallel Architecture:Architecture: ExampleExample
Reference Data path (for example)
A
B
Critical path delay Tadder + Tcomparator (= 25 ns) Î fref = 40 MHz
Total capacitance being switched = Cref
VDD = Vref = 5V 2 Power for reference datapath = Pref = Cref Vref fref
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 28 ParallelParallel Architecture:Architecture: ExampleExample contcont’’dd
Area = 1476 x 1219 µ2
The clock rate can be reduced by half with the same throughput fpar = fref / 2
Vpar = Vref / 1.7, Cpar = 2.15 Cref 2 Ppar = (2.15 Cref) (Vref / 1.7) (fref / 2) = 0.36 Pref
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 29 PipelinedPipelined Architecture:Architecture: ExampleExample
fpipe = fref, , Cpipe = 1.1 Cref , Vpipe = Vref / 1.7 Voltage can be dropped while maintaining the original throughput 2 2 Ppipe = CpipeVpipe fpipe = (1.1 Cref) (Vref/1.7) fref = 0.37 Pref
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 30 ApproximateApproximate TrendTrend
N-parallel proc. N-stage pipeline proc.
Capacitance N*Cref Cref
Voltage Vref/N Vref/N
Frequency fref/N fref
2 2 2 2 Dynamic Power CrefVref fref/N CrefVref fref/N
Chip area N times 10-20% increase
Source: G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 31 GuardedGuarded EvaluationEvaluation
Reduction of switching activity by adding latches at inputs
A A
B B Multiplier Multiplier C C Latch condition condition Latch preserves previous value of inputs to suppress activity Could also use AND gates to mask inputs to zero = forced zero
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 32 PrecomputationPrecomputation
Precomputed inputs R1
Combination Outputs Gated logic f(X) R inputs 2
Load Precomputation g(X) disable logic
Identify logical conditions at inputs that are invariant to the output Since those inputs don’t affect output, disable input transitions Trade area for energy
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 33 PrecomputationPrecomputation:: DesignDesign IssuesIssues
Design steps 1. Selection of precomputation architecture 2. Determination of precomputed and gated inputs (Register R1 should be much smaller than R2) 3. Search good implementation for g(X) 4. Evaluation of potential energy savings based on input statistics (if savings not sufficient go to step 2 or 3 and try again) Also works for multiple output functions where g(X) is the product of gj(X) over all j
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 34 PrecomputationPrecomputation:: ExampleExample
Binary Comparator
An R1 Bn n-bit binary value An-1 comparator A > B Bn-1 A > B R2 A1 B1 Load disable
An = Bn Can achieve up to 75% power reduction with 3% area overhead and 1 to 5 additional gate delays in worst case path Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 35 AdderAdder DesignDesign
Various algorithms exist to implement an integer adder Ripple, select, skip (x2), Look-ahead, conditional-sum. Each with its own characteristics of timing and power consumption. Ripple Carry Carry Select
FA FA FA FA FA FA FAFA FAFA 0 FA 1
Variable/Fixed Width Carry Skip
Carry Look-ahead FA FA FA FA
FA FA FA FA
Source: Mendelson, Intel
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 36 AdderAdder DesignDesign
Energy Delay (pJ) (nSec) Ripple Carry 117 54.27 Constant Width Carry Skip 109 28.38 Variable Width Carry Skip 126 21.84 Carry Lookahead 171 17.13 Carry Select 216 19.56 Conditional Sum 304 20.05
Adders differ in Energy and delay Î Different adders for different applications Î Also true for other units (multiplier, counter, …)
Source: Callaway, Swartzlander “Estimating the power consumption of CMOS adders” - 11th Symposium on Computer Arithmetic, 1993. Proceedings.
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 37 BusBus PowerPower
Buses are significant source of power dissipation 50% of dynamic power for interconnect switching (Magen, SLIP 04) MIT Raw processor’s on-chip network consumes 36% of total chip power (Wang et al. 2003) Caused by: High switching activities Large capacitive loading
W X Y Z out out out out Bus receivers Bus Bus drivers Ain Bin Cin Din Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 38 BusBus PowerPower ReductionReduction
2 For an n-bit bus: Pbus = n* αfClkCloadVDD Alternative bus structures
Segmented buses (lower Cload) Charge recovery buses
Bus multiplexing (lower fClk possible) Minimizing bus traffic (n) Code compression Instruction loop buffers
Minimization of bit switching activity (fclk) by data encoding 2 Minimize voltage swing (VDD ) using differential signaling
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 39 ReducingReducing SharedShared ResourcesResources
Shared resources incur switching overhead Local bus structures reduce overhead
Global bus architecture Local bus architecture
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 40 ReducingReducing SharedShared ResourcesResources contcont’’dd
Bus segmentation Another way to reduce shared buses Control of bus segment by controller blocks (B)
Shared Bus
B
Segmented Bus
B
Source: Evgeny Bolotin – Jan 2004
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 41 DesignDesign Layer:Layer: AlgorithmAlgorithm LevelLevel
Base elements: Functions Procedures Processes Control structures Description of design behavior
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 42 CodingCoding stylesstyles
Use processor-specific instruction style: Variable types Function calls style Conditionalized instructions (for ARM) Follow general guidelines for software coding Use table look-up instead of conditionals Make local copies of global variables so that they can be assigned to registers Avoid multiple memory look-ups with pointer chains
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 43 SourceSource--codecode TransformationsTransformations
Minimize power-consuming activity: Computation
A*B+A*C A*(B+C)
Communication
for (c = 1..N) receive (A) receive (A) for (c = 1..N) B=c*A B=c*A
Storage
for (c = 1..N) B[c] = A[c]*D[c] for (c = 1..N) for (c = 1..N) F[c] = A[c]*D[c]-1 F[c] = B[c]-1
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 44 DatapathDatapath EnergyEnergy ConsumptionConsumption
14000
12000
10000 Others 8000 Functional Unit Pipeline Registers 6000 Register File 4000
Switched Capacitance (nF) 2000
0 bubble.c heap.c quick.c
Î Algorithms can differ in power dissipation
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 45 AdaptiveAdaptive DynamicDynamic VoltageVoltage ScalingScaling (DVS)(DVS)
Slow down processor to fill idle time More Delay Î lower operational voltage
Active Idle Active Idle 3.3 V Active 2.4 V
Runtime Scheduler determines processor speed and selects appropriate voltage Transitions delay for frequencies ~150μs Potential to realize 10x energy savings
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 46 AdaptiveAdaptive DVS:DVS: ExampleExample
Task with 100 ms deadline, requires 50 ms CPU time at full speed Normal system gives 50 ms computation, 50 ms idle/stopped time Half speed/voltage system gives 100 ms computation, 0 ms idle 2 Same number of CPU cycles but: E = C (VDD/2) = Eref / 4 Dynamic Voltage Scaling adapts voltage to workload
T1 T2 T1 T2
Same work, lower energy Task Idle Speed Task
Time Time
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 47 DesignDesign Layer:Layer: SystemSystem LevelLevel
Basic Elements:
Complex modules
Processors
Calculation and control units
Sensors ALU M M E E M M MP3
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 48 DynamicDynamic PowerPower ManagementManagement
Systems are: Designed to deliver peak performance, but … Not needing peak performance most of the time Components are idle sometimes Dynamic power management (DPM): Puts idle components in low-power non-operational states when idle Power manager: Observes and controls the system Power consumption of power manager is negligible
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 49 ProcessorProcessor SleepSleep ModesModes
Software power control - power management DOZE Most units stopped except on-chip cache memory (cache coherency) NAP Cache also turned off, PLL still on, time out or external interrupt to resume SLEEP PLL off, external interrupt to resume
Deeper sleep mode requires Deeper sleep mode consumes more latency to resume less power
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 50 ProcessorProcessor SleepSleep Modes:Modes: ExampleExample
PowerPC sleep modes Mode 66Mhz 80Mhz No power mgmt 2.18W 2.54W Dynamic power mgmt 1.89W 2.20W DOZE 307mW 366mW NAP 113mW 135mW SLEEP 89mW 105mW SLEEP without PLL 18mW 19mW SLEEP without clock 2mW 2mW
10 cycles to wake up from SLEEP 100us to wake up from SLEEP+
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 51 TransmetaTransmeta LongRunLongRun
Applies adaptive DVS LongRun policies: Detection of different workload scenarios Based on runtime performance information After detection Î accordingly adaptation of: Processor supply voltage Processor frequency Clock frequency always within limits required by supply voltage to avoid clock skew problems Use of core frequency/voltage hard coded operating points
Î Best trade-off between performance and power possible
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 52 TransmetaTransmeta LongRunLongRun contcont’’dd
100 90 80 70 60 50 40 30 20 10 % of max powerl consumption powerl % of max Typical operating region Peak performance region 0 300 400 500 600 700 800 900 1000 300 Mhz 433 Mhz 533 Mhz 667 Mhz 800 Mhz 900 Mhz 1000 Mhz 0.80 V 0.87 V 0.95 V 1.05 V 1.15 V 1.25 V 1.30 V Frequency (MHz) Source: Transmeta
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 53 TransmetaTransmeta LongRunLongRun:: ExampleExample
Source: Transmeta
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 54 BatteryBattery awareaware designdesign
Non-linear effects influence life time of batteries 1000 800 1000 mAh “Rate Capacity” (Standard 600 Capacity) If discharging currents 400 higher than allowed 125mA 200 ( Rated Current)
Îreal capacity goes under Capacity (mAh) nominal capacity Discharge current (mA)
“Battery Recovery” Available Pulsed discharge increases Charge nominal capacity (mA) time Based on recovery times Discharge (as long there is no rate Current idle capacity effect) (mA) time
Source: Timmermann, 2007
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 55 BatteryBattery awareaware designdesign contcont’’dd
Diffusion Model from - Rakhmatov, Vrudula et al.
e e
d d
o o
r r t
Fully t After a recent
c c e
e discharge l
charged l E
battery E
e
e
d d
o Fully
o r
After r
t
t c
Recovery c discharged
e
e
l
l
E E Electro-active species
Analytically very sound but computationally intensive Cannot be used for online scheduling decisions.
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 56 BatteryBattery awareaware design:design: ExampleExample 11
Performance of a bipolar lead-acid battery subjected to six current impulses. Pulse length=3 ms, rest period=22 ms.
Current Battery Voltage
Source: LaFollette, “Design and performance of high specific power, pulsed discharge, bipolar lead acid batteries”, 10th Annual Battery Conference on Applications and Advances, Long Beach, pp. 43–47, January 1995.
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 57 BatteryBattery awareaware design:design: ExampleExample 22 Current [mA] Current [mA]
Discharge profile A Discharge profile B
Profile Aver. Current [mA] Battery lifetime [ms] Specif. energy [Wh/Kg]
A 123.8 357053 15.12 B 124.2 536484 18.58
Î Minimum average current ≠ Maximum battery life time Source: Timmermann, 2007
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 58 BackupBackup
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 59 FSM:FSM: ClockClock--GatingGating
Moore machine: Outputs depend only on the state variables. If a state has a self-loop in the state transition graph (STG), then clock can be stopped whenever a self- loop is to be executed.
Xi/Zk Si Sk Xk/Zk
Clock can be stopped Sj Xj/Zk when (Xk, Sk) combination occurs.
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 60 Trend:Trend: InterconnectsInterconnects
Interconnects
Propagation delays of global wires will be a multiple of the clock cycle.
Example (very optimistic): 6–10 clock cycles in 50nm technology [Benini, 2002]
Source: Tenhunen, 2005
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 61 BusBus MultiplexingMultiplexing
or
Number of bus transitions per cycle = 2 (1 + 1/2 + 1/4 + ...) = 4
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 62 ResourceResource SharingSharing andand ActivityActivity IIII
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 63 BusBus MultiplexingMultiplexing
Sharing of long data buses with time multiplexing Example:
S1 uses even cycles
S2 odd
D D1 S1 1 S1
S2 D2 S2 D2
Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 64 CorrelatedCorrelated DataData StreamsStreams
For a shared (multiplexed) bus Muxed advantages of data correlation are Dedicated lost (bus carries samples from two uncorrelated data streams)
1 Bus sharing should not be used for positively correlated data streams Bus sharing may prove 0,5 advantageous in a negatively correlated data stream (where successive samples switch sign Bit switching probabilities Bit switching bits) - more random switching 0 14 12 10 8 6 4 2 0
MSB LSB Bit position Source: Irwin, 2000
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 65 DisadvantagesDisadvantages ofof BusBus MultiplexingMultiplexing
If data bus is shared, advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) Bus sharing should not be used for positively correlated data streams Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 66 AdaptiveAdaptive DVSDVS contcont’’dd
Implementation
Power-Speed Control Knob Workload Filter
Variable Power-Speed System FIFO Input Buffer
Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 67