MicroMicro transductorstransductors ’’0808 LowLow PowerPower VLSIVLSI DesignDesign 22

Dr.-Ing. Frank Sill Department of Electrical Engineering, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil

[email protected] http://www.cpdee.ufmg.br/~frank/ AgendaAgenda

„ Recap „ Power reduction on Gate level Architecture level Algorithm level System level

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 2 Recap:Recap: ProblemsProblems ofof PowerPower DissipationDissipation

„ Continuously increasing performance demands

Î Increasing power dissipation of technical devices

Î Today: power dissipation is a main problem

„ High Power dissipation leads to:

// ReducedReduced time time of of operation operation // HighHigh efforts efforts for for cooling cooling // HigherHigher weight weight (batteries) (batteries) // IncreasingIncreasing operational operational costs costs // ReducedReduced mobility mobility // ReducedReduced reliability reliability

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 3 Recap:Recap: ConsumptionConsumption inin CMOSCMOS

„ Voltage (Volt, V) Water pressure (bar) „ Current (Ampere, A) Water quantity per second (liter/s) „ Energy Amount of Water

1

CL 0

Energy consumption is proportional to capacitive load!

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 4 Recap:Recap: EnergyEnergy andand PowerPower

Power is height of curve Watts

Approach 1

Approach 2

time Energy is area under curve Watts

Approach 1

Approach 2

time Energy = Power * time for calculation = Power * Delay

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 5 Recap:Recap: PowerPower EquationsEquations inin CMOSCMOS

2 P = α f CL VDD + VDD Ipeak (P0→1 + P1→0 )+ VDD Ileak

Dynamic power Short-circuit power Leakage power (≈ 40 - 70% today (≈ 10 % today and (≈ 20 – 50 % and decreasing decreasing today and relatively) absolutely) increasing)

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 6 RecapRecap:: LevelsLevels ofof OptimizationOptimization MEM MEM

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 7 nach Massoud Pedram Recap:Recap: LogicLogic RestructuringRestructuring

ƒ Logic restructuring: changing the topology of a logic network to reduce transitions

AND: P0→1 = P0 * P1 = (1 - PAPB) * PAPB 3/16 0.5 0.5 (1-0.25)*0.25 = 3/16 A Y A W 7/64 = 0.109 0.5B 15/256 X B 15/256 0.5 F 0.5 C C 0.5 D F 0.5 0.5DZ 3/16 = 0.188

Î Chain implementation has a lower overall switching activity than tree implementation for random inputs ƒ BUT: Ignores glitching effects Source: Timmernann, 2007

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 8 Recap:Recap: InputInput OrderingOrdering

(1-0.5x0.2)*(0.5x0.2)=0.09 (1-0.2x0.1)*(0.2x0.1)=0.0196 0.5 0.2 B A X X C B F 0.2 C F 0.1 A 0.1 0.5

AND: P0→1 = (1 - PAPB) * PAPB

Beneficial: postponing introduction of signals with a high transition rate (signals with signal probability close to 0.5)

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 9 Recap:Recap: GlitchingGlitching

A X B C Z

ABC 101 000

X

Z

Unit Delay

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 10 DesignDesign Layer:Layer: GateGate LevelLevel

„ Basic elements: Logic gates Sequential elements (flipflops, latches) „ Behavior of elements is described in libraries

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 11 DynamicDynamic PowerPower andand DeviceDevice SizeSize

„ Device Sizing (= changing gate width)

Î Affects input capacitance Cin 1.5 Î Affects load capacitance Cload fcircuit=1

Î Affects dynamic power consumption Pdyn f =2 „ Optimal fanout factor f for Pdyn is smaller 1 circuit than for performance (especially for fcircuit=5 large loads) normalized energy 0.5 f =10 e.g., for Cload=20, Cin=1 circuit

Î fcircuit = 20 fcircuit=20 Î fopt_energy = 3.53 0 1 2 3 4 5 6 7 Î fopt_performance = 4.47 fanout f „ For Low Power: avoid oversizing (f too big) beyond the optimal Source: Nikolic, UCB

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 12 VVDD versusversus DelayDelay andand PowerPower

6 10

d 5 8

4 dyn 6 t Pdyn 3 d 4 2 Relative P Relative

Relative Delay t Delay Relative 1 2 0 0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Supply voltage (VDD)

„ Delay (td) and dynamic power consumption (Pdyn) are functions of VDD

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 13 MultipleMultiple VVDD

„ Main ideas:

Use of different supply voltages within the same design High VDD for critical parts (high performance needed) Low VDD for non-critical parts (only low performance demands)

„ At design phase:

Determine critical path(s) (see upper next slide)

High VDD for gates on those paths

Lower VDD on the other gates (in non-critical paths) For low VDD: prefer gates that drive large capacitances (yields the largest energy benefits)

„ Usually two different VDD (but more are possible)

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 14 MultipleMultiple VVDD contcont’’dd

„ Level converters: Necessary, when module at lower supply drives gate at higher supply (step-up)

If gate supplied with VDDL drives a gate supplied with VDDH Î then PMOS never turns off Possible implementation: VDDH „ Cross-coupled PMOS transistors „ NMOS transistor operate on VDDL V reduced supply out Vin No need of level converters for step-down change in voltage Reducing of overhead: „ Conversions at register boundaries „ Embedding of inside flipflop

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 15 DataData PathsPaths

„ Data propagate through different data paths between registers (flipflops - FF) „ Paths mostly differ in propagation delay times „ Frequency of clock signal (CLK) depends on path with longest delay Î critical path

Paths Path

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 16 DataData Paths:Paths: SlackSlack

A

G1 ready with B evaluation

Y all inputs of G2 all Inputs of G1 arrived arrived C

delay of G1 Slack for G1 time

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 17 MultipleMultiple VVDD inin DataData PathsPaths

„ Minimum energy consumption when all logic paths are critical (same delay) „ Possible Algorithm: clustered voltage-scaling

Each path starts with VDDH and switches to VDDL (blue gates) when slack is available Level conversion in flipflops at end of paths

Connected with VDDL

Connected with VDDH

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 18 DesignDesign Layer:Layer: ArchitectureArchitecture LevelLevel

„ Also known as Register transfer level (RTL) „ Base elements: Register structures Arithmetic logic units (ALU) Memory elements „ Only behavior is described (no inner structure)

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 19 ClockClock GatingGating

„ Most popular method for power reduction of clock signals and functional units „ Gate off clock to idle functional units „ Logic for generation of disable signal necessary R Higher complexity of control logic Functional ' e unit ' Higher power consumption g ' Critical timing critical for avoiding of clock glitches at OR gate output ' Additional gate delay on clock signal clock disable

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 20 ClockClock GatingGating contcont’’dd

„ Clock-Gating in Low-Power Flip-Flop

D D Q

CLK

Source: Agarwal, 2007

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 21 ClockClock GatingGating contcont’’dd

„ over consideration of state in Finite-State- Machines (FSM)

PI

Combinational logic PO Flip-flops

Clock activation Latch logic Source: L. Benini and G. De Micheli, CLK Dynamic , Boston: Springer, 1998.

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 22 ClockClock Gating:Gating: ExampleExample

Without clock gating 30.6mW

With clock gating

DEU 8.5mW VDE

MIF 0 5 10 15 20 25 DSP/ Power [mW] HIF 896Kb SRAM ƒ 90% of FlipFlops clock-gated

ƒ 70% power reduction by clock-gating MPEG4 decoder

Source: M. Ohashi, Matsushita, 2002

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 23 Recap:Recap: VVDD versusversus DelayDelay andand PowerPower

6 10

d 5 8

4 dyn 6 t Pdyn 3 d 4 2 Relative P Relative

Relative Delay t Delay Relative 1 2 0 0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Supply voltage (VDD)

Dynamic Power can be traded by delay

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 24 AA ReferenceReference DatapathDatapath

Combinational Output Input logic Register Register

Cref CLK

Supply voltage = Vref Total capacitance switched per cycle = Cref Clock frequency = fClk 2 Power consumption: Pref = CrefVref fclk

Source: Agarwal, 2007

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 25 ParallelParallel ArchitectureArchitecture

Supply voltage: Each copy processes Comb. VN ≤ Vref every Nth input, Logic operates at f /N Copy 1 N = Deg. of clk Register reduced voltage parallelism Comb. Logic Output Input Copy 2 Register Register fclk/N fclk N to 1 multiplexer

Multiphase Comb. Clock gen. Logic f /N Copy N and mux clk Register control CK Source: Agarwal, 2007

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 26 PipelinedPipelined ArchitectureArchitecture

„ Reduces the propagation time of a block by factor N Î Voltage can be reduced at constant clock frequency „ Constant throughput

Area A A/N A/N A/N

CLK CLK

„ Functionality:

Data

CLK

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 27 ParallelParallel Architecture:Architecture: ExampleExample

„ Reference Data path (for example)

A

B

„ Critical path delay Tadder + Tcomparator (= 25 ns) Î fref = 40 MHz

„ Total capacitance being switched = Cref

„ VDD = Vref = 5V 2 „ Power for reference datapath = Pref = Cref Vref fref

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 28 ParallelParallel Architecture:Architecture: ExampleExample contcont’’dd

Area = 1476 x 1219 µ2

„ The clock rate can be reduced by half with the same throughput fpar = fref / 2

„ Vpar = Vref / 1.7, Cpar = 2.15 Cref 2 „ Ppar = (2.15 Cref) (Vref / 1.7) (fref / 2) = 0.36 Pref

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 29 PipelinedPipelined Architecture:Architecture: ExampleExample

„ fpipe = fref, , Cpipe = 1.1 Cref , Vpipe = Vref / 1.7 „ Voltage can be dropped while maintaining the original throughput 2 2 „ Ppipe = CpipeVpipe fpipe = (1.1 Cref) (Vref/1.7) fref = 0.37 Pref

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 30 ApproximateApproximate TrendTrend

N-parallel proc. N-stage pipeline proc.

Capacitance N*Cref Cref

Voltage Vref/N Vref/N

Frequency fref/N fref

2 2 2 2 Dynamic Power CrefVref fref/N CrefVref fref/N

Chip area N times 10-20% increase

Source: G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 31 GuardedGuarded EvaluationEvaluation

„ Reduction of switching activity by adding latches at inputs

A A

B B Multiplier Multiplier C C Latch condition condition „ Latch preserves previous value of inputs to suppress activity „ Could also use AND gates to mask inputs to zero = forced zero

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 32 PrecomputationPrecomputation

Precomputed inputs R1

Combination Outputs Gated logic f(X) R inputs 2

Load Precomputation g(X) disable logic

„ Identify logical conditions at inputs that are invariant to the output Since those inputs don’t affect output, disable input transitions Trade area for energy

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 33 PrecomputationPrecomputation:: DesignDesign IssuesIssues

„ Design steps 1. Selection of precomputation architecture 2. Determination of precomputed and gated inputs (Register R1 should be much smaller than R2) 3. Search good implementation for g(X) 4. Evaluation of potential energy savings based on input statistics (if savings not sufficient go to step 2 or 3 and try again) „ Also works for multiple output functions where g(X) is the product of gj(X) over all j

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 34 PrecomputationPrecomputation:: ExampleExample

„ Binary Comparator

An R1 Bn n-bit binary value An-1 comparator A > B Bn-1 A > B R2 A1 B1 Load disable

An = Bn Can achieve up to 75% power reduction with 3% area overhead and 1 to 5 additional gate delays in worst case path Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 35 AdderAdder DesignDesign

„ Various algorithms exist to implement an integer adder Ripple, select, skip (x2), Look-ahead, conditional-sum. Each with its own characteristics of timing and power consumption. Ripple Carry Carry Select

FA FA FA FA FA FA FAFA FAFA 0 FA 1

Variable/Fixed Width Carry Skip

Carry Look-ahead FA FA FA FA

FA FA FA FA

Source: Mendelson, Intel

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 36 AdderAdder DesignDesign

Energy Delay (pJ) (nSec) Ripple Carry 117 54.27 Constant Width Carry Skip 109 28.38 Variable Width Carry Skip 126 21.84 Carry Lookahead 171 17.13 Carry Select 216 19.56 Conditional Sum 304 20.05

„ Adders differ in Energy and delay Î Different adders for different applications Î Also true for other units (multiplier, counter, …)

Source: Callaway, Swartzlander “Estimating the power consumption of CMOS adders” - 11th Symposium on Computer Arithmetic, 1993. Proceedings.

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 37 BusBus PowerPower

„ Buses are significant source of power dissipation 50% of dynamic power for interconnect switching (Magen, SLIP 04) MIT Raw processor’s on-chip network consumes 36% of total chip power (Wang et al. 2003) „ Caused by: High switching activities Large capacitive loading

W X Y Z out out out out Bus receivers Bus Bus drivers Ain Bin Cin Din Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 38 BusBus PowerPower ReductionReduction

2 „ For an n-bit bus: Pbus = n* αfClkCloadVDD „ Alternative bus structures

Segmented buses (lower Cload) Charge recovery buses

Bus multiplexing (lower fClk possible) „ Minimizing bus traffic (n) Code compression Instruction loop buffers

„ Minimization of bit switching activity (fclk) by data encoding 2 „ Minimize voltage swing (VDD ) using differential signaling

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 39 ReducingReducing SharedShared ResourcesResources

„ Shared resources incur switching overhead „ Local bus structures reduce overhead

Global bus architecture Local bus architecture

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 40 ReducingReducing SharedShared ResourcesResources contcont’’dd

„ Bus segmentation Another way to reduce shared buses Control of bus segment by controller blocks (B)

Shared Bus

B

Segmented Bus

B

Source: Evgeny Bolotin – Jan 2004

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 41 DesignDesign Layer:Layer: AlgorithmAlgorithm LevelLevel

„ Base elements: Functions Procedures Processes Control structures „ Description of design behavior

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 42 CodingCoding stylesstyles

„ Use processor-specific instruction style: Variable types Function calls style Conditionalized instructions (for ARM) „ Follow general guidelines for software coding Use table look-up instead of conditionals Make local copies of global variables so that they can be assigned to registers Avoid multiple memory look-ups with pointer chains

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 43 SourceSource--codecode TransformationsTransformations

„ Minimize power-consuming activity: Computation

A*B+A*C A*(B+C)

Communication

for (c = 1..N) receive (A) receive (A) for (c = 1..N) B=c*A B=c*A

Storage

for (c = 1..N) B[c] = A[c]*D[c] for (c = 1..N) for (c = 1..N) F[c] = A[c]*D[c]-1 F[c] = B[c]-1

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 44 DatapathDatapath EnergyEnergy ConsumptionConsumption

14000

12000

10000 Others 8000 Functional Unit Pipeline Registers 6000 Register File 4000

Switched Capacitance (nF) 2000

0 bubble.c heap.c quick.c

Î Algorithms can differ in power dissipation

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 45 AdaptiveAdaptive DynamicDynamic VoltageVoltage ScalingScaling (DVS)(DVS)

„ Slow down processor to fill idle time „ More Delay Î lower operational voltage

Active Idle Active Idle 3.3 V Active 2.4 V

„ Runtime Scheduler determines processor speed and selects appropriate voltage „ Transitions delay for frequencies ~150μs „ Potential to realize 10x energy savings

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 46 AdaptiveAdaptive DVS:DVS: ExampleExample

„ Task with 100 ms deadline, requires 50 ms CPU time at full speed Normal system gives 50 ms computation, 50 ms idle/stopped time Half speed/voltage system gives 100 ms computation, 0 ms idle 2 Same number of CPU cycles but: E = C (VDD/2) = Eref / 4 adapts voltage to workload

T1 T2 T1 T2

Same work, lower energy Task Idle Speed Task

Time Time

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 47 DesignDesign Layer:Layer: SystemSystem LevelLevel

„ Basic Elements:

Complex modules

Processors

Calculation and control units

Sensors ALU M M E E M M MP3

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 48 DynamicDynamic PowerPower ManagementManagement

„ Systems are: Designed to deliver peak performance, but … Not needing peak performance most of the time „ Components are idle sometimes „ Dynamic power management (DPM): Puts idle components in low-power non-operational states when idle „ Power manager: Observes and controls the system Power consumption of power manager is negligible

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 49 ProcessorProcessor SleepSleep ModesModes

„ Software power control - power management DOZE Most units stopped except on-chip cache memory (cache coherency) NAP Cache also turned off, PLL still on, time out or external interrupt to resume SLEEP PLL off, external interrupt to resume

Deeper sleep mode requires Deeper sleep mode consumes more latency to resume less power

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 50 ProcessorProcessor SleepSleep Modes:Modes: ExampleExample

„ PowerPC sleep modes Mode 66Mhz 80Mhz No power mgmt 2.18W 2.54W Dynamic power mgmt 1.89W 2.20W DOZE 307mW 366mW NAP 113mW 135mW SLEEP 89mW 105mW SLEEP without PLL 18mW 19mW SLEEP without clock 2mW 2mW

10 cycles to wake up from SLEEP 100us to wake up from SLEEP+

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 51 TransmetaTransmeta LongRunLongRun

„ Applies adaptive DVS „ LongRun policies: Detection of different workload scenarios Based on runtime performance information „ After detection Î accordingly adaptation of: Processor supply voltage Processor frequency Clock frequency always within limits required by supply voltage to avoid clock skew problems „ Use of core frequency/voltage hard coded operating points

Î Best trade-off between performance and power possible

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 52 TransmetaTransmeta LongRunLongRun contcont’’dd

100 90 80 70 60 50 40 30 20 10 % of max powerl consumption powerl % of max Typical operating region Peak performance region 0 300 400 500 600 700 800 900 1000 300 Mhz 433 Mhz 533 Mhz 667 Mhz 800 Mhz 900 Mhz 1000 Mhz 0.80 V 0.87 V 0.95 V 1.05 V 1.15 V 1.25 V 1.30 V Frequency (MHz) Source:

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 53 TransmetaTransmeta LongRunLongRun:: ExampleExample

Source: Transmeta

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 54 BatteryBattery awareaware designdesign

„ Non-linear effects influence life time of batteries 1000 800 1000 mAh „ “Rate Capacity” (Standard 600 Capacity) If discharging currents 400 higher than allowed 125mA 200 ( Rated Current)

Îreal capacity goes under Capacity (mAh) nominal capacity Discharge current (mA)

„ “Battery Recovery” Available Pulsed discharge increases Charge nominal capacity (mA) time Based on recovery times Discharge (as long there is no rate Current idle capacity effect) (mA) time

Source: Timmermann, 2007

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 55 BatteryBattery awareaware designdesign contcont’’dd

Diffusion Model from - Rakhmatov, Vrudula et al.

e e

d d

o o

r r t

Fully t After a recent

c c e

e discharge l

charged l E

battery E

e

e

d d

o Fully

o r

After r

t

t c

Recovery c discharged

e

e

l

l

E E Electro-active species

„ Analytically very sound but computationally intensive „ Cannot be used for online scheduling decisions.

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 56 BatteryBattery awareaware design:design: ExampleExample 11

„ Performance of a bipolar lead-acid battery subjected to six current impulses. Pulse length=3 ms, rest period=22 ms.

Current Battery Voltage

Source: LaFollette, “Design and performance of high specific power, pulsed discharge, bipolar lead acid batteries”, 10th Annual Battery Conference on Applications and Advances, Long Beach, pp. 43–47, January 1995.

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 57 BatteryBattery awareaware design:design: ExampleExample 22 Current [mA] Current [mA]

Discharge profile A Discharge profile B

Profile Aver. Current [mA] Battery lifetime [ms] Specif. energy [Wh/Kg]

A 123.8 357053 15.12 B 124.2 536484 18.58

Î Minimum average current ≠ Maximum battery life time Source: Timmermann, 2007

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 58 BackupBackup

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 59 FSM:FSM: ClockClock--GatingGating

„ Moore machine: Outputs depend only on the state variables. If a state has a self-loop in the state transition graph (STG), then clock can be stopped whenever a self- loop is to be executed.

Xi/Zk Si Sk Xk/Zk

Clock can be stopped Sj Xj/Zk when (Xk, Sk) combination occurs.

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 60 Trend:Trend: InterconnectsInterconnects

Interconnects

Propagation delays of global wires will be a multiple of the clock cycle.

Example (very optimistic): 6–10 clock cycles in 50nm technology [Benini, 2002]

Source: Tenhunen, 2005

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 61 BusBus MultiplexingMultiplexing

or

„ Number of bus transitions per cycle = 2 (1 + 1/2 + 1/4 + ...) = 4

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 62 ResourceResource SharingSharing andand ActivityActivity IIII

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 63 BusBus MultiplexingMultiplexing

„ Sharing of long data buses with time multiplexing „ Example:

S1 uses even cycles

S2 odd

D D1 S1 1 S1

S2 D2 S2 D2

Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 64 CorrelatedCorrelated DataData StreamsStreams

„ For a shared (multiplexed) bus Muxed advantages of data correlation are Dedicated lost (bus carries samples from two uncorrelated data streams)

1 Bus sharing should not be used for positively correlated data streams Bus sharing may prove 0,5 advantageous in a negatively correlated data stream (where successive samples switch sign Bit switching probabilities Bit switching bits) - more random switching 0 14 12 10 8 6 4 2 0

MSB LSB Bit position Source: Irwin, 2000

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 65 DisadvantagesDisadvantages ofof BusBus MultiplexingMultiplexing

„ If data bus is shared, advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) „ Bus sharing should not be used for positively correlated data streams „ Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 66 AdaptiveAdaptive DVSDVS contcont’’dd

„ Implementation

Power-Speed Control Knob Workload Filter

Variable Power-Speed System FIFO Input Buffer

Copyright Sill, 2008 Micro transductors ‘08, Low Power 2 67