A 0.4-V UWB Baseband

Vivienne Sze Anantha P. Chandrakasan Massachusetts Institute of Technology

ISLPED – Portland, OR – August 28, 2007 OutlineOutline

 UWB Specifications and System Architecture

 Baseband Algorithm and Architecture

 Parallelism for Energy Efficiency  aggressive voltage scaling  reduced receiver on-time

 Chip Measurements

 Conclusions UltraUltra --widebandwideband (UWB)(UWB) RadioRadio

UWB versus Narrowband Possible UWB Applications

Narrowband 500Mb WLAN UWB 50Mb freq Locationing/Tagging 5Mb USB & Multimedia 500kb time

UWB Narrowband 1m 10m 100m distance

 Advantages of UWB communications include High Data Rate Excellent Multi-path Resolution Low Interference

 Integrate UWB on battery operated devices

 Need an energy efficient UWB System UWBUWB SystemSystem ArchitectureArchitecture

14 Channel Frequency Plan -41.3

-51.3

-61.3 Power density [dBm] density Power 3.1 10.6 Figure courtesy of D.Wentzloff Frequency [GHz]

Sampling Rate Data Rate 500 MSPS 100 Mbps

[F.S. Lee et al., ICUWB 2006] PacketPacket StructureStructure (PHY(PHY layer)layer)

PREAMBLE PAYLOAD

31-bits 10ns

...

40ns

Gold Gold Gold Gold Gold ... Payload Bits Code Code Code Code Code

Packet Begins

State 1 State 1 State 2 Channel Acquisition Demodulation Estimation

Receiver Receiver Turns ON Turns OFF

 Goal : Reduce overhead energy (acquisition)

 Majority of acquisition energy spent on computation of cross-correlation CrossCross --CorrelationCorrelation ComputationComputation

619 y[n] = ∑h[k]× x[k − n] k=0

 Points can be computed independently BasebandBaseband ArchitectureArchitecture

10100110

 Cross-correlation requires a fixed number of operations

 Reduce energy of each operation in order to reduce baseband energy

 Map operations to architecture that reduces system energy EnergyEnergy EfficiencyEfficiency UsingUsing ParallelismParallelism

Exploit TWO forms of parallelism in Correlator Bank

 Ultra-Low Voltage Operation - Maintain Throughput (L)  Reduce supply voltage  Minimize energy of baseband processor

 Reducing Acquisition Time - Parallelized Computation (M)  Reduce receiver on-time  Minimize energy of entire receiver (system) BasebandBaseband EnergyEnergy SavingsSavings

Correlator Architecture y[n] = ∑ h[k]× x[k − n]

 Correlators compute the cross-correlation function

 Voltage scaling to reduce energy per operation

 Parallelize to maintain throughput of 500 MSPS

 Designed and simulated in a 90-nm BasebandBaseband EnergyEnergy SavingsSavings

0 10  At the minimum energy point MinimumMinimum Energy Energy Point Point of 0.3 V  9X energy reduction

Total Energy Energy  Set clock frequency to 25 MHz 10 -1 (preamble PRF)

DynamicActive EnergyEnergy  Parallelize by L=20 to maintain 500 MSPS throughput 10 -2 Leakage Energy Leakage Energy  Need to raise voltage to 0.4 V to Energyper operation (normalized) achieve 25 MHz

10 -3 0 0.2 0.4 0.6 0.8 1  At 0.4 V, reduce energy per V (V) DD operation by 5.8X = + E total E active E leakage

∝ 2 E ∝ T V I Eactive Ceff VDD leakage period DD leak SummarySummary ofof MethodologyMethodology

 Select the optimal degree of parallelism that  minimizes energy consumption  meets performance constraints

1. Determine VDD MEP (at minimum energy point)

2. Determine delay and throughput at VDD MEP

3. Divide required throughput by the throughput at VDD MEP to obtain the necessary degree of parallelism (L) SystemSystem EnergyEnergy SavingsSavings

 Trade-off area for time by mapping to parallel architecture

 Reducing acquisition time allows for fewer number of Gold Code repetitions in the preamble

 RF front-end and ADC can be turned off earlier

 Energy savings across the entire system AcquisitionAcquisition EnergyEnergy ReductionReduction

Digital Baseband ADCs Baseband Amplifiers RF front end

1.2

1.0

0.8 Reduce RF front-end 0.6 and ADC energy! 14.7X overall reduction 0.4 (normalized)

0.2 Synchronization energy Synchronization energy Acquisition Energy Acquisition 0.0 1 4 8 11 16 31 Parallelism (M)

[V. Sze, R. Blázquez, M. Bhardwaj, A. Chandrakasan, ICASSP 2006.] MaximumMaximum RatioRatio CombinerCombiner (MRC)(MRC)

 Demodulation uses a 5-fingered RAKE receiver  A hard decision is made at the output MRC to resolve a bit ParallelizedParallelized BasebandBaseband ArchitectureArchitecture

Demodulation Bit Decoder 5-bit complex input from

Serial to Parallel 5-finger RAKE MRC Retiming Block Correlator Bank 5-finger RAKE MRC Demodulated Correlator Sub-bank 1 Bits 5-finger RAKE MRC

ADCs Correlator 1 Correlator 2 5-finger RAKE MRC … … … Channel Correlator L Estimation

Correlator Sub-bank 2 Peak

Correlator L+1 DetectorPeak

……… Detector Correlator L+2 Cross- … … … Correlation Function Correlator 2L

… … … npeak n Correlator Sub-bank M Cross-Correlation Correlator (M-1)L+1 Function Correlator (M-1)L+2 npeak … … …

Correlator ML Synchronization/ L = 20; M = 31 Timing Control Total # of correlators = 620 EnergyEnergy --AreaArea TradeoffTradeoff

Energy-Area Tradeoff for Digital Baseband Processor

8

7

6

5 5.8x 4

3 This Design

2 Processor Energy Processor Normalized Baseband Baseband Normalized 1 9.6x 0 1 10 100 1000 Normalized Baseband Processor Area EnergyEnergy --AreaArea TradeoffTradeoff

Receiver Energy - Digital Baseband Area Tradeoff

16

14

12

10 14.7x 8 This Design 6

4

2 Normalized Receiver Energy 8.6x 0 0 2 4 6 8 10 Normalized Digital Baseband Processor Area 400400 --mVmV BasebandBaseband ProcessorProcessor

 STMicroelectronics standard-VT 90-nm CMOS process  281,260 gates  Includes 620 Correlators & 4 Maximum Ratio mm 3.3 Combiners  Die area: 10.94mm 2  Active area 23% 3.3 mm CorrectCorrect OperationOperation atat 400400 mVmV

Data Ready

Output Data [1-0]

Output Clock

 Oscilloscope plot shows correct functionality at 400mV  Note: I/O has a 1V power supply

 Operating frequency of 25 MHz

 Four bits demodulated in parallel every 40-ns cycle EnergyEnergy PerPer BitBit

 Power Measurements  Acquisition 7 mW / Demodulation 1.7 mW

Breakdown of Baseband Processor's Energy Per Bit

50

45

40

35 30 Acquisition Energy 25 Demodulation Energy 20

Energy per bit (pJ) bit per Energy 15 10 16.8 pJ/bit 5

0 512b 1kb 2kb 4kb 8kb Size of Packet (bits) PowerPower GatingGating

Instantaneous Power

20 Sleep = 1 Sleep = 0 Sleep = 1

15 Recovery Acquisition 10 Demodulation 5 Power(mW)

0 0 50 100 150 200 250 300 350 400

-5 Time (us)

 Reduce leakage power using a high V T “sleep” transistor to gate the leakage current when block is idle

 Breakeven time 137 µs ConclusionsConclusions

 Reduce energy to receive a UWB packet by  Scaling to optimum supply voltage  Mapping algorithm to parallel architecture  Voltage scaling to ultra-low voltage (1 V  0.4 V)  5.8X reduction in energy per operation of correlators  Reduced acquisition time  14.7X reduction in receiver acquisition energy  400-mV 100 Mbps UWB Baseband Processor  16.8 pJ/bit for demodulation  20 pJ/bit for a 4-kb packet  Demonstrate high performance at ultra-low voltage  Can be applied to other high performance communication and signal processing applications AcknowledgementsAcknowledgements

 Funding: DARPA and NSERC Fellowship

 Chip Fabrication: STMicroelectronics