A 0.4-V UWB Baseband Processor
Vivienne Sze Anantha P. Chandrakasan Massachusetts Institute of Technology
ISLPED – Portland, OR – August 28, 2007 OutlineOutline
UWB Specifications and System Architecture
Baseband Algorithm and Architecture
Parallelism for Energy Efficiency aggressive voltage scaling reduced receiver on-time
Chip Measurements
Conclusions UltraUltra --widebandwideband (UWB)(UWB) RadioRadio
UWB versus Narrowband Possible UWB Applications
Narrowband 500Mb WLAN UWB 50Mb freq Wireless Locationing/Tagging 5Mb USB & Multimedia 500kb time
UWB Narrowband 1m 10m 100m distance
Advantages of UWB communications include High Data Rate Excellent Multi-path Resolution Low Interference
Integrate UWB radios on battery operated devices
Need an energy efficient UWB System UWBUWB SystemSystem ArchitectureArchitecture
14 Channel Frequency Plan -41.3
-51.3
-61.3 Power density [dBm] density Power 3.1 10.6 Figure courtesy of D.Wentzloff Frequency [GHz]
Sampling Rate Data Rate 500 MSPS 100 Mbps
[F.S. Lee et al., ICUWB 2006] PacketPacket StructureStructure (PHY(PHY layer)layer)
PREAMBLE PAYLOAD
31-bits 10ns
...
40ns
Gold Gold Gold Gold Gold ... Payload Bits Code Code Code Code Code
Packet Begins
State 1 State 1 State 2 Channel Acquisition Demodulation Estimation
Receiver Receiver Turns ON Turns OFF
Goal : Reduce overhead energy (acquisition)
Majority of acquisition energy spent on computation of cross-correlation CrossCross --CorrelationCorrelation ComputationComputation
619 y[n] = ∑h[k]× x[k − n] k=0
Points can be computed independently BasebandBaseband ArchitectureArchitecture
10100110
Cross-correlation requires a fixed number of operations
Reduce energy of each operation in order to reduce baseband energy
Map operations to architecture that reduces system energy EnergyEnergy EfficiencyEfficiency UsingUsing ParallelismParallelism
Exploit TWO forms of parallelism in Correlator Bank
Ultra-Low Voltage Operation - Maintain Throughput (L) Reduce supply voltage Minimize energy of baseband processor
Reducing Acquisition Time - Parallelized Computation (M) Reduce receiver on-time Minimize energy of entire receiver (system) BasebandBaseband EnergyEnergy SavingsSavings
Correlator Architecture y[n] = ∑ h[k]× x[k − n]
Correlators compute the cross-correlation function
Voltage scaling to reduce energy per operation
Parallelize to maintain throughput of 500 MSPS
Designed and simulated in a 90-nm process BasebandBaseband EnergyEnergy SavingsSavings
0 10 At the minimum energy point MinimumMinimum Energy Energy Point Point of 0.3 V 9X energy reduction
Total Energy Energy Set clock frequency to 25 MHz 10 -1 (preamble PRF)
DynamicActive EnergyEnergy Parallelize by L=20 to maintain 500 MSPS throughput 10 -2 Leakage Energy Leakage Energy Need to raise voltage to 0.4 V to Energyper operation (normalized) achieve 25 MHz
10 -3 0 0.2 0.4 0.6 0.8 1 At 0.4 V, reduce energy per V (V) DD operation by 5.8X = + E total E active E leakage
∝ 2 E ∝ T V I Eactive Ceff VDD leakage period DD leak SummarySummary ofof MethodologyMethodology
Select the optimal degree of parallelism that minimizes energy consumption meets performance constraints
1. Determine VDD MEP (at minimum energy point)
2. Determine delay and throughput at VDD MEP
3. Divide required throughput by the throughput at VDD MEP to obtain the necessary degree of parallelism (L) SystemSystem EnergyEnergy SavingsSavings
Trade-off area for time by mapping to parallel architecture
Reducing acquisition time allows for fewer number of Gold Code repetitions in the preamble
RF front-end and ADC can be turned off earlier
Energy savings across the entire system AcquisitionAcquisition EnergyEnergy ReductionReduction
Digital Baseband ADCs Baseband Amplifiers RF front end
1.2
1.0
0.8 Reduce RF front-end 0.6 and ADC energy! 14.7X overall reduction 0.4 (normalized)
0.2 Synchronization energy Synchronization energy Acquisition Energy Acquisition 0.0 1 4 8 11 16 31 Parallelism (M)
[V. Sze, R. Blázquez, M. Bhardwaj, A. Chandrakasan, ICASSP 2006.] MaximumMaximum RatioRatio CombinerCombiner (MRC)(MRC)
Demodulation uses a 5-fingered RAKE receiver A hard decision is made at the output MRC to resolve a bit ParallelizedParallelized BasebandBaseband ArchitectureArchitecture
Demodulation Bit Decoder 5-bit complex input from
Serial to Parallel 5-finger RAKE MRC Retiming Block Correlator Bank 5-finger RAKE MRC Demodulated Correlator Sub-bank 1 Bits 5-finger RAKE MRC
ADCs Correlator 1 Correlator 2 5-finger RAKE MRC … … … Channel Correlator L Estimation
Correlator Sub-bank 2 Peak
Correlator L+1 DetectorPeak
……… Detector Correlator L+2 Cross- … … … Correlation Function Correlator 2L
… … … npeak n Correlator Sub-bank M Cross-Correlation Correlator (M-1)L+1 Function Correlator (M-1)L+2 npeak … … …
Correlator ML Synchronization/ L = 20; M = 31 Timing Control Total # of correlators = 620 EnergyEnergy --AreaArea TradeoffTradeoff
Energy-Area Tradeoff for Digital Baseband Processor
8
7
6
5 5.8x 4
3 This Design
2 Processor Energy Processor Normalized Baseband Baseband Normalized 1 9.6x 0 1 10 100 1000 Normalized Baseband Processor Area EnergyEnergy --AreaArea TradeoffTradeoff
Receiver Energy - Digital Baseband Area Tradeoff
16
14
12
10 14.7x 8 This Design 6
4
2 Normalized Receiver Energy 8.6x 0 0 2 4 6 8 10 Normalized Digital Baseband Processor Area 400400 --mVmV BasebandBaseband ProcessorProcessor
STMicroelectronics standard-VT 90-nm CMOS process 281,260 gates Includes 620 Correlators & 4 Maximum Ratio mm 3.3 Combiners Die area: 10.94mm 2 Active area 23% 3.3 mm CorrectCorrect OperationOperation atat 400400 mVmV
Data Ready
Output Data [1-0]
Output Clock
Oscilloscope plot shows correct functionality at 400mV Note: I/O has a 1V power supply
Operating frequency of 25 MHz
Four bits demodulated in parallel every 40-ns cycle EnergyEnergy PerPer BitBit
Power Measurements Acquisition 7 mW / Demodulation 1.7 mW
Breakdown of Baseband Processor's Energy Per Bit
50
45
40
35 30 Acquisition Energy 25 Demodulation Energy 20
Energy per bit (pJ) bit per Energy 15 10 16.8 pJ/bit 5
0 512b 1kb 2kb 4kb 8kb Size of Packet (bits) PowerPower GatingGating
Instantaneous Power
20 Sleep = 1 Sleep = 0 Sleep = 1
15 Recovery Acquisition 10 Demodulation 5 Power(mW)
0 0 50 100 150 200 250 300 350 400
-5 Time (us)
Reduce leakage power using a high V T “sleep” transistor to gate the leakage current when block is idle
Breakeven time 137 µs ConclusionsConclusions
Reduce energy to receive a UWB packet by Scaling to optimum supply voltage Mapping algorithm to parallel architecture Voltage scaling to ultra-low voltage (1 V 0.4 V) 5.8X reduction in energy per operation of correlators Reduced acquisition time 14.7X reduction in receiver acquisition energy 400-mV 100 Mbps UWB Baseband Processor 16.8 pJ/bit for demodulation 20 pJ/bit for a 4-kb packet Demonstrate high performance at ultra-low voltage Can be applied to other high performance communication and signal processing applications AcknowledgementsAcknowledgements
Funding: DARPA and NSERC Fellowship
Chip Fabrication: STMicroelectronics