PWRficient Architecture in Critical Embedded Systems
Breakthrough Performance/Watt Multicore Processor
Peter Bannon VP of Architecture and Verification
Bus & Boards, 2007 January 15th, 2007
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 1 Agenda
XThe Power Problem XPWRficient™ Platform Processors ZPWRficient 1682M platform processor Z PA6T core Z CONEXIUM coherent interconnect architecture Z ENVOI I/O System ZFrequency & Power Summary XExample System Designs XLow Power Design XConclusion
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 2 The Power Problem
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 3 Energy Efficiency is a Dominant Industry Theme “We are at the point of new wealth “Customers continue to demand creation when it comes to green solutions that focus on low-power technology” consumption and quieter operation.”
Bill Joy, co-founder, Sun Microsystems Bob Brewer, corporate vice president, Desktop Division, AMD; from Dec 2006 press release
“The industry is going through the most profound shift in decades, moving to an era where performance and energy efficiency are critical in all market segments and all aspects of computing”
Paul Otellini, President & CEO Intel IDF, September 2006
P.A. Semi’s green contribution: the world’s most power-efficient high-performance processor
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 4 Useful Power Is Decreasing
Useful power peaked in 2003 140
120
100
80 Power 60 Leakage
40
20 0
1992 1995 1998 1999 2002 2003 2004 2005 2006 2007 2008
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 5 The Escalating Power Problem
XShrinking device geometries provides Z Faster gates Power Density Z Increased density
BUT
Capacity XMoore’s Law means Gap more power
Area/Gate EXCESSIVE POWER DISSIPATION LIMITS USABLE GATE CAPACITY 250nm 180nm 130nm 90nm 65nm 45nm Process Technology
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 6 3-Year Operating Cost vs. Server Cost
7000
6000
5000
4000 3-Yr. Operating Cost 3000 Server Cost 2000
1000
0
1992 1995 1998 1999 2002 2003 2004 2005 2006 2007 2008
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 7 Choosing Power Over Ultimate Performance
120 100 Old design point 80 60 Watts 40 20 New design point 0
80 82 84 86 88 90 92 96 94 98 100 Performance
X Look for the exponential opportunities in power/performance X Give up some performance for substantial power decrease
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 8 Design Choices for Low Power
X Design choices at several levels favor low power Z CMOS process target Z Circuit design style and sizing Z Micro-architecture features X Integration Z Saves interface power X Management Z Voltage/frequency scaling Z Multiple power planes for optimal voltage selection per region Z Clock gating to reduce power of idle circuits Z Active and pre-charge standby modes in DRAM array Z PCIe power saving modes Z Nap and Sleep modes for CPU
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 9 PWRficient Family Overview
Platform Processor
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 10 P.A. Semi Introduction
XSanta Clara-based fabless processor company Z Developing Power ArchitectureTM microprocessors under license from IBM Z Industry veterans lead 150-strong team Z Pioneering methodology for high-performance, low-power design XFounded in 2003, broke out of stealth mode in October 2005 Z Started design in September 2004 Z Unveiled PWRficientTM Platform Processors at MPF 2005 Z Now sampling family of high-performance, 64-bit, multicore SoC devices Z High-speed I/O integration Z Fabricated on 65nm technology Z Target markets Z Telecom, datacom, storage Z Military/aerospace embedded computing and control XVision to dramatically reposition the ‘performance per watt’ experience for high-performance embedded computing.
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 11 Power Architecture Processors (PowerPC)
X Will be in 50% of the world’s automobiles by 2007 X Runs 4 of the world’s top 5 fastest supercomputers and 44 of the top 100 X Dominant in major embedded markets Z automotive control Z networking & communications Z gaming consoles X 64% of VME and CompactPCI SBC’s Z North America and Europe (Source: VDC)
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 12 PWRficient Family
XPA6T core—the CPU Z Power Architecture compliant, 64-bit, high-performance FPU & VMX Z 7W @ 2GHz worst-case power dissipation Z Provides a high-performance alternative to leverage existing Power Architecture software (from IBM, Freescale, AMCC)
XCONEXIUM™—the on-chip coherent interconnect Z Scalable cross-bar interconnect Z 1–8 SMP cores Z 1 or 2 L2 caches, sized 512KB–8MB Z 1–4 64-bit DDR2 memory controllers
XENVOI™—the I/O system Z SERDES I/O—PCI ExpressTM, XAUI, SGMII, SATA Z Offload engines—TCP/IP, iSCSI, cryptography, and RAID Z Support I/O—Boot bus, UARTs, SMBus, GPIOs 1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 13 PWRficient PA6T-1682M Block Diagram
64KB 64KB 64KB 64KB I-Cache D-Cache I-Cache D-Cache 2MB2MB DDR2 DDR2 L2 Cache DDR2 DDR2 L2 Cache Controller Controller 2GHz Core 2GHz Core Controller Controller PA6T PA6T
CONEXIUM™ Interchange
Power Control TTM*TTM* I/OI/O OffloadOffload System Control BridgeBridge DMADMA CacheCache EngiEnginesnes PTM†† Interrupt/GPIO PTM
ENVOI™ Intelligent I/O SMBus (3) OctalOctal PCIPCI DualDual 10Gb10GbEE QuadQuad GbEGbE ExpressExpress XAUIXAUI SGMIISGMII UART (2) ConfigurableConfigurable Boot Bus SERDESSERDES ((224)4)
*Transaction trace memory †Peripheral trace memory
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 14 PWRficient PA6T Core
X Power Architecture rev 2.04 Z 2.0GHz 64-bit core with FP and VMX Z Hypervisor and virtualization support Instruction Z 64KB L1 I Cache & 64KB D Cache Fetcher/Decoder FPU Branch Prediction Z Idle and sleep modes
Instruction D Z 11M logic transistors I Cache Scheduler Cache LSU X Super-scalar, out-of-order design BIU Reg Z Quad fetch into 64-entry scheduler MMU ALU File Z Issue up to 3 per cycle
VMX Unit Z Strongly ordered memory model Z Issue out of order, retire in order Z 15 transactions in flight X Max Power Z 7W @ 2.0GHz (“thermal virus”) X Performance Z Estimated 4,400 Dhrystone MIPS per core
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 15 Processor Pipeline Multi-Issue, Out-of-Order, Superpipelined
1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16|17|18|19 IP IF IC DE UC MP SE SP IS RR AL AD RT WB Reduced instruction issue AG TR DC DD LongerRT WB pipeline to IERAT 2 3 width to save power ~ N to N Integer Int accommodate RF Integer/Br uIERAT uSeq Pick slower wire delays 0 FPU Arithmetic – 8 CR FPU FPU Compare – 4 BR RF Pick VMX Float – 8 I-Cache Dec uDec Map Eval Issue 1 VMX Complex Integer – 6 VMX VMX Permute 2 RF Pick VMX Easy Int 2 Select RF Dep 2 Next Map Map CONEXIUM Line Addr D Da SLB TLB HTW Add Taken Gen ERAT t a r Not Taken Micro Op D-Cache MRB Increased Buffer RSB Predicted branch TAP Load/Store Dup Target SNP Queue Tag Target prediction resources to reduce false speculation 1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 16 DDR2 Memory Controller
X ~5 Watts for active DRAM rank DDR2 DDR2 X ~1 Watt for standby DRAM rank DDR2 DDR2 X DRAM is a significant percentage of system power Z SOC 13 Watt typical PWRPWRficientficient Z 2 x 2 DIMM memory is ~12 Watts X DRAM controller manages power Z Extensive clock gating within the controller Z Use of DDR2 power saving modes 10GbE Z Precharge standby Ports Z Active standby Z Sorts requests to maximize power saving
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 17 ENVOI — Intelligent I/O
X Interoperability between PCI Express, packets, DMA, and memory Z Centralized DMA model Z Shared resources saves area and power, which enables integration Z Idle functions are clock gated or powered off
CONEXIUM™ Interchange
I/OI/O DMA,DMA, BridgeBridge CacheCache OffloadOffload
88 xx PCPCII 22 xx 10GbE10GbE 44 xx GbEGbE ExpressExpress XAUIXAUI SGMSGMIIII
2424 multi-multi-modemode SERDESSERDES ConfigurableConfigurable ENVOI™ BootBoot timetime configurableconfigurable Intelligent I/O SERDESSERDES FlexibleFlexible mappingmapping toto MACsMACs && PCIPCI ExpressExpress
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 18 Server-Class Reliability Features
X Guiding principles Z Arrays protected based on predicted soft error rates Z Machine check architecture logs all detected errors Z Interrupt notification based on programmable thresholds Z Error propagation ensures that erroneous data is never used Z I/O protocol engines provide in-band and/or out-of-band error notification where possible per protocol definition X ECC protection Z L1 data cache data Z L2 tags and data Z I/O cache and buffers Z DDR memory controller supports ECC memories Z Includes scrubbing engine Z Combined CRC/ECC for single bit correct, double detect, chip fail detect X Parity protection Z CONEXIUM bus Z L1 instruction cache tags and data Z L1 data cache tags
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 19 Example System Block Diagrams
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 20 Advanced Mezzanine Card (AMC)
X High-performance, low power Z Dual-core 2GHz under 40W in a
Hot Swap single-width mid-size module SM UART IPMI Bus RJ11 UART µC X Flexible I/O options PWRPWR SGMII Z Support varying widths of PCIe to card ficientficient 4x PCIe edge XAUI 8x PCIe Z Support XAUI or SGMII to the card
Front Panel edge
DDR2 Rear Card Edge Compact LPC Z Support GbE or XAUI to the front panel Flash Flash DDR2 X Multiple boot options Z LPC flash Z SPI flash
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 21 ATX form-factor Motherboard
X Extensive I/O Z Four PCI slots 2x USB 2x 2x Z One x16 PCI Express USB OTG JTAG UART GbE Z One x4 PCI Express Z One x1 PCI Express x x x Z PCI 32/33MHz 6 4 1
1 Z Two GbE ports Z Separate XAUI option card PCI PCIe PCIe for x4 PCIe slot PCIe Power Z IDE port X Memory sockets Z Four DIMM sockets Z CompactFlash socket Z Boot ROM emulator socket DDR2 X Standard JTAG Emulators Z Corelis JTAG controller ROM Z WindRiver ICE and Probe IDE Compact Flash
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 22 Higher-end fileserver/NAS
X Full Redundancy Z Requires use of SAS discs DDR2 Backup Z No single point of failure (for RAID discs) battery DDR2 X PCIe Links to SAS controller
SGMII PWRPWR PCIe X PCIe/XAUI Link between CPUs XAUI ficientficient SAS Z Allows metadata log from one to other
PCIe X Memory controller supports backup battery SGMII PWRPWR Z Can use memory for file log PCIe SAS XAUI ficientficient Z Care taken for reset and power fail
DDR2 Backup X Assist engines DDR2 battery Z TCP assist Z CRC engine or signatures for digest Z RAID XOR option
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 23 Low-Power Design
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 24 Power Efficient
POWER IS FIRST-ORDER DESIGN PRINCIPLE
PADS > 25,000 MC optimizes DRAM MC gated power clocks CPU L2 L2 cache and I/O are PADS coherent with core off Separate power rails CPU for cores, I/O pads PADS L2
Turn off IO Dynamic power-control unused I/Os PwrCtl hardware and software voltage/frequency SERDES management
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 25 Fine-Grained Clock Gating Reduces Dynamic Power
80 Coarse-Grained Clock Gating 70
60
50 Thermal Normal Operation Virus
40 of Flops Clocked
% 30
20
Time 10
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 26 Device-Specific Vdd Reduces Static and Dynamic Power
45 X Conventional approach 40 Z Operate at 1.1V across entire process 35 30 range 25 Z Fast parts tend to be very leaky 20 15 10 5 0 FF @ 1.1V TT @ 1.1V SS @ 1.1V
45 X P.A. Semi approach 40 35 Z Operate at device-specific optimal Vdd 30 Z Partition power plane for optimal 25 voltage selection per region 20 Z Enables full process range for power 15 yield 10 5 0 FF @ 0.8V TT @ 0.9V SS @ 1.1V Leakage Power Dynamic Power
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 27 Voltage/Frequency Scaling
X Multiple power planes for maximum control Z VID regulators for cores, SoC PA6T-1682M Z Support for either or both cores powered VID_CP0 down 6 PA6TPA6T Z Low power I/O coherent nap mode VDD_CP0 VRMVRM CoreCore VID_CP1 Max Typ Max 6 PA6TPA6T Freq VDD_CP1 VRMVRM CoreCore PA6T-1682M-FCN 2.0GHz 13W 25W VID_SoC PA6T-1682M-FCG 1.5GHz 8W 15W SoCSoC VDD_SoC VRMVRM PA6T-1682M-FCD 1.0GHz 6W 10W I/O coherent nap 2W*
*PA6T-1682M-FCN nap power may be higher
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 28 Comparison with 65nm 2GHz processors
X PWRficient 1682M takes less total power 120W than competing 100W support chipsets alone 80W
X 3x - 4x advantage in 60W total system power* 40W compared to 25W contemporary 65nm 20W dual-core processors 0 M P 2 o 2 M x u 8 0 4 D 6 7 6 1 9 n 2 X Reduced system lo re h o cooling and chip count At C result in improved CPUs @ 2GHz system reliability I/O (Northbridge, Southbridge, 10GbE MACs)
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 29 Summary
X PWRficient processors set the bar for ultimate energy conservation at full performance Z Lowest total energy for a computation or transaction
X Power conservation a key initiative from architectural concept though design implementation
X High levels of integration enable low latency, high bandwidth interfaces to cache, memory, & IO
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 30 Contact P.A. Semi
X For further information, please visit P.A. Semi web site at:
www.pasemi.com
X Kindly direct sales inquiries to:
X Full contact information:
P.A. Semi, Inc. 3965 Freedom Circle, Floor 8 Santa Clara CA 95054-1203 USA Main: 408.200.4500 Fax: 408.200.4501
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 31 Thank You
The P.A. Semi name and the P.A. Semi logo and combinations thereof are trademarks of P.A. Semi, Inc. The Power name is a trademark of International Business Machines Corporation, used under license therefrom. SPECint and SPECfp are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). All other trademarks are the property of their respective owners.
1/15/07 Copyright P.A. Semi, Inc.--Bus and Boards 2007 32