HyperTransport Extending Technology Leadership
International HyperTransport Symposium 2009
February 11, 2009
Mario Cavalli General Manager HyperTransport Technology Consortium
Copyright HyperTransport Consortium 2009 HyperTransport Extending Technology Leadership
HyperTransport and Consortium Snapshot
Industry Status and Trends
HyperTransport Leadership Role
February 11, 2009
Mario Cavalli General Manager HyperTransport Technology Consortium
Copyright HyperTransport Consortium 2009 HyperTransport Snapshot
Low Latency, High Bandwidth, High Efficiency Point-to-Point Interconnect Leadership
CPU-to-I/O CPU-to-CPU CPU-to-Coprocessor
Copyright HyperTransport Consortium 2009 Adopted by Industry Leaders in Widest Range of Applications than Any Other Interconnect Technology
Copyright HyperTransport Consortium 2009 Snapshot
Formed 2001
Controls, Licenses, Promotes HyperTransport as Royalty-Free Open Standard
World Technology Leaders among Commercial and Academic Members
Newly Elected President Mike Uhler VP Accelerated Computing Advanced Micro Devices
Copyright HyperTransport Consortium 2009 Industry Status and Trends
Copyright HyperTransport Consortium 2009 Global Economic Downturn
Tough State of Affairs for All Industries
Consumer Markets Crippled with Long-Term to Recovery Commercial Markets Strongly Impacted
Copyright HyperTransport Consortium 2009 Consequent Business Focus
Cost Effectiveness
No Redundancy
Frugality
Copyright HyperTransport Consortium 2009 Downturn Breeds Opportunities
Reinforced Need for More Optimized, Cost-Effective Computing Infrastructure
Good for HPC Sector
Copyright HyperTransport Consortium 2009 Creating Demand for New Technology Delivering: More Value for Same Power and Cost Same Value for Less Power and Cost Best Investment Preservation Minimized Total Cost of Ownership Through Better: Performance and Power Efficiency Resource Flexibility and Adaptability System Virtualization Æ Consolidation
Copyright HyperTransport Consortium 2009 Producing New Computing Trends
Cloud Computing Æ Hosted Software, Software as a Service (SaaS) Replace Costly In-House Infrastructure and Management Resources
Infrastructure Centralization Demands Efficient Data Centers, Server Farms
Copyright HyperTransport Consortium 2009 Producing New Computing Trends (cont.)
Netbook over Notebook / Desktop
New? No Innovative? No Same for Less? No Less for Much Less? Yes!
Good Enough if Budget Tight? Yes! Right-Time, Right-Place Products? Right!
Copyright HyperTransport Consortium 2009 HyperTransport Leadership Role
Copyright HyperTransport Consortium 2009 Answers Market Trend Expectations
With Core Values
Leading Performance Full Scalability Power Efficiency Low Design Cost Market-Proven Solidity Vast Product Ecosystem
Copyright HyperTransport Consortium 2009 Continued Technology Progression
With Expanding Market Presence
HT 1.0 HT 1.1
2001 2002
HT 2.0 17.7M HT-Based Systems Shipped (Note 1) HT 3.1 2003 2004 HTX HT 3.0 HNC 1.0 HTX3 (Note 3)
2005 2006 2008 2009 62.7M HT-Based Systems Shipped Note 1: by end of 2003 – Source InStat (Note 2) Note 2: by end of 2008 – Source InStat Note 3: High Node Count HT Specification 1.0 - Accessible/Useable by HTC Promoter and Contributor Members Only
Copyright HyperTransport Consortium 2009 HT 3.1 Specification
Keeps HT Ahead of Industry Requirements
Feature Current Use HT 3.1 Max Max Headroom Clock Rate 2.0 GHz 3.2 GHz 60% HT 3.1 Bandwidth 16 GB/s 51.2 GB/s 220% 51.2 GB/s (32-Bit) Link Width 16-bit 32-bit 100% 25.6 GB/s (16-Bit)
Solidifies HT Leadership HT 3.0 Reinforces HT ROI 41.6 GB/s (32-Bit) 20.8 GB/s (16-Bit) The Only 32-Bit-Capable Processor Interconnect 2.6 GHz 2.8 GHz 3.0 GHz 3.2 GHz Clock In Industry
Copyright HyperTransport Consortium 2009 HTX3TM Specification
3x Bandwidth of HTXTM Connector Standard
• HT3.0 Performance • HT3.0 Link Splitting Support • More Power Mgmt. Features • 100% Backward Compatibility
For Highest Performance Subsystems
Copyright HyperTransport Consortium 2009 M4 M1
k
or Server 1
w t
e
N
d M3 M2 M8 he
c t i
w
S
M5 /
k r
o
w
et
N
Server 2 t
c
e r i
D +3 M7 M6 Mx Copyright HyperTransportConsortium 2009 Mx Server n +1 +2 Mx Mx Enables Scalable HPC Systems and Clusters with High Node Count HT Specification 1.0 Low Latency Non-Coherent Shared Memory Architecture High Node Count HT Specification 1.0 (cont.)
Answers Ever Compounding On-Chip + In-System Addressing Challenge
Exponential Exponential Number of CPU Number of Cores Clusters/Subclusters You are Here
Copyright HyperTransport Consortium 2009 (cont.) Server X
k r
o
Server Y w t
e
N Copyright HyperTransportConsortium 2009 Server Z High Node Count Specification 1.0 Supports Global Sharing of Localized Data Storage (cont.) Server X
k r
o
Server Y w t
e
N DRAM High-Density Copyright HyperTransportConsortium 2009 Server Z High Node Count Specification 1.0 Supports Global Sharing of Localized Data Storage Subsystem Especially High-Density DRAM Flash Memory Flash (cont.) Server X
k r
o
Server Y w t
e
N DRAM High-Density Copyright HyperTransportConsortium 2009 Server Z High Node Count Specification 1.0 Supports Global Sharing of Localized Data Storage Subsystem Especially High-Density DRAM and Low Power Flash-Based Memory Subsystems Flash Memory Flash High Node Count Specification 1.0 (cont.)
Best System and Performance Scalability Minimized Power Consumption
Optimized Total Cost of Ownership
Copyright HyperTransport Consortium 2009 Mature Stability, Mission-Critical Reliability
Field-Proven Dependability for Demanding Markets
63 Million HT-Powered Products by end of 2008
2007 2007 Capture Market Yr/Yr Growth
8% Defense Applications 17% 32% Top500 Supercomputers 28% 11% Core Routers 1.2% 22% Edge Routers 34% 15% SAN 11% 23% Servers 38%
Source: InStat
Copyright HyperTransport Consortium 2009 Ever Expanding Product Ecosystem
• From HT IP to HT Software • 12 HT-Based Processor Brands • Fosters Technology Strength • Widespread Market Utilization
X86 Computing
Graphics
Security
Packet
Media
Comm
Acceleration
System Virtualization
Copyright HyperTransport Consortium 2009 Expanding Product Ecosystem (cont.)
New Godson Multi-Core Server-Class CPU
• Petascale Performance Target by 2010 • Backed by China’s Government • MIPS-Based with 200+ More Instructions for x86 Translation and Acceleration • 16 GFLOPS at 1GHz and 10W of Power • Earlier versions (non-HT), produced by ST Institute of Computing Technology Microelectronics and sold to 40 companies Chinese Academy of Sciences in set-top boxes, laptops, etc.
• @200 developers working on Godson HW, @100 on SW and Compilers
Copyright HyperTransport Consortium 2009 HyperTransport Book
Covers all HT Link and HTX Specification
700 Pages of Must-Have Tutorial
Co-Authored by HTC’s Brian Holden
Available Online from MindShare www.mindhsare.com in Paper and eBook Formats
Copyright HyperTransport Consortium 2009 Thank You!
Mario Cavalli General Manager HyperTransport Technology Consortium
Copyright HyperTransport Consortium 2009 Corollary Information Not Part of Live Presentation
Copyright HyperTransport Consortium 2009 HyperTransport Everywhere!
Also in PowerPC-Based and Intel-Based Products
Copyright HyperTransport Consortium 2009 Godson Server-Class CPU Institute of Computing Technology - Chinese Academy of Sciences 4-Core Reconfigurable Architecture
65-nm Technology Directory-Based Coherence 8 Config. Address Protocol Safeguards Windows of Each Master Port Allow Cache Data Pages Migration Across L2 and Memory
Nodes Organized in Mesh
ncHT1.0 ncHT1.0 8x8 AXI Switch PCIe PCIe
Shared L2 Configurable DMA Engine Supports As Internal RAM, DMA 2 Links for Each Node’s Pre-Fetch and Matrix To Internal RAM Directly 4 Connection Points (Stream Processor)
Copyright HyperTransport Consortium 2009 Godson Server-Class CPU (cont.) Institute of Computing Technology - Chinese Academy of Sciences Godson Versions
8-Core Multi-Chip 20W Version Possible in 2009
Copyright HyperTransport Consortium 2009 Godson Server-Class CPU (cont.) Institute of Computing Technology - Chinese Academy of Sciences
Godson Cores Profile
Copyright HyperTransport Consortium 2009 HTXTM Spotlight
How and Why HyperTransport HTX Proves Best Choice for Compute-Intensive Applications
Copyright HyperTransport Consortium 2009 HTXTM Values Snapshot
Enables • HPC Products Demanding Performance Beyond the Reach of PCI-Class Interconnects • Integration of System Functionality Too New/Complex/Costly for MB Integration Empowers • HPC Solution Providers with a Competitive Edge – No Risks of Premature MB Integration – Shortest Time-to-Market – One MB Fits Multiple Markets/Applications – Up-Sell Factor
Copyright HyperTransport Consortium 2009 HTXTM Applications
Compute Intensive • High Bandwidth + Low Latency • Multi-Processing, Co-Processing Target Markets • Database Analytics • High Traffic Web Services • Stock Trading Acceleration • Server Clustering and SMP • Streaming Media Servers • Financial Modeling
Copyright HyperTransport Consortium 2009 Expanding HTXTM Product Ecosystem
Server / MB
Data Analysys Coprocessor HTXTM Content-Aware Routing Processor
High-Perf Server Clustering Controller
Content/Security Processor
Content/Security Processor
10GE NIC Ref Design
Universal HTX/HTX3 Board Ref Design
FPGA Ref Design Board More Innovative HTXTM Systems and Subsystems in the Pipeline Copyright HyperTransport Consortium 2009 New HTXTM Systems
HTX HTX PCIe PCIe PCIe PCIe PCIe PCIe PCIe ProL iant D x16 x16 x4 x4 x16 x4 x4 x4 x8 L165- Slot Blank 9 Blank 8 7 6 5 4 3 2 1 G5
ProLiant DL785-G5
Copyright HyperTransport Consortium 2009 New HTXTM Subsystems
NumaChip Technology
Cache-Coherent Shared Memory Processor for Scalable Server Clustering
Copyright HyperTransport Consortium 2009 New HTXTM Subsystems
Vulcan Content-Aware Routing Processor for Multi-Core Systems Delivers Unprecedented Multi-Core Processing and Power Optimization
Applications High-Traffic Web Telecom Automated Trading High Throughput, Fast Network Access
Copyright HyperTransport Consortium 2009 New HTXTM Reference Designs
HTX3TM Universal Reference Design Board
HT3 Core IP
Copyright HyperTransport Consortium 2009 Why HTX3TM ?
Empowers Future HPC Innovation
• FPGAs Playing Key Role in Compute-Intensive Designs • HTX3 Paves Way for New Generation FPGA Technology – FPGAs from Bandwidth Bottlenecks to Performance Drivers • Power Optimization Ranks High in HPC Agenda • HT 3.0 Has Reached Maturity and Stability • HT 3.0 Capability Now Safely and Stably “Connectorized”
Reinforces HTX Performance Edge over PCI Express
Copyright HyperTransport Consortium 2009 HTX3TM Features Summary
Feature HTX HTX3 Notes Max Clock Rate 800 MHz 2.6 GHz 12” Trace length Max Bandwidth x Lane 1.6 GT/s 5.2 GT/s Bi-directional Max Bandwidth 6.4 GB/s 20.8 GB/s Bi-directional 16-Bit HT link Aggregate HT3 Link Splitting NO YES HT link can be 1x 16-Bit or Support 2x 8-Bit for multi-CPU support HT3 Extended Power NO YES LDTREQ# Signal Added to Management participate in x86 power states Extended FPGA NO YES Incorporated field-proven Guidelines recommendations Full Backward -- YES Level shifters and signal Compatibility allocation
For more details, see HTX3 specifications on HTC’s web site
Copyright HyperTransport Consortium 2009 HTXTM a Substitute for PCI Express?
No – HTX Complements and Coexists with PCIe by Providing the Capability that PCIe Cannot Deliver
DDR Memory DDR Memory
Chipset HTX TM TM
HTX3 16-Bit
Direct Connect to 2x 8-Bit Compute-Intensive Subsystems TM Peripheral HTX3 Interconnects
Copyright HyperTransport Consortium 2009 Unique HTXTM Capabilities
Aggregate Latency Advantage
• 20% Better Physical Layer Latency and Bandwidth due to Absence of 8B/10B Clock Recovery Overhead – No SerDes • 55% Lower Latency Per Transaction due to Absence of Intermediate Control Logic Overhead – 95nS of PCIe Gen2’s Estimated Round Trip Penalty out of 170nS Total on Short, Open Page DRAM Reads • Vastly Leaner Protocol (Packet Payload) – 12 Less Bytes of Overhead per Packet Compared to PCIe • 20nS Better Per-Transaction Latency in Heavy Traffic Environments due to HT’s Priority Request InterleavingTM
Copyright HyperTransport Consortium 2009 TM Unique HTX Capabilities (cont.)
Up to Twice Packet/Latency Efficiency in Intra-Processor Traffic
TM HTX Packet Overhead Efficiency Margins over PCIe
Efficiency
Min Overhead
Max Overhead
Usual Intra-Processor Traffic Data Bytes per Packet
Copyright HyperTransport Consortium 2009 TM Unique HTX Capabilities (cont.)
Considerable Per-Packet Latency Advantage
TM HTX3 Per Packet Latency Advantage over PCIe Gen2 nS HTX3: 2.6 GHz - x16 Links
PCIe: 5.0 GHz – x16 Links
Min Packet Overhead
Max Packet Overhead
Usual Intra-Processor Traffic Data Bytes Per Packet
Latency Advantage nS
Latency Advantage nS
The results take into account PCIe’s 20% clock recovery, packet payload and 55% chipset overhead penalties. HTX’s Priority Request Interleaving, if applicable, will add to HTX’s total latency advantage.
Copyright HyperTransport Consortium 2009 TM Unique HTX Capabilities (cont.)
Superior Bandwidth
Feature PCIe PCIe HTX HTX3 Gen1 Gen2 Max Clock Rate 2.5 GHz 5.0 GHz 800 MHz 2.6 GHz Double Data Rate NO NO YES YES Max Bandwidth x Lane 2.5 Gbps 5.0 Gbps 1.6 GT/s (*) 5.2 GT/s (*) 8B/10B Penalty -20% -20% No Penalty No Penalty Net Bandwidth x Lane 2.0 Gbps 4.0 Gbps 1.6 GT/s (*) 5.2 GT/s (*) Net Bandwidth 8 16 6.4 20.8 16-Bit - Aggregate Gbytes/s Gbytes/s GBytes/s GBytes/s
(*) HyperTransport supports Double Data Rate (DDR), transferring data on both the leading and trailing edge of the clock. Therefore HyperTransport’s bandwidth is more appropriately represented by the term “Transfers/second” than the term “Bits/second.”
Copyright HyperTransport Consortium 2009 TM Unique HTX Capabilities (cont.)
Tangible Time-to-Result Savings!
Compute-Intensive Tasks Require 100Ks to Billions of Packet Transactions HTX3TM Time-to-Result Savings vs. PCIe Gen2
Number of Packets 100,000 1 Million 1 Billion Transferred Per Task Per Task Per Task Bytes per Packet Transferred 4 0.78 mS 7.8 mS 7.8 Sec 16 4 mS 40 mS 40 Sec 256 0.32 Sec 3.20 Sec 53 Min 512 1.16 Sec 11.62 Sec 3.23 Hrs
The results take into account PCIe’s 20% clock recovery, packet payload and 55% chipset overhead penalties. HTX’s Priority Request InterleavingTM, if applicable, will add to HTX’s total time-to-result latency advantage
Copyright HyperTransport Consortium 2009 TM Unique HTX Capabilities (cont.)
Example: Celoxica’s Accelerator Company’s Benchmark Results
Latency Access to Network Data Regardless of Packet Size
e face fac ter TM Inter In HTX 1.4 uS <10 uS
Copyright HyperTransport Consortium 2009 HPC - Industry’s Bright Star
Strong Business Growth Opportunities
Copyright HyperTransport Consortium 2009