AMD's Direct Connect Architecture

AMD's Direct Connect Architecture

AMD’s Direct Connect Architecture 4 Node Opteron MP Architecture Pat Conway 12.8 GB/s DRAM DRAM Principal Member of Technical Staff 128-bit “Fully Connected 4 Node and 8 Node Core 0 Core 0 MCT MCT Topologies are becoming Feasible” SRI SRI Core 1 Core 1 ncHT Additional HyperTransport™ Ports XBAR XBAR Glueless Multiprocessing with I/O I/O HT HT I/O • Enable Fully Connected 4 Node (four x16 HT) HT-HB HT HT HT-HB 4.0GB/s per Opteron CMP processors direction and 8 Node (eight x8 HT) cHT @ 2GT/s Data Rate • Reduced network diameter This poster illustrates the variety of Opteron system topologies built to date using – Fewer hops to memory HyperTransport and AMD's Direct Connect Architecture for glueless multiprocessing HT HT • Increased Coherent Bandwidth Four x16 HT I/O I/O and summarizes the latency/bandwidth characteristics of these systems. HT HT – more links HT-HB HT-HB OR – cHT packets visit fewer links XBAR XBAR Building and using Opteron systems has provided useful, real world lessons which expose – HyperTransport3 up to 5.2GT/s the challenges to be addressed when designing future system interconnect, memory Core 0 Core 0 • Benefits “NorthBridge” hierarchy and I/O to scale up both the number of cores and sockets in future x86 CMP SRI Core 1 MCT Core 1 SRI MCT Architectures. – Low latency because of lower diameter – Evenly balanced utilization of HyperTransport links Eight x8 HT – Low queuing delays DRAM DRAM Low latency under load Lessons Learned #1 Torrenza - HyperTransport-based Accelerators Memory Latency is the Key to Application Imagine it, Build it Fully Connected 4 Node Performance Performance! 4 Node Square 4 Node fully connected SOASOA XMLXML I/O I/O I/O I/O AcceleratorAccelerator AcceleratorAccelerator 16 16 P0 P2 P0 P2 Performance vs Average Memory Latency (single 2.8GHz core, 400MHz DDR2 PC3200, 2GT/s HT with 1MB cache in MP system) P1 P3 P1 P3 Native Native 7 120% Dual-Core Dual-Core I/O I/O I/O I/O 6 8 GB/S 100% + 2 EXTRA LINKS W/ HYPERTRANSPORT3 5 4N FC (4.4GT/s 80% SRQ SRQ 4N SQ (2GT/s 4N FC (2GT/s OLTP1 Crossbar Crossbar HyperTransport) HyperTransport) HyperTransport3) Mem.Ctrlr Mem.Ctrlr 4 OLTP2 HT HT 60% Diam 2 Avg Diam 1.00 Diam 1 Avg Diam 0.75 Diam 1 Avg Diam 0.75 SW99 3 8 GB/S 8 GB/S XFIRE BW 14.9GB/s XFIRE BW 29.9GB/s XFIRE BW 65.8GB/s SSL 40% 2 JBB (2X) (4X) OtherOther PCI-EPCI-E FLOPs 20% Optimized Bridge FLOPs 1 Optimized Bridge Accelerator System Performance SiliconSilicon Accelerator XFIRE (“crossfire”) 0 0% Processor Performance BW is the link-limited 8 GB/S all-to-all communication 1N 2N 4N (SQ) 8N (TL) 8N (L) bandwidth (data only) USBUSB I/O Hub I/O Hub 4 Node Square 1 Node 8N Ladder PCIPCI I/O I/O I/O P0 P2 P4 P6 I/O P0 P0 P2 Fully Connected 8 Node Performance P1 P3 • Open platform for system builders I/O I/O P1 P3 P5 P7 I/O I/O I/O – 3rd Party Accelerators 8 Node Fully Connected AvgD 0 hops 1 hops 1.8 hops –Media 8 Node8 Node 6HT 6HT 2x4 2x4 I/O I/O 8N Twisted Ladder 8 Latency x + 0ns x + 44ns (124 cpuclk) x + 105ns (234 cpuclk) P0 P2 P5 P4 16 8 –FLOPs I/O P0 P2 P4 P6 I/O I/O I/O P1 P3 I/O 543 435 I/O I/O P0 P7 I/O P0 0 0 P2 1 1 OR 8N Twisted Ladder 2 2 –XML I/O I/O P1 P6 3 4 I/O P1 P3 P5 P7 I/O 2 21 1 P4 P6 P1 0 0 P3 P2 P3 5 43 435 P0 P2 P6 –SOA I/O I/O P5 P7 I/O P4 I/O I/O I/O • AMD Opteron™ Socket or HTX slot I/O I/O 2 Node I/O P1 P3 P5 P7 I/O • HyperTransport interface is an open standard see www.hypertransport.org I/O P0 P1 I/O • Coherent HyperTransport interface available if the accelerator caches system memory (under 8N TL (2GT/s 8N 2x4 (4.4GT/s 8N FC (4.4GT/s 0.5 hops 1.5 hops license) HyperTransport) HyperTransport3) HyperTransport3) x + 17ns (47 cpuclk) x + 76ns (214 cpuclk) • Homogeneous/heterogeneous computing Diam 3 Avg Diam 1.62 Diam 2 Avg Diam 1.12 Diam 1 Avg Diam 0.88 • Architectural enhancements planned to support fine grain parallel communication and XFIRE BW 15.2GB/s XFIRE BW 72.2GB/s XFIRE BW 94.4GB/s synchronization (5X) (6X) 2006 Workshop on On- and Off-Chip Interconnection Networks for Multicore Systems, 6-7 December 2006, Stanford, California.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    1 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us