Hypertransport Consortium Globalpress Presentation
Total Page:16
File Type:pdf, Size:1020Kb
HyperTransportTM Technology Tutorial Pict u r e Prof. José Duato Technical University of Valencia, Spain Simula Research Laboratory, Oslo, Norway HyperTransport Technology Consortium Hot Chips Symposium August 23, 2009 www.hypertransport.org Copyright HyperTransport Consortium, 2009 1 With us Today and Happy to AddressYour Questions Brian Holden Mario Cavalli VP and Chair, Technical Working Group General Manager HyperTransport Technology Consortium HyperTransport Technology Consortium [email protected] [email protected] 408-472-6310 925-968-0220 Topics: Topics: HyperTransport Technology HyperTransport Market Positioning HyperTransport Consortium Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 2 Topics • Scope and Design Goals • HyperTransport Defined • Host I nterface • Connecting Device to Host • Connecting Multiple Devices to Host • InterconnectingMultipleHosts • AMD Cache Coherence Support • Beyond Motherboards • New in HT3 • Beyond HT3 • Beyond Conventional • HyperTransport Technology Consortium Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 3 Scope and Design Goals • System Area Network Supporting Cache-Coherent Shared-Memory Multiprocessors and I / O Devices • High-Performance Replacement for Processor Front Side Bus (Point-to-Point Links vs. Bus) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 4 Scope and Design Goals (cont.) • System Area Network Supports Cache-Coherent Shared-Memory Multiprocessors and I / O Devices • High-Performance Replacement for Processor Front Side Bus (Point-to-Point Links vs. Bus) • HyperTransport’s Distinction: Processor-Native • Integrated in Processor Architectures Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 5 Scope and Design Goals (cont.) • Lowest-Latency, High Bandwidth, Cost-Effective, Reliable Motherboard-Level I nterconnect • SMP Programming Model • Unified I nterface For Local and Remote Memory • Self-Configuring Topology and Link Speed • HT3 Enhancements • IncreasedBandwidthand Reliability • Link Splitting • Dynamic Power Management • AC Mode • Hot Plugging Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 6 Typical Server Architecture Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 7 HyperTransportTM Defined Nine Years of Fine Tuning, Perfecting, Polishing Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 8 Host I nterface Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 9 Host I nterface (cont.) • Single I nterface for All Cores ( SRQ/ SRI ) • On-Chip Crossbar and Routing • Host Bridge for I / O Device Chain Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 10 Northbridge Architecture Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 11 Connecting Device to Host • Approach: High Speed Point-to-Point Parallel Link • Point-to-Point Link Minimizes Parasitic Capacitance • Clock Forwarding Removes Clock Recovery Overhead • Parallel Link Delivers High Bandwidth, Low Latency • Control and Data Packets Interleaved on Each Link • CTL Signal Distinguishes Between Control and Data • Tw o Additional System Signals: PWROK and RESET Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 12 HT Physical Layer • Low-Voltage Differential Signalling (LVDS) • Pre-Emphasis Supports Higher Clock Rates Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 13 HT Physical Layer (cont.) • Clock Rate: From 200 MHz to 3.2 GHz • Link Width: 2, 4, 8, 16, 32-Bit Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 14 HT Physical Layer (cont.) • Support for Asymmetric and Mixed Link Width Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 15 HT Transaction/ Data Link Layer Header DATA 8 or 12 4-64 Bytes Bytes Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 16 HT Basic Read/ Write Sequences Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 17 HT Request Packet Format Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 18 HT Request Packet Format (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 19 HT Request Packet Format (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 20 HT Read Response Packet Format Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 21 Com m on HT Com m and Types Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 23 Connecting Multiple Devices to Host Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 24 HyperTransport I / O Device Configurations Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 25 Routing to Target Device Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 26 Pipelining Multiple Requests Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 27 Communication Between Two I/ O Devices Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 28 Communication Between Two I/ O Devices (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 29 Transmission Error Handling Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 30 Priority Request InterleavingTM Data transfer 1 under way CPU Peripheral A Peripheral B 1 PRI While transfer 1 carries on… CPU Peripheral A Peripheral B 2 Data transfer 2 initiates while data transfer 1 still under way CPU Peripheral A Peripheral B 3 Lowest Achievable Latency Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 31 InterconnectingMultipleHosts • Coherent vs. Non-Coherent HyperTransport • cHT-Enabled Links Configurable at Boot Time Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 34 Cache Coherence Support Proprietary Technology Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 35 AMD cHT Basics • On-Chip Support for Up to 8 CPUs • Broadcast-Based 3-Hop I nvalidation Cache Coherence Protocol Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 36 AMD cHT Read Request Example Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 37 AMD cHT Read Request Example (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 38 AMD cHT Read Request Example (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 39 AMD cHT Read Request Example (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 40 AMD cHT Read Request Example (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 41 AMD cHT Read Request Example (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 42 AMD cHT Read Request Example (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 43 AMD cHT Read Request Example (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 44 AMD cHT Read Request Example (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 45 AMD cHT Read Request Example (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 46 AMD cHT HT-Assist ( Probe Filter) • Old cHT Broadcast Protocol Broadcasts Probes to InvalidateCopies evenifMemoryLine isClean Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 47 AMD cHT HT-Assist (Probe Filter) (cont.) • Sparse Directory Cache Next to Memory Controller • Rule: I f a Line is Cached, it has an Entry in PF • Replacement Policy Makes Room for New Lines • Enhanced behavior: • No Probing for Uncached Lines • Directed Probe to Request Copy of Cached Line • Benefits: • Significantly Less Bandwidth Use • Shorter Access Latency, Mainly for Uncached Lines For Mor e Det ails: “Blade Computing with The AMD Magny-Cours Processor” Presented by - Pat Conway, Hot Chips 200 9 Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 48 AMD cHT HT-Assist (Probe Filter) (cont.) Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 49 Beyond Motherboards Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 50 HTX Connector • Low Latency CPU-to-High- Perf. Subsystem Direct Connect • Removes Performance Bottlenecks in Compute- Intensive Data Processing and Acceleration Functions • Com plem ents PCI - Class Interconnects • Link Splitting Capability Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 51 HTX Specification Evolution Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 52 New in HyperTransportTM 3 HT3 Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 53 HT3 - Link Splitting All Links or I ndividual Links 1x 4-Bit 2x 2-Bit 1x 8-Bit 2x 4-Bit 1x 16-Bit 2x 8-Bit 1x 32-Bit 2x 16-Bit Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 1 HT3 - Link Splitting (cont.) Extended SMP Topologies Enabled I/ O CPU CPU I/ O I/ O I/ O CPU CPU I/ O CPU CPU I/ O CPU CPU I/ O CPU CPU I/ O I/ O I/ O I/ O CPU CPU I/ O Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 55 HT3 – Dynamic Power Management • Dynamic Link Width, Clock Rate and Voltage Scaling • Partial Link Shutdow n via Link Splitting Hot Chips Symposium 2009 Copyright HyperTransport Consortium, 2009 56 HT3 – AC Mode (Optional – Enabled if Needed) • 8b/ 10b Encoding • Lower Bandwith, Higher Latency than DC Mode • DC/ AC Autoconfiguration • TX Equalization DC Mode AC Mode Decoupling Caps HT3 Spec HT3 Spec at < = 12 In at < = 3 Ft Tr a n sm i t w i t h Pr e - In-System Backplane and Post-Cursor De-Emphasis Chassis-