Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX . Leading provider of high-throughput, low-latency server and storage interconnect • FDR 56Gb/s InfiniBand and 10/40/56GbE • Reduces application wait-time for data • Dramatically increases ROI on data center infrastructure . Company headquarters: • Yokneam, Israel; Sunnyvale, California • ~1,200 employees* worldwide . Solid financial position • Record revenue in FY12; $500.8M, up 93% year-over-year • Q2’13 revenue of $98.2M • Q3’13 guidance ~$104M to $109M • Cash + investments @ 6/30/13 = $411.3M * As of June 2013 © 2013 Mellanox Technologies 2 Providing End-to-End Interconnect Solutions Comprehensive End-to-End Software Accelerators and Managment Management Storage and Data MXM FCA UFM VSA UDA Mellanox Messaging Fabric Collectives Storage Accelerator Unstructured Data Unified Fabric Management Acceleration Acceleration (iSCSI) Accelerator Comprehensive End-to-End InfiniBand and Ethernet Solutions Portfolio ICs Adapter Cards Switches/Gateways Long-Haul Systems Cables/Modules © 2013 Mellanox Technologies 3 Virtual Protocol Interconnect (VPI) Technology VPI Adapter VPI Switch Unified Fabric Manager Switch OS Layer Applications Networking Storage Clustering Management 64 ports 10GbE Acceleration Engines 36 ports 40/56GbE 48 10GbE + 12 40/56GbE 36 ports IB up to 56Gb/s Ethernet: 10/40/56 Gb/s 8 VPI subnets 3.0 InfiniBand:10/20/40/56 Gb/s From data center to campus and metro connectivity LOM Adapter Card Mezzanine Card © 2013 Mellanox Technologies 4 MetroDX™ and MetroX™ . MetroX™ and MetroDX™ extends InfiniBand and Ethernet RDMA reach . Fastest interconnect over 40Gb/s InfiniBand or Ethernet links . Supporting multiple distances . Simple management to control distant sites . Low-cost, low-power , long-haul solution 40Gb/s over Campus and Metro © 2013 Mellanox Technologies 5 Data Center Expansion Example – Disaster Recovery © 2013 Mellanox Technologies 6 Key Elements in a Data Center Interconnect Applications Servers Storage Switch and IC Cables, Silicon Photonics, Parallel Optical Modules Adapter and IC Adapter and IC © 2013 Mellanox Technologies 7 IPtronics and Kotura Complete Mellanox’s 100Gb/s+ Technology Recent Acquisitions of Kotura and IPtronics Enable Mellanox to Deliver Complete High-Speed Optical Interconnect Solutions for 100Gb/s and Beyond © 2013 Mellanox Technologies 8 Mellanox InfiniBand Paves the Road to Exascale © 2013 Mellanox Technologies 9 Meteo France © 2013 Mellanox Technologies 10 NASA Ames Research Center Pleiades . 20K InfiniBand nodes . Mellanox end-to-end FDR and QDR InfiniBand . Supports variety of scientific and engineering projects • Coupled atmosphere-ocean models • Future space vehicle design • Large-scale dark matter halos and galaxy evolution Asian Monsoon Water Cycle High-Resolution Climate Simulations © 2013 Mellanox Technologies 11 NCAR (National Center for Atmospheric Research) . “Yellowstone” system . 72,288 processor cores, 4,518 nodes . Mellanox end-to-end FDR InfiniBand, CLOS (full fat tree) network, single plane © 2013 Mellanox Technologies 12 Applications Performance (Courtesy of the HPC Advisory Council) 284% Intel Ivy Bridge 1556% Intel Sandy Bridge 173% 282% 182% © 2013 Mellanox Technologies 13 Applications Performance (Courtesy of the HPC Advisory Council) 135% 392% © 2013 Mellanox Technologies 14 Dominant in Enterprise Back-End Storage Interconnects SMB Direct © 2013 Mellanox Technologies 15 Leading Interconnect, Leading Performance 200Gb/s Bandwidth 100Gb/s 56Gb/s 40Gb/s 20Gb/s 10Gb/s 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Same Software Interface 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 5usec 2.5usec 1.3usec 0.7usec 0.6usec Latency <0.5usec © 2013 Mellanox Technologies 16 Connect-IB Architectural Foundation for Exascale Computing © 2013 Mellanox Technologies 17 Connect-IB : The Exascale Foundation . World’s first 100Gb/s interconnect adapter • PCIe 3.0 x16, dual FDR 56Gb/s InfiniBand ports to provide >100Gb/s . Highest InfiniBand message rate: 137 million messages per second • 4X higher than other InfiniBand solutions . <0.7 micro-second application latency . Supports GPUDirect RDMA for direct GPU-to-GPU communication . Unmatchable Storage Performance • 8,000,000 IOPs (1QP), 18,500,000 IOPs (32 QPs) . New Innovative Transport – Dynamically Connected Transport Service . Supports Scalable HPC with MPI, SHMEM and PGAS/UPC offloads Enter the World of Boundless Performance © 2013 Mellanox Technologies 18 Connect-IB Memory Scalability 1,000,000,000 8 nodes 2K nodes 10K nodes 100K nodes 1,000,000 1,000 1 InfiniHost, RC 2002 InfiniHost-III, SRQ 2005 ConnectX, XRC 2008 Connect-IB, DCT 2012 Host Memory Consumption (MB) Consumption Memory Host © 2013 Mellanox Technologies 19 Dynamically Connected Transport Advantages © 2013 Mellanox Technologies 20 FDR InfiniBand Delivers Highest Application Performance Message Rate 150 100 50 Message Rate (Million) Rate Message 0 QDR InfiniBand FDR InfiniBand © 2013 Mellanox Technologies 21 Scalable Communication MXM, FCA © 2013 Mellanox Technologies 22 Mellanox ScalableHPC Accelerate Parallel Applications MPI SHMEM PGAS P1 P2 P3 P1 P2 P3 P1 P2 P3 Logical Shared Memory Logical Shared Memory Memory Memory Memory Memory Memory Memory Memory MXM FCA • Reliable Messaging Optimized for Mellanox HCA • Topology Aware Collective Optimization • Hybrid Transport Mechanism • Hardware Multicast • Efficient Memory Registration • Separate Virtual Fabric for Collectives • Receive Side Tag Matching • CORE-Direct Hardware Offload InfiniBand Verbs API © 2013 Mellanox Technologies 23 MXM v2.0 - Highlights . Transport library integrated with OpenMPI, OpenSHMEM, BUPC, Mvapich2 . More solutions will be added in the future . Utilizing Mellanox offload engines . Supported APIs (both sync/async): AM, p2p, atomics, synchronization . Supported transports: RC, UD, DC, RoCE, SHMEM . Supported built-in mechanisms: tag matching, progress thread, memory registration cache, fast path send for small messages, zero copy, flow control . Supported data transfer protocols: Eager Send/Recv, Eager RDMA, Rendezvous © 2013 Mellanox Technologies 24 Mellanox FCA Collective Scalability Barrier Collective Reduce Collective 100.0 3000 90.0 2500 80.0 70.0 2000 60.0 50.0 1500 40.0 1000 Latency (us) Latency 30.0 20.0 Latency (us) 500 10.0 0.0 0 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Processes (PPN=8) processes (PPN=8) Without FCA With FCA Without FCA With FCA 8-Byte Broadcast 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Bandwidth (KB*processes) Bandwidth 0 500 1000 1500 2000 2500 Processes (PPN=8) Without FCA With FCA © 2013 Mellanox Technologies 25 FDR InfiniBand Delivers Highest Application Performance © 2013 Mellanox Technologies 26 GPU Direct © 2013 Mellanox Technologies 27 GPUDirect RDMA Receive Transmit System 1 System 1 Memory CPU GPUDirect 1.0 CPU Memory GPU Chip Chip GPU set set InfiniBand InfiniBand GPU GPU Memory Memory System 1 System 1 Memory CPU CPU Memory GPU Chip Chip GPU set set InfiniBand InfiniBand GPU GPU Memory GPUDirect RDMA Memory © 2013 Mellanox Technologies 28 Preliminary Performance of MVAPICH2 with GPUDirect RDMA GPU-GPU Internode MPI Latency GPU-GPU Internode MPI Bandwidth Small Message Bandwidth Small Message Latency 900 35 MVAPICH2-1.9 MVAPICH2-1.9-GDR MVAPICH2-1.9 MVAPICH2-1.9-GDR 800 30 700 Lower is Better Lower 25 600 19.78 20 500 3x 15 400 Higher is Better Higher Latency Latency (us) 69 % 300 10 Bandwidth (MB/s) 200 5 6.12 100 0 0 1 4 16 64 256 1K 4K 1 4 16 64 256 1K 4K Message Size (bytes) Message Size (bytes) Source: Prof. DK Panda 69% Lower Latency 3X Increase in Throughput © 2013 Mellanox Technologies 29 Preliminary Performance of MVAPICH2 with GPU-Direct-RDMA Execution Time of HSG (Heisenberg Spin Glass) Application with 2 GPU Nodes Source: Prof. DK Panda © 2013 Mellanox Technologies 30 Thank You © 2013 Mellanox Technologies 31 .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages31 Page
-
File Size-