S7281: Device Lending: Dynamic Sharing of Gpus in a Pcie Cluster

S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster Jonas Markussen PhD student Simula Research Laboratory Outline • Motivation • PCIe Overview • Non-Transparent Bridges • Device Lending Distributed applications may need to access and use IO resources that are physically located inside remote hosts Front-end . Control + Signaling + Data Interconnect . … … … Compute node Compute node Compute node Software abstractions simplify the use and allocation of resources in a cluster and facilitate development of distributed applications Control + Handled in software Signaling + . • rCUDA … Data • CUDA-aware Open MPI … … • Custom GPUDirect RDMA implementation … Front-end • . … … Logical view of resources Local resource Remote resource using middleware Application Application CUDA library + driver CUDA – middleware integration Local Middleware service PCIe IO bus Interconnect transport (RDMA) Interconnect Interconnect transport (RDMA) Middleware service/daemon Remote CUDA driver PCIe IO bus In PCIe clusters, the same fabric is used both as local IO bus within a single node and as the interconnect between separate nodes Memory bus PCIe interconnect switch RAM External PCIe cable CPU and chipset Interconnect PCIe bus switch PCIe interconnect PCIe IO device host adapter Local resource Remote resource over native fabric Application Application CUDA library + driver CUDA library + driver Local PCIe IO bus PCIe IO bus PCIe-based interconnect Remote PCIe IO bus PCIe Overview PCIe is the dominant IO bus technology in computers today, and can also be used as a high-bandwidth low-latency interconnect 35 30 25 20 PCIe x4 15 PCIe x8 PCIe x16 10 Gigabytes per second (GB/s) 5 0 Gen 2 Gen 3 Gen 4 PCI-SIG. PCI Express 3.1 Base Specification, 2010. http://www.eetimes.com/document.asp?doc_id=1259778 Memory reads and writes are handled by PCIe as transactions that are packet-switched through the fabric depending on the address CPU and chipset RAM • Upstream • Downstream • Peer-to-peer (shortest path) PCIe device PCIe device PCIe device IO devices and the CPU share the same physical address space, allowing devices to access system memory and other devices Address space Interrupt vecs 0x00000… 0xfee00xxx CPU and chipset IO device IO device RAM IO device RAM 0xFFFFF… PCIe device • Memory-mapped IO (MMIO / PIO) • Direct Memory Access (DMA) • Message-Signaled Interrupts (MSI-X) PCIe device PCIe device Non-Transparent Bridges Remote address space can be mapped into local address space by using PCIe Non-Transparent Bridges (NTBs) Address space NTB CPU and chipset CPU and chipset Local RAM RAM RAM Local host NTB addr mapping Remote host Local Remote 0xf000 0x9000 . PCIe NTB adapter PCIe NTB adapter Using NTBs, each node in the cluster take part in a shared address space and have their own “window” into the global address space A’s addr space Global addr space Local IO devices Addr space in A Addr space in B Global addr space Addr space in C Local RAM C’s addr space A B C Local IO devices Exported address range NTB-based interconnect Local RAM Device Lending A remote IO device can be “borrowed” by mapping it into local address space, making it appear locally installed in the system Device driver Owner CPU and chipset CPU and chipset Borrower RAM RAM NTB addr mapping Remote Local 0xb000 0x2000 . PCIe hot-plug Physical device NTB adapter NTB adapter Inserted device 0xb000 0xe000 0x1000 0x2000 By intercepting DMA API calls to set up IOMMU mappings and inject reverse NTB mappings, physical location is completely transparent Device driver CPU and chipset Borrower Owner CPU and chipset dma_addr = dma_map_page(0x9000); RAM RAM NTB addr mapping IOV Phys Use addr 0xf000 Local Remote 0x5000 0x9000 0xf000 0x5000 . IOMMU Physical device NTB adapter NTB adapter Inserted device 0xb000 0xe000 0x1000 0x2000 Borrowed remote resource Resource appears local Application to OS, driver, and app CUDA library + driver Local PCIe IO bus Unmodified local driver (with hot-plug support) PCIe NTB interconnect Hardware mappings ensure fast data path Works with any PCIe device Remote (even individual SR-IOV functions) PCIe IO bus Borrowed remote resource Remote resource using middleware Application Application CUDA library + driver CUDA – middleware integration Local Middleware service PCIe IO bus Interconnect transport (RDMA) PCIe NTB interconnect Interconnect Interconnect transport (RDMA) Middleware service/daemon Remote CUDA driver PCIe IO bus PCIe IO bus Borrowed remote resource Local resource Application Application CUDA library + driver CUDA library + driver Local PCIe IO bus PCIe IO bus PCIe NTB interconnect Remote PCIe IO bus Device-to-host memory transfer 14 12 10 8 6 4 2 Gigabytes per second (GB/s) 0 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB Transfer size bandwidthTest (Local) bandwidthTest (Borrowed) PXH830 DMA (GPUDirect RDMA) GPU: Quadro P400 Nvidia driver: Version 375.26 (Centos 7) 1. Nvidia CUDA 8.0 Samples bandwidthTest 2. GPUDirect RDMA benchmark using Dolphin NTB DMA CPU: Xeon E5-1630 3.7 GHz Memory: DDR4 2133 MHz https://github.com/Dolphinics/cuda-rdma-bench Using Device Lending, nodes in a PCIe cluster can share resources through a process of borrowing and giving back devices RAM Task A CPU + chipset Task A Task B Task C FPGA NIC SSD SSD SSD SSD NTB GPU GPU GPU SSD GPU SSD RAM Task B CPU + chipset NIC FPGA GPU NTB NIC GPUGPU GPUGPU SSDSSD RAM Task C SSD CPU + chipset FPGA GPU GPU GPU NTB Device pool http://mlab.no/blog/2016/12/eir/ Server room EIR – Efficient computer aided diagnosis framework for gastrointestinal examination Examination room Examination room Moving forward • Strategy-based management • Fail-over mechanisms • VFIO and other API integration (“SmartIO”) • Borrowing vGPU functions Thank you! “Device Lending in PCI Express Networks” My email address Selected ACM NOSSDAV 2016 publications “Efficient Processing of Video in a Multi Auditory Environment using Device Lending of GPUs” [email protected] ACM Multimedia Systems 2016 (MMSys’16) “PCIe Device Lending” University of Oslo 2015 Device Lending demo and more Visit Dolphin in exhibition area (booth 625) .

Load more