D 3.1 Report on Current and Emerging Transport Technologies

Ref. Ares(2019)562628 - 31/01/2019 HORIZON 2020 TOPIC FETHPC-02-2017 Transition to Exascale Computing Exascale Programming Models for Heterogeneous Systems 801039 D 3.1 Report on Current and Emerging Transport Technologies WP3: Efficient and Simplified Usage of Diverse Memories Date of preparation (latest version): 31/1/2019 Copyright c 2018 { 2021 The EPiGRAM-HS Consortium The opinions of the authors expressed in this document do not necessarily reflect the official opinion of the EPiGRAM-HS partners nor of the European Commission. D 3.1: Report on Current and Emerging Transport Technologies 2 DOCUMENT INFORMATION Deliverable Number D 3.1 Deliverable Name Report on Current and Emerging Transport Technologies for Data Movement Due Date 31/1/2019 (PM 5) Deliverable lead Adrian Tate Authors Tim Dykes Harvey Richardson Adrian Tate Steven W. D. Chien Responsible Author Adrian Tate (Cray) e-mail: [email protected] Keywords memory data movement, transport WP/Task WP 3/Task 3.1/3.4 Nature R Dissemination Level PU Planned Date 28/1/2019 Final Version Date 31/1/2019 Reviewed by Stefano Markidis (KTH) Martin Kuehn (FhG) MGT Board Approval YES D 3.1: Report on Current and Emerging Transport Technologies 3 DOCUMENT HISTORY Partner Date Comment Version Cray UK 15/01/2019 version for internal review v0.1 Cray UK 24/01/2019 including comments from S. Markidis v0.2 Cray UK 28/01/2019 Final version including all feedback v1.0 D 3.1: Report on Current and Emerging Transport Technologies 4 Executive Summary This report presents an overview of the current and upcoming landscape of heterogeneous memory technologies for pre-Exascale and Exascale high performance computing systems, surveying both hardware and software interfaces and identifying those of great- est importance to the EPiGRAM-HS project. A series of observations are made to guide the Memory Work Package, and the project as a whole, with particular consideration given to the requirement to present, allocate, and move data and to design the memory- abstraction device to be developed as part of EPiGRAM-HS. The report concludes that while the landscape of Exascale memory systems and software is complex, a subset of the hardware and software technologies are robust enough to be relied upon to develop interesting advances in this area. D 3.1: Report on Current and Emerging Transport Technologies 5 Contents 1 Introduction 7 2 Heterogeneous Memory Systems 8 2.1 Memory Trends . .8 2.2 Exascale Memory Systems . .8 2.3 Exascale Programmes . .9 2.4 Review of Exascale Candidate Memories . 13 2.4.1 Static Random-Access Memory (SRAM) . 13 2.4.2 High Bandwidth Memory (HBM) . 14 2.4.3 Dynamic Random-Access Memory (DRAM) . 15 2.4.4 DIMM-Based Non-Volatile Memory (NVDIMM) . 16 2.4.5 Solid State Drive (SSD) . 17 2.4.6 Hard Disk Drive (HDD) . 18 3 Interfaces to Memory and Storage 19 3.1 Operating System Memory Management . 19 3.2 Low-Level Hardware Support . 20 3.2.1 The Peripheral Component Interface (PCI) . 20 3.2.2 Non-Volatile Memory Express (NVMe) . 21 3.2.3 Storage Device Interfaces . 22 3.2.4 Accelerators . 22 3.2.5 Bus and Interconnect Developments . 22 3.3 Shared Memory . 23 3.3.1 Basic Memory Allocation Support . 23 3.3.2 POSIX Shared Memory . 24 3.3.3 Malloc-Like Memory Allocators . 24 3.3.4 Memory Copy and Move APIs . 25 3.3.5 Persistent Memory APIs . 25 3.3.6 Tmpfs . 26 3.3.7 XPMEM . 26 3.4 OS Support for Heterogeneity . 27 3.4.1 NUMA Memory Management . 27 3.4.2 Heterogeneous Memory Management . 27 3.5 Programming Frameworks and APIs . 28 3.5.1 CUDA . 28 3.5.2 ROCm . 30 3.5.3 HSA . 31 3.5.4 OpenMP . 31 3.5.5 OpenCL . 32 3.5.6 OpenACC . 33 3.5.7 Memkind . 33 3.5.8 MPI for Memory Access . 34 3.5.9 Higher Level Programming Frameworks . 34 3.6 Off-Node Access to Memory . 36 D 3.1: Report on Current and Emerging Transport Technologies 6 3.6.1 RDMA . 36 3.6.2 Burst Buffers . 36 3.6.3 Object Store . 37 3.6.4 Data Brokering and Workflow Services . 38 4 Discussion 39 Observation 1 . 39 Observation 2 . 42 Observation 3 . 42 Observation 4 . 43 D 3.1: Report on Current and Emerging Transport Technologies 7 1 Introduction The EPiGRAM-HS project seeks to research and apply the state-of-the-art in programming environments for heterogeneous Exascale platforms, with the goal of ènabling ex- treme scale applications on heterogeneous hardware'. Work Package 3 (WP3) of the EPiGRAM-HS project focuses on the memory technologies in such systems with the goal of providing ‘simplified efficient usage of complex memory systems'. This document surveys pre-Exascale and Exascale data movement technologies in order to define a set of relevant hardware and software technologies to be pursued in WP3. Within this document the term `transport' is defined as a technology or set of technologies by which data is moved from one storage location to another (possibly remote) storage location. It is important to note that a canonical interpretation of the term `transport' specifically concerns low-level protocols within buses and networks e.g. those used within PCIe and NVLink. While such transports are mentioned, this document is primarily concerned with the movement of data into and out of storage locations. The investigation is motivated by recent and upcoming changes in the balance of computing systems and hence in supercomputers. Today's software system and programming environment were designed primarily during an earlier paradigm, when arithmetic operations were costly compared to data movement. Since the reverse is true today, the software systems are not well equipped to minimise data movement or even to present to the programmer toolsets that allow fine control of data movement. This landscape and the constraints acting within it are described in Section 2.1. Data movement is a broad term from a technical perspective and could encompass large swathes of the HPC and technical computing sectors. Furthermore there has been an explosion in the variants of storage media that are either available at the time of writ- ing1 or would be available in the next few years. The potential scope of this document is therefore extremely wide. However, many candidate technologies are not realistically relevant to mainstream HPC and/or Exascale computing. Section 2.2 defines an Exascale Candidate technology as being one which has the right combination of availability, cost, and performance attributes to be deemed relevant to the EPiGRAM-HS project. Thank- fully, the list of such technologies is manageable. As such there are many technologies not discussed in this document, e.g. Memristors which may be interesting to post-Exascale computing but which do not fit the above criteria and so are not discussed. Concerning data-movement software, the question of relevance to Exascale is more nuanced and this report is in many ways supporting information / analysis of this point. Given the variety of storage media that are expected to play a role in an Exascale system (as outlined in Section 2.3), it is important to understand which software technologies are required in order to successfully use each class of memory efficiently. Ùsage' of a memory will include each of: explicit programming, performance-portability features, and abstraction of the memory. The relevant technologies are reviewed in Section 3, and in Section 4 we classify each technology by its relevance to the project and present key observations and analysis as a guide for future activity in the EPiGRAM-HS project. 1January 2019 D 3.1: Report on Current and Emerging Transport Technologies 8 Figure 1: Growth in processor performance over 40 years, from [1]. 2 Heterogeneous Memory Systems 2.1 Memory Trends The slowdown in CPU performance gains over recent years has been extremely well docu- mented. At the turn of the millennium the 50% annual growth rate that had characterised the previous 15 years began to slow down and in recent years the growth rate has been less than 5% as shown in Figure 1. Although the growth-rate and its decline is an extremely nuanced topic, the most significant factor has been the end of Dennard scaling - the linear relationship between supply voltage and transistor size/density, allowing chip makers to effectively make smaller, faster chips - which ended in about 2002. Since this time, chip makers have relied upon increasing the amount on-chip parallelism which has stressed software implementations and made high efficiency more difficult to obtain. Although processor performance increases have slowed, memory performance has con- sistently decreased relative to processor performance since 1980. The dramatic divergence can be seen in Figure 2 which plots memory performance (measured as DRAM access latency) and processor performance on a single core, over 35 years. With 1980 as the baseline, modern memories have slowed by three orders of magnitude relative to processor performance. Even if the processor performance were to continue to plateau, it would take many decades for this deficit to to be corrected. This chart illustrates why DRAM advances are no longer satisfactory and why we are seeing enormous innovation in memory systems (see 2.4.2). We have not seen a dramatic change in the software environment in order to address this three orders of magnitude shift. 2.2 Exascale Memory Systems As described in Section 1 the Exascale candidate memory technologies that this document reviews is a subset of the total number of technologies that could be considered.

D 3.1 Report on Current and Emerging Transport Technologies

Reflex: Remote Flash at the Performance of Local Flash

Multi-Resolution Storage and Search in Sensor Networks Deepak Ganesan University of Massachusetts Amherst

Network-Compute Co-Design for Distributed In-Memory Computing” Advisors: Prof

Succinct: Enabling Queries on Compressed Data

Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory

(ASE 18 US 8,370.224 B2 Page 2

Bluecache: a Scalable Distributed Flash-Based Key-Value Store

Building Distributed Storage with Specialized Hardware

Iot Governance, Privacy and Security Issues EUROPEAN RESEARCH CLUSTER on the INTERNET of THINGS

Zipg: a Memory-Efficient Graph Store for Interactive Queries

Towards Elastic High-Performance Geo-Distributed Storage in the Cloud

A Scalable Distributed Flash-Based Key-Value Store