A Case for RDMA in Clouds: Turning Supercomputer Networking Into Commodity

A Case for RDMA in Clouds: Turning Supercomputer Networking into Commodity Animesh Trivedi Bernard Metzler Patrick Stuedi IBM Research IBM Research IBM Research Saeumerstrasse 4 Saeumerstrasse 4 Saeumerstrasse 4 Rueschlikon, Switzerland Rueschlikon, Switzerland Rueschlikon, Switzerland [email protected] [email protected] [email protected] ABSTRACT General Terms Modern cloud computing infrastructures are steadily push- Design, Measurement, Performance ing the performance of their network stacks. At the hardware- level, already some cloud providers have upgraded parts of their network to 10GbE. At the same time there is a con- Keywords tinuous effort within the cloud community to improve the RDMA, Commodity Clouds, Ethernet Networking network performance inside the virtualization layers. The low-latency/high-throughput properties of those network in- terfaces are not only opening the cloud for HPC applica- 1. INTRODUCTION tions, they will also be well received by traditional large Applications running in today's cloud computing infras- scale web applications or data processing frameworks. How- tructures are increasingly making high demands on the net- ever, as commodity networks get faster the burden on the work layer. To support current data-intensive applications, end hosts increases. Inefficient memory copying in socket- terabytes worth of data is moved everyday among the servers based networking takes up a significant fraction of the end- inside a data center requiring high network throughput. For to-end latency and also creates serious CPU load on the large scale web applications like Facebook low data access host machine. Years ago, the supercomputing community latencies are key to provide a responsive interactive experi- has developed RDMA network stacks like Infiniband that ence to users [4]. And recently there have been cases running offer both low end-to-end latency as well as a low CPU foot- HPC workload inside scalable commodity cloud strengthen- 1 print. While adopting RDMA to the commodity cloud en- ing the need for a high performance network stack further . vironment is difficult (costly, requires special hardware) we To accommodate those requirements modern cloud infras- argue in this paper that most of the benefits of RDMA can tructures are continuously pushing the performance of their in fact be provided in software. To demonstrate our find- network stack. Most of the work focuses on improving the ings we have implemented and evaluated a prototype of a network performance inside the virtualization layers [14, 21]. software-based RDMA stack. Our results, when compared In parallel some cloud providers have upgraded parts of the to a socket/TCP approach (with TCP receive copy offload) networking hardware to 10GbE which recently has moved show significant reduction in end-to-end latencies for mes- closer to the price range acceptable for commodity clouds. sages greater than modest 64kB and reduction of CPU load While high network performance is greatly desired, it also (w/o TCP receive copy offload) for better efficiency while increases the burden for end hosts. This is especially true for saturating the 10Gbit/s link. applications relying on standard TCP sockets as their communication abstraction. With raw link speeds of 10Gbit/s, protocol processing inside the TCP stack can consume a significant amount of CPU cycles and also increases the end- to-end latency perceived by the application. A large part Categories and Subject Descriptors of the inefficiency stems from unnecessary copying of data C.2.5 [Local and Wide-Area Networks]: Ethernet; D.4.4 inside the OS kernel [3]. The increased CPU load is particu- [Communications Management]: Network communica- larly problematic as parallelizing the network stack remains tion|RDMA, Efficient data transfer, performance measure- challenging [25]. ment High performance networks are state of the art in supercomputers since a long time. For instance, the Remote Di- rect Memory Access (RDMA) network stack provides low- latency/high-throughput with a small CPU footprint. This is achieved in RDMA mainly by omitting intermediate data Permission to make digital or hard copies of all or part of this work for copies and offloading the protocol processing to hardware [16]. personal or classroom use is granted without fee provided that copies are Adopting RDMA by commodity clouds is, however, difficult not made or distributed for profit or commercial advantage and that copies due to cost and special hardware requirements. In this pa- bear this notice and the full citation on the first page. To copy otherwise, to per we argue that the basic concept and the semantics of republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. RDMA can be beneficial for modern cloud infrastructures APSys ’11 Shanghai, China 1 Copyright 2011 ACM X-XXXXX-XX-X/XX/XX ...$10.00. http://aws.amazon.com/hpc-applications/ as the raw network speed increases, and { most importantly applications to be scheduled during network transfers. As { that RDMA network support can efficiently be provided a consequence RDMA also is minimizing the penalty for in pure software, without a need for special hardware. To processor state pollution such as cache flushes. demonstrate our findings we have developed SoftiWARP, a Energy: The energy consumption of commodity data cen- software-based RDMA stack. We have evaluated our stack ters have been alarming [2]. Future cloud infrastructures against a socket-based approach and show that SoftiWARP need to squeeze more performance per Watt. Because of an provides high-performance network I/O with reduced CPU efficient data movement pattern on the memory bus (zero- load and significantly lower end-to-end-latencies for reason- copy), less application involvement, and low OS overhead, ably large message sizes. RDMA data transfers are more energy efficient than traditional socket based transfers [12]. 2. REVISITING SUPERCOMPUTER NET- 2.2 Challenges for RDMA in Clouds WORKING Although promising, RDMA has neither extensive end- Supercomputers are designed to run specialized workloads user experience and expertise nor widespread deployment and are built using customized hardware and interconnects. outside the HPC world. We point out three major chal- The type of workload running on supercomputers, e.g. MPI- lenges in deploying RDMA hardware in commodity cloud based data processing, requires super-fast and efficient net- infrastructure. work I/O to shuffle data across the interconnects. At a first First, to efficiently move data between NIC and applica- glance, it seems apparent that the requirements faced by su- tion buffers, current RDMA technology offloads the trans- percomputers in the past resemble the ones of future com- port stack. The issues related to integration of stateful modity data centers. Thus, a natural next step would be to offload NICs in mature operating systems are well docu- look at the network stack operated by supercomputers and mented [16]. These problems include: consistently main- see if a similar approach could be beneficial for data centers taining shared protocol resources such as port space, rout- as well. RDMA is one networking technology used by super- ing tables or protocol statistics, scalability, security and bug computers. RDMA enables applications to have high band- fixing issues etc. width and low latency access to data by efficiently moving Second, clouds run on commodity data centers. Deploy- data between application buffers. It is built upon the user- ing RDMA in such an environment might be expensive. level networking concept and separates data from the control RDMA requires special adapters with offload capabilities path, as only the latter requires host OS involvement. An like RNICs. The usability of those adapters, however, is important aspect of RDMA when compared to socket-based limited to a couple of years at best since processor and soft- networking is that RDMA has rich asynchronous communi- ware advancements are catching up rapidly. cation semantics, that helps in overlapping communication Third, RDMA technology has often been criticized for with computation for better resource utilization. its complex and inflexible host resource management [10]. RDMA applications are required to be completely aware 2.1 Commodity Clouds and RDMA of their resource needs. System resources such as memory, Clouds run on data centers built from commodity hard- queue pairs etc. are then statically committed on connec- ware and perform on-demand, flexible and scalable resource tion initialization. This goes against the cloud philosophy allocation for the applications. Resource utilization becomes of flexible and on-demand resource allocation. an important factor to cloud efficiency because hardware resources (e.g. CPU, memory, and networks) are being mul- 3. RDMA IN SOFTWARE tiplexed among many virtual machines. Due to the econ- Despite challenges we argue that a major subset of RDMA omy of scale, inefficiencies in resource usage can potentially benefits still can be provided to commodity data center envi- eclipse large gains from the clouds. In the following we look ronment using only software-based stacks. In this section we at CPU usage, latency and energy consumption in data cen- present a case for the software-based RDMA stack, discuss ters and discuss how RDMA can improve the performance its advantages, and why it is a good match for the clouds. in those cases. First, there are currently several networks stacks sup- CPU Usage: Network I/O can cost a lot of CPU cycles [3]. porting RDMA communications semantics. iWARP [11] Applications in clouds require better support from the OS to (Internet Wide Area RDMA Protocol) being one of them, efficiently transfer large quantities of data. RDMA-enabled enables RDMA transport on IP networks. In contrast to NICs (RNICs) have sufficient information to completely by- other RDMA stacks like Infiniband, iWARP uses TCP/IP or pass the host OS and directly move data between application SCTP/IP as end-to-end transport. This enables RDMA on buffers and the network without any intermediate copies.

A Case for RDMA in Clouds: Turning Supercomputer Networking Into Commodity

Strategic Use of the Internet and E-Commerce: Cisco Systems

Networking Hardware: Absolute Beginner's Guide T Networking, 3Rd Edition Page 1 of 15

Computer Networking in Nuclear Medicine

An Extensible System-On-Chip Internet Firewall

Cisco Systems (A): Evolution to E-Business

Ch05-Hardware.Pdf

UNIT :II Hardware and Software Requirements for E-Commerce Web Server Meaning • It Refers to a Common Computer, Which Provides

Review Article an Overview of Multiple Sequence Alignments and Cloud Computing in Bioinformatics

High-Performance Computing (HPC) What Is It and Why Do We Care?

Path Computation Enhancement in SDN Networks

The Lee Center for Advanced Networking CALIFORNIA INSTITUTE of TECHNOLOGY 2 the Lee Center Table of Contents

A Layman's Guide to Layer 1 Switching