INFINIBAND 97 Ment Without CPU Interaction

Cluster Computing 6, 95–104, 2003 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. InfiniBand: The “De Facto” Future Standard for System and Local Area Networks or Just a Scalable Replacement for PCI Buses? TIMOTHY MARK PINKSTON ∗ University of Southern California ALAN F. BENNER IBM Corporation MICHAEL KRAUSE Hewlett Packard IRV M. ROBINSON Intel Corporation THOMAS STERLING California Institute of Technology Abstract. InfiniBand is a new industry-wide general-purpose interconnect standard designed to provide significantly higher levels of reliability, availability, performance, and scalability than alternative server I/O technologies. After more than two years since its official release, many are still trying to understand what are the profitable uses for this new and promising interconnect technology, and how this technology might evolve. In this article, we provide a summary of several industry and academic perspectives on this issue expressed during a panel discussion at the Workshop for Communication Architecture for Clusters (CAC), held in conjunction with the International Parallel and Distributed Processing Symposium (IPDPS) in April 2001, in hopes of narrowing down the design space for InfiniBand-based systems. Keywords: InfiniBand, I/O, system area network, data center fabric, interconnection network standard 1. Introduction ture systems, and how it might be improved. In addition, not everyone is enamored with this technology. Some claim it is In an attempt to solve a wide spectrum of problems associ- too expensive, others that it is too complex, still others that it ated with server I/O, many commercial entities worked to- attempts to address too many disparate problems. Moreover, gether to develop an industry-wide general-purpose intercon- some believe that because of the way InfiniBand has been po- nect standard called InfiniBand [1]. InfiniBand was designed sitioned, it directly competes with PCI, Ethernet, Fibre Chan- to provide significantly higher levels of reliability, availabil- nel, and other well-established industry standards and, thus, ity, performance, and scalability than could be achieved with may never be widely accepted. While there is some validity alternative server I/O technology. In October of 2000, the first in some of these claims and beliefs, the reality is that version of the InfiniBand specs were released with much fan- InfiniBand is the first technology to really solve the entire fare. At its release, this non-proprietary, low-overhead point- server I/O problem and much of the high-speed, low-latency to-point communication standard was poised to become the inter-processor communication (IPC) problem within a sin- interconnection network fabric technology on which com- gle, open industry standard specification. modity and high-end servers could be based [2]. Nevertheless, since nature abhors a vacuum, it is likely The first generation of InfiniBand products have appeared, that many vendors will continue to invest in evolutionary ap- and prototype InfiniBand-based clustered applications have proaches to solve some of the same problems addressed by been demonstrated, but it is not yet clear in which areas In- InfiniBand. It will take some time for any new technology finiBand technology will be most successfully employed as targeted for the server market to gain a foot-hold – many be- it matures. Since its release, many realize that InfiniBand is lieving that 2003/2004 will be the time frame at which Infini- not a panacea and was never meant to be one. There has been Band could really start to take flight. More important than much effort put towards understanding just what this tech- when, however, is the question of where does it make sense nology is best used for, how it should be integrated into fu- to deploy this new technology? What will be the possible ap- ∗ Corresponding author. plication areas for InfiniBand: for I/O interconnect, system E-mail: [email protected] area network (SAN), storage area network (STAN), or local 96 PINKSTON ET AL. Figure 1. Conceptual diagram of InfiniBand’s layered architecture. area network (LAN) application? Is it useful only for IPC or might it also be useful as a unified network fabric (backbone) in servers, server clusters, and data centers? Is there inter- esting research to be done on InfiniBand architecture? Will InfiniBand have a significant impact on the way in which future systems are designed or might it have only limited impact like some of its predecessors, i.e., VIA, SCI, etc.? These and other such questions were raised and debated during a panel discussion at the CAC Workshop, held in conjunction with IPDPS’15. As with many such panel discus- sions, a wide variety of views were expressed, with a variety of similarities as well as disagreements between them. This paper represents an attempt to summarize and clarify the var- (a) ious converging and conflicting perspectives shared during that workshop to help narrow the possible design space for InfiniBand-based systems. 2. InfiniBand overview InfiniBand is a layered architecture that provides physical, data link, network, and transport layer services (see figure 1). At the physical and data link layers, its switch-based architecture allows for richly connected, arbitrary topologies to be configured with some degree of flexibility in routing across logical and physical channels. It provides scalable increased I/O bandwidth for driving I/O at link rates from 2.5 Gbps to 12 times that rate, increased distance (as compared to PCI) (b) of up to 300 meters, and standardized form factors for sup- Figure 2. Block diagram of an InfiniBand fabric. porting a variety of simple to complex I/O solutions including serial or parallel lines, copper or fiber links, and wide or Discrete message passing via send and receive queue pairs tall modules. It also provides support for traffic prioritiza- (QPs) and completion queue elements (CQEs) is supported, tion, deadlock avoidance, and segregation of traffic classes. as shown in figure 2. Its programming model is derived from At the network and transport layers, it provides various types Virtual Interface Architecture (VIA) [3], however InfiniBand of connection-oriented and datagram communication services is intended to enable the most efficient interface possible between consumers at network endpoints, including remote between a message passing interconnection network and a direct memory access (RDMA) and atomic operations. It also server’s memory controller to facilitate highly efficient data provides standardized fabric management services, fault iso- transfers. For example, data movement is via DMA, sched- lation/containment, and reliability functions. uled by fabric-connected devices, which enables data move- INFINIBAND 97 ment without CPU interaction. In support of this, InfiniBand I/O, HyperTransport, and PCI Express (formerly 3GIO) have is defined to make it practical to implement protocol stack caused a fair amount of confusion about the role of InfiniBand processing in ASICs, with a strategy for integration of In- in the I/O arena. finiBand Host Channel Adapters (HCAs) and target channel Some of these I/O fabrics offer bandwidths of 400 Mbps adapters (TCAs) into server chipsets.1 While chipset inte- to 16 Gbps (aggregate, full-duplex application bandwidth), gration of other interconnection networks is certainly pos- inter-rack distances of up to 5 meters, standardized hot plug sible, InfiniBand was conceived to make the process easier and swap capability, high fan-out attachment of multiple and provide the highest performance and efficiency of any cards, and load/store/interrupt semantics which are software non-proprietary alternative. One such efficiency2 has been compatible with traditional PCI-based I/O. This would sug- achieved by the use of a message passing network which can gest prolonged usage of these I/O fabrics in future server sys- be used for nearly every kind of server I/O, perhaps making tems. Nevertheless, there are a few very important capabili- I/O buses like PCI superfluous. It is this application – as a ties that InfiniBand offers that these multi-drop and switched single fabric for server I/O use – that causes the greatest spec- I/O fabrics do not. Among these are protection, partitioning, ulation regarding its role juxtaposed to other standard alterna- operating system (OS) by-pass, and transport level features. tives, such as PCI (for I/O) and Ethernet (for IPC). This issue PCI-based I/O technologies rely on a trust model of the is addressed in the following sections. highest degree since they provide open access to memory. Although many important elements have been specified in Although misbehaving PCI devices could be relatively rare, the standard, some details have been left for vendor innova- the potential for intentional or unintentional user corruption tion or have not been specified in the current version, possibly in large database servers, for example, is unacceptable. The left for future improvement of the standard. For example, at problem increases in scope when one considers the grow- the higher layers, operations over InfiniBand are strategically ing functionality and complexity of the PCI-connected de- specified at a functional level using verbs in a vendor-neutral, vices, which makes the possibility of errant operations even operating-system-independent way. As application program- greater. This is only one of many such deficiencies inher- mer interfaces (APIs) are included in the operating system, ent to PCI-based I/O architectures. InfiniBand separates it- it is up to operating system vendors to decide how the verbs self from the PCI comparison by having additional I/O func- should be mapped to particular operating systems to support tionality features. Among these are a sophisticated virtual various APIs. While this level of specifying things purpose- memory protection scheme (using registration), atomic and fully allows for vendor differentiation, some details are not remote memory access, protection key-based fabric partition- specified at any level.

Load more