Server I/O Networks Past, Present, and Future Renato John Recio Chief Architect, IBM Eserver I/O IBM Systems Group, Austin, Texas [email protected]

Server I/O Networks Past, Present, and Future Renato John Recio Chief Architect, IBM eServer I/O IBM Systems Group, Austin, Texas [email protected] Abstract • Low latency - the total time required to transfer the first bit of a message from the local application to the remote appli- Enterprise and technical customers place a diverse set of cation, including the transit times spent in intermediate requirements on server I/O networks. In the past, no single switches and I/O adapters. network type has been able to satisfy all of these requirements. As • High throughput - the number of small block transactions a result several fabric types evolved and several interconnects performed per second. The size and rate of small block I/O emerged to satisfy a subset of the requirements. Recently several depends on the workload and fabric type. A first level technologies have emerged that enable a single interconnect to be approximation of the I/O block size and I/O throughput rates required for various I/O workloads can be found in used as more than one fabric type. This paper will describe the “I/O Workload Characteristics of Modern Servers” [4]. requirements customers place on server I/O networks; the various • High bandwidth - the number of bytes per second sup- fabric types and interconnects that have been used to satisfy those ported by the network. Typically, used to gauge perfor- requirements; the technologies that are enabling network mance for large block data transfers. Peek bandwidth convergence; and how these new technologies are being deployed describes the bandwidth provided by a given link; whereas on various network families. Sustained bandwidth describes the actual bandwidths provided after taking application, operating system, I/O adapter, and protocol inefficiencies into account. Again, a General Terms first level approximation of the bandwidth required for var- Design, Experimentation, Performance, Standardization. ious I/O workloads can be found in “I/O Workload Charac- teristics of Modern Servers” [4] and ASCI Purple Statement of Work [1]. Keywords • Host Overhead (a.k.a. efficiency, utilization) - a measure 10 GigE, Cluster, Cluster Networks, Gigabit Ethernet, of the number of cycles expended by the host CPU to per- InfiniBand, I/O Expansion Network, IOEN, iONIC, iSER, form an I/O operation. iSCSI, LAN, PCI, PCI Express, RDMA, RNIC, SAN, Socket • Memory Overhead (a.k.a. efficiency, utilization) - a mea- Extensions, and TOE. sure of the number of times data must be moved through host memory, before it reaches the final consumer of the 1. INTRODUCTION TO SERVER I/O data. NETWORKS • Connectivity related requirements Server I/O networks are used by servers to connect to I/O • Scalability - ability to interconnect from a small number to devices, clients, and other servers. Servers use a wide range of a large number of endpoints. interconnects to attach I/O, from traditional buses (e.g. PCI) to • Distance - ability to interconnect endpoints that span a more complex general purpose networks that incorporate serial wide range (from short to long). links, packet-routing, and switches. • Performance Density - ability to package a larger number of computing resources in a limited space, and is typically 3 2. SERVER I/O REQUIREMENTS measured in performance/ft (/watt). The requirements placed on Server I/O networks are derived • Self-management related requirements from server requirements and include the following: • Self-healing (unscheduled outage protection) - ability to remain available despite the presence of failures. Unsched- • Performance related requirements uled outages can be caused by transient (short duration) or permanent (long duration) faults. For transient faults, server I/O interconnects typically use a combination of data redundancy (e.g. cyclical redundancy checks) and sig- Permission to make digital or hard copies of all or part of this nal quality inspection to detect the fault and use operation work for personal or classroom use is granted without fee retries to recover from the fault. If a fault cannot be recov- provided that copies are not made or distributed for profit or ered through temporal recovery, it is typically considered a commercial advantage and that copies bear this notice and the permanent fault. Server I/O interconnects typically recover full citation on the first page. To copy otherwise, or republish, from permanent faults through the use of redundant paths to post on servers or to redistribute to lists, requires prior and path switchover that reroute traffic through faultless specific permission and/or a fee. paths. SIGCOMM'03, August 25-29, 2003, Karlsruhe, Germany. • Self-configuring (scheduled outage protection) - ability to Copyright 2003 remain available during the addition, modification, or dele- tion of server I/O interconnects resources (e.g. links, ACM 1-58113-735-4/03/0008...$5.00. switches, bridges, and other endpoints). • Self-optimizing (Service Level Agreement, Quality of Ser- proliferated. Figure1 shows a typical large server topology vice) - ability to provide a sustained level of service and the various Server I/O Fabric Types. despite increases in fabric or resource utilization. For Figure 1.Server I/O Networks server I/O interconnects, several key self-optimizing requirements are: uP, $ uP, $ uP, $ uP, $ uP, $ uP, $ uP, $ uP, $ Memory Memory Cluster Area Memory Memory Fabric congestion management - static or dynamic Controller Network Controller mechanisms used to manage fabric utilization peaks. I/O Expansion I/O Expansion Network Storage Area Network I/O Network Bridge I/O Service differentiation management - mechanisms Attachment Switch Attachment I/O I/O Virtual used to match the performance levels associated with Virtual Virtual AdapterVirtual Virtual Adapter VirtualAdapter . Adapter. Adapter VirtualAdapter Local Area Virtual Adapter a given service to the performance levels provided by VirtualAdapter Virtual I/O Adapter Network Adapter I/O the end point and fabric. I/O I/O Server Packaging Envelope Router, Server Packaging Envelope Router, Gateway Router, • Self-protecting (a.k.a. secure) - ability to protect against Gateway Gateway unauthorized access. WAN/MAN Access Access Gateway Gateway • Cost related requirements User Access User Networks User Devices Devices • Infrastructure build-up cost - amount of infrastructure change required to support a given interconnect. Includes development effort (costs) and customer effort (e.g. new management tools). 3.1 Fabric Types I/O Expansion Networks (IOENs) - These networks are used • Standardization - interconnect protocols and interfaces to connect the host processor complex to I/O adapters through defined by a specification that is widely accepted and the use of switches and/or bridges. IOENs allow large servers adhered to within the industry. Allows procurement of to attach a large number of I/O adapters, typically through a products from several (to many) different vendors. switched fabric. However, on a small server (esp. a server blade) the IOEN is typically a direct point-to-point • Total Cost of Ownership - Hardware, software, and management costs associated with a given I/O interconnect. interconnect between the memory controller and the I/O Standards interacts with volumes to lower TCO. Bridge/Switch. • Today, IOENs are proprietary and include: IBM’s Remote • Fabric consolidation - ability to solve the needs of more I/O (RIO) network used on pSeries and iSeries servers[25]; than one fabric type, which reduces total cost of ownership. IBM’s Self-Timed Interface used on zSeries servers[12]; For example, fabric consolidation reduces the number of HP’s Superdome I/O network[26]; HP’s ServerNet[13]; and software management products and skills used to manage Silicon Graphics’ XIO[23]. Some of these, for example the system. IBM’s RIO, are switched, link identifier based networks. • Virtualization related requirements I/O Attachment - If the I/O adapter supports the link used by the I/O Expansion Network, then it can be attached directly to • Host virtualization - a mechanism that allows a single the IOEN. If the I/O adapter does not support the link used by physical host to run multiple, independent operating sys- the IOEN, then it is attached through an IOEN bridge tem images (a.k.a. partitions) concurrently and provides access protection between the operating system images. • Today, server I/O adapters primarily attach through a standard I/O bus (PCI family) and are used to either connect • Interconnect virtualization - a mechanism that allows a sin- other fabric types or directly attach storage (e.g. disk, gle physical interconnect to be segmented into multiple, optical, tape) devices. Storage I/O devices today typically virtual interconnects and provides access protection use parallel SCSI (Small Computer Systems Interface), between the virtual interconnects. parallel ATA (Advanced Technology Attachment), or Fibre Channel Arbitrated Loop (FC-AL). Though the industry is • I/O virtualization - a mechanism that enables physical I/O migrating the parallel links to serial (Serial Attached SCSI unit’s resources to be aggregated and managed in shared and Serial ATA). resource pools. Each shared resource pool is associated to one or more host. Only hosts that are associated with a Cluster Area Networks (CAN) support message-passing shared resource pool are allowed to access the shared communications between servers in a cluster. Cluster Area resource pool. Networks can be built from low-bandwidth, high-latency standard interconnects to high-bandwidth, low-latency proprietary interconnects. Examples of standard CAN 3. SERVER I/O FABRICS AND interconnects include: Ethernet and, more recently, InfiniBand. INTERCONNECTS Examples of proprietary CAN interconnects include: IBM’s SP fabric [15]; IBM’s InterSystem Channel; Myricom’s Myrinet; Server I/O network requirements conflict to some degree. Quadrics’ QsNet; HP’s ServerNet, and HP’s Hyperfabric [14]. Fully satisfying all the requirements with a single network has Storage Area Networks (SAN) connect servers to storage not been possible in the past, so several “Fabric Types” (on servers (a.k.a.

Server I/O Networks Past, Present, and Future Renato John Recio Chief Architect, IBM Eserver I/O IBM Systems Group, Austin, Texas [email protected]

OES 2 SP3: Novell CIFS for Linux Administration Guide 9.2.1 Mapping Drives from a Windows 2000 Or XP Client

Shared Resource Matrix Methodology: an Approach to Identifying Storage and Timing Channels

Shared Resource

Resource Management in Multimedia Networked Systems

Novell Cluster Services,. for Linux. and Netware

Novell® Platespin® Recon 3.7.4 User Guide 5.6.4 Printing and Exporting Reports

Characterizing Peer-Level Performance of Bittorrent

NFS Gateway for Netware 6 Administration Guide October 22, 2003

Casual Resource Sharing with Shared Virtual Folders

A Pattern Language for Designing Application-Level Communication Protocols and the Improvement of Computer Science Education Through Cloud Computing

Architecture Layers Document

Security and Resource Sharing in an Nt Environment