Extending the Compactpci Bus Architecture with Infiniband by Chris Eddington
Total Page:16
File Type:pdf, Size:1020Kb
TECHNOLOGY FEATURE Extending the CompactPCI bus architecture with InfiniBand By Chris Eddington InfiniBand technology is a powerful providing chip-to-chip, board-to-board, I Management infrastructure support- new architecture designed to support and chassis-to-chassis interconnects ing fault tolerance, failover, and hot I/O connectivity for advanced Internet based on industry standards. But is it swap functions data center and enterprise infrastruc- really suited for all of these applica- ture deployment. The InfiniBand archi- tions? Is it better than PCI? Does it meet This overview will briefly cover these tecture was developed to overcome the the needs of the future? And if so, how InfiniBand features, keeping in mind I/O bottleneck in today’s server archi- does it assist in the transition? These the application for embedded systems, tectures and is broadly supported by are important questions that system ven- “lightweight” implementations, and the server, storage, and communica- dors must consider. Here is a look at migration from PCI. Although the archi- tions industries. Although its primary InfiniBand adoption in the server mar- tecture has a rich feature set, it is flexible motivation is for next-generation ket and the impact it will have on and has many options that may not be server I/O, InfiniBand will extend its CompactPCI. applicable to all designs. The reader may benefits to the embedded computing consult the references for more details. and telecommunications industries. InfiniBand architecture overview For the first time, a high volume, InfiniBand is a switch-based point- Physical layer industry standard I/O interconnect to-point interconnect architecture. Each The InfiniBand specification defines a is available that extends the role of individual link is based on a four-wire very flexible and scalable physical layer traditional backplane and board buses 2.5 Gbits/sec bidirectional connection that provides room for further growth beyond the physical connector for with a flexible choice of media and scal- in speed and media. These features additional functionality. And this able link speeds. The architecture de- include: technology is designed from the ground fines a layered hardware protocol (phy- up for RAS: reliability, availability, sical, link, network, transport layers) as I Currently defined signal rates of and serviceability. well as a software layer to support fabric 2.5 Gbits/sec(1X), 10 Gbits/sec(4X), management and low latency communi- 30 Gbits/sec(12X) About seven years ago, the embedded cation between devices. Some of the I Low wire count: 4, 16, 48 wires computing and telecommunication in- key features are: for 1X, 4X, 12X dustry chose a new interconnect archi- I Embedded clock and control tecture as the basis of its technology, the I Scalable link speeds starting at I Scalable clock frequency PCI bus. Researchers evaluated and 2.5 Gbits/sec over PCB, copper or I 8B/10B encoding chose this architecture for several rea- fiber cable I Media: copper (for economy) and sons including bandwidth, interop- I Packet-based, switched optical (for distance) erability with a variety of platforms/ communication with data integrity processors, software/application devel- and flow control InfiniBand has defined the electrical opment, and the economies of choosing I Quality of Service (QoS) and mechanical characteristics for sev- a PC-based solution. I Flexible hardware transport eral different media listed in Table 1. mechanisms As the industry again begins to evaluate I Optimized software interface and *InfiniBand defines a backplane con- new switched fabric architectures that Remote Direct Memory Access nector for its chassis specification. How- will meet its needs for years to come, (RDMA) ever, there are many available connec- systems vendors must look at technolo- gies that provide these attributes and Media Transceiver Connector Distance more, while determining how best to Board and Backplane Differential Pair SpeedPac* 20-30 Inches* preserve the investments made with Copper Cable Differential Pair HSSDC2 17 Meters PCI. Multimode Fiber SW (850nm) Fiber LC Duplex 300 Meters Clearly InfiniBand is the logical succes- Singlemode Fiber LW (1350nm) Fiber LC Duplex 10 Kilometers sor to the PCI bus in the computing server architecture and it is unique in Table 1 Copyright 2001 CompactPCI Systems Reprinted from CompactPCI Systems / September 2001 tors that meet the cross talk and imped- Packet switching is used to protect against buffer over- ance requirements of the InfiniBand InfiniBand has defined a flexible set of flow at an endpoint connection that may physical layer. Distance depends on packet forwarding features that allow be multiple hops away. Each receiving other factors such as PCB material, lay- developers to tune an InfiniBand SAN end of a link/connection supplies cred- out practices, use of repeaters, etc. for different system requirements. its to the sending device to specify the These include: amount of data that can be reliably InfiniBand network components received. Data is not transmitted unless An InfiniBand System Area Network I Variable size packet size up to a the receiver advertises credits indicating (SAN) has four basic system compo- maximum (defined as maximum receive buffer space is available. Credit nents that interconnect using InfiniBand transfer unit or MTU) passing between each device is built links as shown in Figure 1. These are: I Choice of MTU size: 256, 512, 1K, into the link and connection protocols to 2K, and 4K bytes guarantee reliable flow control opera- I Host Channel Adapter (HCA) – I A minimum Layer 2 packet header tion. Link-level flow control is handled terminates connection for a host is defined for local switching (LRH) on a per Virtual Lane (VL) basis (dis- node. It includes hardware features I Optional Layer 3 header for global cussed next). to support high performance mem- routing (GRH) ory transfers into CPU memory. I Multicast support Quality of service I Target Channel Adapter (TCA) – I Variant and Invariant Checksums InfiniBand supports QoS through VLs. terminates connection for a periph- (VCRC and ICRC) for data These VLs are separate logical commu- eral node. It defines a subset of integrity nication links that share a single physi- HCA functionality and can be opti- cal link. Each VL has its own buffer and mized for embedded applications. The choice of MTU sizes enables the flow control mechanism for each port I Switch – handles link layer packet control of system characteristics such as in a switch. InfiniBand allows up to 15 forwarding. A switch does not con- packet jitter, encapsulation overhead, general purpose VLs to be supported, sume or generate packets other than and latency characteristics, aiding in the plus one additional lane dedicated for management packets. development of multi-protocol systems. management traffic. I Router – routes packets between The ability to omit global routing infor- subnets. InfiniBand routers divide mation for local subnet destinations QoS is realized at the link layer by iso- InfiniBand networks into subnets reduces the overhead of local communi- lating traffic congestion to individual and do not consume or generate cation. The VCRC is recalculated at VLs. Here’s a simple example of QoS packets other than management each hop and the ICRC is calculated at involving two types of traffic on the packets. the packet destination providing link same fabric: non-real-time data such as and end-to-end data integrity. IP traffic or storage backup sessions is A subnet manager is required to run on coupled with real-time traffic for voice each subnet and handles device and Flow control or multi-media. The system manager connection management tasks. A subnet InfiniBand defines two levels of credit- can assign each type of traffic to a dif- manager may run on a host or embedded based flow control to manage conges- ferent VL, such as VL1 for data and in switches and routers. All system com- tion: link-level flow control and end- VL2 for voice, and give higher priority ponents must include a Subnet Manage- to-end flow control. Link-level flow to the voice traffic. When data traffic ment Agent (SMA) required for handling control applies back pressure to traffic becomes congested due to a large file communication with the subnet manager. on a link, while end-to-end flow control transfer, then the VL1 link buffer begins to fill and flow control kicks in to apply back pressure to the data traffic source. Processor Node CPU CPU CPU If a voice packet (VL2) arrives at this CPU CPU CPU HCA Mem HCA time, it will be scheduled ahead of the Mem HCA congested data traffic. Thus, the voice traffic will still move through the fabric Other IB Subnets Switch Switch Switch Router WANs, LANs with minimal latency. Hardware transport Switch Switch InfiniBand defines a set of transport InfiniBand Fabric services that are implemented in the channel adapter hardware. These ser- TCA Storage I/O I/O Subsystem vices provide a reliable, in-order, con- Switch Chassis Switch Chassis Controller Consoles nection-oriented packet delivery system TCA TCA TCA TCA TCA TCA that is highly efficient compared to soft- I/O I/O I/O I/O I/O I/O ware-implemented transport services. Module Module Module IB Module Module Module Drives TCP/IP, for example, provides reliable packet delivery for Ethernet networks Figure 1 via a list of functions that consume pro- Copyright 2001 CompactPCI Systems Reprinted from CompactPCI Systems / September 2001 TECHNOLOGY FEATURE cessing, such as transmitter data-buffer, Transport Service Description VL QoS sliding-window flow control transmis- sion policy, congestion control algo- Reliable Connection Acknowledged – connection oriented Yes rithms, segmentation and re-assembly, Reliable Datagram Acknowledged – multiplexed Yes and checksum calculation. InfiniBand Unreliable Connection Unacknowledged – connection oriented Yes transport services are efficiently de- signed to be implemented in hardware, Unreliable Datagram Unacknowledged – connectionless Yes offering a significant reduction in mes- Raw Datagram Unacknowledged – connectionless Yes sage latency (see Table 2).