Hypertransport Over Ethernet - a Scalable, Commodity Standard for Resource Sharing in the Data Center

Proceedings of the Second International Workshop on HyperTransport Research and Applications (WHTRA2011) Feb. 09, 2011, Mannheim, Germany HyperTransport Over Ethernet - A Scalable, Commodity Standard for Resource Sharing in the Data Center Jeffrey Young, Sudhakar Yalamanchili∗ Brian Holden, Mario Cavalli Georgia Institute of Technology HyperTransport Consortium [email protected], [email protected] fbrian.holden, [email protected] Paul Miranda AMD [email protected] Abstract 1. Introduction HyperTransport interconnect technology has been Future data center configurations are driven by total in use for several years as a low-latency interconnect for cost of ownership (TCO) for specific performance ca- processors and peripherals [9] [7] and more recently as pabilities. Low-latency interconnects are central to per- an off-chip interconnect using the HTX card [5]. How- formance, while the use of commodity interconnects ever, HyperTransport adoption for scalable cluster solu- is central to cost. This paper reports on an effort to tions has typically been limited by the number of avail- combine a very high-performance, commodity intercon- able coherent connections between AMD processors (8 nect (HyperTransport) with a high-volume interconnect sockets) and by the need for custom HyperTransport (Ethernet). Previous approaches to extending Hyper- connectors between nodes. Transport (HT) over a cluster used custom FPGA cards The HyperTransport Consortium’s new Hyper- [5] and proprietary extensions to coherence schemes Share market strategy has presented three new options [22], but these solutions mainly have been adopted for for building scalable, low-cost cluster solutions using use in research-oriented clusters. The new HyperShare HyperTransport technology: 1) HyperTransport-native strategy from the HyperTransport Consortium proposes torus-based network fabric using PCI Express-enabled several new ways to create low-cost, commodity clus- network interface cards implementing the HyperTrans- ters that can support scalable high performance com- port High Node Count specification [14], 2) Hyper- puting in either clusters or in the data center. Transport encapsulated into InfiniBand physical layer packets, and 3) HyperTransport encapsulated into Eth- HyperTransport over Ethernet (HToE) is the ernet physical layer packets. These three approaches newest specification in the HyperShare strategy that provide different levels of advantages and trade-offs aims to combine favorable market trends with a high- across the spectrum of cost and performance. This bandwidth and low-latency hardware solution for non- paper describes the encapsulation of HyperTransport coherent sharing of resources in a cluster. This paper packets into Ethernet, thereby leveraging the cost and illustrates the motivation behind using 10, 40, or 100 performance advantages of Ethernet to enable sharing Gigabit Ethernet as an encapsulation layer for Hyper- of resources and (noncoherent) memory across future Transport, the requirements for the HToE specification, data centers. More specifically, this paper describes key and engineering solutions for implementing key por- aspects of the HyperTransport over Ethernet (HToE) tions of the specification. specification that is part of the HyperShare strategy. In the following sections, we describe 1) the motivation for using HToE in both the HPC and data center arenas, 2) challenges facing the encapsulation of ∗This research was supported in part by NSF grant CCF-0874991, and Jeffrey Young was supported by a NSF Graduate Research Fel- HT packets over Ethernet, 3) an overview of the major lowship components of this specification, and 4) use cases that 1 Proceedings of the Second International Workshop on HyperTransport Research and Applications (WHTRA2011) Feb. 09, 2011, Mannheim, Germany demonstrate how this new specification can be utilized cessible to the application software layers without hav- for resource sharing in high node count environments. ing to engage higher overhead legacy software protocol stacks that can add microseconds of latency [4] [23]. 2. The Motivation for HToE: Trends in In- The HToE specification described here is a step towards terconnects that goal, since it focuses on using Layer 2 (L2) packets and a global address space memory model to reduce dependencies on software and OS-level techniques in The past ten years in the high-performance com- performing remote memory accesses. puting world have seen dramatic decreases in off-chip latency along with increases in available off-chip bandwidth, due largely to the introduction of commodity 2.2. Cost and Market Share networking technologies like InfiniBand and 10 Giga- bit Ethernet (10GE) from companies such as Myrinet While 10 Gigabit Ethernet has had a relatively slow and Quadrics. Arguably, InfiniBand has made the most adoption rate in the past few years, it should be noted inroads in the high-performance computing space, with that 1 Gigabit and 10 Gigabit Ethernet still have a 45.6% InfiniBand composing 42.6% of the fabrics for clusters share of the Top 500 Supercomputing list [18], with a on the current Top 500 Supercomputing list [18]. majority of these installations still using 1 Gigabit Eth- At the same time, Ethernet has evolved as a lower- ernet. This indicates that cost plays an important role cost and “software-friendly” alternative that enjoys in the construction of computational clusters on this higher volumes. The ability to integrate HT over Ether- list (for example, for market analysis and geological net would enjoy significant infrastructure and operating data analysis in the mineral and natural resource indus- cost advantages in data center applications and certain tries). Additionally, networks composed of 1 and 10 segments of the high-performance marketplace. Gigabit Ethernet also have a dominant position in high- performance web server farms. Part of this widespread 2.1. Performance market share is due to the low cost of Gigabit Ether- net and falling cost of 10 Gigabit Ethernet as well as The ratification of the 10 Gigabit Ethernet standard the management and operational simplicity of Ethernet in 2002 [1] has led to its adoption in data centers and the networks. high-performance community. Woven Systems (now However, it should also be noted that InfiniBand Fortinet) in 2007 demonstrated that 10 Gigabit Ethernet still enjoys a price and power advantage over 10 and 40 with TCP offloading can compete in terms of perfor- Gbps Ethernet due to being first to market. A 40 Gbps, mance with SDR InfiniBand, with both fabrics demon- 36 port InfiniBand switch now costs around $6,500 and strating latencies in the low microseconds during a San- has a typical power dissipation of 226 Watts [8], while a dia test [28]. In addition, switch manufacturers have 10 Gbps, 48 port Ethernet switch costs around $20,900 built 10 Gigabit Ethernet devices with latencies in the and has a power dissipation of 360 Watts. low hundreds of nanoseconds [11] [31]. Recent tests One of the strongest factors for using Ethernet is with iWARP-enabled 10GE adapters have shown laten- the trend toward converged networks, driven in large cies that are on the order of 8-10 microseconds, as com- part by the need to lower the total cost of ownership pared to similar InfiniBand adapters with latencies of 4- (TCO). For example, Fibre Channel (FC) has been the 6 microseconds [12]. More recent tests have confirmed de facto high-performance standard for SANs for the that 10 Gigabit Ethernet latency for MPI with iWARP past 15 years. The technical committee behind FC has is in the range of 8 microseconds [20]. been a major proponent of convergence in the data cen- These latencies already are low enough to support ter with their introduction of the Fibre Channel over the needs of many high-throughput applications, such Ethernet (FCoE) standard [15]. This standard relies on as retail forecasting and many forms of financial analy- several new IEEE Ethernet standards that are collec- sis which typically require end-to-end packet latencies tively referred to as either Data Center Bridging (DCB) in the range of a few microseconds. The new IEEE or Converged Enhanced Ethernet (CEE) and are de- 802.3ba standard for 40 and 100 Gbps Ethernet also scribed in more detail in Section 3.3. The approval of aims to make Ethernet more competitive with Infini- this standard and subsequent adoption by hardware ven- Band. Although full-scale adoption is likely to take sev- dors bodes well for the continued usage of Ethernet in eral years, there are already some early products that data centers and smaller high-performance clusters. support 100 Gigabit Ethernet [25]. Possibly one of the best indicators of the future The challenge with using these lower-latency fab- market share for Ethernet as a high-performance data rics is in making these lower hardware latencies ac- center and cluster fabric is the willingness of competi- 2 Proceedings of the Second International Workshop on HyperTransport Research and Applications (WHTRA2011) Feb. 09, 2011, Mannheim, Germany tors to embrace and extend Ethernet technologies. Two examples are the creation of high-performance Ethernet switches [24] and the development of RDMA over Con- verged Ethernet (RoCE) [3], which has been referred to by some as “InfiniBand over Ethernet” since it utilizes the InfiniBand verbs and transport layer with a DCB Ethernet link layer and physical network. 2.3. Scalability Figure 1. HyperTransport Over Ethernet Layers As the most prevalent commodity interconnect technology in previous generation data centers, there has been considerable effort devoted to constructing scalable Ethernet fabrics for data centers. For instance, the barrier to entry in using HyperTransport over Ether- consider the use of highly scalable fat tree networks for net is low in most cases, and using converged Ethernet data centers using 10 Gigabit Ethernet [27], while net- negates the need for a custom sharing fabric like NU- work vendors have already embraced the in-progress MAlink [16] or additional cabling for an InfiniBand or standards for Data Center Bridging as a way to cre- other custom network.

Load more