Infiniband and 10-Gigabit Ethernet for Dummies

Infiniband and 10-Gigabit Ethernet for Dummies

Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Ethernet A Tutorial at CCGrid ’11 by Dhabaleswar K. (DK) Panda Sayantan Sur The Ohio State University The Ohio State University E-mail: [email protected] E-mail: [email protected] http://www.cse.ohio-state.edu/~panda http://www.cse.ohio-state.edu/~surs Presentation Overview • Introduction • Why InfiniBand and High-speed Ethernet? • Overview of IB, HSE, their Convergence and Features • IB and HSE HW/SW Products and Installations • Sample Case Studies and Performance Numbers • Conclusions and Final Q&A CCGrid '11 2 Current and Next Generation Applications and Computing Systems • Diverse Range of Applications – Processing and dataset characteristics vary • Growth of High Performance Computing – Growth in processor performance • Chip density doubles every 18 months – Growth in commodity networking • Increase in speed/features + reducing cost • Different Kinds of Systems – Clusters, Grid, Cloud, Datacenters, ….. CCGrid '11 3 Cluster Computing Environment Compute cluster Frontend LAN LAN Storage cluster Compute Meta-Data Meta Node Manager Data Compute I/O Server L Data Node A Node Compute N I/O Server Node Node Data Compute I/O Server Node Node Data CCGrid '11 4 Trends for Computing Clusters in the Top 500 List (http://www.top500.org) Nov. 1996: 0/500 (0%) Nov. 2001: 43/500 (8.6%) Nov. 2006: 361/500 (72.2%) Jun. 1997: 1/500 (0.2%) Jun. 2002: 80/500 (16%) Jun. 2007: 373/500 (74.6%) Nov. 1997: 1/500 (0.2%) Nov. 2002: 93/500 (18.6%) Nov. 2007: 406/500 (81.2%) Jun. 1998: 1/500 (0.2%) Jun. 2003: 149/500 (29.8%) Jun. 2008: 400/500 (80.0%) Nov. 1998: 2/500 (0.4%) Nov. 2003: 208/500 (41.6%) Nov. 2008: 410/500 (82.0%) Jun. 1999: 6/500 (1.2%) Jun. 2004: 291/500 (58.2%) Jun. 2009: 410/500 (82.0%) Nov. 1999: 7/500 (1.4%) Nov. 2004: 294/500 (58.8%) Nov. 2009: 417/500 (83.4%) Jun. 2000: 11/500 (2.2%) Jun. 2005: 304/500 (60.8%) Jun. 2010: 424/500 (84.8%) Nov. 2000: 28/500 (5.6%) Nov. 2005: 360/500 (72.0%) Nov. 2010: 415/500 (83%) Jun. 2001: 33/500 (6.6%) Jun. 2006: 364/500 (72.8%) Jun. 2011: To be announced CCGrid '11 5 Grid Computing Environment Compute cluster Compute cluster LAN Frontend Frontend LAN LAN LAN Storage cluster WAN Storage cluster Compute Meta-Data Meta Compute Meta-Data Meta Node Manager Data Node Manager Data Compute L I/O Server Compute I/O Server Data L Data Node A Node Node A Node Compute N I/O Server Compute N I/O Server Data Node Node Node Node Data Compute I/O Server Compute I/O Server Data Node Node Node Node Data Compute cluster Frontend LAN LAN Storage cluster Compute Meta-Data Meta Node Manager Data Compute I/O Server L Data Node A Node Compute N I/O Server Node Node Data Compute I/O Server Node Node Data CCGrid '11 6 Multi-Tier Datacenters and Enterprise Computing Enterprise Multi-tier Datacenter Routers/ Application Database Servers Server Server Routers/ Application Database Servers Server Server Switch Switch Switch Routers/ Application Database Servers Server Server . Routers/ Application Database Servers Server Server Tier1 Tier2 Tier3 CCGrid '11 7 Integrated High-End Computing Environments Compute cluster Storage cluster Compute Meta-Data Meta Node Manager Data Compute I/O Server L Data Node A Node Frontend LAN Compute N I/O Server Node Node Data Compute I/O Server LAN/WAN Node Node Data Enterprise Multi-tier Datacenter for Visualization and Mining Routers/ Application Database Servers Server Server Routers/ Application Database Server Servers Switch Switch Server Switch Routers/ Application Database Servers Server Server . Routers/ Application Database Servers Server Server Tier1 Tier2 Tier3 CCGrid '11 8 Cloud Computing Environments VM VM Physical Machine Meta Meta-Data Local Storage Data VM VM Virtual FS Virtual I/O Server Data Physical Machine Local Storage LAN I/O Server Data I/O Server Data VM VM Physical Machine I/O Server Data VM VM Local Storage Physical Machine Local Storage CCGrid '11 9 Hadoop Architecture • Underlying Hadoop Distributed File System (HDFS) • Fault-tolerance by replicating data blocks • NameNode: stores information on data blocks • DataNodes: store blocks and host Map-reduce computation • JobTracker: track jobs and detect failure • Model scales but high amount of communication during intermediate phases CCGrid '11 10 Memcached Architecture System System Area Area Network Network Internet Web-frontend Memcached Database Servers Servers Servers • Distributed Caching Layer – Allows to aggregate spare memory from multiple nodes – General purpose • Typically used to cache database queries, results of API calls • Scalable model, but typical usage very network intensive CCGrid '11 11 Networking and I/O Requirements • Good System Area Networks with excellent performance (low latency, high bandwidth and low CPU utilization) for inter-processor communication (IPC) and I/O • Good Storage Area Networks high performance I/O • Good WAN connectivity in addition to intra-cluster SAN/LAN connectivity • Quality of Service (QoS) for interactive applications • RAS (Reliability, Availability, and Serviceability) • With low cost CCGrid '11 12 Major Components in Computing Systems • Hardware components P0 – Processing cores and memory Core0 Core1 Memory subsystem Core2 Core3 Processing – I/O bus or links Bottlenecks – Network adapters/switches P1 • Software components Core0 Core1 I Memory – Communication stack Core2 Core3 / O I/O Interface • Bottlenecks can artificially BottlenecksB u limit the network performance s the user perceives Network Adapter Network NetworkBottlenecks Switch CCGrid '11 13 Processing Bottlenecks in Traditional Protocols • Ex: TCP/IP, UDP/IP • Generic architecture for all networks P0 • Host processor handles almost all aspects Core0 Core1 Memory Core2 Core3 Processing of communication Bottlenecks P1 – Data buffering (copies on sender and Core0 Core1 I Memory receiver) Core2 Core3 / O – Data integrity (checksum) B u – Routing aspects (IP routing) s Network Adapter • Signaling between different layers – Hardware interrupt on packet arrival or Network Switch transmission – Software signals between different layers to handle protocol processing in different priority levels CCGrid '11 14 Bottlenecks in Traditional I/O Interfaces and Networks • Traditionally relied on bus-based P0 Core0 Core1 Memory technologies (last mile bottleneck) Core2 Core3 – E.g., PCI, PCI-X P1 Core0 Core1 – One bit per wire I Memory Core2 Core3 / – Performance increase through: O I/O Interface BottlenecksB • Increasing clock speed u • Increasing bus width s Network Adapter – Not scalable: • Cross talk between bits Network Switch • Skew between wires • Signal integrity makes it difficult to increase bus width significantly, especially for high clock speeds PCI 1990 33MHz/32bit: 1.05Gbps (shared bidirectional) 1998 (v1.0) 133MHz/64bit: 8.5Gbps (shared bidirectional) PCI-X 2003 (v2.0) 266-533MHz/64bit: 17Gbps (shared bidirectional) CCGrid '11 15 Bottlenecks on Traditional Networks P0 Core0 Core1 Memory • Network speeds saturated at Core2 Core3 around 1Gbps P1 Core0 Core1 I Memory – Features provided were limited Core2 Core3 / O – Commodity networks were not B u considered scalable enough for very s large-scale systems Network Adapter Network NetworkBottlenecks Switch Ethernet (1979 - ) 10 Mbit/sec Fast Ethernet (1993 -) 100 Mbit/sec Gigabit Ethernet (1995 -) 1000 Mbit /sec ATM (1995 -) 155/622/1024 Mbit/sec Myrinet (1993 -) 1 Gbit/sec Fibre Channel (1994 -) 1 Gbit/sec CCGrid '11 16 Motivation for InfiniBand and High-speed Ethernet • Industry Networking Standards • InfiniBand and High-speed Ethernet were introduced into the market to address these bottlenecks • InfiniBand aimed at all three bottlenecks (protocol processing, I/O bus, and network speed) • Ethernet aimed at directly handling the network speed bottleneck and relying on complementary technologies to alleviate the protocol processing and I/O bus bottlenecks CCGrid '11 17 Presentation Overview • Introduction • Why InfiniBand and High-speed Ethernet? • Overview of IB, HSE, their Convergence and Features • IB and HSE HW/SW Products and Installations • Sample Case Studies and Performance Numbers • Conclusions and Final Q&A CCGrid '11 18 IB Trade Association • IB Trade Association was formed with seven industry leaders (Compaq, Dell, HP, IBM, Intel, Microsoft, and Sun) • Goal: To design a scalable and high performance communication and I/O architecture by taking an integrated view of computing, networking, and storage technologies • Many other industry participated in the effort to define the IB architecture specification • IB Architecture (Volume 1, Version 1.0) was released to public on Oct 24, 2000 – Latest version 1.2.1 released January 2008 • http://www.infinibandta.org CCGrid '11 19 High-speed Ethernet Consortium (10GE/40GE/100GE) • 10GE Alliance formed by several industry leaders to take the Ethernet family to the next speed step • Goal: To achieve a scalable and high performance communication architecture while maintaining backward compatibility with Ethernet • http://www.ethernetalliance.org • 40-Gbps (Servers) and 100-Gbps Ethernet (Backbones, Switches, Routers): IEEE 802.3 WG • Energy-efficient and power-conscious protocols – On-the-fly link speed reduction for under-utilized links CCGrid '11 20 Tackling Communication Bottlenecks with IB and HSE • Network speed bottlenecks • Protocol processing bottlenecks • I/O interface bottlenecks CCGrid '11 21 Network Bottleneck Alleviation: InfiniBand (“Infinite Bandwidth”) and High-speed Ethernet (10/40/100 GE) • Bit serial

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    150 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us