High-End HPC Architectures

Total Page:16

File Type:pdf, Size:1020Kb

High-End HPC Architectures High-end HPC architectures Mithuna Thottethodi School of Electrical and Computer Engineering Purdue University What makes a Supercomputer a Supercomputer • Top 500 (www.top500.org) – Processor family: Intel/AMD/Power families • 96% of top 500 – Operating systems • Linux/Unix/BSD dominate –Scale • Range from 128-128K processors – Interconnect • Also varies significantly • Is this really surprising? • Better interconnect can scale to more CPUs #1 and #500 over time The Pyramid • 3 of top 5 and 13 of top 50 - BlueGene solutions Number ofSystems • MPPs (60% of top 50 vs Cost/Performance 21% of top 500) • Clusters (~75%) – with High-performance interconnect (#8, #9 on top 10) – with Gigabit Ethernet (41% of top 500, only 1 in top 50) Outline • Interconnects? – Connectivity: MPP vs. cluster – Latency, Bandwidth, Bisection • When is computer A faster than computer B? – Algorithm Scaling, Concurrency, Communication • Other issues – Storage I/O, Failures, Power • Case Studies Connectivity • How fast/slow can the processor get information on/off the network – How far is the on-ramp/the exit from the source/destination? – Can limit performance even if network is fast Massively Parallel Processing (MPP) • Network interface typically Processor close to processor – Memory bus: Cache Network • locked to specific processor Interface architecture/bus protocol – Registers/cache: Memory Bus I/O Bridge • only in research machines Network • Time-to-market is long I/O Bus – processor already available or Main Memory work closely with processor Disk designers Controller • Maximize performance and cost Disk Disk • Numalink, BlueGene network Clusters Processor interrupts • Network interface Cache on I/O bus Core Chip Set • Standards (e.g., PCI-X) => longer Main I/O Bus life, faster to Memory market Disk Graphics Network Controller Controller Interface • Slow to access network interface Disk Disk Graphics Network • Quadrics, Myrinet, Infiniband, GigE, Link Speeds MPI latency Bandwidth per link Technology Vendor usec, short msg (unidirectional, MB/s) NUMAlink 4 (Altix) SGI 1 3200 RapidArray (XD1) Cray 1.8 20001 QsNet II Quadrics 2 9002 Infiniband Voltaire 3.5 8303 High Performance Switch IBM 5 10004 Myrinet XP2 Myricom 5.7 4955 SP Switch 2 IBM 18 5006 Ethernet Various 30 100 Source: http://www.sgi.com/products/servers/altix/numalink.html Topology • Link speeds alone are not sufficient • Topology matters – Bisection • Weakest links • Most likely spot for traffic jams and unnecessary serialization – Not cost-neutral • Cost-performance is important • 64K nodes in BlueGene/L • No node farther than 64 hops from any other Link Speeds MPI latency Bandwidth per link Technology Vendor usec, short msg (unidirectional, MB/s) NUMAlink 4 (Altix) SGI 1 3200 RapidArray (XD1) Cray 1.8 20001 QsNet II Quadrics 2 9002 Infiniband Voltaire 3.5 8303 High Performance Switch IBM 5 10004 Myrinet XP2 Myricom 5.7 4955 SP Switch 2 IBM 18 5006 Ethernet Various 30 100 Source: http://www.sgi.com/products/servers/altix/numalink.html Topology • Link speeds alone are not sufficient • Topology matters – Bisection • Weakest links • Most likely spot for traffic jams and unnecessary serialization – Not cost-neutral • Cost-performance is important • 64K nodes in BlueGene/L • No node farther than 64 hops from any other Outline • Interconnects? – Connectivity: MPP vs. cluster – Latency, Bandwidth, Bisection • When is computer A faster than computer B? – Algorithm Scaling, Concurrency, Communication • Other issues – Storage I/O, Failures, Power • Case Studies Caveat: Methodology • When is computer A faster than computer B? • Before answering the above question • Which is better: a car or a bus? – If metric = cost AND typical payload = 2 • Car wins – If metric = persons delivered per unit time and cost AND typical payload = 30 • Bus wins • Back to the original question Caveat: Methodology • When is computer A faster than computer B? • Top500.org answer – Flops on LINPACK – Rewards scaling and interconnect performance – Other cases? • Application does not scale (~ only 2 people ride) • Application scales even without better interconnect Case 2 • Independent task parallelism – Run 1000 simulations with different parameters – Scenario 1) Run 100 simulations on 100 machines • Repeat 10 times – Scenario 2) Run 200 simulations on 200 machines • Repeat 5 times • Do no need to parallelize application • Do not need MPP – Large cluster adequate – More expensive interconnect : waste of money – Storage I/O may still be bottleneck Case 1 • Application does not scale • Cost exceeds benefit – Cost of increased parallelism – Benefit of concurrency • Need new scalable parallel algorithm or parallelization – Why pay for aggressive machine? Storage • Can compute as fast as data can be fed • Large data-set HPC often disk-bound • Top 500 does not report storage subsystem statistics • Anecdotes of disk array racks being moved physically Miscellany • Interaction of scale and MTTF – 64K processor system: failures every 6 days – Time to repair? • n days • effective performance reduced by 6/(6+n) ? • Power – 1.2MW for 64K node – Cooling • Multicore – Increasing multicore-based platforms in top 500 – Anecdotes of users using only one core • Using other core degrades performance by 20% Case Studies • High end MPP: Blue Gene/L (at LLNL) – #1 for two years (four ranking cycles) – 3 of top 5 – 13 of top 50 • MPP with support for global shared memory – SGI Altix 4700 • Clusters – 74% of top 500 The BlueGene Interconnect •3D Torus – Cube with wraparound links – 64K nodes – No node is more than 64 hops away – Clever tricks to create subtorus • Multiple networks – Global reduce – Tree based network –GigE Source: llnl.gov SGI Altix • Altix 4700 • Global shared memory – Up to 128TB – Very fast MPI performance • Fat-tree topology with Numalink links – At the memory bus (MPP) – Hub chip catches remote loads/stores • Translates them to network traffic • Support for FGPA acceleration – RASC Clusters • Most accessible platform • Great with fast interconnect • Under-representation in top 10 may not be relevant for many application domains – Clusters as fast as BlueGene/L if communication is minimal • Cost-performance leader Question 1 Question 2.
Recommended publications
  • End-To-End Performance of 10-Gigabit Ethernet on Commodity Systems
    END-TO-END PERFORMANCE OF 10-GIGABIT ETHERNET ON COMMODITY SYSTEMS INTEL’SNETWORK INTERFACE CARD FOR 10-GIGABIT ETHERNET (10GBE) ALLOWS INDIVIDUAL COMPUTER SYSTEMS TO CONNECT DIRECTLY TO 10GBE ETHERNET INFRASTRUCTURES. RESULTS FROM VARIOUS EVALUATIONS SUGGEST THAT 10GBE COULD SERVE IN NETWORKS FROM LANSTOWANS. From its humble beginnings as such performance to bandwidth-hungry host shared Ethernet to its current success as applications via Intel’s new 10GbE network switched Ethernet in local-area networks interface card (or adapter). We implemented (LANs) and system-area networks and its optimizations to Linux, the Transmission anticipated success in metropolitan and wide Control Protocol (TCP), and the 10GbE area networks (MANs and WANs), Ethernet adapter configurations and performed sever- continues to evolve to meet the increasing al evaluations. Results showed extraordinari- demands of packet-switched networks. It does ly higher throughput with low latency, so at low implementation cost while main- indicating that 10GbE is a viable intercon- taining high reliability and relatively simple nect for all network environments. (plug and play) installation, administration, Justin (Gus) Hurwitz and maintenance. Architecture of a 10GbE adapter Although the recently ratified 10-Gigabit The world’s first host-based 10GbE adapter, Wu-chun Feng Ethernet standard differs from earlier Ether- officially known as the Intel PRO/10GbE LR net standards, primarily in that 10GbE oper- server adapter, introduces the benefits of Los Alamos National ates only over fiber and only in full-duplex 10GbE connectivity into LAN and system- mode, the differences are largely superficial. area network environments, thereby accom- Laboratory More importantly, 10GbE does not make modating the growing number of large-scale obsolete current investments in network infra- cluster systems and bandwidth-intensive structure.
    [Show full text]
  • Ebook - Informations About Operating Systems Version: August 15, 2006 | Download
    eBook - Informations about Operating Systems Version: August 15, 2006 | Download: www.operating-system.org AIX Internet: AIX AmigaOS Internet: AmigaOS AtheOS Internet: AtheOS BeIA Internet: BeIA BeOS Internet: BeOS BSDi Internet: BSDi CP/M Internet: CP/M Darwin Internet: Darwin EPOC Internet: EPOC FreeBSD Internet: FreeBSD HP-UX Internet: HP-UX Hurd Internet: Hurd Inferno Internet: Inferno IRIX Internet: IRIX JavaOS Internet: JavaOS LFS Internet: LFS Linspire Internet: Linspire Linux Internet: Linux MacOS Internet: MacOS Minix Internet: Minix MorphOS Internet: MorphOS MS-DOS Internet: MS-DOS MVS Internet: MVS NetBSD Internet: NetBSD NetWare Internet: NetWare Newdeal Internet: Newdeal NEXTSTEP Internet: NEXTSTEP OpenBSD Internet: OpenBSD OS/2 Internet: OS/2 Further operating systems Internet: Further operating systems PalmOS Internet: PalmOS Plan9 Internet: Plan9 QNX Internet: QNX RiscOS Internet: RiscOS Solaris Internet: Solaris SuSE Linux Internet: SuSE Linux Unicos Internet: Unicos Unix Internet: Unix Unixware Internet: Unixware Windows 2000 Internet: Windows 2000 Windows 3.11 Internet: Windows 3.11 Windows 95 Internet: Windows 95 Windows 98 Internet: Windows 98 Windows CE Internet: Windows CE Windows Family Internet: Windows Family Windows ME Internet: Windows ME Seite 1 von 138 eBook - Informations about Operating Systems Version: August 15, 2006 | Download: www.operating-system.org Windows NT 3.1 Internet: Windows NT 3.1 Windows NT 4.0 Internet: Windows NT 4.0 Windows Server 2003 Internet: Windows Server 2003 Windows Vista Internet: Windows Vista Windows XP Internet: Windows XP Apple - Company Internet: Apple - Company AT&T - Company Internet: AT&T - Company Be Inc. - Company Internet: Be Inc. - Company BSD Family Internet: BSD Family Cray Inc.
    [Show full text]
  • Parallel Computing at DESY Peter Wegner Outline •Types of Parallel
    Parallel Computing at DESY Peter Wegner Outline •Types of parallel computing •The APE massive parallel computer •PC Clusters at DESY •Symbolic Computing on the Tablet PC Parallel Computing at DESY, CAPP2005 1 Parallel Computing at DESY Peter Wegner Types of parallel computing : •Massive parallel computing tightly coupled large number of special purpose CPUs and special purpose interconnects in n-Dimensions (n=2,3,4,5,6) Software model – special purpose tools and compilers •Event parallelism trivial parallel processing characterized by communication independent programs which are running on large PC farms Software model – Only scheduling via a Batch System Parallel Computing at DESY, CAPP2005 2 Parallel Computing at DESY Peter Wegner Types of parallel computing cont.: •“Commodity ” parallel computing on clusters one parallel program running on a distributed PC Cluster, the cluster nodes are connected via special high speed, low latency interconnects (GBit Ethernet, Myrinet, Infiniband) Software model – MPI (Message Passing Interface) •SMP (Symmetric MultiProcessing) parallelism many CPUs are sharing a global memory, one program is running on different CPUs in parallel Software model – OpenPM and MPI Parallel Computing at DESY, CAPP2005 3 Parallel computing at DESY: Zeuthen Computer Center Massive parallel PC Farms PC Clusters computer Parallel Computing Parallel Computing at DESY, CAPP2005 4 Parallel Computing at DESY Massive parallel APE (Array Processor Experiment) - since 1994 at DESY, exclusively used for Lattice Simulations for simulations of Quantum Chromodynamics in the framework of the John von Neumann Institute of Computing (NIC, FZ Jülich, DESY) http://www-zeuthen.desy.de/ape PC Cluster with fast interconnect (Myrinet, Infiniband) – since 2001, Applications: LQCD, Parform ? Parallel Computing at DESY, CAPP2005 5 Parallel computing at DESY: APEmille Parallel Computing at DESY, CAPP2005 6 Parallel computing at DESY: apeNEXT Parallel computing at DESY: apeNEXT Parallel Computing at DESY, CAPP2005 7 Parallel computing at DESY: Motivation for PC Clusters 1.
    [Show full text]
  • SGI Altix Applications Development and Optimization
    SGI Altix Applications Development and Optimization Part No.: AAPPL-0.9-L2.4-S-SD-W-DRAFT Release Date: May 15, 2003 2 RESTRICTION ON USE This document is protected by copyright and contains information proprietary to Silicon Graphics, Inc. Any copying, adaptation, distribution, public performance, or public display of this document without the express written consent of Silicon Graphics, Inc., is strictly prohibited. The receipt or possession of this document does not convey the rights to reproduce or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part, without the specific written consent of Silicon Graphics, Inc. Copyright 1997-2000 Silicon Graphics, Inc. All rights reserved. U.S. GOVERNMENT RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the data and information contained in this document by the Government is subject to restrictions as set forth in FAR 52.227-19(c)(2) or subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 and/or in similar or successor clauses in the FAR, or the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contrac- tor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre Pkwy., Mountain View, CA 94039-1351. The contents of this publication are subject to change without notice. PART NUMBER AAPPL-0.9-L2.4-S-SD-W-DRAFT, May 2003 RECORD OF REVISION Revision 0.9, Version 2.4, April 2003. SGI TRADEMARKS InfiniteReality, IRIX, Silicon Graphics, and the Silicon Graphics logo are registered trademarks, and Altix, Altix 3000, Origin, Origin 2000, Origin 300, Origin 3000, Power Challenge, Power ChallengeArray, NUMAflex and ProDev are trademarks of Silicon Graphics, Inc.
    [Show full text]
  • SGI® L1 and L2 Controller Software User's Guide
    SGI® L1 and L2 Controller Software User’s Guide 007-3938-004 CONTRIBUTORS Written by Linda Rae Sande Revised by Francisco Razo and Terry Schultz Illustrated by Dan Young Production by Terry Schultz Engineering contributions by Don Adams, Michael T. Brown, Dick Brownell, Jason Chang, Steve Hein, Jill Heitpas, Nancy Heller, Matt Hoy, Hao Pham, Craig Schultz, and Lisa Steinmetz. COPYRIGHT © 2002, 2003, 2004, 2005, Silicon Graphics, Inc. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part, without the prior written permission of Silicon Graphics, Inc. LIMITED RIGHTS LEGEND The software described in this document is “commercial computer software” provided with restricted rights (except as to included open/free source) as specified in the FAR 52.227-19 and/or the DFAR 227.7202, or successive sections. Use beyond license provisions is a violation of worldwide intellectual property laws, treaties and conventions. This document is provided with limited rights as defined in 52.227-14. TRADEMARKS AND ATTRIBUTIONS Silicon Graphics, SGI, the SGI logo, Altix, Onyx, and Origin are registered trademarks and Fuei, NUMAflex, NUMAlink, Prism, and SGIconsole are trademarks of Silicon Graphics, Inc., in the U.S. and/or other countries worldwide. All other trademarks mentioned herein are the property of their respective owners. New Features in This Guide This manual has been updated with information to support the SGI Altix 3700 Bx2 system. Major Documentation Changes The following sections were revised for this release: • Added information about the Silicon Graphics Prism Visualization System in the Introduction, Chapter 1, and Chapter 2.
    [Show full text]
  • Application of General Purpose HPC Systems in HPEC David Alexander
    Application of General Purpose HPC Systems in HPEC David Alexander Silicon Graphics, Inc. Phone: (650) 933-1073 Fax: (650) 932-0663 Email: [email protected] Areas that this paper/presentation will address: * Reconfigurable Computing for Embedded Systems * High-Speed Interconnect Technologies Abstract: High performance embedded computing (HPEC) has traditionally been performed by systems designed specifically for the task. Recent years have seen the increasing application of general-purpose high performance computing (HPC) systems in embedded applications. General purpose HPC systems typically have a large user base which results in broad application SW and device driver availability, robust development and debugging tools, and revenue streams which support significant R&D funding of technologies to enhance HPC system performance and reliability. Various factors have prevented wider adoption of general purpose HPC systems in the embedded space...factors such as lack of dense, ruggedized packaging suitable for embedded applications, lack of real-time capabilities in general purpose operating systems [1], and performance/watt and performance/unit volume advantages that specialized systems have traditionally had over general purpose HPC systems. This presentation details plans for addressing these shortcomings through the deployment of a heterogeneous computing architecture which incorporates FPGA-based reconfigurable computing and I/O elements, system interconnect advancements leveraged from HPC system development, microprocessor and system advancements developed under DARPA's HPCS program, and the mapping of the system into packaging suitable for HPEC applications. Introduction and System Architectural Review SGI's ccNUMA (cache coherent non-uniform memory architecture) global shared memory system architecture is the basis of our general-purpose Origin [2] and Altix [3] HPC systems.
    [Show full text]
  • Data Center Architecture and Topology
    CENTRAL TRAINING INSTITUTE JABALPUR Data Center Architecture and Topology Data Center Architecture Overview The data center is home to the computational power, storage, and applications necessary to support an enterprise business. The data center infrastructure is central to the IT architecture, from which all content is sourced or passes through. Proper planning of the data center infrastructure design is critical, and performance, resiliency, and scalability need to be carefully considered. Another important aspect of the data center design is flexibility in quickly deploying and supporting new services. Designing a flexible architecture that has the ability to support new applications in a short time frame can result in a significant competitive advantage. Such a design requires solid initial planning and thoughtful consideration in the areas of port density, access layer uplink bandwidth, true server capacity, and oversubscription, to name just a few. The data center network design is based on a proven layered approach, which has been tested and improved over the past several years in some of the largest data center implementations in the world. The layered approach is the basic foundation of the data center design that seeks to improve scalability, performance, flexibility, resiliency, and maintenance. Figure 1-1 shows the basic layered design. 1 CENTRAL TRAINING INSTITUTE MPPKVVCL JABALPUR Figure 1-1 Basic Layered Design Campus Core Core Aggregation 10 Gigabit Ethernet Gigabit Ethernet or Etherchannel Backup Access The layers of the data center design are the core, aggregation, and access layers. These layers are referred to extensively throughout this guide and are briefly described as follows: • Core layer—Provides the high-speed packet switching backplane for all flows going in and out of the data center.
    [Show full text]
  • SGI® Altix® 330 Self-Paced Training
    SGI Multi-Paradigm Architecture Michael Woodacre Chief Engineer, Server Platform Group [email protected] A History of Innovation in HPC Challenge® XL media server fuels Steven Spielberg’s Shoah NASA Ames and project to document Altix® set world Power Series™, Holocaust survivor record for multi-processing stories systems provide STREAMS Jim Clark compute power First systems benchmark founded SGI on SGI introduces for high-end deployed in Stephen the vision of its first 64-bit graphics Hawking’s COSMOS Altix®, first scalable Computer operating applications system 64-bit Linux® Server Visualization system 1982 1984 1988 1994 1995 1996 1997 1998 2001 2003 2004 DOE deploys 6144p Introduced First generation Origin 2000 to IRIS® Workstations modular NUMA System: monitor and become first integrated NUMAflex™ Origin® 2000 simulate nuclear 3D graphics systems architecture stockpile with Origin® 3000 First 512p Altix cluster Dockside engineering analysis on Origin® drives ocean research at NASA Ames 2000 and Indigo2™ helps Team New +10000p upgrade! Zealand win America’s Cup Images courtesy of Team New Zealand and the University of Cambridge SGI Proprietary 2 Over Time, Problems Get More Complex, Data Sets Exploding Bumper, hood, engine, wheels Entire car E-crash dummy Organ damage This Trend Continues Across SGI's Markets Improve design Improve patient safety Improve oil exploration Improve hurricane prediction & manufacturing First Row Images: EAI, Lana Rushing, Engineering Animation, Inc, Volvo Car Corporation, Images courtesy of the SCI, Second Row Images: The MacNeal-Schwendler Corp , Manchester Visualization Center and University Department of Surgery, Paradigm Geophysical, the Laboratory for Atmospheres,SGI Proprietary NASA Goddard Space Flight Center.
    [Show full text]
  • SGI® Altix® 4700 Servers and Supercomputers
    SGI® Altix® 4700 Servers and Supercomputers Revolutionary Platform Delivers New System Highlights • Unique modular blade design for superior performance Levels of Performance and Flexibility density and ‘plug and solve’ fl exibility with a Functional Blade Design • Designed for future upgrade, expansion and integration of next-generation technologies • Scalable system size for simplifi ed programming, administration and sustained performance • Standards-based platform reduces cost while delivering uncompromised performance on Linux® Modular Blade Design for Superior Performance Density and ‘Plug and Solve’ Flexibility SGI® Altix® 4700 platform is comprised of modular blades - interchangeable compute, memory, I/O and special purpose blades for ‘plug and solve’ configuration flexibility. The innovative blade-to-NUMAlink™ architecture enables users to mix and match a variety of standardized blade choices, for perfect system right-sizing. The compact blade packaging of the Altix 4700 rack also provides excellent performance density, delivering over one teraflop per tall rack. SGI Altix 4700 is offered with two alternate compute blades; one optimized for maximum performance with top memory bandwidth and another optimized for cost-effective compute density. Designed for Future Upgrade, Expansion and Integration of Next-Generation Technologies SGI Altix 4700 supports dual-core Intel® Itanium® Series 9000 cpus and offers easy upgrade or expansion of memory, I/O or other capabilities. This flexible growth path makes it possible for customers to adjust system configurations to meet current and changing requirements easily and cost-effectively; minimum risk for maximum productivity. Altix 4700 also integrates SGI’s Peer I/O technology which enables high-speed access to SGI’s large shared memory for all system components.
    [Show full text]
  • SGI® Altix® Architecture Considerations for Linux® Device Drivers
    SGI® Altix® Architecture Considerations for Linux® Device Drivers 007–4763–001 COPYRIGHT © 2005, Silicon Graphics, Inc. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part, without the prior written permission of Silicon Graphics, Inc. LIMITED RIGHTS LEGEND The software described in this document is "commercial computer software" provided with restricted rights (except as to included open/free source) as specified in the FAR 52.227-19 and/or the DFAR 227.7202, or successive sections. Use beyond license provisions is a violation of worldwide intellectual property laws, treaties and conventions. This document is provided with limited rights as defined in 52.227-14. TRADEMARKS AND ATTRIBUTIONS Silicon Graphics, SGI, the SGI logo, Altix, IRIX, and Origin are registered trademarks and NUMAlink is a trademark of Silicon Graphics, Inc., in the United States and/or other countries worldwide. Intel and Itanium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linux Torvalds. Motorola is a registered trademark of Motorola, Inc. All other trademarks mentioned herein are the property of their respective owners. The information in Chapter 2, “Memory Operation Ordering on SGI Altix Systems,” was originally authored by Jesse Barnes. Record of Revision Version Description 001 August 2005 Original publication 007–4763–001 iii Contents About This Guide ..................... ix Related Resources ....................... ix Developer Program .
    [Show full text]
  • Comparing Ethernet and Myrinet for MPI Communication
    Comparing Ethernet and Myrinet for MPI Communication Supratik Majumder Scott Rixner Rice University Rice University Houston, Texas Houston, Texas [email protected] [email protected] ABSTRACT operating system. In addition, TCP is a carefully developed This paper compares the performance of Myrinet and Eth- network protocol that gracefully handles lost packets and ernet as a communication substrate for MPI libraries. MPI flow control. User-level protocols can not hope to achieve library implementations for Myrinet utilize user-level com- the efficiency of TCP when dealing with these issues. Fur- munication protocols to provide low latency and high band- thermore, there has been significant work in the network width MPI messaging. In contrast, MPI library impleme- server domain to optimize the use of TCP for efficient com- nations for Ethernet utilize the operating system network munication. protocol stack, leading to higher message latency and lower message bandwidth. However, on the NAS benchmarks, GM This paper evaluates the performance differences between messaging over Myrinet only achieves 5% higher applica- GM over 1.2 Gbps Myrinet and TCP over 1 Gbps Ethernet tion performance than TCP messaging over Ethernet. Fur- as communication substrates for the Los Alamos MPI (LA- thermore, efficient TCP messaging implmentations improve MPI) library. The raw network unidirectional ping latency communication latency tolerance, which closes the perfor- of Myrinet is lower than Ethernet by almost 40 µsec. How- mance gap between Myrinet and Ethernet to about 0.5% ever, a considerable fraction of this difference is due to in- on the NAS benchmarks. This shows that commodity net- terrupt coalescing employed by the Ethernet network inter- working, if used efficiently, can be a viable alternative to face driver.
    [Show full text]
  • SGI® Altix® Systems Dual-Port Gigabit Ethernet Board User's Guide
    SGI® Altix® Systems Dual-Port Gigabit Ethernet Board User’s Guide 007-4326-001 CONTRIBUTORS Written by Terry Schultz Illustrated by Dan Young and Chrystie Danzer Production by Karen Jacobson Engineering contributions by Jim Hunter and Steve Modica COPYRIGHT © 2004, Silicon Graphics, Inc. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part, without the prior written permission of Silicon Graphics, Inc. LIMITED RIGHTS LEGEND The electronic (software) version of this document was developed at private expense; if acquired under an agreement with the US government or any contractor thereto, it is acquired as “commercial computer software” subject to the provisions of its applicable license agreement, as specified in (a) 48 CFR 12.212 of the FAR; or, if acquired for Department of Defense units, (b) 48 CFR 227-7202 of the DoD FAR Supplement; or sections succeeding thereto. Contractor/manufacturer is Silicon Graphics, Inc., 1500 Crittenden Lane, Mountain View, CA 94043-1351. TRADEMARKS AND ATTRIBUTIONS Silicon Graphics, SGI, the SGI logo, Altix, IRIS, IRIX, Octane, Onyx, Onyx2, and Origin are registered trademarks, and Octane2, Silicon Graphics Fuel, and Silicon Graphics Tezro are trademarks of Silicon Graphics, Inc., in the United States and/or other countries worldwide. FCC WARNING This equipment has been tested and found compliant with the limits for a Class A digital device, pursuant to Part 15 of the FCC rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment.
    [Show full text]