Implementation of Open MPI on Red Storm
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Red Storm Infrastructure at Sandia National Laboratories
Red Storm Infrastructure Robert A. Ballance, John P. Noe, Milton J. Clauser, Martha J. Ernest, Barbara J. Jennings, David S. Logsted, David J. Martinez, John H. Naegle, and Leonard Stans Sandia National Laboratories∗ P.O. Box 5800 Albquuerque, NM 87185-0807 May 13, 2005 Abstract Large computing systems, like Sandia National Laboratories’ Red Storm, are not deployed in isolation. The computer, its infrastructure support, and the operations of both have to be designed to support the mission of the or- ganization. This talk and paper will describe the infrastructure and operational requirements of Red Storm along with a discussion of how Sandia is meeting those requirements. It will include discussions of the facilities that surround and support Red Storm: shelter, power, cooling, security, networking, interactions with other Sandia platforms and capabilities, operations support, and user services. 1 Introduction mary components: Stewart Brand, in How Buildings Learn[Bra95], has doc- 1. The physical setting, such as buildings, power, and umented many of the ways in which buildings — often cooling, thought to be static structures – evolve over time to meet the needs of their occupants and the constraints of their 2. The computational setting which includes the re- environments. Machines learn in the same way. This is lated computing systems that support ASC efforts not the learning of “Machine learning” in Artificial Intel- on Red Storm, ligence; it is the very human result of making a machine 3. The networking setting that ties all of these to- smarter as it operates within its distinct environment. gether, and Deploying a large computational resource teaches many lessons, including the need to meet the expecta- 4. -
Introducing the Cray XMT
Introducing the Cray XMT Petr Konecny Cray Inc. 411 First Avenue South Seattle, Washington 98104 [email protected] May 5th 2007 Abstract Cray recently introduced the Cray XMT system, which builds on the parallel architecture of the Cray XT3 system and the multithreaded architecture of the Cray MTA systems with a Cray-designed multithreaded processor. This paper highlights not only the details of the architecture, but also the application areas it is best suited for. 1 Introduction past two decades the processor speeds have increased several orders of magnitude while memory speeds Cray XMT is the third generation multithreaded ar- increased only slightly. Despite this disparity the chitecture built by Cray Inc. The two previous gen- performance of most applications continued to grow. erations, the MTA-1 and MTA-2 systems [2], were This has been achieved in large part by exploiting lo- fully custom systems that were expensive to man- cality in memory access patterns and by introducing ufacture and support. Developed under code name hierarchical memory systems. The benefit of these Eldorado [4], the Cray XMT uses many parts built multilevel caches is twofold: they lower average la- for other commercial system, thereby, significantly tency of memory operations and they amortize com- lowering system costs while maintaining the sim- munication overheads by transferring larger blocks ple programming model of the two previous genera- of data. tions. Reusing commodity components significantly The presence of multiple levels of caches forces improves price/performance ratio over the two pre- programmers to write code that exploits them, if vious generations. they want to achieve good performance of their ap- Cray XMT leverages the development of the Red plication. -
Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams
MERRIMAC – HIGH-PERFORMANCE AND HIGHLY-EFFICIENT SCIENTIFIC COMPUTING WITH STREAMS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Mattan Erez May 2007 c Copyright by Mattan Erez 2007 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (William J. Dally) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Patrick M. Hanrahan) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Mendel Rosenblum) Approved for the University Committee on Graduate Studies. iii iv Abstract Advances in VLSI technology have made the raw ingredients for computation plentiful. Large numbers of fast functional units and large amounts of memory and bandwidth can be made efficient in terms of chip area, cost, and energy, however, high-performance com- puters realize only a small fraction of VLSI’s potential. This dissertation describes the Merrimac streaming supercomputer architecture and system. Merrimac has an integrated view of the applications, software system, compiler, and architecture. We will show how this approach leads to over an order of magnitude gains in performance per unit cost, unit power, and unit floor-space for scientific applications when compared to common scien- tific computers designed around clusters of commodity general-purpose processors. -
Cray XT and Cray XE Y Y System Overview
Crayyy XT and Cray XE System Overview Customer Documentation and Training Overview Topics • System Overview – Cabinets, Chassis, and Blades – Compute and Service Nodes – Components of a Node Opteron Processor SeaStar ASIC • Portals API Design Gemini ASIC • System Networks • Interconnection Topologies 10/18/2010 Cray Private 2 Cray XT System 10/18/2010 Cray Private 3 System Overview Y Z GigE X 10 GigE GigE SMW Fibre Channels RAID Subsystem Compute node Login node Network node Boot /Syslog/Database nodes 10/18/2010 Cray Private I/O and Metadata nodes 4 Cabinet – The cabinet contains three chassis, a blower for cooling, a power distribution unit (PDU), a control system (CRMS), and the compute and service blades (modules) – All components of the system are air cooled A blower in the bottom of the cabinet cools the blades within the cabinet • Other rack-mounted devices within the cabinet have their own internal fans for cooling – The PDU is located behind the blower in the back of the cabinet 10/18/2010 Cray Private 5 Liquid Cooled Cabinets Heat exchanger Heat exchanger (XT5-HE LC only) (LC cabinets only) 48Vdc flexible Cage 2 buses Cage 2 Cage 1 Cage 1 Cage VRMs Cage 0 Cage 0 backplane assembly Cage ID controller Interconnect 01234567 Heat exchanger network cable Cage inlet (LC cabinets only) connection air temp sensor Airflow Heat exchanger (slot 3 rail) conditioner 48Vdc shelf 3 (XT5-HE LC only) 48Vdc shelf 2 L1 controller 48Vdc shelf 1 Blower speed controller (VFD) Blooewer PDU line filter XDP temperature XDP interface & humidity sensor -
Developing Integrated Data Services for Cray Systems with a Gemini Interconnect
Developing Integrated Data Services for Cray Systems with a Gemini Interconnect Ron A. Oldeld Todd Kordenbrock Jay Lofstead May 1, 2012 Abstract Over the past several years, there has been increasing interest in injecting a layer of compute resources between a high-performance computing application and the end storage devices. For some projects, the objective is to present the parallel le system with a reduced set of clients, making it easier for le-system vendors to support extreme-scale systems. In other cases, the objective is to use these resources as staging areas to aggregate data or cache bursts of I/O operations. Still others use these staging areas for in-situ analysis on data in-transit between the application and the storage system. To simplify our discussion, we adopt the general term Integrated Data Services to represent these use-cases. This paper describes how we provide user-level, integrated data services for Cray systems that use the Gemini Interconnect. In particular, we describe our implementation and performance results on the Cray XE6, Cielo, at Los Alamos National Laboratory. 1 Introduction den on the le system and potentially improve over- all ecency of the application workow. Current In our quest toward exascale systems and applica- work to enable these coupling and workow scenar- tions, one topic that is frequently discussed is the ios are focused on the data issues to resolve resolu- need for more exible execution models. Current tion and mesh mismatches, time scale mismatches, models for capability class high-performance comput- and make data available through data staging tech- ing (HPC) systems are essentially static, requiring niques [20, 34, 21, 11, 2, 40, 10]. -
Jaguar and Kraken -The World's Most Powerful Computer Systems
Jaguar and Kraken -The World's Most Powerful Computer Systems Arthur Bland Cray Users’ Group 2010 Meeting Edinburgh, UK May 25, 2010 Abstract & Outline At the SC'09 conference in November 2009, Jaguar and Kraken, both located at ORNL, were crowned as the world's fastest computers (#1 & #3) by the web site www.Top500.org. In this paper, we will describe the systems, present results from a number of benchmarks and applications, and talk about future computing in the Oak Ridge Leadership Computing Facility. • Cray computer systems at ORNL • System Architecture • Awards and Results • Science Results • Exascale Roadmap 2 CUG2010 – Arthur Bland Jaguar PF: World’s most powerful computer— Designed for science from the ground up Peak performance 2.332 PF System memory 300 TB Disk space 10 PB Disk bandwidth 240+ GB/s Based on the Sandia & Cray Compute Nodes 18,688 designed Red Storm System AMD “Istanbul” Sockets 37,376 Size 4,600 feet2 Cabinets 200 3 CUG2010 – Arthur Bland (8 rows of 25 cabinets) Peak performance 1.03 petaflops Kraken System memory 129 TB World’s most powerful Disk space 3.3 PB academic computer Disk bandwidth 30 GB/s Compute Nodes 8,256 AMD “Istanbul” Sockets 16,512 Size 2,100 feet2 Cabinets 88 (4 rows of 22) 4 CUG2010 – Arthur Bland Climate Modeling Research System Part of a research collaboration in climate science between ORNL and NOAA (National Oceanographic and Atmospheric Administration) • Phased System Delivery • Total System Memory – CMRS.1 (June 2010) 260 TF – 248 TB DDR3-1333 – CMRS.2 (June 2011) 720 TF • File Systems -
Cray XT3/XT4 Software
Cray XT3/XT4 Software: Status and Plans David Wallace Cray Inc ABSTRACT: : This presentation will discuss the current status of software and development plans for the CRAY XT3 and XT4 systems. A review of major milestones and accomplishments over the past year will be presented. KEYWORDS: ‘Cray XT3’, ‘Cray XT4’, software, CNL, ‘compute node OS’, Catamount/Qk processors, support for larger system configurations (five 1. Introduction rows and larger), a new version of SuSE Linux and CFS Lustre as well as a host of other new features and fixes to The theme for CUG 2007 is "New Frontiers ," which software deficiencies. Much has been done in terms of reflects upon how the many improvements in High new software releases and adding support for new Performance Computing have significantly facilitated hardware. advances in Technology and Engineering. UNICOS/lc is 2.1 UNICOS/lc Releases moving to new frontiers as well. The first section of this paper will provide a perspective of the major software The last twelve months have been very busy with milestones and accomplishments over the past twelve respect to software releases. Two major releases months. The second section will discuss the current status (UNICOS/lc 1.4 and 1.5) and almost weekly updates have of the software with respect to development, releases and provided a steady stream of new features, enhancements support. The paper will conclude with a view of future and fixes to our customers. UNICOS/lc 1.4 was released software plans. for general availability in June 2006. A total of fourteen minor releases have been released since last June. -
The Gemini Network
The Gemini Network Rev 1.1 Cray Inc. © 2010 Cray Inc. All Rights Reserved. Unpublished Proprietary Information. This unpublished work is protected by trade secret, copyright and other laws. Except as permitted by contract or express written permission of Cray Inc., no part of this work or its content may be used, reproduced or disclosed in any form. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Autotasking, Cray, Cray Channels, Cray Y-MP, UNICOS and UNICOS/mk are federally registered trademarks and Active Manager, CCI, CCMT, CF77, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Ada, Cray Animation Theater, Cray APP, Cray Apprentice2, Cray C90, Cray C90D, Cray C++ Compiling System, Cray CF90, Cray EL, Cray Fortran Compiler, Cray J90, Cray J90se, Cray J916, Cray J932, Cray MTA, Cray MTA-2, Cray MTX, Cray NQS, Cray Research, Cray SeaStar, Cray SeaStar2, Cray SeaStar2+, Cray SHMEM, Cray S-MP, Cray SSD-T90, Cray SuperCluster, Cray SV1, Cray SV1ex, Cray SX-5, Cray SX-6, Cray T90, Cray T916, Cray T932, Cray T3D, Cray T3D MC, Cray T3D MCA, Cray T3D SC, Cray T3E, Cray Threadstorm, Cray UNICOS, Cray X1, Cray X1E, Cray X2, Cray XD1, Cray X-MP, Cray XMS, Cray XMT, Cray XR1, Cray XT, Cray XT3, Cray XT4, Cray XT5, Cray XT5h, Cray Y-MP EL, Cray-1, Cray-2, Cray-3, CrayDoc, CrayLink, Cray-MP, CrayPacs, CrayPat, CrayPort, Cray/REELlibrarian, CraySoft, CrayTutor, CRInform, CRI/TurboKiva, CSIM, CVT, Delivering the power…, Dgauss, Docview, EMDS, GigaRing, HEXAR, HSX, IOS, ISP/Superlink, LibSci, MPP Apprentice, ND Series Network Disk Array, Network Queuing Environment, Network Queuing Tools, OLNET, RapidArray, RQS, SEGLDR, SMARTE, SSD, SUPERLINK, System Maintenance and Remote Testing Environment, Trusted UNICOS, TurboKiva, UNICOS MAX, UNICOS/lc, and UNICOS/mp are trademarks of Cray Inc. -
Red Storm IO Performance Analysis James H
1 Red Storm IO Performance Analysis James H. Laros III #1,LeeWard#2, Ruth Klundt ∗3, Sue Kelly #4 James L. Tomkins #5, Brian R. Kellogg #6 #Sandia National Laboratories 1515 Eubank SE Albuquerque NM, 87123-1319 [email protected] [email protected], [email protected] [email protected], [email protected] ∗Hewlett-Packard 3000 Hanover Street Palo Alto, CA 94304-1185 [email protected] Abstract— This paper will summarize an IO1 performance hardware or software, potentially affects how the entire system analysis effort performed on Sandia National Laboratories Red performs. Storm platform. Our goal was to examine the IO system We consider the testing described in this paper to be second performance and identify problems or bottle-necks in any aspect of the IO sub-system. Our process examined the entire IO path phase testing. Initial testing was accomplished to establish from application to disk both in segments and as a whole. Our baselines, develop test harnesses and establish test parameters final analysis was performed at scale employing parallel IO that would provide meaningful information within the lim- access methods typically used in High Performance Computing its imposed such as time constraints. All testing presented applications. in this paper was accomplished during system preventative Index Terms— Red Storm, Lustre, CFS, Data Direct Networks, maintenance periods on the production Red Storm platform, Parallel File-Systems on production file-systems. Testing time on heavily utilized production platforms such as Red Storm is precious. Detailed results of previous, current and future testing is posted on- I. INTRODUCTION line[2]. IO is a critical component of capability computing2.At In section II we will discuss components of the Red Storm Sandia Labs, high value is placed on balanced platforms. -
Overview of the Next Generation Cray XMT
Overview of the Next Generation Cray XMT Andrew Kopser and Dennis Vollrath, Cray Inc. ABSTRACT: The next generation of Cray’s multithreaded supercomputer is architecturally very similar to the Cray XMT, but the memory system has been improved significantly and other features have been added to improve reliability, productivity, and performance. This paper begins with a review of the Cray XMT followed by an overview of the next generation Cray XMT implementation. Finally, the new features of the next generation XMT are described in more detail and preliminary performance results are given. KEYWORDS: Cray, XMT, multithreaded, multithreading, architecture 1. Introduction The primary objective when designing the next Cray XMT generation Cray XMT was to increase the DIMM The Cray XMT is a scalable multithreaded computer bandwidth and capacity. It is built using the Cray XT5 built with the Cray Threadstorm processor. This custom infrastructure with DDR2 technology. This change processor is designed to exploit parallelism that is only increases each node’s memory capacity by a factor of available through its unique ability to rapidly context- eight and DIMM bandwidth by a factor of three. switch among many independent hardware execution streams. The Cray XMT excels at large graph problems Reliability and productivity were important goals for that are poorly served by conventional cache-based the next generation Cray XMT as well. A source of microprocessors. frustration, especially for new users of the current Cray XMT, is the issue of hot spots. Since all memory is The Cray XMT is a shared memory machine. The programming model assumes a flat, globally accessible, globally accessible and the application has access to shared address space. -
Titan Introduction and Timeline Bronson Messer
Titan Introduction and Timeline Presented by: Bronson Messer Scientific Computing Group Oak Ridge Leadership Computing Facility Disclaimer: There is no contract with the vendor. This presentation is the OLCF’s current vision that is awaiting final approval. Subject to change based on contractual and budget constraints. We have increased system performance by 1,000 times since 2004 Hardware scaled from single-core Scaling applications and system software is the biggest through dual-core to quad-core and challenge dual-socket , 12-core SMP nodes • NNSA and DoD have funded much • DOE SciDAC and NSF PetaApps programs are funding of the basic system architecture scalable application work, advancing many apps research • DOE-SC and NSF have funded much of the library and • Cray XT based on Sandia Red Storm applied math as well as tools • IBM BG designed with Livermore • Computational Liaisons key to using deployed systems • Cray X1 designed in collaboration with DoD Cray XT5 Systems 12-core, dual-socket SMP Cray XT4 Cray XT4 Quad-Core 2335 TF and 1030 TF Cray XT3 Cray XT3 Dual-Core 263 TF Single-core Dual-Core 119 TF Cray X1 54 TF 3 TF 26 TF 2005 2006 2007 2008 2009 2 Disclaimer: No contract with vendor is in place Why has clock rate scaling ended? Power = Capacitance * Frequency * Voltage2 + Leakage • Traditionally, as Frequency increased, Voltage decreased, keeping the total power in a reasonable range • But we have run into a wall on voltage • As the voltage gets smaller, the difference between a “one” and “zero” gets smaller. Lower voltages mean more errors. -
Taking the Lead in HPC
Taking the Lead in HPC Cray X1 Cray XD1 Cray Update SuperComputing 2004 Safe Harbor Statement The statements set forth in this presentation include forward-looking statements that involve risk and uncertainties. The Company wished to caution that a number of factors could cause actual results to differ materially from those in the forward-looking statements. These and other factors which could cause actual results to differ materially from those in the forward-looking statements are discussed in the Company’s filings with the Securities and Exchange Commission. SuperComputing 2004 Copyright Cray Inc. 2 Agenda • Introduction & Overview – Jim Rottsolk • Sales Strategy – Peter Ungaro • Purpose Built Systems - Ly Pham • Future Directions – Burton Smith • Closing Comments – Jim Rottsolk A Long and Proud History NASDAQ: CRAY Headquarters: Seattle, WA Marketplace: High Performance Computing Presence: Systems in over 30 countries Employees: Over 800 BuildingBuilding Systems Systems Engineered Engineered for for Performance Performance Seymour Cray Founded Cray Research 1972 The Father of Supercomputing Cray T3E System (1996) Cray X1 Cray-1 System (1976) Cray-X-MP System (1982) Cray Y-MP System (1988) World’s Most Successful MPP 8 of top 10 Fastest First Supercomputer First Multiprocessor Supercomputer First GFLOP Computer First TFLOP Computer Computer in the World SuperComputing 2004 Copyright Cray Inc. 4 2002 to 2003 – Successful Introduction of X1 Market Share growth exceeded all expectations • 2001 Market:2001 $800M • 2002 Market: about $1B