Cray XT and Cray XE Y Y System Overview

Total Page:16

File Type:pdf, Size:1020Kb

Cray XT and Cray XE Y Y System Overview Crayyy XT and Cray XE System Overview Customer Documentation and Training Overview Topics • System Overview – Cabinets, Chassis, and Blades – Compute and Service Nodes – Components of a Node Opteron Processor SeaStar ASIC • Portals API Design Gemini ASIC • System Networks • Interconnection Topologies 10/18/2010 Cray Private 2 Cray XT System 10/18/2010 Cray Private 3 System Overview Y Z GigE X 10 GigE GigE SMW Fibre Channels RAID Subsystem Compute node Login node Network node Boot /Syslog/Database nodes 10/18/2010 Cray Private I/O and Metadata nodes 4 Cabinet – The cabinet contains three chassis, a blower for cooling, a power distribution unit (PDU), a control system (CRMS), and the compute and service blades (modules) – All components of the system are air cooled A blower in the bottom of the cabinet cools the blades within the cabinet • Other rack-mounted devices within the cabinet have their own internal fans for cooling – The PDU is located behind the blower in the back of the cabinet 10/18/2010 Cray Private 5 Liquid Cooled Cabinets Heat exchanger Heat exchanger (XT5-HE LC only) (LC cabinets only) 48Vdc flexible Cage 2 buses Cage 2 Cage 1 Cage 1 Cage VRMs Cage 0 Cage 0 backplane assembly Cage ID controller Interconnect 01234567 Heat exchanger network cable Cage inlet (LC cabinets only) connection air temp sensor Airflow Heat exchanger (slot 3 rail) conditioner 48Vdc shelf 3 (XT5-HE LC only) 48Vdc shelf 2 L1 controller 48Vdc shelf 1 Blower speed controller (VFD) Blooewer PDU line filter XDP temperature XDP interface & humidity sensor board (behind (behind perforated perforated panel, panel in some XT5-HE LC only) XT5-HE LC cabinets) 10/18/2010 Cray Private 6 Blades • The system contains two types of blades: – Compute blades 4 SeaStar or 2 Gemini ASICs 4 nodes – Service blades SIO (XT systems) • 4 SeaStar ASICs • 2 nodes • One dual-slot PCI-X or PCIe riser assemblies per node XIO (XE systems) • 2 Gemini ASICs • 4 nodes • OilltPCIiOne single slot PCIe riser per nod d(tthbtd)e (except on the boot node) 10/18/2010 Cray Private 7 PCI Cards • Cray supports the following types of PCIe cards: – Gigabit Ethernet – 10Gigabit Ethernet – Fibre Channel (FC2, FC4, and FC8) – IfiiBdInfiniBand 10/18/2010 Cray Private 8 Node Components – Opteron processor Single-, dual-, quad-, six-, eight-, twelve-core versions – NdNode memory 2- to 8-GB of DDR1 memory in Cray XT3 compute nodes 2- to 8-GB of DDR1 memory in Cray XT service (SIO) nodes 4- to 16-GB of DDR2 memory in Cray XT4 compute nodes 4- to 32-GB of DDR2 memory in Cray XT5 compute nodes 8- t64to 64-GB o f DDR3 memory in Cray XT6 compu te no des 8- to 64-GB of DDR3 memory in Cray XE6 compute nodes 8- to 16-GB of DDR2 memoryyy in Cray XE6 service ()(XIO) nodes – SeaStar or Gemini ASIC 10/18/2010 Cray Private 9 Cray XT4 Compute Blade Node memory Node VRM Node 1 NdNode 2 SeaStar Node 0 Node 3 L0 controller 48Vdc to 12Vdc VRMs 10/18/2010 Cray Private 10 XT4 Node Block Diagram Opteron slots processor MM X86-64 4 DIM HyperTransport Link High-speed network (HSN) SeaStar (()interconnect) L0 controller 10/18/2010 Cray Private 11 SIO Service Blade PCI card Node 3 3 Node 0 2 PCI riser 1 (PCI-X or PCIe) 0 10/18/2010 Cray Private 12 SeaStar ASIC – Direct HyperTransport connection to Opteron – The DMA engine transfers data between the host memory andthd the net work Separate read and write sides in the DMA engine The Oppyteron does not directly load/store to network On the receive side a set of content addressable memory (CAM) locations are used to route incoming messages • There are 256 CAM entries – Designed for Sandia Portals API 10/18/2010 Cray Private 13 SeaStar Diagram Core DDR- Northbridge SDRAM Memory interface HyperTransport HyperTransport Interface Interface SeaStar To PCI-X on Service blades DMA Hyper x R engine Transport o y u Memory t z e r SSI PPU L0 controller 10/18/2010 Cray Private 14 Node Connection Bandwidths Opteron core(s) +Y ry oo HyperTransport Mem B/s GG +Z HyperTransport Cray XT3 6.4 GB/s bidirectional 9.6 2.17 GB/s sustained Cray XT4 8.0 GB/s bidirectional Memory Cray XT3 DDR1 6.4 GB/s (400 MHz DIMMs) 5.7 GB/s sustained -X 9.6 GB/s 9.6 GB/s +X 51.6 ns latency - measured SeaStar Cray XT4 DDR2 10.4 GB/s (667 MHz DIMMs) 8. 5 GB/s sustained 50 ns latency - measured High Speed Network Cray XT4 DDR2 12.8 GB/s (800 MHz DIMMs) 9.6 GB/s bidirectional 6.5 GB/s sustained Cray XT5 DDR2 10.4 GB/s (667 MHz DIMMs) GB/s Cray XT5 DDR2 12.8 GB/s (800 MHz DIMMs) -Z 9.6 10/18/2010 -Y Cray Private 15 Portals API Design – Designed to message among thousands of nodes Performance is critical only in terms of scalability SiSuccess is measure dbhllitiilldtd by how large an application is allowed to scale, not by a 2-node ping-pong time – Designed to avoid scalability limitations Is network independent Bypasses OS – no memory copies into or out of the kernel; few interrupts Bypasses application – does not require activity by the application to ensure progress Reliable, ordered delivery of messages between two nodes 10/18/2010 Cray Private 16 Portals API Design – Receiver (target) managed – no reserved memory for thousands of potential senders Is connec tionl ess; no expli c it connec tion is est abli sh ed with other processes in the application • When connected, maintains a minimum amount of state The target process determines how to respond to the incoming message • The target node can choose to accept or ignore a message from any node • Destination of any message is not an address – Instead “Match bits ” enable the receiver to place it Unexpected messages are handled by the unexpected message queue All buffers are in user space 10/18/2010 Cray Private 17 X6 Compute Blade with Gemini Node 3 NdNode 2 Gemini 1 Node 1 Gemini 0 Node 0 10/18/2010 Cray Private 18 XE6 Node Block Diagram lots lots Opteron Opteron ss ss processor processor X86-64 X86-64 4 DIMM 4 DIMM HyperTransport Links High-speed SeaStar (XT) network (HSN) Gemini (XE) (()interconnect) L0 controller 10/18/2010 Cray Private 19 XIO Service Blades • Four nodes; each node contains: – One AMD Opteron processor with up to 16 GB of DDR2 memory Processor is a six-core Opteron – A connection to a Gemini ASIC – Voltage regulating modules (VRMs) • L0 controller • Gemini mezzanine card – Contains two Gemini ASICs • Four PCIe risers – One riser per node 10/18/2010 Cray Private 20 XIO Blade node 3 connects Node 3 to the lower mezzanine Node 2 Gemini 1 node 1 connects Node 1 to the lower mezzanine Gemini 0 Node 0 10/18/2010 Cray Private 21 XIO Riser Configuration DDR2 Upper bay memory Inner x16 PCIe G2 HyperTransport Opteron mezzanine bridge HyperTransport node 3 DDR2 75 Watt slot memory Gemini 225 Watt slot HyperTransport Outer x16 PCIe G2 HyperTransport Opteron mezzanine bridge node 2 DDR2 memory Inner x16 PCIe G2 HyperTransport Opteron mezzanine bridge HyperTransport node 1 75 Watt slot DDR2 memory Gemini 225 Watt slot HyperTransport Outer x16 PCIe G2 HyperTransport Opteron mezzanine bridge node 0 Lower bay 10/18/2010 Cray Private 22 XIO Boot Node Configuration DDR2 Upper bay memory Inner x16 PCIe G2 HyperTransport Opteron mezzanine bridge HyperTransport node 3 DDR2 memory Gemini HyperTransport Outer x16 PCIe G2 HyperTransport Opteron mezzanine bridge node 2 DDR2 memory Inner x16 PCIe G2 HyperTransport Opteron mezzanine bridge HyperTransport node 1 DDR2 memory Gemini HyperTransport Outer boot x8 PCIe G2 HyperTransport Opteron mezzanine bridge node 0 Lower bay 10/18/2010 Cray Private 23 Opteron Processors CPU CPU CPU CPU CPU CPU CPU L1 I cache L1 D cache L1 caches L1 caches L1 s L1 s L1 s L1 s L2 cache L2 cache L2 cache L2 L2 L2 L2 System request queue System request queue L3 cache Crossbar Crossbar System request queue Memory Memory Crossbar Controller Controller Hyper- Hyper- Memory Transport Transport Controller Hyper- Links Links DDR DDR Transport Links DDR AMD Opteron single-core processor AMD Opteron dual-core processor AMD Opteron quad-core processor Cray XT3 System Cray XT3 System Cray XT4 System Socket 940: DDR1 buffered DIMMs Socket 940: DDR1 buffered DIMMs Socket AM2: DDR2 unbuffered DIMMs and 3 HyperTransportsand 3 HyperTransports and one HyperTransport Cray XT4 System Cray XT5 System Socket AM2: DDR2 unbuffered DIMMs Socket F: DDR2 registered (buffered) and one HyperTransport DIMMs and 3 HyperTransports AMD Opteron hex-core processor 10/18/2010 Cray Private 24 Opteron Processors CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L1 s L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L3 cache L3 cache L3 cache System request queue System request queue System request queue Crossbar Crossbar Crossbar Memory Memory Memory Controller Controller Controller Hyper- Hyper- Hyper- Transport Transport Transport Links Links Links DDR DDR DDR AMD Opteron six-core processor AMD Opteron twelve-core processor Cray XT5 System Dual-die socket Socket F: DDR2 unbuffered DIMMs (for an eight-core socket remove two and 3 HyperTransports CPUs from each side) L3 Cache: 6 MB Cray XT6 and Cray XE6 Systems Socket G34: DDR3 unbuffered DIMMs and 4 HyperTransports L3 cache: each die has 6 MBs, 12 MB per socket 10/18/2010 Cray Private 25 Gemini • Gemini is designed to pass data to and from the network with less control – GiiiitdfSMPGemini is suited for SMP – Gemini allows for adaptive routing – Hardware support for PGAS languages Remote memory address translations, as with Cray X2 systems – Atomic memoryyp operations • The Gemini
Recommended publications
  • New CSC Computing Resources
    New CSC computing resources Atte Sillanpää, Nino Runeberg CSC – IT Center for Science Ltd. Outline CSC at a glance New Kajaani Data Centre Finland’s new supercomputers – Sisu (Cray XC30) – Taito (HP cluster) CSC resources available for researchers CSC presentation 2 CSC’s Services Funet Services Computing Services Universities Application Services Polytechnics Ministries Data Services for Science and Culture Public sector Information Research centers Management Services Companies FUNET FUNET and Data services – Connections to all higher education institutions in Finland and for 37 state research institutes and other organizations – Network Services and Light paths – Network Security – Funet CERT – eduroam – wireless network roaming – Haka-identity Management – Campus Support – The NORDUnet network Data services – Digital Preservation and Data for Research Data for Research (TTA), National Digital Library (KDK) International collaboration via EU projects (EUDAT, APARSEN, ODE, SIM4RDM) – Database and information services Paituli: GIS service Nic.funet.fi – freely distributable files with FTP since 1990 CSC Stream Database administration services – Memory organizations (Finnish university and polytechnics libraries, Finnish National Audiovisual Archive, Finnish National Archives, Finnish National Gallery) 4 Current HPC System Environment Name Louhi Vuori Type Cray XT4/5 HP Cluster DOB 2007 2010 Nodes 1864 304 CPU Cores 10864 3648 Performance ~110 TFlop/s 34 TF Total memory ~11 TB 5 TB Interconnect Cray QDR IB SeaStar Fat tree 3D Torus CSC
    [Show full text]
  • Petaflops for the People
    PETAFLOPS SPOTLIGHT: NERSC housands of researchers have used facilities of the Advanced T Scientific Computing Research (ASCR) program and its EXTREME-WEATHER Department of Energy (DOE) computing predecessors over the past four decades. Their studies of hurricanes, earthquakes, NUMBER-CRUNCHING green-energy technologies and many other basic and applied Certain problems lend themselves to solution by science problems have, in turn, benefited millions of people. computers. Take hurricanes, for instance: They’re They owe it mainly to the capacity provided by the National too big, too dangerous and perhaps too expensive Energy Research Scientific Computing Center (NERSC), the Oak to understand fully without a supercomputer. Ridge Leadership Computing Facility (OLCF) and the Argonne Leadership Computing Facility (ALCF). Using decades of global climate data in a grid comprised of 25-kilometer squares, researchers in These ASCR installations have helped train the advanced Berkeley Lab’s Computational Research Division scientific workforce of the future. Postdoctoral scientists, captured the formation of hurricanes and typhoons graduate students and early-career researchers have worked and the extreme waves that they generate. Those there, learning to configure the world’s most sophisticated same models, when run at resolutions of about supercomputers for their own various and wide-ranging projects. 100 kilometers, missed the tropical cyclones and Cutting-edge supercomputing, once the purview of a small resulting waves, up to 30 meters high. group of experts, has trickled down to the benefit of thousands of investigators in the broader scientific community. Their findings, published inGeophysical Research Letters, demonstrated the importance of running Today, NERSC, at Lawrence Berkeley National Laboratory; climate models at higher resolution.
    [Show full text]
  • Overview of Recent Supercomputers
    Overview of recent supercomputers Aad J. van der Steen HPC Research NCF L.J. Costerstraat 5 6827 AR Arnhem The Netherlands [email protected] www.hpcresearch.nl NCF/HPC Research October 2011 Abstract In this report we give an overview of high-performance computers which are currently available or will become available within a short time frame from vendors; no attempt is made to list all machines that are still in the development phase. The machines are described according to their macro-architectural class. The information about the machines is kept as compact as possible. No attempt is made to quote price information as this is often even more elusive than the performance of a system. In addition, some general information about high-performance computer architectures and the various processors and communication networks employed in these systems is given in order to better appreciate the systems information given in this report. This document reflects the technical momentary state of the supercomputer arena as accurately as possible. However, the author nor NCF take any responsibility for errors or mistakes in this document. We encourage anyone who has comments or remarks on the contents to inform us, so we can improve this report. NCF, the National Computing Facilities Foundation, supports and furthers the advancement of tech- nical and scientific research with and into advanced computing facilities and prepares for the Netherlands national supercomputing policy. Advanced computing facilities are multi-processor vectorcomputers, mas- sively parallel computing systems of various architectures and concepts and advanced networking facilities. Contents 1 Introduction and account 2 2 Architecture of high-performance computers 4 2.1 Themainarchitecturalclasses.
    [Show full text]
  • Workload Management and Application Placement for the Cray Linux Environment™
    TMTM Workload Management and Application Placement for the Cray Linux Environment™ S–2496–4101 © 2010–2012 Cray Inc. All Rights Reserved. This document or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Cray Inc. U.S. GOVERNMENT RESTRICTED RIGHTS NOTICE The Computer Software is delivered as "Commercial Computer Software" as defined in DFARS 48 CFR 252.227-7014. All Computer Software and Computer Software Documentation acquired by or for the U.S. Government is provided with Restricted Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7014, as applicable. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Cray and Sonexion are federally registered trademarks and Active Manager, Cascade, Cray Apprentice2, Cray Apprentice2 Desktop, Cray C++ Compiling System, Cray CX, Cray CX1, Cray CX1-iWS, Cray CX1-LC, Cray CX1000, Cray CX1000-C, Cray CX1000-G, Cray CX1000-S, Cray CX1000-SC, Cray CX1000-SM, Cray CX1000-HN, Cray Fortran Compiler, Cray Linux Environment, Cray SHMEM, Cray X1, Cray X1E, Cray X2, Cray XD1, Cray XE, Cray XEm, Cray XE5, Cray XE5m, Cray XE6, Cray XE6m, Cray XK6, Cray XK6m, Cray XMT, Cray XR1, Cray XT, Cray XTm, Cray XT3, Cray XT4, Cray XT5, Cray XT5h, Cray XT5m, Cray XT6, Cray XT6m, CrayDoc, CrayPort, CRInform, ECOphlex, LibSci, NodeKARE, RapidArray, The Way to Better Science, Threadstorm, uRiKA, UNICOS/lc, and YarcData are trademarks of Cray Inc.
    [Show full text]
  • Introducing the Cray XMT
    Introducing the Cray XMT Petr Konecny Cray Inc. 411 First Avenue South Seattle, Washington 98104 [email protected] May 5th 2007 Abstract Cray recently introduced the Cray XMT system, which builds on the parallel architecture of the Cray XT3 system and the multithreaded architecture of the Cray MTA systems with a Cray-designed multithreaded processor. This paper highlights not only the details of the architecture, but also the application areas it is best suited for. 1 Introduction past two decades the processor speeds have increased several orders of magnitude while memory speeds Cray XMT is the third generation multithreaded ar- increased only slightly. Despite this disparity the chitecture built by Cray Inc. The two previous gen- performance of most applications continued to grow. erations, the MTA-1 and MTA-2 systems [2], were This has been achieved in large part by exploiting lo- fully custom systems that were expensive to man- cality in memory access patterns and by introducing ufacture and support. Developed under code name hierarchical memory systems. The benefit of these Eldorado [4], the Cray XMT uses many parts built multilevel caches is twofold: they lower average la- for other commercial system, thereby, significantly tency of memory operations and they amortize com- lowering system costs while maintaining the sim- munication overheads by transferring larger blocks ple programming model of the two previous genera- of data. tions. Reusing commodity components significantly The presence of multiple levels of caches forces improves price/performance ratio over the two pre- programmers to write code that exploits them, if vious generations. they want to achieve good performance of their ap- Cray XMT leverages the development of the Red plication.
    [Show full text]
  • HDAMA Rev.G User's Guide
    HDAMA rev.G HDAMA User’s Guide Release Date:Jul.2005 3.02 Version: Appendix BIOS Hardware Overview Setup Install Arima ServerBoard Manual COPYRIGHTS AND DISCLAIMERS ..........................................C-I ATTENTION: READ FIRST! ...................................... C-II Overview GENERAL SAFETY PRECAUTIONS .......................................C-2 ESD PRECAUTIONS ........................................................C-2 OPERATING PRECAUTIONS ................................................C-2 ABOUT THIS USER'S MANUAL ...........................................C-3 GETTING HELP ...............................................................C-3 SERVERBOARD SPECIFICATIONS ........................................C-5 SERVERBOARD LAYOUT ...................................................C-6 SERVERBOARD MAP .......................................................C-7 I/O PORT ARRAY ............................................................C-7 Hardware Installation MAP OF JUMPERS ...........................................................C-9 JUMPER SETTINGS ........................................................C-10 INSTALLING MEMORY .....................................................C-11 RECOMMENDED MEMORY CONFIGURATIONS .......................C-13 INSTALLING THE PROCESSOR AND HEATSINK ......................C-14 MAP OF SERVERBOARD CABLE CONNECTORS ...................C-16 ATX POWER CONNECTORS ............................................C-17 FLOPPY DISK DRIVE CONNECTOR ...................................C-18 PRIMARY IDE CONNECTORS ............................................C-18
    [Show full text]
  • Parallelization Hardware Architecture Type to Enter Text
    Parallelization Hardware architecture Type to enter text Delft University of Technology Challenge the future Contents • Introduction • Classification of systems • Topology • Clusters and Grid • Fun Hardware 2 hybrid parallel vector superscalar scalar 3 Why Parallel Computing Primary reasons: • Save time • Solve larger problems • Provide concurrency (do multiple things at the same time) Classification of HPC hardware • Architecture • Memory organization 5 1st Classification: Architecture • There are several different methods used to classify computers • No single taxonomy fits all designs • Flynn's taxonomy uses the relationship of program instructions to program data • SISD - Single Instruction, Single Data Stream • SIMD - Single Instruction, Multiple Data Stream • MISD - Multiple Instruction, Single Data Stream • MIMD - Multiple Instruction, Multiple Data Stream 6 Flynn’s Taxonomy • SISD: single instruction and single data stream: uniprocessor • SIMD: vector architectures: lower flexibility • MISD: no commercial multiprocessor: imagine data going through a pipeline of execution engines • MIMD: most multiprocessors today: easy to construct with off-the-shelf computers, most flexibility 7 SISD • One instruction stream • One data stream • One instruction issued on each clock cycle • One instruction executed on single element(s) of data (scalar) at a time • Traditional ‘von Neumann’ architecture (remember from introduction) 8 SIMD • Also von Neumann architectures but more powerful instructions • Each instruction may operate on more than one data element • Usually intermediate host executes program logic and broadcasts instructions to other processors • Synchronous (lockstep) • Rating how fast these machines can issue instructions is not a good measure of their performance • Two major types: • Vector SIMD • Parallel SIMD 9 Vector SIMD • Single instruction results in multiple operands being updated • Scalar processing operates on single data elements.
    [Show full text]
  • Pubtex Output 2006.05.15:1001
    Cray XD1™ Release Description Private S–2453–14 © 2006 Cray Inc. All Rights Reserved. Unpublished Private Information. This unpublished work is protected to trade secret, copyright and other laws. Except as permitted by contract or express written permission of Cray Inc., no part of this work or its content may be used, reproduced or disclosed in any form. U.S. GOVERNMENT RESTRICTED RIGHTS NOTICE The Computer Software is delivered as "Commercial Computer Software" as defined in DFARS 48 CFR 252.227-7014. All Computer Software and Computer Software Documentation acquired by or for the U.S. Government is provided with Restricted Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7014, as applicable. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Autotasking, Cray, Cray Channels, Cray Y-MP, GigaRing, LibSci, UNICOS and UNICOS/mk are federally registered trademarks and Active Manager, CCI, CCMT, CF77, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Ada, Cray Animation Theater, Cray APP, Cray Apprentice2, Cray C++ Compiling System, Cray C90, Cray C90D, Cray CF90, Cray EL, Cray Fortran Compiler, Cray J90, Cray J90se, Cray J916, Cray J932, Cray MTA, Cray MTA-2, Cray MTX, Cray NQS, Cray Research, Cray SeaStar, Cray S-MP,
    [Show full text]
  • Jaguar and Kraken -The World's Most Powerful Computer Systems
    Jaguar and Kraken -The World's Most Powerful Computer Systems Arthur Bland Cray Users’ Group 2010 Meeting Edinburgh, UK May 25, 2010 Abstract & Outline At the SC'09 conference in November 2009, Jaguar and Kraken, both located at ORNL, were crowned as the world's fastest computers (#1 & #3) by the web site www.Top500.org. In this paper, we will describe the systems, present results from a number of benchmarks and applications, and talk about future computing in the Oak Ridge Leadership Computing Facility. • Cray computer systems at ORNL • System Architecture • Awards and Results • Science Results • Exascale Roadmap 2 CUG2010 – Arthur Bland Jaguar PF: World’s most powerful computer— Designed for science from the ground up Peak performance 2.332 PF System memory 300 TB Disk space 10 PB Disk bandwidth 240+ GB/s Based on the Sandia & Cray Compute Nodes 18,688 designed Red Storm System AMD “Istanbul” Sockets 37,376 Size 4,600 feet2 Cabinets 200 3 CUG2010 – Arthur Bland (8 rows of 25 cabinets) Peak performance 1.03 petaflops Kraken System memory 129 TB World’s most powerful Disk space 3.3 PB academic computer Disk bandwidth 30 GB/s Compute Nodes 8,256 AMD “Istanbul” Sockets 16,512 Size 2,100 feet2 Cabinets 88 (4 rows of 22) 4 CUG2010 – Arthur Bland Climate Modeling Research System Part of a research collaboration in climate science between ORNL and NOAA (National Oceanographic and Atmospheric Administration) • Phased System Delivery • Total System Memory – CMRS.1 (June 2010) 260 TF – 248 TB DDR3-1333 – CMRS.2 (June 2011) 720 TF • File Systems
    [Show full text]
  • Cray XT3/XT4 Software
    Cray XT3/XT4 Software: Status and Plans David Wallace Cray Inc ABSTRACT: : This presentation will discuss the current status of software and development plans for the CRAY XT3 and XT4 systems. A review of major milestones and accomplishments over the past year will be presented. KEYWORDS: ‘Cray XT3’, ‘Cray XT4’, software, CNL, ‘compute node OS’, Catamount/Qk processors, support for larger system configurations (five 1. Introduction rows and larger), a new version of SuSE Linux and CFS Lustre as well as a host of other new features and fixes to The theme for CUG 2007 is "New Frontiers ," which software deficiencies. Much has been done in terms of reflects upon how the many improvements in High new software releases and adding support for new Performance Computing have significantly facilitated hardware. advances in Technology and Engineering. UNICOS/lc is 2.1 UNICOS/lc Releases moving to new frontiers as well. The first section of this paper will provide a perspective of the major software The last twelve months have been very busy with milestones and accomplishments over the past twelve respect to software releases. Two major releases months. The second section will discuss the current status (UNICOS/lc 1.4 and 1.5) and almost weekly updates have of the software with respect to development, releases and provided a steady stream of new features, enhancements support. The paper will conclude with a view of future and fixes to our customers. UNICOS/lc 1.4 was released software plans. for general availability in June 2006. A total of fourteen minor releases have been released since last June.
    [Show full text]
  • The Gemini Network
    The Gemini Network Rev 1.1 Cray Inc. © 2010 Cray Inc. All Rights Reserved. Unpublished Proprietary Information. This unpublished work is protected by trade secret, copyright and other laws. Except as permitted by contract or express written permission of Cray Inc., no part of this work or its content may be used, reproduced or disclosed in any form. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Autotasking, Cray, Cray Channels, Cray Y-MP, UNICOS and UNICOS/mk are federally registered trademarks and Active Manager, CCI, CCMT, CF77, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Ada, Cray Animation Theater, Cray APP, Cray Apprentice2, Cray C90, Cray C90D, Cray C++ Compiling System, Cray CF90, Cray EL, Cray Fortran Compiler, Cray J90, Cray J90se, Cray J916, Cray J932, Cray MTA, Cray MTA-2, Cray MTX, Cray NQS, Cray Research, Cray SeaStar, Cray SeaStar2, Cray SeaStar2+, Cray SHMEM, Cray S-MP, Cray SSD-T90, Cray SuperCluster, Cray SV1, Cray SV1ex, Cray SX-5, Cray SX-6, Cray T90, Cray T916, Cray T932, Cray T3D, Cray T3D MC, Cray T3D MCA, Cray T3D SC, Cray T3E, Cray Threadstorm, Cray UNICOS, Cray X1, Cray X1E, Cray X2, Cray XD1, Cray X-MP, Cray XMS, Cray XMT, Cray XR1, Cray XT, Cray XT3, Cray XT4, Cray XT5, Cray XT5h, Cray Y-MP EL, Cray-1, Cray-2, Cray-3, CrayDoc, CrayLink, Cray-MP, CrayPacs, CrayPat, CrayPort, Cray/REELlibrarian, CraySoft, CrayTutor, CRInform, CRI/TurboKiva, CSIM, CVT, Delivering the power…, Dgauss, Docview, EMDS, GigaRing, HEXAR, HSX, IOS, ISP/Superlink, LibSci, MPP Apprentice, ND Series Network Disk Array, Network Queuing Environment, Network Queuing Tools, OLNET, RapidArray, RQS, SEGLDR, SMARTE, SSD, SUPERLINK, System Maintenance and Remote Testing Environment, Trusted UNICOS, TurboKiva, UNICOS MAX, UNICOS/lc, and UNICOS/mp are trademarks of Cray Inc.
    [Show full text]
  • Archana Subramanian Nikhat Farha
    Archana Subramanian Nikhat Farha Application with multiple threads running within a process. Ability of a program or operating system to manage its use by more than one user at a time. Manage multiple requests without having multiple copies of the same program. Processes are insulated from each other by the operating system. Error in one process cannot bring down another process. Operating system keeps track of the tasks and goes from one task to the next without loss of information. PROCESS THREAD Executable object in an container, Stream of instruction within a an application process. Processes do not share memory Threads share the same memory Message passing. Inter thread communication. Security Speed Hybrid design of Multi threading and Multi processing leads to efficient parallel capability of Hardware Statement “The number of transistors on a chip will roughly double each year.” “Computer performance will double every 18 months.” 1st generation: • 1971: Intel 4004 (2300 transistors) • 1974: Intel 8080 (4500 transistors) Instruction processing was strictly sequential. Instructions were fetched, decoded and executed strictly one at a time 2nd generation: • 1979: Motorola MC68000 (68000 transistors) Primitive pipelining with three stages: fetch, decode, execute; only one instruction is in execution at a certain time 3rd generation: • 1984: Motorola MC68020 (240000 transistors) Five stage pipeline; increased parallelism 4th generation: • 1990: MotorolaMC88110, Intel 80960 - these are RISCs (over 1 Million transistors) • PowerPC604, Pentium Superscalar architectures; parallel execution based on multiple pipelines and functional units 5th generation: • 1996: P6, PowerPC 620 (over 5 Million transistors) • MIPS R10000, AMD K5/K6, UltraSparc (not exactly out of order) Superscalars with out of order execution and sophisticated scheduling and renaming strategies VLIW VLIW (very large instruction word) - a compiler schedules the instructions statically (instead of dynamic scheduling as with superscalars).
    [Show full text]