CUG 09-Program-042109-Fix NERSC.Indd
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Workload Management and Application Placement for the Cray Linux Environment™
TMTM Workload Management and Application Placement for the Cray Linux Environment™ S–2496–4101 © 2010–2012 Cray Inc. All Rights Reserved. This document or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Cray Inc. U.S. GOVERNMENT RESTRICTED RIGHTS NOTICE The Computer Software is delivered as "Commercial Computer Software" as defined in DFARS 48 CFR 252.227-7014. All Computer Software and Computer Software Documentation acquired by or for the U.S. Government is provided with Restricted Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7014, as applicable. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Cray and Sonexion are federally registered trademarks and Active Manager, Cascade, Cray Apprentice2, Cray Apprentice2 Desktop, Cray C++ Compiling System, Cray CX, Cray CX1, Cray CX1-iWS, Cray CX1-LC, Cray CX1000, Cray CX1000-C, Cray CX1000-G, Cray CX1000-S, Cray CX1000-SC, Cray CX1000-SM, Cray CX1000-HN, Cray Fortran Compiler, Cray Linux Environment, Cray SHMEM, Cray X1, Cray X1E, Cray X2, Cray XD1, Cray XE, Cray XEm, Cray XE5, Cray XE5m, Cray XE6, Cray XE6m, Cray XK6, Cray XK6m, Cray XMT, Cray XR1, Cray XT, Cray XTm, Cray XT3, Cray XT4, Cray XT5, Cray XT5h, Cray XT5m, Cray XT6, Cray XT6m, CrayDoc, CrayPort, CRInform, ECOphlex, LibSci, NodeKARE, RapidArray, The Way to Better Science, Threadstorm, uRiKA, UNICOS/lc, and YarcData are trademarks of Cray Inc. -
Cray XD1™ Supercomputer Release 1.3
CRAY XD1 DATASHEET Cray XD1™ Supercomputer Release 1.3 ■ Purpose-built for HPC — delivers exceptional application The Cray XD1 supercomputer combines breakthrough performance interconnect, management and reconfigurable computing ■ Affordable power — designed for a broad range of HPC technologies to meet users’ demands for exceptional workloads and budgets performance, reliability and usability. Designed to meet ■ Linux, 32 and 64-bit x86 compatible — runs a wide variety of ISV the requirements of highly demanding HPC applications applications and open source codes in fields ranging from product design to weather prediction to ■ Simplified system administration — automates configuration and scientific research, the Cray XD1 system is an indispensable management functions tool for engineers and scientists to simulate and analyze ■ Highly reliable — monitors and maintains system health faster, solve more complex problems, and bring solutions ■ Scalable to hundreds of compute nodes — high bandwidth and to market sooner. low latency let applications scale Direct Connected Processor Architecture Cray XD1 System Highlights The Cray XD1 system is based on the Direct Connected Processor (DCP) architecture, harnessing many processors into a single, unified system to deliver new levels of application � � � ���� � � � Compute� � � Processors performance. ���� � Cray’s implementation of the DCP architecture optimizes message-passing applications by 12 AMD Opteron™ 64-bit single directly linking processors to each other through a high performance interconnect fabric, or dual core processors run eliminating shared memory contention and PCI bus bottlenecks. Linux and are organized as six nodes of 2 or 4-way SMPs to deliver up to 106 GFLOPS* per chassis. Matching memory and I/O performance removes bottlenecks and maximizes processor performance. -
Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams
MERRIMAC – HIGH-PERFORMANCE AND HIGHLY-EFFICIENT SCIENTIFIC COMPUTING WITH STREAMS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Mattan Erez May 2007 c Copyright by Mattan Erez 2007 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (William J. Dally) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Patrick M. Hanrahan) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Mendel Rosenblum) Approved for the University Committee on Graduate Studies. iii iv Abstract Advances in VLSI technology have made the raw ingredients for computation plentiful. Large numbers of fast functional units and large amounts of memory and bandwidth can be made efficient in terms of chip area, cost, and energy, however, high-performance com- puters realize only a small fraction of VLSI’s potential. This dissertation describes the Merrimac streaming supercomputer architecture and system. Merrimac has an integrated view of the applications, software system, compiler, and architecture. We will show how this approach leads to over an order of magnitude gains in performance per unit cost, unit power, and unit floor-space for scientific applications when compared to common scien- tific computers designed around clusters of commodity general-purpose processors. -
Pubtex Output 2006.05.15:1001
Cray XD1™ Release Description Private S–2453–14 © 2006 Cray Inc. All Rights Reserved. Unpublished Private Information. This unpublished work is protected to trade secret, copyright and other laws. Except as permitted by contract or express written permission of Cray Inc., no part of this work or its content may be used, reproduced or disclosed in any form. U.S. GOVERNMENT RESTRICTED RIGHTS NOTICE The Computer Software is delivered as "Commercial Computer Software" as defined in DFARS 48 CFR 252.227-7014. All Computer Software and Computer Software Documentation acquired by or for the U.S. Government is provided with Restricted Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7014, as applicable. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Autotasking, Cray, Cray Channels, Cray Y-MP, GigaRing, LibSci, UNICOS and UNICOS/mk are federally registered trademarks and Active Manager, CCI, CCMT, CF77, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Ada, Cray Animation Theater, Cray APP, Cray Apprentice2, Cray C++ Compiling System, Cray C90, Cray C90D, Cray CF90, Cray EL, Cray Fortran Compiler, Cray J90, Cray J90se, Cray J916, Cray J932, Cray MTA, Cray MTA-2, Cray MTX, Cray NQS, Cray Research, Cray SeaStar, Cray S-MP, -
The Gemini Network
The Gemini Network Rev 1.1 Cray Inc. © 2010 Cray Inc. All Rights Reserved. Unpublished Proprietary Information. This unpublished work is protected by trade secret, copyright and other laws. Except as permitted by contract or express written permission of Cray Inc., no part of this work or its content may be used, reproduced or disclosed in any form. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Autotasking, Cray, Cray Channels, Cray Y-MP, UNICOS and UNICOS/mk are federally registered trademarks and Active Manager, CCI, CCMT, CF77, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Ada, Cray Animation Theater, Cray APP, Cray Apprentice2, Cray C90, Cray C90D, Cray C++ Compiling System, Cray CF90, Cray EL, Cray Fortran Compiler, Cray J90, Cray J90se, Cray J916, Cray J932, Cray MTA, Cray MTA-2, Cray MTX, Cray NQS, Cray Research, Cray SeaStar, Cray SeaStar2, Cray SeaStar2+, Cray SHMEM, Cray S-MP, Cray SSD-T90, Cray SuperCluster, Cray SV1, Cray SV1ex, Cray SX-5, Cray SX-6, Cray T90, Cray T916, Cray T932, Cray T3D, Cray T3D MC, Cray T3D MCA, Cray T3D SC, Cray T3E, Cray Threadstorm, Cray UNICOS, Cray X1, Cray X1E, Cray X2, Cray XD1, Cray X-MP, Cray XMS, Cray XMT, Cray XR1, Cray XT, Cray XT3, Cray XT4, Cray XT5, Cray XT5h, Cray Y-MP EL, Cray-1, Cray-2, Cray-3, CrayDoc, CrayLink, Cray-MP, CrayPacs, CrayPat, CrayPort, Cray/REELlibrarian, CraySoft, CrayTutor, CRInform, CRI/TurboKiva, CSIM, CVT, Delivering the power…, Dgauss, Docview, EMDS, GigaRing, HEXAR, HSX, IOS, ISP/Superlink, LibSci, MPP Apprentice, ND Series Network Disk Array, Network Queuing Environment, Network Queuing Tools, OLNET, RapidArray, RQS, SEGLDR, SMARTE, SSD, SUPERLINK, System Maintenance and Remote Testing Environment, Trusted UNICOS, TurboKiva, UNICOS MAX, UNICOS/lc, and UNICOS/mp are trademarks of Cray Inc. -
Pubtex Output 2011.12.12:1229
TM Using Cray Performance Analysis Tools S–2376–53 © 2006–2011 Cray Inc. All Rights Reserved. This document or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Cray Inc. U.S. GOVERNMENT RESTRICTED RIGHTS NOTICE The Computer Software is delivered as "Commercial Computer Software" as defined in DFARS 48 CFR 252.227-7014. All Computer Software and Computer Software Documentation acquired by or for the U.S. Government is provided with Restricted Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7014, as applicable. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Cray, LibSci, and PathScale are federally registered trademarks and Active Manager, Cray Apprentice2, Cray Apprentice2 Desktop, Cray C++ Compiling System, Cray CX, Cray CX1, Cray CX1-iWS, Cray CX1-LC, Cray CX1000, Cray CX1000-C, Cray CX1000-G, Cray CX1000-S, Cray CX1000-SC, Cray CX1000-SM, Cray CX1000-HN, Cray Fortran Compiler, Cray Linux Environment, Cray SHMEM, Cray X1, Cray X1E, Cray X2, Cray XD1, Cray XE, Cray XEm, Cray XE5, Cray XE5m, Cray XE6, Cray XE6m, Cray XK6, Cray XMT, Cray XR1, Cray XT, Cray XTm, Cray XT3, Cray XT4, Cray XT5, Cray XT5h, Cray XT5m, Cray XT6, Cray XT6m, CrayDoc, CrayPort, CRInform, ECOphlex, Gemini, Libsci, NodeKARE, RapidArray, SeaStar, SeaStar2, SeaStar2+, Sonexion, The Way to Better Science, Threadstorm, uRiKA, and UNICOS/lc are trademarks of Cray Inc. -
INNOVATION for HPC AMD Launches Opteron 6300 Series X86 & Announces 64-Bit ARM Strategy
HPC Advisory Council Switzerland Conference 2013 ROBERTO DOGNINI – HEAD OF COMMERCIAL SALES EMEA MARCH 2013 AGENDA AMD & HPC Opteron 6300 Is AMD out of the server game ? What’s changing.... and roadmap... Products Q&A 2 INNOVATION FOR HPC AMD Launches Opteron 6300 Series x86 & Announces 64-bit ARM Strategy AMD Launches Worlds First 16- Core x86 Server Processor 2012 AMD Opteron Cray ORNL “Titan” Powers First x86 2011 Ranks #1 on Top500 PetaFlop Supercomputer AMD Achieves 2010 Sixth # 1 Spot in Last AMD Launches First x86 Dual- Five Years AMD Launches Core Opteron 2009 Opteron processor processor • First 64-bit x86 • Direct-Connect 2008 24 of the 50 Fastest 2007 Supercomputers on Top500 using 2006 AMD Opteron™ 2005 Top500 # 1 & 2 AMD Technology 2004 processors Power 2003 IBM “Roadrunner” Fastest #1 Cray “Jaguar” #2 Supercomputers in 11 Countries Cray/Sandia Cray Introduces “Red Storm” Rank #2 Cray “Jaguar” Reaches Cray XD1 based on Top500 with on AMD #1 on Top500 & AMD AMD Opteron™ Technology Powers 4 of Opteron™ processors The Top 5 Systems processor 3 3 TODAY: NEW AMD OPTERON™ 6300 SERIES PROCESSORS OPTIMIZED FOR HPC TCO AMD Opteron™ 6300 Series Processors Scalable Performance Energy Efficient Cost Effective . Scalability under heavy load – . Up to 40% higher performance . Low acquisition costs maintain SLAs at peak times per watt than previous generation . The right performance at the . Record-breaking Java right price performance1 . Flexible power management 4 AMD OPTERON™ BULLDOZER MODULE Dedicated Shared at the Shared at the Components module level chip level Fetch Decode Int Int FP Scheduler Scheduler Scheduler Core 1 Core 2 bit bit - - FMAC FMAC 128 128 Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline L1 DCache L1 DCache Shared L2 Cache Shared L3 Cache and NB 5 COMPETITIVE PERFORMANCE – 2P SPECFP Perf Per $ 2 x Abu Dhabi 6380 $0.19 2 x SandyBridge E5-2670 $0.16 2 x Abu Dhabi 6386 SE $0.16 2 x SandyBridge E5-2690 $0.12 4 socket... -
IUCAA CRAY CX1 Supercomputer Users Guide (V2.1)
IUCAA CRAY CX1 supercomputer Users Guide (v2.1) 1 Introduction Cray CX1 at the Inter-University Centre for Astronomy & Astrophysics (IUCAA) Pune is a desk- top/deskside supercomputer with 1 head node, 1 GPU node and 4 compute nodes. Each node has two 2.67 Ghz Intel Xeon(5650) processors, with every processor having six cores. In total there are 72 cores on the systems. Since there are twelve cores on every node, one can use twelve OpenMP hardware threads. However, the system supports more than 12 OpenMP threads on all the nodes (apart from the head node). Every node of the system has 24 GB RAM and in total the system has 144 GB RAM. At present there is around 5 TB disk space available for users on three dierent partitions, named /home, /data1/ and /data2. Keeping in mind that this is a small system and not more than ten users will be using it, every user can get 100 GB space in the home area. Data of common use (like WMAP and Planck data) is kept in a common area /data1/cmb_data and have the right permissions so that users can use it. The Cray CX1 is equipped with a 16 port RJ-45 10Base T, 100Base T, 1000Base T Gigabit Ethernet Switch, with the data transfer capability of 1 Gbps, supporting Ethernet, Fast Ethernet, and Gigabit Ethernet Data Link Protocol. The list of nodes is give in the Table (1). There are many software, packages and libraries already installed in the system and more (open 1 S. -
Taking the Lead in HPC
Taking the Lead in HPC Cray X1 Cray XD1 Cray Update SuperComputing 2004 Safe Harbor Statement The statements set forth in this presentation include forward-looking statements that involve risk and uncertainties. The Company wished to caution that a number of factors could cause actual results to differ materially from those in the forward-looking statements. These and other factors which could cause actual results to differ materially from those in the forward-looking statements are discussed in the Company’s filings with the Securities and Exchange Commission. SuperComputing 2004 Copyright Cray Inc. 2 Agenda • Introduction & Overview – Jim Rottsolk • Sales Strategy – Peter Ungaro • Purpose Built Systems - Ly Pham • Future Directions – Burton Smith • Closing Comments – Jim Rottsolk A Long and Proud History NASDAQ: CRAY Headquarters: Seattle, WA Marketplace: High Performance Computing Presence: Systems in over 30 countries Employees: Over 800 BuildingBuilding Systems Systems Engineered Engineered for for Performance Performance Seymour Cray Founded Cray Research 1972 The Father of Supercomputing Cray T3E System (1996) Cray X1 Cray-1 System (1976) Cray-X-MP System (1982) Cray Y-MP System (1988) World’s Most Successful MPP 8 of top 10 Fastest First Supercomputer First Multiprocessor Supercomputer First GFLOP Computer First TFLOP Computer Computer in the World SuperComputing 2004 Copyright Cray Inc. 4 2002 to 2003 – Successful Introduction of X1 Market Share growth exceeded all expectations • 2001 Market:2001 $800M • 2002 Market: about $1B -
Blackwidow System Update Blackwidow System Update Performance Update Programming Environment Scalar & Vector Computing Roadmap
BlackWidow System Update BlackWidow System Update Performance Update Programming Environment Scalar & Vector Computing Roadmap Slide 2 The Cray Roadmap Baker Granite Realizing Our Adaptive 2 0 1 0 Supercomputing Vision Cray XT4+ + Upgrade 20 09 Cray XT4 Phase II: Upgrade Cascade Program Adaptive BlackWidow Cray XMT Hybrid System 20 Cray XT4 08 Cray XT3 Phase I: Rainier Program Multiple Processor Types with 200 7 Integrated Infrastructure and Cray XD1 User Environment Cray X1E 2006 Phase 0: Individually Architected Machines Unique Products Serving Individual Market Needs Slide 3 BlackWidow Summary Project name for Cray’s next generation vector system Follow-on to Cray X1 and Cray X1E vector systems Special focus on improving scalar performance Significantly improved price-performance over earlier systems Closely integrated with Cray XT infrastructure Product will be launched and shipments will begin towards the end of this year “Vector“Vector MPP”MPP” System System •• HighlyHighly scalablescalable –– to to thousandsthousands ofof CPUsCPUs •• HighHigh networknetwork bandwidthbandwidth •• MinimalMinimal OSOS “jitter”“jitter” on on computecompute bladesblades Slide 4 BlackWidow System Overview Tightly integrated with Cray XT3 or Cray XT4 system Cray XT SIO nodes provide I/O and login support for BlackWidow Users can easily run on either or both vector and scalar compute blades BlackWidow cabinets attached to Cray XT4 system (artist’s conception) Slide 5 Compute Cabinet Blower Rank1 router modules Compute • Up to 4 Chassis and -
Breakthrough Science Via Extreme Scalability
Breakthrough Science via Extreme Scalability Greg Clifford Segment Manager, Cray Inc. [email protected] • Cray’s focus • The requirement for highly scalable systems • Cray XE6 technology • The path to Exascale computing 10/6/2010 Oklahoma Supercomputing Symposium 2010 2 • 35 year legacy focused on building the worlds fastest computer. • 850 employees world wide • Growing in a tough economy • Cray XT6 first computer to deliver a PetaFLOP/s in a production environment (Jaguar system at Oakridge) • A full range of products • From the Cray CX1 to the Cray XE6 • Options includes: unsurpassed scalability, GPUs, SMP to 128 cores & 2 Tbytes, AMD and Intel, InfiniBand and Gemini, high performance IO, … 10/6/2010 Oklahoma Supercomputing Symposium 2010 3 Designed for “mission critical” HPC environments: “when you can not afford to be wrong” Sustainable performance on production applications Reliability Complete HPC environment. Focus on productivity Cray Linux Environment, Compilers, libraries, etc Partner with industry leaders (e.g. PGI, Platform, etc) Compatible with Open Source World Unsurpassed scalability/performance (compute, I/O and software) Proprietary system interconnect (Cray Gemini router) Performance on “grand challenge” applications 10/6/2010 Oklahoma Supercomputing Symposium 2010 4 Award(U. Tennessee/ORNL) Sep, 2007 Cray XT3: 7K cores, 40 TF Jun, 2008 Cray XT4: 18K cores,166 TF Aug 18, 2008 Cray XT5: 65K cores, 600 TF Feb 2, 2009 Cray XT5+: ~100K cores, 1 PF Oct, 2009 Kraken and Krakettes! NICS is specializing on true capability -
Sumit Gupta NVIDIA [email protected] Ian Miller Cray
Ian Miller Sumit Gupta Cray NVIDIA [email protected] [email protected] The Ease-of-Everything solution to ease the transition to HPC, increase engineering efficiency, and improve competitiveness Ease of configuration and purchase Ease of installation and deployment Ease of maintenance Pre-installed and tested combinations of industry-leading standard hardware, OS, and HPC management tools Upgradeable over time Complemented by Cray’s renowned quality of service and support No need for a dedicated computer room Compact deskside design Use of standard office power outlet (20A/110V or 16A/240V) Active noise reduction NR45 compliant Minimal power and cooling requirements Ability to mix and match compute, visualization, and storage blades according to a customer’s HPC needs Up to 8 blades per chassis (ability to combine up to 3 chassis) 2 sockets per blade (16 sockets per chassis) Up to 16 Intel Xeon 5500 quad-core processors per chassis Up to 64 cores per chassis 8 compute blades x 2 sockets x quad core Xeon processors = 64 cores Up to 48 GB of memory per blade, or 384 GB per chassis When 8 GB DIMMS are available, max of 96GB per blade and 768GB per chassis 1.7 TB of storage per storage blade, 6.8 TB per chassis Built-in Gigabit Ethernet Interconnect, optional InfiniBand GPU Computing Technology – Tesla with CUDA Up to 4 C1060 internal units per chassis Up to 4 S1070 external units per chassis CC55 (Dual Xeon) CV55-01 CT55-01 CS55-04 GPU Computing Compute Node Visualization Node Storage Node Node 1 Slot 2 Slots 2 Slots