Accelerators in Cray's Adaptive Supercomputing

Total Page:16

File Type:pdf, Size:1020Kb

Accelerators in Cray's Adaptive Supercomputing Accelerators in Cray's Adaptive Supercomputing NCSA’s Reconfigurable Systems Summer Institute Dave Strenski Application Analyst [email protected] What does a petaflop look like? Theoretical Performance with 10 Gflop/s Nodes 1000 900 800 700 600 Tflop/s 500 400 300 200 1.00000 0.99999 100 0.99990 0.99900 % Parallel 0 0.99000 1 10 0.90000 100 1000 10000 0.00000 Number of Nodes 100000 May 16, 2007 Copyright 2007 – Cray Inc. 2 Supercomputing is increasingly about managing scalability 16,316 • Exponential increase with advent of multi-core chips • Systems with more than 100,000 10,073 processing cores • 80+ core processor expected within the decade 3,518 2,827 3,093 2,230 1,644 1,847 1,245 1,073 722 808 202 408 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Average Number of Processors Per Supercomputer (Top 20) Source: www.top500.org May 16, 2007 Copyright 2007 – Cray Inc. 3 Opportunities to Exploit Heterogeneity Applications vary considerably in their demands Any HPC application contains some form of parallelism • Many HPC apps have rich, SIMD-style data-level parallelism Can significantly accelerate via vectorization • Those that don’t generally have rich thread-level parallelism Can significantly accelerate via multithreading • Some parts of applications are not parallel at all Need fast serial scalar execution speed (Amdahl’s Law) Applications also vary in their communications needs • Required memory bandwidth and granularity Some work well out of cache, some don’t • Required network bandwidth and granularity Some ok with message passing , some need shared memory No one processor/system design is best for all apps May 16, 2007 Copyright 2007 – Cray Inc. 4 Increasingly Complex Application Requirements Earth Sciences Example Evolution of Computational Climate Simulation Complexity International Intergovernmental Panel on Climate Change, 2004, as updated by Washington, NCAR, 2005 NASA Report: Earth Sciences Vision 2030 Increased complexity and number of “As scientific computing migrated toward commodity components lends itself well to a “As scientific computing migrated toward commodity platforms,platforms, interconnect interconnect technology, technology, both both in in terms terms o of f variety of processing technologies. bandwidthbandwidth and and latency, latency, became became the the limiting limiting factor factor o onn applicationapplication performance performance and and continues continues to to be be a a performanceperformance bottleneck.” bottleneck.” Similar trends in astrophysics, - ComputerWorld article – 2/6/06 - ComputerWorld article – 2/6/06 - James Hack (NCAR) nuclear engineering, CAE, etc. - James Hack (NCAR) May 16, 2007 Copyright 2007 – Cray Inc. 5 So, Can We Just Pack Chips with Flops? Key is making the system easily programmable Must balance peak computational power with generality • How easy is it to map high level code onto the machine? • How easy is it for computation units to access global data? Some examples: • XD1 FPGAs • Clearspeed CSX600 • IBM Cell Flop efficiency vs. generality/programmability spectrum: • Qualitative only; also influenced by memory system General More GP-GPU Clearspeed BG/L Multi-core Purpose More area/power programmable efficient FPGAs Cell Streaming Vectors µµµprocs May 16, 2007 Copyright 2007 – Cray Inc. 6 Cray XD1 FPGA Accelerators Performance gains from FPGA: RC5 Cipher Breaking • Implemented on Xilinx Virtex II • 1000x faster than 2.4 GHz P4 Elliptic Curve Cryptography • Implemented on Xilinx Virtex II • 895-1300x faster than 1 GHz P3 Vehicular Traffic Simulation • Implemented on Xilinx Virtex II (XC2V6000) and Virtex II Pro (XC2VP100) • 300x faster on XC2V6000 than 1.7 GHz Xeon • 650x faster on XC2VP100 than 1.7 GHz Xeon Smith Waterman DNA matching • 28x faster than 2.4 GHz Opteron Primary challenge is programming No general-purpose compiler available May 16, 2007 Copyright 2007 – Cray Inc. 7 Peak GFLOP/s per processor Opteron Virtex4 Virtex5 Dual-core LX200 LX330 2.5 GHz 185 MHz 237 MHz mult/add 10 15.9 28.0 mult 5 12.0 19.9 add 5 23.9 55.3 www.fpgajournal.com/articles_2006/pdf/20061114_cray.pdf www.hpcwire.com/hpc/1195762.html May 16, 2007 Copyright 2007 – Cray Inc. 8 Clearspeed CSX600 • 50 Gflops on card • 6 GB/s to on-card local memory (4GB) • 2+ GB/s to local host memory • Doesn’t share memory with host • Mostly used for accelerating libraries • No general-purpose compiler available May 16, 2007 Copyright 2007 – Cray Inc. 9 Cell Processor Each chip contains: • One PowerPC • Eight “synergistic processing elements” Targeted for: • (1) Playstations, (2) HDTVs, (3) computing Lots of flops • 250 Gflops (32 bit) • ~25 Glfops (64 bit) 25 GB/s to < 1GB memory Big challenge is programming • SPE’s have no virtual memory • Can only access data in local 256 KB buffers • Requires alignment for good performance No general-purpose compiler available May 16, 2007 Copyright 2007 – Cray Inc. 10 Adaptive Supercomputing Combines multiple processing architectures into a single, • Transparent Interface scalable system • Libraries • Tools • Compilers • Scalar X86/64 • Scheduling • Vector • System Management • Multithreaded • Runtime • HW Accelerators • Interconnect • File Systems • Storage • Packaging Adapt the system to the application – not the application to the system 11 May 16, 2007 Copyright 2007 – Cray Inc. 11 Step 1 to Adaptive Supercomputing: Rainier Program -- Cray XT Infrastructure C = Compute Cray’s Rainier generation of products use a C S = Service S common infrastructure: A = Accelerator A • Opteron-based service & I/O (SIO) blades C C • Cray SeaStar interconnect C C S • Single global file system A • Single point of login C • Single point of administration Delivered with one or more types of compute resources • Cray XT4 compute blades (scalar) • Cray XMT compute blades (multithreading) • “BlackWidow” compute cabinets (vector) • Hardware Accelerators TheThe Cray Cray XT XT Infrastructure Infrastructure allows allows customerscustomers to to “mix “mix-and-match”-and -match” compute compute resources resources May 16, 2007 Copyright 2007 – Cray Inc. 12 DARPA HPCS Program Focused on providing a new generation of economically viable high productivity computing systems for the national security and industrial user community in the 2010 timeframe Performance (time-to-solution) : speed up critical applications by factors of 10 to 40 Programmability (idea-to-first solution) : "High productivity reduce cost and time for developing computing is a key application solutions technology enabler for meeting our national security and economic Portability: insulate application software competitiveness from system specifics requirements. ””” - Dr. William Harrod, Robustness: protect applications from DARPA HPCS Program hardware faults and system software errors May 16, 2007 Copyright 2007 – Cray Inc. 13 CRAY SIGNS $250 MILLION AGREEMENT WITH DARPA TO DEVELOP BREAKTHROUGH ADAPTIVE SUPERCOMPUTER SEATTLE, WA, November 21, 2006 -- Global supercomputer leader Cray Inc. announced today that it has been awarded a $250 million agreement from the U.S. Defense Advanced Research Projects Agency (DARPA). Under this agreement, Cray will develop a revolutionary new supercomputer based on the company's Adaptive Supercomputing vision, a phased approach to hybrid computing that integrates a range of processing technologies into a single scalable platform. […] May 16, 2007 Copyright 2007 – Cray Inc. Slide 14 Motivation for Cascade Why are HPC machines unproductive? Difficult to write parallel code (e.g.: MPI) • Major burden for computational scientists Lack of programming tools to understand program behavior • Conventional models break with scale and complexity Time spent trying to modify code to fit machine’s characteristics • For example, cluster machines have relatively low bandwidth between processors, and can’t directly access global memory… • As a result, programmers try hard to reduce communication, and have to bundle communication up in messages instead of simply accessing shared memory If the machine doesn’t match your code’s attributes, it makes the programming job much more difficult. And code’s vary significantly in their requirements… May 16, 2007 Copyright 2007 – Cray Inc. 15 Cascade Approach Design an adaptive, configurable machine that can match the attributes of a wide variety of applications: • Serial (single thread, latency-driven) performance • SIMD data level parallelism (vectorizable) • Fine grained MIMD parallelism (threadable) • Regular and sparse bandwidth of varying intensities ⇒ Increases performance ⇒ Significantly eases programming ⇒ Makes the machine much more broadly applicable Ease the development of parallel codes • Legacy programming models: MPI, OpenMP • Improved variants: SHMEM, UPC and CoArray Fortran (CAF) • New alternative: Global View (Chapel) Provide programming tools to ease debugging and tuning at scale • Automatic performance analysis; comparative debugging May 16, 2007 Copyright 2007 – Cray Inc. 16 Integrated Multi-Architecture System Basic architecture separates system Specialized Processor Compute Nodes services from computational tasks • Purpose-built service and compute nodes • Sets the infrastructure for hybrid computing Service nodes provide command and control for variety of compute nodes • Assignment and management of the applications • I/O functions available to all nodes • Consistent control layer for the underlying hardware • Full system administration Service Nodes Flexible Execution Environment
Recommended publications
  • New CSC Computing Resources
    New CSC computing resources Atte Sillanpää, Nino Runeberg CSC – IT Center for Science Ltd. Outline CSC at a glance New Kajaani Data Centre Finland’s new supercomputers – Sisu (Cray XC30) – Taito (HP cluster) CSC resources available for researchers CSC presentation 2 CSC’s Services Funet Services Computing Services Universities Application Services Polytechnics Ministries Data Services for Science and Culture Public sector Information Research centers Management Services Companies FUNET FUNET and Data services – Connections to all higher education institutions in Finland and for 37 state research institutes and other organizations – Network Services and Light paths – Network Security – Funet CERT – eduroam – wireless network roaming – Haka-identity Management – Campus Support – The NORDUnet network Data services – Digital Preservation and Data for Research Data for Research (TTA), National Digital Library (KDK) International collaboration via EU projects (EUDAT, APARSEN, ODE, SIM4RDM) – Database and information services Paituli: GIS service Nic.funet.fi – freely distributable files with FTP since 1990 CSC Stream Database administration services – Memory organizations (Finnish university and polytechnics libraries, Finnish National Audiovisual Archive, Finnish National Archives, Finnish National Gallery) 4 Current HPC System Environment Name Louhi Vuori Type Cray XT4/5 HP Cluster DOB 2007 2010 Nodes 1864 304 CPU Cores 10864 3648 Performance ~110 TFlop/s 34 TF Total memory ~11 TB 5 TB Interconnect Cray QDR IB SeaStar Fat tree 3D Torus CSC
    [Show full text]
  • Cray XT and Cray XE Y Y System Overview
    Crayyy XT and Cray XE System Overview Customer Documentation and Training Overview Topics • System Overview – Cabinets, Chassis, and Blades – Compute and Service Nodes – Components of a Node Opteron Processor SeaStar ASIC • Portals API Design Gemini ASIC • System Networks • Interconnection Topologies 10/18/2010 Cray Private 2 Cray XT System 10/18/2010 Cray Private 3 System Overview Y Z GigE X 10 GigE GigE SMW Fibre Channels RAID Subsystem Compute node Login node Network node Boot /Syslog/Database nodes 10/18/2010 Cray Private I/O and Metadata nodes 4 Cabinet – The cabinet contains three chassis, a blower for cooling, a power distribution unit (PDU), a control system (CRMS), and the compute and service blades (modules) – All components of the system are air cooled A blower in the bottom of the cabinet cools the blades within the cabinet • Other rack-mounted devices within the cabinet have their own internal fans for cooling – The PDU is located behind the blower in the back of the cabinet 10/18/2010 Cray Private 5 Liquid Cooled Cabinets Heat exchanger Heat exchanger (XT5-HE LC only) (LC cabinets only) 48Vdc flexible Cage 2 buses Cage 2 Cage 1 Cage 1 Cage VRMs Cage 0 Cage 0 backplane assembly Cage ID controller Interconnect 01234567 Heat exchanger network cable Cage inlet (LC cabinets only) connection air temp sensor Airflow Heat exchanger (slot 3 rail) conditioner 48Vdc shelf 3 (XT5-HE LC only) 48Vdc shelf 2 L1 controller 48Vdc shelf 1 Blower speed controller (VFD) Blooewer PDU line filter XDP temperature XDP interface & humidity sensor
    [Show full text]
  • Pubtex Output 2006.05.15:1001
    Cray XD1™ Release Description Private S–2453–14 © 2006 Cray Inc. All Rights Reserved. Unpublished Private Information. This unpublished work is protected to trade secret, copyright and other laws. Except as permitted by contract or express written permission of Cray Inc., no part of this work or its content may be used, reproduced or disclosed in any form. U.S. GOVERNMENT RESTRICTED RIGHTS NOTICE The Computer Software is delivered as "Commercial Computer Software" as defined in DFARS 48 CFR 252.227-7014. All Computer Software and Computer Software Documentation acquired by or for the U.S. Government is provided with Restricted Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7014, as applicable. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Autotasking, Cray, Cray Channels, Cray Y-MP, GigaRing, LibSci, UNICOS and UNICOS/mk are federally registered trademarks and Active Manager, CCI, CCMT, CF77, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Ada, Cray Animation Theater, Cray APP, Cray Apprentice2, Cray C++ Compiling System, Cray C90, Cray C90D, Cray CF90, Cray EL, Cray Fortran Compiler, Cray J90, Cray J90se, Cray J916, Cray J932, Cray MTA, Cray MTA-2, Cray MTX, Cray NQS, Cray Research, Cray SeaStar, Cray S-MP,
    [Show full text]
  • Jaguar and Kraken -The World's Most Powerful Computer Systems
    Jaguar and Kraken -The World's Most Powerful Computer Systems Arthur Bland Cray Users’ Group 2010 Meeting Edinburgh, UK May 25, 2010 Abstract & Outline At the SC'09 conference in November 2009, Jaguar and Kraken, both located at ORNL, were crowned as the world's fastest computers (#1 & #3) by the web site www.Top500.org. In this paper, we will describe the systems, present results from a number of benchmarks and applications, and talk about future computing in the Oak Ridge Leadership Computing Facility. • Cray computer systems at ORNL • System Architecture • Awards and Results • Science Results • Exascale Roadmap 2 CUG2010 – Arthur Bland Jaguar PF: World’s most powerful computer— Designed for science from the ground up Peak performance 2.332 PF System memory 300 TB Disk space 10 PB Disk bandwidth 240+ GB/s Based on the Sandia & Cray Compute Nodes 18,688 designed Red Storm System AMD “Istanbul” Sockets 37,376 Size 4,600 feet2 Cabinets 200 3 CUG2010 – Arthur Bland (8 rows of 25 cabinets) Peak performance 1.03 petaflops Kraken System memory 129 TB World’s most powerful Disk space 3.3 PB academic computer Disk bandwidth 30 GB/s Compute Nodes 8,256 AMD “Istanbul” Sockets 16,512 Size 2,100 feet2 Cabinets 88 (4 rows of 22) 4 CUG2010 – Arthur Bland Climate Modeling Research System Part of a research collaboration in climate science between ORNL and NOAA (National Oceanographic and Atmospheric Administration) • Phased System Delivery • Total System Memory – CMRS.1 (June 2010) 260 TF – 248 TB DDR3-1333 – CMRS.2 (June 2011) 720 TF • File Systems
    [Show full text]
  • Cray XT3/XT4 Software
    Cray XT3/XT4 Software: Status and Plans David Wallace Cray Inc ABSTRACT: : This presentation will discuss the current status of software and development plans for the CRAY XT3 and XT4 systems. A review of major milestones and accomplishments over the past year will be presented. KEYWORDS: ‘Cray XT3’, ‘Cray XT4’, software, CNL, ‘compute node OS’, Catamount/Qk processors, support for larger system configurations (five 1. Introduction rows and larger), a new version of SuSE Linux and CFS Lustre as well as a host of other new features and fixes to The theme for CUG 2007 is "New Frontiers ," which software deficiencies. Much has been done in terms of reflects upon how the many improvements in High new software releases and adding support for new Performance Computing have significantly facilitated hardware. advances in Technology and Engineering. UNICOS/lc is 2.1 UNICOS/lc Releases moving to new frontiers as well. The first section of this paper will provide a perspective of the major software The last twelve months have been very busy with milestones and accomplishments over the past twelve respect to software releases. Two major releases months. The second section will discuss the current status (UNICOS/lc 1.4 and 1.5) and almost weekly updates have of the software with respect to development, releases and provided a steady stream of new features, enhancements support. The paper will conclude with a view of future and fixes to our customers. UNICOS/lc 1.4 was released software plans. for general availability in June 2006. A total of fourteen minor releases have been released since last June.
    [Show full text]
  • The Gemini Network
    The Gemini Network Rev 1.1 Cray Inc. © 2010 Cray Inc. All Rights Reserved. Unpublished Proprietary Information. This unpublished work is protected by trade secret, copyright and other laws. Except as permitted by contract or express written permission of Cray Inc., no part of this work or its content may be used, reproduced or disclosed in any form. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Autotasking, Cray, Cray Channels, Cray Y-MP, UNICOS and UNICOS/mk are federally registered trademarks and Active Manager, CCI, CCMT, CF77, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Ada, Cray Animation Theater, Cray APP, Cray Apprentice2, Cray C90, Cray C90D, Cray C++ Compiling System, Cray CF90, Cray EL, Cray Fortran Compiler, Cray J90, Cray J90se, Cray J916, Cray J932, Cray MTA, Cray MTA-2, Cray MTX, Cray NQS, Cray Research, Cray SeaStar, Cray SeaStar2, Cray SeaStar2+, Cray SHMEM, Cray S-MP, Cray SSD-T90, Cray SuperCluster, Cray SV1, Cray SV1ex, Cray SX-5, Cray SX-6, Cray T90, Cray T916, Cray T932, Cray T3D, Cray T3D MC, Cray T3D MCA, Cray T3D SC, Cray T3E, Cray Threadstorm, Cray UNICOS, Cray X1, Cray X1E, Cray X2, Cray XD1, Cray X-MP, Cray XMS, Cray XMT, Cray XR1, Cray XT, Cray XT3, Cray XT4, Cray XT5, Cray XT5h, Cray Y-MP EL, Cray-1, Cray-2, Cray-3, CrayDoc, CrayLink, Cray-MP, CrayPacs, CrayPat, CrayPort, Cray/REELlibrarian, CraySoft, CrayTutor, CRInform, CRI/TurboKiva, CSIM, CVT, Delivering the power…, Dgauss, Docview, EMDS, GigaRing, HEXAR, HSX, IOS, ISP/Superlink, LibSci, MPP Apprentice, ND Series Network Disk Array, Network Queuing Environment, Network Queuing Tools, OLNET, RapidArray, RQS, SEGLDR, SMARTE, SSD, SUPERLINK, System Maintenance and Remote Testing Environment, Trusted UNICOS, TurboKiva, UNICOS MAX, UNICOS/lc, and UNICOS/mp are trademarks of Cray Inc.
    [Show full text]
  • Cray DVS: Data Virtualization Service
    Cray DVS: Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray DVS, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with the Unicos/lc 2.1 release. DVS is a configurable service that provides compute-node access to a variety of file systems across the Cray high-speed network. The flexibility of DVS makes it a useful solution for many common situations at XT sites, such as providing I/O to compute nodes from NFS file systems. A limited set of use cases will be supported in the initial release but additional features will be added in the future. KEYWORDS: DVS, file systems, network protocol used in the communication layer takes advantage 1. Introduction of the high-performance network. The Cray Data Virtualization Service (Cray DVS) is Most file systems are not capable of supporting a network service that provides compute nodes hundreds or thousands or tens of thousands file system transparent access to file systems mounted on service clients. With DVS, there may be thousands of clients that nodes. DVS projects file systems mounted on service connect to a small number of DVS servers; the underlying nodes to the compute nodes: the remote file system file system sees only the aggregated requests from the appears to be local to applications running on the DVS servers. compute nodes. This solves several common problems: • Access to multiple file systems in the data center 2. Architecture • Resource overhead of file system clients on the The Cray XT series of supercomputers contain compute nodes compute nodes and service nodes.
    [Show full text]
  • The Cray XT4 and Seastar 3-D Torus Interconnect
    The Cray XT4 and Seastar 3-D Torus Interconnect April 6, 2010 BYLINE Dennis Abts [email protected] Google Inc. Madison, WI USA [email protected] SYNONYMS Cray Red Storm, Cray XT3, Cray XT4, Cray XT5, Cray XT, Interconnection Networks DEFINITION The Cray XT4 system is a distributed memory multiprocessor combining an aggressive superscalar proces- sor (AMD64) with a bandwidth-rich 3-D torus interconnection network that scales up to 32K processing nodes. This chapter provides an overview of the Cray XT4 system architecture and a detailed discussion of its interconnection network. DISCUSSION The physical sciences are increasingly turning toward computational techniques as an alternative to the traditional “wet lab” or destructive testing environments for experimentation. In particular, computational sciences can be used to scale far beyond that of traditional experimental methodologies; opening the door to large-scale climatology and molecular dynamics, for example, which encompass enough detail to ac- curately model the dominant terms that characterize the physical phenomena being studied [2]. These large-scale applications require careful orchestration among cooperating processors to ply these computa- tional techniques effectively. The genesis of the Cray XT4 system was the collaborative design and deployment of the Sandia “Red Storm” computer which provided the computational power necessary to assure safeguards under the nu- clear Stockpile Stewardship Program which seeks to maintain and verify a nuclear weapons arsenal with- out the use of testing. It was later renamed the Cray XT3 and sold commercially in configurations varying from hundreds of processors, to 10s of thousands of processors. An improved processor, faster processor- network interface, along with further optimizations to the software stack and migrating to a lightweight Linux kernel prompted the introduction of the Cray XT4; however, the underlying system architecture and interconnection network remained unchanged.
    [Show full text]
  • A Multifrequency MAC Specially Designed for Wireless Sensor
    U.S. Department of Energy, Office of Science High Performance Computing Facility Operational Assessment, CY 2011 Oak Ridge Leadership Computing Facility February 2012 Prepared by Arthur S. Bland James J. Hack Ann E. Baker Ashley D. Barker Kathlyn J. Boudwin Doug Hudson Ricky A. Kendall Bronson Messer James H. Rogers Galen M. Shipman Jack C. Wells Julia C. White U.S. Department of Energy, Office of Science HIGH PERFORMANCE COMPUTING FACILITY OPERATIONAL ASSESSMENT, FY11 OAK RIDGE LEADERSHIP COMPUTING FACILITY Arthur S. Bland Ricky A. Kendall James J. Hack Bronson Messer Ann E. Baker James H. Rogers Ashley D. Barker Galen M. Shipman Kathlyn J. Boudwin Jack C. Wells Doug Hudson Julia C. White February 2012 Prepared by OAK RIDGE NATIONAL LABORATORY Oak Ridge, Tennessee 37831-6283 managed by UT-BATTELLE, LLC for the U.S. DEPARTMENT OF ENERGY under contract DE-AC05-00OR22725 This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
    [Show full text]
  • Introduction to the Oak Ridge Leadership Computing Facility for CSGF Fellows Bronson Messer
    Introduction to the Oak Ridge Leadership Computing Facility for CSGF Fellows Bronson Messer Acting Group Leader Scientific Computing Group National Center for Computational Sciences Theoretical Astrophysics Group Oak Ridge National Laboratory Department of Physics & Astronomy University of Tennessee Outline • The OLCF: history, organization, and what we do • The upgrade to Titan – Interlagos processors with GPUs – Gemini Interconnect – Software, etc. • The CSGF Director’s Discretionary Program • Questions and Discussion 2 ORNL has a long history in 2007 High Performance Computing IBM Blue Gene/P ORNL has had 20 systems 1996-2002 on the lists IBM Power 2/3/4 1992-1995 Intel Paragons 1985 Cray X-MP 1969 IBM 360/9 1954 2003-2005 ORACLE Cray X1/X1E 3 Today, we have the world’s most powerful computing facility Peak performance 2.33 PF/s #2 Memory 300 TB Disk bandwidth > 240 GB/s Square feet 5,000 Power 7 MW Dept. of Energy’s Jaguar most powerful computer Peak performance 1.03 PF/s #8 Memory 132 TB Disk bandwidth > 50 GB/s Square feet 2,300 National Science Kraken Power 3 MW Foundation’s most powerful computer Peak Performance 1.1 PF/s Memory 248 TB #32 Disk Bandwidth 104 GB/s Square feet 1,600 National Oceanic and Power 2.2 MW Atmospheric Administration’s NOAA Gaea most powerful computer 4 We have increased system performance by 1,000 times since 2004 Hardware scaled from single-core Scaling applications and system software is the biggest through dual-core to quad-core and challenge dual-socket , 12-core SMP nodes • NNSA and DoD have funded
    [Show full text]
  • Cray HPCS Response 10/17/2013
    Cray HPCS Response 10/17/2013 Cray Response to EEHPC Vendor Forum Slides presented on 12 September, 2013 Steven J. Martin Cray Inc. 1 10/17/2013 Copyright 2013 Cray Inc. Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about our financial guidance and expected operating results, our opportunities and future potential, our product development and new product introduction plans, our ability to expand and penetrate our addressable markets and other statements that are not historical facts. These statements are only predictions and actual results may materially vary from those projected. Please refer to Cray's documents filed with the SEC from time to time concerning factors that could affect the Company and these forward-looking statements. 2 10/17/2013 Copyright 2013 Cray Inc. Legal Disclaimer Information in this document is provided in connection with Cray Inc. products. No license, express or implied, to any intellectual property rights is granted by this document. Cray Inc. may make changes to specifications and product descriptions at any time, without notice. All products, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. Cray hardware and software products may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Cray uses codenames internally to identify products that are in development and not yet publically announced for release. Customers and other third parties are not authorized by Cray Inc.
    [Show full text]
  • Report Shares Just a Few Develop an Add-On for Long-Haul Trucks to Optimize Influential Success Stories of OLCF User Projects
    Twenty-five Years of Leadership Science at the Oak Ridge Leadership Computing Facility 1 Outreach manager – Katie Bethea Writers – Eric Gedenk, Jonathan Hines, Katie Jones, and Rachel Harken Designer – Jason Smith Editor – Wendy Hames and Deborah Counce Photography – ORNL Creative Services Stock images – iStockphoto™ Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory P.O. Box 2008, Oak Ridge, TN 37831-6161 Phone: 865-241-6536 Email: [email protected] Website: https://www.olcf.ornl.gov Facebook: @oakridgeleadershipcomputingfacility Twitter: @OLCFGOV The research detailed in this publication made use of the Oak Ridge Leadership Computing Facility, a US Department of Energy Office of Science User Facility located at DOE’s Oak Ridge National Laboratory. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov. 2 Contents Introduction 4 Breaking Down Biomass 10 Detailing Combustion 12 Revealing Supernova Secrets 14 Building a SmartTruck 16 Shaking up Earthquake Preparedness 18 Unearthing the Subsurface 20 Mapping the Nuclear Landscape 22 Tunneling Electrons 24 Resurrecting Earth’s Paleoclimate 26 Illuminating Dark Matter 28 Improving Everyday Products 30 Revealing the Quantum World of Materials 32 Acronyms 34 3 OLCF 25years 4 Twenty-five years ago, high- scientific knowledge. Today, simulation is considered on par with experiment and theory as an essential performance computing (HPC) in the standard of modern science. United States stood at a crossroads. Established computing architectures At the start of the 1990s, however, predicting the OLCF’s elevated HPC role would have seemed far- were approaching their limits in fetched.
    [Show full text]