Jaguar: the World's Most Powerful Computer System – an Update

Total Page:16

File Type:pdf, Size:1020Kb

Jaguar: the World's Most Powerful Computer System – an Update Jaguar: The World’s Most Powerful Computer System – An Update Arthur S. Bland, Wayne Joubert, Ricky A. Kendall, Douglas B. Kothe, James H. Rogers, Galen M. Shipman Oak Ridge National Laboratory ABSTRACT: At the SC'09 conference in November 2009, Jaguar was crowned as the world's fastest computer by the web site www.Top500.org. In this paper, we will describe Jaguar, present results from a number of benchmarks and applications, and discuss future computing in the Oak Ridge Leadership Computing Facility. This paper updates the CUG 2009 paper that described Jaguar with new information about the upgrade from 4-core to 6-core processors, and ORNL’s plans for exaflops computing by 2018. KEYWORDS: Jaguar, XT4, XT5, Lustre, Exascale, OLCF, Top500, Istanbul core system in 2006, an addition of a 65 TF 1. Introduction XT44 in 2006 that was combined with the XT3 In 2004, the National Center for in 2007 to make Jaguar XT4 a 119 TF system. Computational Sciences (NCCS) at the U.S. In 2008 the system was further upgraded to Department of Energy’s Oak Ridge National quad-core processors increasing performance to Laboratory (ORNL) partnered with Cray Inc., 263 TF with 62 terabytes (TB) of system Argonne National Laboratory, Pacific memory. This ten-fold increase in Jaguar’s Northwest National Laboratory, and others to computing power and memory from February propose a new national user facility for 2005 through April 2008 provided a productive, 1 leadership computing. Having won the scalable computing system for the development competition, the new Oak Ridge Leadership and execution of the most demanding science Computing Facility (OLCF) has delivered a applications. series of increasingly powerful computer On July 30, 2008 the OLCF took delivery of systems to the science community based on the first 16 of 200 cabinets of an XT5 system 2 Cray’s X1 and XT product lines. The series of providing an additional 1,375 TF, 300 TB of machines includes a 6.4 trillion floating-point system memory and over 10,000 TB of total operations per second (TF) Cray X1 in 2004, an disk space to the OLCF. The final cabinets were upgrade of that system to an 18.5 TF X1e in delivered on September 17, 2008. Twelve days 3 2005, a 26 TF single-core Cray XT3 in 2005, later on September 29th, this incredibly large and an upgrade of that system to a 54 TF dual- and complex system, known as Jaguar XT5, ran Cray User Group 2010 Proceedings 1 of 15 a full-system benchmark application that took phased approach to facility upgrades, system two and one-half hours to complete. The software development, and dramatic application system completed acceptance testing in scalability improvements over an acceptable December 2008 and was transitioned to time horizon. operations shortly thereafter. In late July 2009, a Having successfully transitioned from phased upgrade of the Jaguar XT5 system from sustained teraflop to sustained petaflop 2.3 GHz 4-core Barcelona AMD processors to computing, the OLCF has laid out a similar 2.6 GHz 6-core Istanbul AMD processors roadmap to transition to exaflop computing as began. This phased approach provided users illustrated in Figure 2. The expectation is that with a substantial fraction of the Jaguar XT5 systems on the way to Exascale will be platform throughout the upgrade period, composed of a relatively modest number of minimizing the impact to our users. Completion CPU cores optimized for single thread of this upgrade on November 7th increased performance coupled with a large number of performance of Jaguar XT5 to 2,332 TF with CPU cores optimized for multi-threaded improved memory bandwidth due to performance and low-power consumption. enhancements in the Istanbul’s memory Images in Figure 2 are placeholders and are controller. Today, Jaguar XT5 is running a meant to illustrate that systems could be quite broad range of time-critical applications of different at each stage of the roadmap. national importance in such fields as energy assurance, climate modelling, superconducting 2. Scaling Applications and Supporting materials, bio-energy, chemistry, combustion, Infrastructure astrophysics, and nuclear physics. Successful execution of our 5-year roadmap to sustained petascale computing required focused efforts in improving application scalability, facility upgrades, and scaling out our supporting infrastructure. In 2005 our initial installation of the Jaguar platform resulted in a 3,748 core XT3, subsequent Figure 1: OLCF Petascale Roadmap upgrades and additions have resulted in nearly a The successful deployment of the Jaguar 60 fold increase in the number of CPU cores in XT5 platform represents a landmark in the the current Jaguar system (224,256 cores). OLCF's roadmap to petascale computing. This Scaling applications and infrastructure over this roadmap, illustrated in Figure 1, was based on period, while challenging, has been critical to phased scaling steps of system upgrades and the success of the OLCF. new system acquisitions, allowing a concurrent Cray User Group 2010 Proceedings 2 of 15 SC2009. Using the NWChem application, a study6 of the CCSD(T) calculations of the binding energies of H 2O resulted in sustained performance of 1.4 Petaflops/sec on 224,196 CPU cores, a 2009 Gordon Bell Award finalist at SC2009. This result was particularly notable from a Figure 2: OLCF Exascale Roadmap system and software architecture perspective as NWChem’s communication 2.1. Improving Application Scalability model is based on the Global Arrays7 Recognizing that improving application framework rather than the dominant scalability would require a concerted effort communication model in large-scale HPC, between computational scientists with domain MPI8. expertise and the science teams and that a gap in 2.2. Scaling Supporting Infrastructure computational science expertise existed in many To scale Jaguar from a 26TF system in 2005 of these science teams, the OLCF created the to a 2.332 PF system in 2009 required both Scientific Computing group in which group facility upgrades to our 40,000 square foot members act as liaisons to science teams facility and scaling out of our supporting helping them improve application scalability, infrastructure. Power requirements necessitated performance, and scientific productivity. To facility upgrades from 8 MW in 2005 to 14 MW compliment these efforts, Cray established a in 2009. Power efficiency improvements such as Supercomputer Center of Excellence (COE) at transitioning from a traditional 208-volt power the OLCF in 2005 providing expertise in distribution system to a 480-volt power performance optimization and scalability on distribution system resulted in energy savings Cray platforms. The liaison model coupled with estimated at nearly $500,000 while dramatically the Cray COE has proven successful in decreasing the cost of our power distribution improving application performance and installation. Cooling infrastructure required scalability. As an example, beginning in dramatic improvements in efficiency and September 2009, the OLCF worked closely with capacity. Over this four-year period the a number of key applications teams to OLCF’s cooling capacity grew from 3,600 tons demonstrate application scalability on the to 6,600 tons. In order to efficiently cool the Jaguar XT5 platform. Work on a scalable Jaguar XT5 system, a new cooling system was method for ab-initio computation of free designed for the XT5 platform using R-134a energies in nanoscale systems using the locally refrigerant within the XT5 cabinet and chilled- self-consistent multiple scattering method 5 water heat exchangers to remove the heat. This (LSMS) resulted in sustained performance of system requires less than 0.2 Watts of power for 1.836 Petaflops/sec on 223,232 CPU cores, cooling for every 1 Watt of power used for claiming the 2009 Gordon Bell Award at Cray User Group 2010 Proceedings 3 of 15 computing, substantially better than the industry core Xeon processors, connected to 48 Data average ratio of 0.8 : 1. Direct Networks S2A990013 disk subsystems. The login cluster is composed of eight quad- In addition to facility enhancements, the socket servers with quad-core AMD Opteron OLCF required dramatic scalability and processors and 64 GB of memory per node capacity improvements in its supporting running SUSE Linux. infrastructure particularly in the areas of parallel I/O and archival storage. Scaling from 8 GB/sec 3.1. Node Architecture of peak I/O performance and 38.5 TB of The XT5 compute nodes are general- capacity on Jaguar XT3 in 2005 to over 240 purpose symmetric multi-processor, shared GB/sec and 10 PB of capacity on Jaguar XT5 in memory building blocks designed specifically to 2009 required improvements in parallel I/O support the needs of a broad range of 9 software , storage systems, and system area applications. Each node has two AMD Opteron network technologies. Archival storage 2435 “Istanbul” six-core processors running at requirements have grown dramatically with less 2.6 GHz and connected to each other through a than 500 TB of data stored in 2005 to over 11 pair of HyperTransport14 connections. Each of PB as of this writing, requiring improvements in the Opteron processors has 6 MB of level 3 10 archival storage software and tape archive cache shared among the six cores and a DDR2 infrastructure. memory controller connected to a pair of 4 GB DDR2-800 memory modules. The 3. System Architecture HyperTransport connections between the 11 Jaguar is a descendent of the Red Storm processors provide a cache coherent shared computer system developed by Sandia National memory node with twelve cores, 16 GB of Laboratories with Cray and first installed at memory, and 25.6 GB per second of memory Sandia in 2004. Jaguar is a massively parallel, distributed memory system composed of a 2,332 TF Cray XT5, a 263 TF Cray XT4, a 10,000 TB external file system known as Spider, and a cluster of external login, compile, and job submission nodes.
Recommended publications
  • New CSC Computing Resources
    New CSC computing resources Atte Sillanpää, Nino Runeberg CSC – IT Center for Science Ltd. Outline CSC at a glance New Kajaani Data Centre Finland’s new supercomputers – Sisu (Cray XC30) – Taito (HP cluster) CSC resources available for researchers CSC presentation 2 CSC’s Services Funet Services Computing Services Universities Application Services Polytechnics Ministries Data Services for Science and Culture Public sector Information Research centers Management Services Companies FUNET FUNET and Data services – Connections to all higher education institutions in Finland and for 37 state research institutes and other organizations – Network Services and Light paths – Network Security – Funet CERT – eduroam – wireless network roaming – Haka-identity Management – Campus Support – The NORDUnet network Data services – Digital Preservation and Data for Research Data for Research (TTA), National Digital Library (KDK) International collaboration via EU projects (EUDAT, APARSEN, ODE, SIM4RDM) – Database and information services Paituli: GIS service Nic.funet.fi – freely distributable files with FTP since 1990 CSC Stream Database administration services – Memory organizations (Finnish university and polytechnics libraries, Finnish National Audiovisual Archive, Finnish National Archives, Finnish National Gallery) 4 Current HPC System Environment Name Louhi Vuori Type Cray XT4/5 HP Cluster DOB 2007 2010 Nodes 1864 304 CPU Cores 10864 3648 Performance ~110 TFlop/s 34 TF Total memory ~11 TB 5 TB Interconnect Cray QDR IB SeaStar Fat tree 3D Torus CSC
    [Show full text]
  • Petaflops for the People
    PETAFLOPS SPOTLIGHT: NERSC housands of researchers have used facilities of the Advanced T Scientific Computing Research (ASCR) program and its EXTREME-WEATHER Department of Energy (DOE) computing predecessors over the past four decades. Their studies of hurricanes, earthquakes, NUMBER-CRUNCHING green-energy technologies and many other basic and applied Certain problems lend themselves to solution by science problems have, in turn, benefited millions of people. computers. Take hurricanes, for instance: They’re They owe it mainly to the capacity provided by the National too big, too dangerous and perhaps too expensive Energy Research Scientific Computing Center (NERSC), the Oak to understand fully without a supercomputer. Ridge Leadership Computing Facility (OLCF) and the Argonne Leadership Computing Facility (ALCF). Using decades of global climate data in a grid comprised of 25-kilometer squares, researchers in These ASCR installations have helped train the advanced Berkeley Lab’s Computational Research Division scientific workforce of the future. Postdoctoral scientists, captured the formation of hurricanes and typhoons graduate students and early-career researchers have worked and the extreme waves that they generate. Those there, learning to configure the world’s most sophisticated same models, when run at resolutions of about supercomputers for their own various and wide-ranging projects. 100 kilometers, missed the tropical cyclones and Cutting-edge supercomputing, once the purview of a small resulting waves, up to 30 meters high. group of experts, has trickled down to the benefit of thousands of investigators in the broader scientific community. Their findings, published inGeophysical Research Letters, demonstrated the importance of running Today, NERSC, at Lawrence Berkeley National Laboratory; climate models at higher resolution.
    [Show full text]
  • Materials Modelling and the Challenges of Petascale and Exascale
    Multiscale Materials Modelling on High Performance Computer Architectures Materials modelling and the challenges of petascale and exascale Andrew Emerson Cineca Supercomputing Centre, Bologna, Italy The project MMM@HPC is funded by the 7th Framework Programme of the European Commission within the Research Infrastructures 26/09/2013 with grant agreement number RI-261594. Contents Introduction to HPC HPC and the MMM@HPC project Petascale computing The Road to Exascale Observations 26/09/2013 A. Emerson, International Forum on Multiscale Materials Modelling, Bologna 2013 2 High Performance Computing High Performance Computing (HPC). What is it ? High-performance computing (HPC) is the use of parallel processing for running advanced application programs efficiently, reliably and quickly. The term applies especially to systems that function above a teraflop or 10 12 floating- point operations per second. (http://searchenterpriselinux.techtarget.com/definition/high -performance -computing ) A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. A main area of this discipline is developing parallel processing algorithms and software: programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors. (WEBOPEDIA ) 26/09/2013 A. Emerson, International Forum on Multiscale Materials Modelling, Bologna 2013 3 High Performance Computing Advances due to HPC, e.g. Molecular dynamics early 1990s . Lysozyme, 40k atoms 2006. Satellite tobacco mosaic virus (STMV). 1M atoms, 50ns 2008. Ribosome. 3.2M atoms, 230ns. 2011 . Chromatophore, 100M atoms (SC 2011) 26/09/2013 A. Emerson, International Forum on Multiscale Materials Modelling, Bologna 2013 4 High Performance Computing Cray-1 Supercomputer (1976) 80MHz , Vector processor → 250Mflops Cray XMP (1982) 2 CPUs+vectors, 400 MFlops “FERMI”, Bluegene/Q 168,000 cores 2.1 Pflops 26/09/2013 A.
    [Show full text]
  • Technical Manual for a Coupled Sea-Ice/Ocean Circulation Model (Version 3)
    OCS Study MMS 2009-062 Technical Manual for a Coupled Sea-Ice/Ocean Circulation Model (Version 3) Katherine S. Hedström Arctic Region Supercomputing Center University of Alaska Fairbanks U.S. Department of the Interior Minerals Management Service Anchorage, Alaska Contract No. M07PC13368 OCS Study MMS 2009-062 Technical Manual for a Coupled Sea-Ice/Ocean Circulation Model (Version 3) Katherine S. Hedström Arctic Region Supercomputing Center University of Alaska Fairbanks Nov 2009 This study was funded by the Alaska Outer Continental Shelf Region of the Minerals Manage- ment Service, U.S. Department of the Interior, Anchorage, Alaska, through Contract M07PC13368 with Rutgers University, Institute of Marine and Coastal Sciences. The opinions, findings, conclusions, or recommendations expressed in this report or product are those of the authors and do not necessarily reflect the views of the U.S. Department of the Interior, nor does mention of trade names or commercial products constitute endorsement or recommenda- tion for use by the Federal Government. This document was prepared with LATEX xfig, and inkscape. Acknowledgments The ROMS model is descended from the SPEM and SCRUM models, but has been entirely rewritten by Sasha Shchepetkin, Hernan Arango and John Warner, with many, many other con- tributors. I am indebted to every one of them for their hard work. Bill Hibler first came up with the viscous-plastic rheology we are using. Paul Budgell has rewritten the dynamic sea-ice model, improving the solution procedure and making the water- stress term implicit in time, then changing it again to use the elastic-viscous-plastic rheology of Hunke and Dukowicz.
    [Show full text]
  • Introducing the Cray XMT
    Introducing the Cray XMT Petr Konecny Cray Inc. 411 First Avenue South Seattle, Washington 98104 [email protected] May 5th 2007 Abstract Cray recently introduced the Cray XMT system, which builds on the parallel architecture of the Cray XT3 system and the multithreaded architecture of the Cray MTA systems with a Cray-designed multithreaded processor. This paper highlights not only the details of the architecture, but also the application areas it is best suited for. 1 Introduction past two decades the processor speeds have increased several orders of magnitude while memory speeds Cray XMT is the third generation multithreaded ar- increased only slightly. Despite this disparity the chitecture built by Cray Inc. The two previous gen- performance of most applications continued to grow. erations, the MTA-1 and MTA-2 systems [2], were This has been achieved in large part by exploiting lo- fully custom systems that were expensive to man- cality in memory access patterns and by introducing ufacture and support. Developed under code name hierarchical memory systems. The benefit of these Eldorado [4], the Cray XMT uses many parts built multilevel caches is twofold: they lower average la- for other commercial system, thereby, significantly tency of memory operations and they amortize com- lowering system costs while maintaining the sim- munication overheads by transferring larger blocks ple programming model of the two previous genera- of data. tions. Reusing commodity components significantly The presence of multiple levels of caches forces improves price/performance ratio over the two pre- programmers to write code that exploits them, if vious generations. they want to achieve good performance of their ap- Cray XMT leverages the development of the Red plication.
    [Show full text]
  • Survey of Computer Architecture
    Architecture-aware Algorithms and Software for Peta and Exascale Computing Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 11/17/2010 1 H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance Rate - Updated twice a year Size SC‘xy in the States in November Meeting in Germany in June - All data available from www.top2 500.org 36rd List: The TOP10 Rmax % of Power Flops/ Rank Site Computer Country Cores [Pflops] Peak [MW] Watt Nat. SuperComputer NUDT YH Cluster, X5670 1 China 186,368 2.57 55 4.04 636 Center in Tianjin 2.93Ghz 6C, NVIDIA GPU DOE / OS Jaguar / Cray 2 USA 224,162 1.76 75 7.0 251 Oak Ridge Nat Lab Cray XT5 sixCore 2.6 GHz Nebulea / Dawning / TC3600 Nat. Supercomputer 3 Blade, Intel X5650, Nvidia China 120,640 1.27 43 2.58 493 Center in Shenzhen C2050 GPU Tusbame 2.0 HP ProLiant GSIC Center, Tokyo 4 SL390s G7 Xeon 6C X5670, Japan 73,278 1.19 52 1.40 850 Institute of Technology Nvidia GPU Hopper, Cray XE6 12-core 5 DOE/SC/LBNL/NERSC USA 153,408 1.054 82 2.91 362 2.1 GHz Commissariat a Tera-100 Bull bullx super- 6 l'Energie Atomique France 138,368 1.050 84 4.59 229 node S6010/S6030 (CEA) DOE / NNSA Roadrunner / IBM 7 USA 122,400 1.04 76 2.35 446 Los Alamos Nat Lab BladeCenter QS22/LS21 NSF / NICS / kyaken/ Cray 8 USA 98,928 .831 81 3.09 269 U of Tennessee Cray XT5 sixCore 2.6 GHz Forschungszentrum Jugene / IBM 9 Germany 294,912 .825 82 2.26 365 Juelich (FZJ) Blue Gene/P Solution DOE/ NNSA / 10 Cray XE6 8-core 2.4 GHz USA 107,152 .817 79 2.95 277 Los Alamos Nat Lab 36rd List: The TOP10 Rmax % of Power Flops/ Rank Site Computer Country Cores [Pflops] Peak [MW] Watt Nat.
    [Show full text]
  • Lessons Learned in Deploying the World's Largest Scale Lustre File
    Lessons Learned in Deploying the World’s Largest Scale Lustre File System Galen M. Shipman, David A. Dillow, Sarp Oral, Feiyi Wang, Douglas Fuller, Jason Hill, Zhe Zhang Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory Oak Ridge, TN 37831, USA fgshipman,dillowda,oralhs,fwang2,fullerdj,hilljj,[email protected] Abstract 1 Introduction The Spider system at the Oak Ridge National Labo- The Oak Ridge Leadership Computing Facility ratory’s Leadership Computing Facility (OLCF) is the (OLCF) at Oak Ridge National Laboratory (ORNL) world’s largest scale Lustre parallel file system. Envi- hosts the world’s most powerful supercomputer, sioned as a shared parallel file system capable of de- Jaguar [2, 14, 7], a 2.332 Petaflop/s Cray XT5 [5]. livering both the bandwidth and capacity requirements OLCF also hosts an array of other computational re- of the OLCF’s diverse computational environment, the sources such as a 263 Teraflop/s Cray XT4 [1], visual- project had a number of ambitious goals. To support the ization, and application development platforms. Each of workloads of the OLCF’s diverse computational plat- these systems requires a reliable, high-performance and forms, the aggregate performance and storage capacity scalable file system for data storage. of Spider exceed that of our previously deployed systems Parallel file systems on leadership-class systems have by a factor of 6x - 240 GB/sec, and 17x - 10 Petabytes, traditionally been tightly coupled to single simulation respectively. Furthermore, Spider supports over 26,000 platforms. This approach had resulted in the deploy- clients concurrently accessing the file system, which ex- ment of a dedicated file system for each computational ceeds our previously deployed systems by nearly 4x.
    [Show full text]
  • Titan: a New Leadership Computer for Science
    Titan: A New Leadership Computer for Science Presented to: DOE Advanced Scientific Computing Advisory Committee November 1, 2011 Arthur S. Bland OLCF Project Director Office of Science Statement of Mission Need • Increase the computational resources of the Leadership Computing Facilities by 20-40 petaflops • INCITE program is oversubscribed • Programmatic requirements for leadership computing continue to grow • Needed to avoid an unacceptable gap between the needs of the science programs and the available resources • Approved: Raymond Orbach January 9, 2009 • The OLCF-3 project comes out of this requirement 2 ASCAC – Nov. 1, 2011 Arthur Bland INCITE is 2.5 to 3.5 times oversubscribed 2007 2008 2009 2010 2011 2012 3 ASCAC – Nov. 1, 2011 Arthur Bland What is OLCF-3 • The next phase of the Leadership Computing Facility program at ORNL • An upgrade of Jaguar from 2.3 Petaflops (peak) today to between 10 and 20 PF by the end of 2012 with operations in 2013 • Built with Cray’s newest XK6 compute blades • When completed, the new system will be called Titan 4 ASCAC – Nov. 1, 2011 Arthur Bland Cray XK6 Compute Node XK6 Compute Node Characteristics AMD Opteron 6200 “Interlagos” 16 core processor @ 2.2GHz Tesla M2090 “Fermi” @ 665 GF with 6GB GDDR5 memory Host Memory 32GB 1600 MHz DDR3 Gemini High Speed Interconnect Upgradeable to NVIDIA’s next generation “Kepler” processor in 2012 Four compute nodes per XK6 blade. 24 blades per rack 5 ASCAC – Nov. 1, 2011 Arthur Bland ORNL’s “Titan” System • Upgrade of existing Jaguar Cray XT5 • Cray Linux Environment
    [Show full text]
  • Musings RIK FARROWOPINION
    Musings RIK FARROWOPINION Rik is the editor of ;login:. While preparing this issue of ;login:, I found myself falling down a rabbit hole, like [email protected] Alice in Wonderland . And when I hit bottom, all I could do was look around and puzzle about what I discovered there . My adventures started with a casual com- ment, made by an ex-Cray Research employee, about the design of current super- computers . He told me that today’s supercomputers cannot perform some of the tasks that they are designed for, and used weather forecasting as his example . I was stunned . Could this be true? Or was I just being dragged down some fictional rabbit hole? I decided to learn more about supercomputer history . Supercomputers It is humbling to learn about the early history of computer design . Things we take for granted, such as pipelining instructions and vector processing, were impor- tant inventions in the 1970s . The first supercomputers were built from discrete components—that is, transistors soldered to circuit boards—and had clock speeds in the tens of nanoseconds . To put that in real terms, the Control Data Corpora- tion’s (CDC) 7600 had a clock cycle of 27 .5 ns, or in today’s terms, 36 4. MHz . This was CDC’s second supercomputer (the 6600 was first), but included instruction pipelining, an invention of Seymour Cray . The CDC 7600 peaked at 36 MFLOPS, but generally got 10 MFLOPS with carefully tuned code . The other cool thing about the CDC 7600 was that it broke down at least once a day .
    [Show full text]
  • Recent Developments in Supercomputing
    John von Neumann Institute for Computing Recent Developments in Supercomputing Th. Lippert published in NIC Symposium 2008, G. M¨unster, D. Wolf, M. Kremer (Editors), John von Neumann Institute for Computing, J¨ulich, NIC Series, Vol. 39, ISBN 978-3-9810843-5-1, pp. 1-8, 2008. c 2008 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above. http://www.fz-juelich.de/nic-series/volume39 Recent Developments in Supercomputing Thomas Lippert J¨ulich Supercomputing Centre, Forschungszentrum J¨ulich 52425 J¨ulich, Germany E-mail: [email protected] Status and recent developments in the field of supercomputing on the European and German level as well as at the Forschungszentrum J¨ulich are presented. Encouraged by the ESFRI committee, the European PRACE Initiative is going to create a world-leading European tier-0 supercomputer infrastructure. In Germany, the BMBF formed the Gauss Centre for Supercom- puting, the largest national association for supercomputing in Europe. The Gauss Centre is the German partner in PRACE. With its new Blue Gene/P system, the J¨ulich supercomputing centre has realized its vision of a dual system complex and is heading for Petaflop/s already in 2009. In the framework of the JuRoPA-Project, in cooperation with German and European industrial partners, the JSC will build a next generation general purpose system with very good price-performance ratio and energy efficiency.
    [Show full text]
  • Cray XT and Cray XE Y Y System Overview
    Crayyy XT and Cray XE System Overview Customer Documentation and Training Overview Topics • System Overview – Cabinets, Chassis, and Blades – Compute and Service Nodes – Components of a Node Opteron Processor SeaStar ASIC • Portals API Design Gemini ASIC • System Networks • Interconnection Topologies 10/18/2010 Cray Private 2 Cray XT System 10/18/2010 Cray Private 3 System Overview Y Z GigE X 10 GigE GigE SMW Fibre Channels RAID Subsystem Compute node Login node Network node Boot /Syslog/Database nodes 10/18/2010 Cray Private I/O and Metadata nodes 4 Cabinet – The cabinet contains three chassis, a blower for cooling, a power distribution unit (PDU), a control system (CRMS), and the compute and service blades (modules) – All components of the system are air cooled A blower in the bottom of the cabinet cools the blades within the cabinet • Other rack-mounted devices within the cabinet have their own internal fans for cooling – The PDU is located behind the blower in the back of the cabinet 10/18/2010 Cray Private 5 Liquid Cooled Cabinets Heat exchanger Heat exchanger (XT5-HE LC only) (LC cabinets only) 48Vdc flexible Cage 2 buses Cage 2 Cage 1 Cage 1 Cage VRMs Cage 0 Cage 0 backplane assembly Cage ID controller Interconnect 01234567 Heat exchanger network cable Cage inlet (LC cabinets only) connection air temp sensor Airflow Heat exchanger (slot 3 rail) conditioner 48Vdc shelf 3 (XT5-HE LC only) 48Vdc shelf 2 L1 controller 48Vdc shelf 1 Blower speed controller (VFD) Blooewer PDU line filter XDP temperature XDP interface & humidity sensor
    [Show full text]
  • JSC News No. 166, July 2008
    JUMP Succession two more years. This should provide users No. 166 • July 2008 with sufficient available computing power On 1 July 2008, the new IBM Power6 ma- on the general purpose part and time to mi- chine p6 575 – known as JUMP, just like grate their applications to the new cluster. its predecessor – took over the production (Contact: Klaus Wolkersdorfer, ext. 6579) workload from the previous Power4 clus- ter as scheduled. First benchmark tests show that users can expect their applica- Gauss Alliance Founded tions to run about three times faster on the During the ISC 2008, an agreement was new machine. This factor could even be im- signed to found the Gauss Alliance, which proved by making the most out of the follow- will unite supercomputer forces in Ger- ing Power6 features: many. The Gauss Centre for Supercom- • Simultaneous multi-threading (SMT puting (GCS) and eleven regional and topi- mode): 64 threads (instead of 32) can cal high-performance computer centres are be used on one node, where 2 threads participating in the alliance, thus creating a now share one physical processor with computer association that is unique world- its dedicated memory caches and float- wide. The signatories are: Gauss Cen- ing point units. tre for Supercomputing (GCS), Center for • Medium size virtual memory pages Computing and Communication of RWTH (64K): applications can set 64K pages Aachen University, Norddeutscher Ver- as a default during link time and thus bund für Hoch- und Höchstleistungsrech- can benefit from improved hardware ef- nen (HLRN) consisting of Zuse Institute ficiencies by accessing these pages.
    [Show full text]