Assessing the Facilities and Systems Supporting HPC at ORNL

Total Page:16

File Type:pdf, Size:1020Kb

Assessing the Facilities and Systems Supporting HPC at ORNL Summer 2019 - Assessing the Facilities and Systems Supporting HPC at ORNL Jim Rogers Director, Computing and Facilities National Center for Computational Sciences Oak Ridge National Laboratory ORNL is managed by UT-Battelle, LLC for the US Department of Energy ORNL Leadership Class Systems 2004 - 2018 27 PF 2012 2.5 Cray XK7 1 PF Titan PF 263 2009 TF 2008 Cray XT5 62 Cray XT5 Jaguar 54 2008 25 TF TF Jaguar 18.5 TF Cray XT4 TF 2006 2007 Jaguar Cray XT3 Cray XT4 2004 2005 Jaguar From 2004 – 2018, HPC systems relied Cray X1E Cray XT3 Jaguar Phoenix Jaguar on chiller-based cooling (5.5°C supply) with annualized PUEs to ~1.4 2 ORNL’s Transition to Warmer Facility Supply Temperatures Frontier: Custom mechanical Summit: A combination of packaging is >95% room- Titan: Refrigerant-based direct on-package cooling neutral with a 30°C supply. per-rack cooling with and RDHX with 21°C supply • ~100% Evaporative Cooling, direct rejection of heat to is > 95% room-neutral. with supplemental HVAC for cold 5.5°C water • Above dewpoint parasitic loads • Below dewpoint • Contribution by chillers • 100% use of chillers ~20% of the hours of the year ~1.5 EF 200 2021/2022 PF Frontier 27 PF 2018/2019 2012 IBM Cray XK7 Summit Titan 3 GTC'154 Session S5566 GPU Errors on HPC Systems Oak Ridge National Laboratory’s Cray XK7 Titan Operational from November 2012 through August 1 2019 18,688 compute nodes – 1 AMD Opteron + 1 NVIDIA Kepler/node 27PF (peak); 17.59PF (HPL); 9.5MW (peak) Delivered > 27B compute hours over its life Titan entered as the #1 supercomputer in the world in November 2012, and was still #12 on the Top500 list at the time of its decommissioning 4 Perspective on Titan as an Air-cooled Supercomputer ORNL’s Cray XK7 “Titan” Wait. What? • 200 cabinets, each with a 3,000 CFM fan • 600,000 CFM (air volume) Titan moves as much air as a • Dry air has a density of 13.076 cubic feet per pound long-range Airbus A340 at • Titan moves 600,000/13.076/60 -> 765 lbs/sec cruising altitude. Airbus A340 • At takeoff, each of (4) engines generates 140kN of force and consumes 1000 lbs/sec of air • At cruise, the A340’s engines each produce ~29kN and consumes ~200 lbs/sec of air. 5 Motivation for “Warmer” Cooling Solutions Serving HPC Centers • Reduced Cost, Both CAPEX and OPEX – Reduce or eliminate the need for traditional chillers. • No chillers, no ozone-depleting refrigerants (GREEN) – Oak Ridge calculates an annualized PUE for air--cooled devices of no better than 1.4 Oak Ridge, TN (ASHRAE Zone 4A – Mixed Humid) – HPC power budgets continue to grow – Summit has a design point for >12MW (HPC-only). Minimizing PUE/ITUE is critical to the budget. • Easier, more reliable design – Design is reduced to pumps, evaporative cooling, heat exchangers. – Traditional chilled water may not be necessary at all (NREL, NCAR, et al) 6 OLCF Facilities Supporting Summit • Titan – 9MW @ heavy load • Sitting on 250 pounds/ft2 raised floor • Uses 42F water and special CDUs (XDPs) • Summit – 256 compute cabinets on-slab • 100% room-neutral design uses RDHX • 20MW warm-water cooling plant using centralized CDU/secondary loop 7 21°C/20MW/7700 ton Facility System Design System • Summit – Demand: 3-10MW; • Secondary Loop – Supply 3300GPM (12,500 liters/min) @ 21°C; Return @ 29-33°C – CPUs and GPUs use cold plates – DIMMs and parasitic loads use RDHX – Storage and Network use RDHX 8 Cooling Design Existing chilled water cooling loop Primary Loop uses Evaporative Cooling Towers (~80% of the hours of the year) When the MTW RETURN is above the 21C setpoint, use a second set of Trim HX (with 5.5C on the other side) to drive MTW to the 21C setpoint. The need for the trim-loop is about 20% of the hours in the year, and can ramp 0-100% to meet the setpoint back to Summit New primary cooling loop 9 Benefits of Warm Water + Operating Dashboards • Warm Water allows annualized PUE of 1.1 – ~$1M cost per MW-year for consumption on Summit; – ~$100k cost per MW-year for waste heat management • Integration with PLC allows us to tune water flow – Better delta(t); less pumping energy • Integration with IBM’s OpenBMC allows us to protect these 40k components from inadequate flow across the cold plates • Integration with the scheduler allows us to correlate power and temperature data with individual applications. 10 • Additional data streams to be added- most from the Facility PLC Comparing Energy Performance - Titan to Summit • Demonstrated Performance on Titan (Oct 2012) • 17.59 PF 8.9MW peak 8.3MW average • Demonstrated Performance on Summit (Oct 2018) • 143.5 PF 11,065kW peak 9,783kW average 8.16x performance increase 17.9% avg. power increase Titan: ~2.1 GFLOPs/Watt Summit: 14.66 GFLOPs/Watt >7x increase in energy efficiency 11 12 13 14 Summit MTW Cooling Loads and Temperatures HPL Run 5/24/19 Duration: 5:48 PUE during HPL Run = 1.081 100 14000 95 90 12000 85 10000 80 IT Load follows F) ° 75 a traditional HPL profile 8000 70 Supply temperature to Summit stayed kW 65 OAWBT remained Total power load rock-solid at 70F. 6000 60 at/above the on the energy supply target, Temperature( 55 plant reflects affecting ECT storage and other 4000 50 items A small portion of the total load 45 required the use of 2000 CHW (trim RDHX) 40 35 0 MTW kW (cooling) CHW kW (cooling) MTW Return Temp MTW Supply Temp 23:51:00 K10023:58:30 00:06:00Avg00:13:30 Space00:21:00 00:28:30 00:36:00 Temp00:43:30 00:51:00 00:58:30 01:06:00 01:13:30 01:21:00 01:28:30 Outdoor01:36:00 01:43:30 01:51:00 Air01:58:30 Wet02:06:00 02:13:30Bulb02:21:00 Temp02:28:30 02:36:00 02:43:30 02:51:00 02:58:30 03:06:00 IT kW03:13:30 03:21:00 03:28:30 03:36:00 03:43:30 03:51:00 03:58:30 04:06:00 04:13:30 04:21:00 04:28:30 04:36:00 04:43:30 04:51:00 04:58:30 05:06:00 05:13:30 05:21:00 05:28:30 05:36:00 15 Time 16 Idle: 2.97MW kW 10000 12000 10000 12000 2000 4000 6000 8000 2000 4000 6000 8000 0 0 1 239215 477429 643 11,065 kW 11,065 Max power power Max 715 measurements 151,284 seconds; 21,612 857 953 1071 12851191 14991429 17131667 11,065 kW 11,065 19271905 kW 9,783 power Average power Max 21432141 23812355 26192569 28572783 2997 3095 3211 34253333 – 36393571 38533809 40674047 42854281 45234495 – 47614709 49994923 5137 5237 5351 – 55655475 57795713 59935951 62076189 64276421 66656635 69036849 71417063 7277 OLCF 7379 OLCF 7491 77057617 79197855 8093 8133 10 10 - - 4 Summit HPL Power Measurements 4 Summit HPLPower 83478331 Measurements 4 Summit HPLPower - 85698561 - 25 88078775 25 8989 - 9045 - 18 92839203 18 9417 9521 Measurement (1/second) Measurement Measurement (1/second) Measurement 9631 - 98459759 - 03:31:48 to 09:32:00 03:31:48 to 100599997 09:32:00 03:31:48 to 1027310235 1048710473 1071110701 1094910915 11129 TotalLoad IT 11187 PUE TotalMech Load 1142511343 11557 11663 11771 1198511901 1219912139 1241312377 1262712615 1285312841 1309113055 1332913269 1356713483 13697 13805 13911 1412514043 1433914281 1455314519 1476714757 1499514981 1523315195 1547115409 1570915623 15837 1605115947 1626516185 1647916423 58,730 kW 58,730 1669316661 1.0246 kW 1,449 1689916907 kW 9,783 power Average 1713717121 1737517335 1761317549 1785117763 17977 1819118089 1840518327 18565 18619 - 18803 hours 18833 - 1904119047 hours 1927919261 1951719475 1975519689 19903 19993 20117 2033120231 2054520469 – 2075920707 2097320945 2118321187 2142121401 0 10000 20000 30000 40000 50000 60000 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 Accumulated kW-hoursAxis Title 17 10000 12000 2000 4000 6000 8000 kW 0 10000 12000 2000 4000 6000 8000 1 0 214 1 427 215 429 640 643 853 857 1066 1071 1285 1279 1499 1492 1713 1705 1927 2141 1918 2355 2131 2569 2344 122.3PF 2783June2018 2997 2557 3211 2770 3425 2983 3639 3853 3196 4067 3409 4281 3622 4495 4709 3835 4923 4048 5137 4261 5351 5565 4474 5779 4687 5993 4900 6207 6421 5113 6635 5326 6849 7063 5539 7277 5752 7491 5965 7705 7919 OLCF 6178 8133 6391 8347 10 - 8561 Measurements 4 Summit HPL Power 6604 - 8775 25 6817 8989 - 7030 9203 18 9417 7243 - Measurement (1/second) Measurement 9631 03:31:48 to 09:32:00 03:31:48 to 7456 9845 143.5PF 2018 Oct 7669 10059 7882 10273 10487 8095 10701 8308 10915 8521 11129 11343 8734 11557 8947 11771 9160 11985 12199 9373 12413 9586 12627 9799 12841 13055 10012 13269 10225 13483 10438 13697 13911 10651 14125 10864 14339 14553 11077 14767 11290 14981 11503 15195 15409 11716 15623 11929 15837 12142 16051 16265 12355 16479 12568 16693 12781 16907 17121 12994 17335 13207 17549 13420 17763 17977 13633 18191 13846 18405 14059 18619 18833 14272 19047 14485 19261 14698 19475 19689 14911 19903 15124 20117 15337 20331 20545 15550 20759 15763 20973 21187 15976 21401 16189 Cooling System Performance - PUE 18 20°C 21°C 22°C 24°C 25°C 26°C 27°C 19 23°C 26°C 19°C 20 Caution – Secondary (Closed) Loop Concerns Worsen as Temperatures Rise… Allowable Wetted Materials Accelerated Corrosion • Lead-free copper alloys with less than 15% zinc.
Recommended publications
  • ORNL Debuts Titan Supercomputer Table of Contents
    ReporterRetiree Newsletter December 2012/January 2013 SCIENCE ORNL debuts Titan supercomputer ORNL has completed the installation of Titan, a supercomputer capable of churning through more than 20,000 trillion calculations each second—or 20 petaflops—by employing a family of processors called graphic processing units first created for computer gaming. Titan will be 10 times more powerful than ORNL’s last world-leading system, Jaguar, while overcoming power and space limitations inherent in the previous generation of high- performance computers. ORNL is now home to Titan, the world’s most powerful supercomputer for open science Titan, which is supported by the DOE, with a theoretical peak performance exceeding 20 petaflops (quadrillion calculations per second). (Image: Jason Richards) will provide unprecedented computing power for research in energy, climate change, efficient engines, materials and other disciplines and pave the way for a wide range of achievements in “Titan will provide science and technology. unprecedented computing Table of Contents The Cray XK7 system contains 18,688 nodes, power for research in energy, with each holding a 16-core AMD Opteron ORNL debuts Titan climate change, materials 6274 processor and an NVIDIA Tesla K20 supercomputer ............1 graphics processing unit (GPU) accelerator. and other disciplines to Titan also has more than 700 terabytes of enable scientific leadership.” Betty Matthews memory. The combination of central processing loves to travel .............2 units, the traditional foundation of high- performance computers, and more recent GPUs will allow Titan to occupy the same space as Service anniversaries ......3 its Jaguar predecessor while using only marginally more electricity. “One challenge in supercomputers today is power consumption,” said Jeff Nichols, Benefits ..................4 associate laboratory director for computing and computational sciences.
    [Show full text]
  • Petaflops for the People
    PETAFLOPS SPOTLIGHT: NERSC housands of researchers have used facilities of the Advanced T Scientific Computing Research (ASCR) program and its EXTREME-WEATHER Department of Energy (DOE) computing predecessors over the past four decades. Their studies of hurricanes, earthquakes, NUMBER-CRUNCHING green-energy technologies and many other basic and applied Certain problems lend themselves to solution by science problems have, in turn, benefited millions of people. computers. Take hurricanes, for instance: They’re They owe it mainly to the capacity provided by the National too big, too dangerous and perhaps too expensive Energy Research Scientific Computing Center (NERSC), the Oak to understand fully without a supercomputer. Ridge Leadership Computing Facility (OLCF) and the Argonne Leadership Computing Facility (ALCF). Using decades of global climate data in a grid comprised of 25-kilometer squares, researchers in These ASCR installations have helped train the advanced Berkeley Lab’s Computational Research Division scientific workforce of the future. Postdoctoral scientists, captured the formation of hurricanes and typhoons graduate students and early-career researchers have worked and the extreme waves that they generate. Those there, learning to configure the world’s most sophisticated same models, when run at resolutions of about supercomputers for their own various and wide-ranging projects. 100 kilometers, missed the tropical cyclones and Cutting-edge supercomputing, once the purview of a small resulting waves, up to 30 meters high. group of experts, has trickled down to the benefit of thousands of investigators in the broader scientific community. Their findings, published inGeophysical Research Letters, demonstrated the importance of running Today, NERSC, at Lawrence Berkeley National Laboratory; climate models at higher resolution.
    [Show full text]
  • Safety and Security Challenge
    SAFETY AND SECURITY CHALLENGE TOP SUPERCOMPUTERS IN THE WORLD - FEATURING TWO of DOE’S!! Summary: The U.S. Department of Energy (DOE) plays a very special role in In fields where scientists deal with issues from disaster relief to the keeping you safe. DOE has two supercomputers in the top ten supercomputers in electric grid, simulations provide real-time situational awareness to the whole world. Titan is the name of the supercomputer at the Oak Ridge inform decisions. DOE supercomputers have helped the Federal National Laboratory (ORNL) in Oak Ridge, Tennessee. Sequoia is the name of Bureau of Investigation find criminals, and the Department of the supercomputer at Lawrence Livermore National Laboratory (LLNL) in Defense assess terrorist threats. Currently, ORNL is building a Livermore, California. How do supercomputers keep us safe and what makes computing infrastructure to help the Centers for Medicare and them in the Top Ten in the world? Medicaid Services combat fraud. An important focus lab-wide is managing the tsunamis of data generated by supercomputers and facilities like ORNL’s Spallation Neutron Source. In terms of national security, ORNL plays an important role in national and global security due to its expertise in advanced materials, nuclear science, supercomputing and other scientific specialties. Discovery and innovation in these areas are essential for protecting US citizens and advancing national and global security priorities. Titan Supercomputer at Oak Ridge National Laboratory Background: ORNL is using computing to tackle national challenges such as safe nuclear energy systems and running simulations for lower costs for vehicle Lawrence Livermore's Sequoia ranked No.
    [Show full text]
  • Lessons Learned in Deploying the World's Largest Scale Lustre File
    Lessons Learned in Deploying the World’s Largest Scale Lustre File System Galen M. Shipman, David A. Dillow, Sarp Oral, Feiyi Wang, Douglas Fuller, Jason Hill, Zhe Zhang Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory Oak Ridge, TN 37831, USA fgshipman,dillowda,oralhs,fwang2,fullerdj,hilljj,[email protected] Abstract 1 Introduction The Spider system at the Oak Ridge National Labo- The Oak Ridge Leadership Computing Facility ratory’s Leadership Computing Facility (OLCF) is the (OLCF) at Oak Ridge National Laboratory (ORNL) world’s largest scale Lustre parallel file system. Envi- hosts the world’s most powerful supercomputer, sioned as a shared parallel file system capable of de- Jaguar [2, 14, 7], a 2.332 Petaflop/s Cray XT5 [5]. livering both the bandwidth and capacity requirements OLCF also hosts an array of other computational re- of the OLCF’s diverse computational environment, the sources such as a 263 Teraflop/s Cray XT4 [1], visual- project had a number of ambitious goals. To support the ization, and application development platforms. Each of workloads of the OLCF’s diverse computational plat- these systems requires a reliable, high-performance and forms, the aggregate performance and storage capacity scalable file system for data storage. of Spider exceed that of our previously deployed systems Parallel file systems on leadership-class systems have by a factor of 6x - 240 GB/sec, and 17x - 10 Petabytes, traditionally been tightly coupled to single simulation respectively. Furthermore, Spider supports over 26,000 platforms. This approach had resulted in the deploy- clients concurrently accessing the file system, which ex- ment of a dedicated file system for each computational ceeds our previously deployed systems by nearly 4x.
    [Show full text]
  • Titan: a New Leadership Computer for Science
    Titan: A New Leadership Computer for Science Presented to: DOE Advanced Scientific Computing Advisory Committee November 1, 2011 Arthur S. Bland OLCF Project Director Office of Science Statement of Mission Need • Increase the computational resources of the Leadership Computing Facilities by 20-40 petaflops • INCITE program is oversubscribed • Programmatic requirements for leadership computing continue to grow • Needed to avoid an unacceptable gap between the needs of the science programs and the available resources • Approved: Raymond Orbach January 9, 2009 • The OLCF-3 project comes out of this requirement 2 ASCAC – Nov. 1, 2011 Arthur Bland INCITE is 2.5 to 3.5 times oversubscribed 2007 2008 2009 2010 2011 2012 3 ASCAC – Nov. 1, 2011 Arthur Bland What is OLCF-3 • The next phase of the Leadership Computing Facility program at ORNL • An upgrade of Jaguar from 2.3 Petaflops (peak) today to between 10 and 20 PF by the end of 2012 with operations in 2013 • Built with Cray’s newest XK6 compute blades • When completed, the new system will be called Titan 4 ASCAC – Nov. 1, 2011 Arthur Bland Cray XK6 Compute Node XK6 Compute Node Characteristics AMD Opteron 6200 “Interlagos” 16 core processor @ 2.2GHz Tesla M2090 “Fermi” @ 665 GF with 6GB GDDR5 memory Host Memory 32GB 1600 MHz DDR3 Gemini High Speed Interconnect Upgradeable to NVIDIA’s next generation “Kepler” processor in 2012 Four compute nodes per XK6 blade. 24 blades per rack 5 ASCAC – Nov. 1, 2011 Arthur Bland ORNL’s “Titan” System • Upgrade of existing Jaguar Cray XT5 • Cray Linux Environment
    [Show full text]
  • Musings RIK FARROWOPINION
    Musings RIK FARROWOPINION Rik is the editor of ;login:. While preparing this issue of ;login:, I found myself falling down a rabbit hole, like [email protected] Alice in Wonderland . And when I hit bottom, all I could do was look around and puzzle about what I discovered there . My adventures started with a casual com- ment, made by an ex-Cray Research employee, about the design of current super- computers . He told me that today’s supercomputers cannot perform some of the tasks that they are designed for, and used weather forecasting as his example . I was stunned . Could this be true? Or was I just being dragged down some fictional rabbit hole? I decided to learn more about supercomputer history . Supercomputers It is humbling to learn about the early history of computer design . Things we take for granted, such as pipelining instructions and vector processing, were impor- tant inventions in the 1970s . The first supercomputers were built from discrete components—that is, transistors soldered to circuit boards—and had clock speeds in the tens of nanoseconds . To put that in real terms, the Control Data Corpora- tion’s (CDC) 7600 had a clock cycle of 27 .5 ns, or in today’s terms, 36 4. MHz . This was CDC’s second supercomputer (the 6600 was first), but included instruction pipelining, an invention of Seymour Cray . The CDC 7600 peaked at 36 MFLOPS, but generally got 10 MFLOPS with carefully tuned code . The other cool thing about the CDC 7600 was that it broke down at least once a day .
    [Show full text]
  • Unlocking the Full Potential of the Cray XK7 Accelerator
    Unlocking the Full Potential of the Cray XK7 Accelerator ∗ † Mark D. Klein , and, John E. Stone ∗National Center for Supercomputing Application, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA †Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA Abstract—The Cray XK7 includes NVIDIA GPUs for ac- [2], [3], [4], and for high fidelity movie renderings with the celeration of computing workloads, but the standard XK7 HVR volume rendering software.2 In one example, the total system software inhibits the GPUs from accelerating OpenGL turnaround time for an HVR movie rendering of a trillion- and related graphics-specific functions. We have changed the operating mode of the XK7 GPU firmware, developed a custom cell inertial confinement fusion simulation [5] was reduced X11 stack, and worked with Cray to acquire an alternate driver from an estimate of over a month for data transfer to and package from NVIDIA in order to allow users to render and rendering on a conventional visualization cluster down to post-process their data directly on Blue Waters. Users are able just one day when rendered locally using 128 XK7 nodes to use NVIDIA’s hardware OpenGL implementation which has on Blue Waters. The fully-graphics-enabled GPU state is many features not available in software rasterizers. By elim- inating the transfer of data to external visualization clusters, currently considered an unsupported mode of operation by time-to-solution for users has been improved tremendously. In Cray, and to our knowledge Blue Waters is presently the one case, XK7 OpenGL rendering has cut turnaround time only Cray system currently running in this mode.
    [Show full text]
  • Cray XT and Cray XE Y Y System Overview
    Crayyy XT and Cray XE System Overview Customer Documentation and Training Overview Topics • System Overview – Cabinets, Chassis, and Blades – Compute and Service Nodes – Components of a Node Opteron Processor SeaStar ASIC • Portals API Design Gemini ASIC • System Networks • Interconnection Topologies 10/18/2010 Cray Private 2 Cray XT System 10/18/2010 Cray Private 3 System Overview Y Z GigE X 10 GigE GigE SMW Fibre Channels RAID Subsystem Compute node Login node Network node Boot /Syslog/Database nodes 10/18/2010 Cray Private I/O and Metadata nodes 4 Cabinet – The cabinet contains three chassis, a blower for cooling, a power distribution unit (PDU), a control system (CRMS), and the compute and service blades (modules) – All components of the system are air cooled A blower in the bottom of the cabinet cools the blades within the cabinet • Other rack-mounted devices within the cabinet have their own internal fans for cooling – The PDU is located behind the blower in the back of the cabinet 10/18/2010 Cray Private 5 Liquid Cooled Cabinets Heat exchanger Heat exchanger (XT5-HE LC only) (LC cabinets only) 48Vdc flexible Cage 2 buses Cage 2 Cage 1 Cage 1 Cage VRMs Cage 0 Cage 0 backplane assembly Cage ID controller Interconnect 01234567 Heat exchanger network cable Cage inlet (LC cabinets only) connection air temp sensor Airflow Heat exchanger (slot 3 rail) conditioner 48Vdc shelf 3 (XT5-HE LC only) 48Vdc shelf 2 L1 controller 48Vdc shelf 1 Blower speed controller (VFD) Blooewer PDU line filter XDP temperature XDP interface & humidity sensor
    [Show full text]
  • This Is Your Presentation Title
    Introduction to GPU/Parallel Computing Ioannis E. Venetis University of Patras 1 Introduction to GPU/Parallel Computing www.prace-ri.eu Introduction to High Performance Systems 2 Introduction to GPU/Parallel Computing www.prace-ri.eu Wait, what? Aren’t we here to talk about GPUs? And how to program them with CUDA? Yes, but we need to understand their place and their purpose in modern High Performance Systems This will make it clear when it is beneficial to use them 3 Introduction to GPU/Parallel Computing www.prace-ri.eu Top 500 (June 2017) CPU Accel. Rmax Rpeak Power Rank Site System Cores Cores (TFlop/s) (TFlop/s) (kW) National Sunway TaihuLight - Sunway MPP, Supercomputing Center Sunway SW26010 260C 1.45GHz, 1 10.649.600 - 93.014,6 125.435,9 15.371 in Wuxi Sunway China NRCPC National Super Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Computer Center in Cluster, Intel Xeon E5-2692 12C 2 Guangzhou 2.200GHz, TH Express-2, Intel Xeon 3.120.000 2.736.000 33.862,7 54.902,4 17.808 China Phi 31S1P NUDT Swiss National Piz Daint - Cray XC50, Xeon E5- Supercomputing Centre 2690v3 12C 2.6GHz, Aries interconnect 3 361.760 297.920 19.590,0 25.326,3 2.272 (CSCS) , NVIDIA Tesla P100 Cray Inc. DOE/SC/Oak Ridge Titan - Cray XK7 , Opteron 6274 16C National Laboratory 2.200GHz, Cray Gemini interconnect, 4 560.640 261.632 17.590,0 27.112,5 8.209 United States NVIDIA K20x Cray Inc. DOE/NNSA/LLNL Sequoia - BlueGene/Q, Power BQC 5 United States 16C 1.60 GHz, Custom 1.572.864 - 17.173,2 20.132,7 7.890 4 Introduction to GPU/ParallelIBM Computing www.prace-ri.eu How do
    [Show full text]
  • Jaguar Supercomputer
    Jaguar Supercomputer Jake Baskin '10, Jānis Lībeks '10 Jan 27, 2010 What is it? Currently the fastest supercomputer in the world, at up to 2.33 PFLOPS, located at Oak Ridge National Laboratory (ORNL). Leader in "petascale scientific supercomputing". Uses Massively parallel simulations. Modeling: Climate Supernovas Volcanoes Cellulose http://www.nccs.gov/wp- content/themes/nightfall/img/jaguarXT5/gallery/jaguar-1.jpg Overview Processor Specifics Network Architecture Programming Models NCCS networking Spider file system Scalability The guts 84 XT4 and 200 XT5 cabinets XT5 18688 compute nodes 256 service and i/o nodes XT4 7832 compute nodes 116 service and i/o nodes (XT5) Compute Nodes 2 Opteron 2435 Istanbul (6 core) processors per node 64K L1 instruction cache 65K L1 data cache per core 512KB L2 cache per core 6MB L3 cache per processor (shared) 8GB of DDR2-800 RAM directly attached to each processor by integrated memory controller. http://www.cray.com/Assets/PDF/products/xt/CrayXT5Brochure.pdf How are they organized? 3-D Torus topology XT5 and XT4 segments are connected by an InfiniBand DDR network 889 GB/sec bisectional bandwidth http://www.cray.com/Assets/PDF/products/xt/CrayXT5Brochure.pdf Programming Models Jaguar supports these programming models: MPI (Message Passing Interface) OpenMP (Open Multi Processing) SHMEM (SHared MEMory access library) PGAS (Partitioned global address space) NCCS networking Jaguar usually performs computations on large datasets. These datasets have to be transferred to ORNL. Jaguar is connected to ESnet (Energy Sciences Network, scientific institutions) and Internet2 (higher education institutions). ORNL owns its own optical network that allows 10Gb/s to various locations around the US.
    [Show full text]
  • Hardware Complexity and Software Challenges
    Trends in HPC: hardware complexity and software challenges Mike Giles Oxford University Mathematical Institute Oxford e-Research Centre ACCU talk, Oxford June 30, 2015 Mike Giles (Oxford) HPC Trends June 30, 2015 1 / 29 1 { 10? 10 { 100? 100 { 1000? Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Question ?! How many cores in my Dell M3800 laptop? Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Question ?! How many cores in my Dell M3800 laptop? 1 { 10? 10 { 100? 100 { 1000? Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Question ?! How many cores in my Dell M3800 laptop? 1 { 10? 10 { 100? 100 { 1000? Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Top500 supercomputers Really impressive { 300× more capability in 10 years! Mike Giles (Oxford) HPC Trends June 30, 2015 3 / 29 My personal experience 1982: CDC Cyber 205 (NASA Langley) 1985{89: Alliant FX/8 (MIT) 1989{92: Stellar (MIT) 1990: Thinking Machines CM5 (UTRC) 1987{92: Cray X-MP/Y-MP (NASA Ames, Rolls-Royce) 1993{97: IBM SP2 (Oxford) 1998{2002: SGI Origin (Oxford) 2002 { today: various x86 clusters (Oxford) 2007 { today: various GPU systems/clusters (Oxford) 2011{15: GPU cluster (Emerald @ RAL) 2008 { today: Cray XE6/XC30 (HECToR/ARCHER @ EPCC) 2013 { today: Cray XK7 with GPUs (Titan @ ORNL) Mike Giles (Oxford) HPC Trends June 30, 2015 4 / 29 Top500 supercomputers Power requirements are raising the question of whether this rate of improvement is sustainable.
    [Show full text]
  • Jaguar and Kraken -The World's Most Powerful Computer Systems
    Jaguar and Kraken -The World's Most Powerful Computer Systems Arthur Bland Cray Users’ Group 2010 Meeting Edinburgh, UK May 25, 2010 Abstract & Outline At the SC'09 conference in November 2009, Jaguar and Kraken, both located at ORNL, were crowned as the world's fastest computers (#1 & #3) by the web site www.Top500.org. In this paper, we will describe the systems, present results from a number of benchmarks and applications, and talk about future computing in the Oak Ridge Leadership Computing Facility. • Cray computer systems at ORNL • System Architecture • Awards and Results • Science Results • Exascale Roadmap 2 CUG2010 – Arthur Bland Jaguar PF: World’s most powerful computer— Designed for science from the ground up Peak performance 2.332 PF System memory 300 TB Disk space 10 PB Disk bandwidth 240+ GB/s Based on the Sandia & Cray Compute Nodes 18,688 designed Red Storm System AMD “Istanbul” Sockets 37,376 Size 4,600 feet2 Cabinets 200 3 CUG2010 – Arthur Bland (8 rows of 25 cabinets) Peak performance 1.03 petaflops Kraken System memory 129 TB World’s most powerful Disk space 3.3 PB academic computer Disk bandwidth 30 GB/s Compute Nodes 8,256 AMD “Istanbul” Sockets 16,512 Size 2,100 feet2 Cabinets 88 (4 rows of 22) 4 CUG2010 – Arthur Bland Climate Modeling Research System Part of a research collaboration in climate science between ORNL and NOAA (National Oceanographic and Atmospheric Administration) • Phased System Delivery • Total System Memory – CMRS.1 (June 2010) 260 TF – 248 TB DDR3-1333 – CMRS.2 (June 2011) 720 TF • File Systems
    [Show full text]