Early Evaluation of the Cray XC40 Xeon Phi System 'Theta' at Argonne
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Petaflops for the People
PETAFLOPS SPOTLIGHT: NERSC housands of researchers have used facilities of the Advanced T Scientific Computing Research (ASCR) program and its EXTREME-WEATHER Department of Energy (DOE) computing predecessors over the past four decades. Their studies of hurricanes, earthquakes, NUMBER-CRUNCHING green-energy technologies and many other basic and applied Certain problems lend themselves to solution by science problems have, in turn, benefited millions of people. computers. Take hurricanes, for instance: They’re They owe it mainly to the capacity provided by the National too big, too dangerous and perhaps too expensive Energy Research Scientific Computing Center (NERSC), the Oak to understand fully without a supercomputer. Ridge Leadership Computing Facility (OLCF) and the Argonne Leadership Computing Facility (ALCF). Using decades of global climate data in a grid comprised of 25-kilometer squares, researchers in These ASCR installations have helped train the advanced Berkeley Lab’s Computational Research Division scientific workforce of the future. Postdoctoral scientists, captured the formation of hurricanes and typhoons graduate students and early-career researchers have worked and the extreme waves that they generate. Those there, learning to configure the world’s most sophisticated same models, when run at resolutions of about supercomputers for their own various and wide-ranging projects. 100 kilometers, missed the tropical cyclones and Cutting-edge supercomputing, once the purview of a small resulting waves, up to 30 meters high. group of experts, has trickled down to the benefit of thousands of investigators in the broader scientific community. Their findings, published inGeophysical Research Letters, demonstrated the importance of running Today, NERSC, at Lawrence Berkeley National Laboratory; climate models at higher resolution. -
Ushering in a New Era: Argonne National Laboratory & Aurora
Ushering in a New Era Argonne National Laboratory’s Aurora System April 2015 ANL Selects Intel for World’s Biggest Supercomputer 2-system CORAL award extends IA leadership in extreme scale HPC Aurora Argonne National Laboratory >180PF Trinity NNSA† April ‘15 Cori >40PF NERSC‡ >30PF July ’14 + Theta Argonne National Laboratory April ’14 >8.5PF >$200M ‡ Cray* XC* Series at National Energy Research Scientific Computing Center (NERSC). † Cray XC Series at National Nuclear Security Administration (NNSA). 2 The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery & innovation >180 PFLOPS (option to increase up to 450 PF) 18X higher performance† >50,000 nodes Prime Contractor 13MW >6X more energy efficient† 2018 delivery Subcontractor Source: Argonne National Laboratory and Intel. †Comparison of theoretical peak double precision FLOPS and power consumption to ANL’s largest current system, MIRA (10PFs and 4.8MW) 3 Aurora | Science From Day One! Extreme performance for a broad range of compute and data-centric workloads Transportation Biological Science Renewable Energy Training Argonne Training Program on Extreme- Scale Computing Aerodynamics Biofuels / Disease Control Wind Turbine Design / Placement Materials Science Computer Science Public Access Focus Areas Focus US Industry and International Co-array Fortran Batteries / Solar Panels New Programming Models 4 Aurora | Built on a Powerful Foundation Breakthrough technologies that deliver massive benefits Compute Interconnect File System 3rd Generation 2nd Generation Intel® Xeon Phi™ Intel® Omni-Path Intel® Lustre* Architecture Software >17X performance† >20X faster† >3X faster† FLOPS per node >500 TB/s bi-section bandwidth >1 TB/s file system throughput >12X memory bandwidth† >2.5 PB/s aggregate node link >5X capacity† bandwidth >30PB/s aggregate >150TB file system capacity in-package memory bandwidth Integrated Intel® Omni-Path Architecture Processor code name: Knights Hill Source: Argonne National Laboratory and Intel. -
Architectural Trade-Offs in a Latency Tolerant Gallium Arsenide Microprocessor
Architectural Trade-offs in a Latency Tolerant Gallium Arsenide Microprocessor by Michael D. Upton A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) in The University of Michigan 1996 Doctoral Committee: Associate Professor Richard B. Brown, CoChairperson Professor Trevor N. Mudge, CoChairperson Associate Professor Myron Campbell Professor Edward S. Davidson Professor Yale N. Patt © Michael D. Upton 1996 All Rights Reserved DEDICATION To Kelly, Without whose support this work may not have been started, would not have been enjoyed, and could not have been completed. Thank you for your continual support and encouragement. ii ACKNOWLEDGEMENTS Many people, both at Michigan and elsewhere, were instrumental in the completion of this work. I would like to thank my co-chairs, Richard Brown and Trevor Mudge, first for attracting me to Michigan, and then for allowing our group the freedom to explore many different ideas in architecture and circuit design. Their guidance and motivation combined to make this a truly memorable experience. I am also grateful to each of my other dissertation committee members: Ed Davidson, Yale Patt, and Myron Campbell. The support and encouragement of the other faculty on the project, Karem Sakallah and Ron Lomax, is also gratefully acknowledged. My friends and former colleagues Mark Rossman, Steve Sugiyama, Ray Farbarik, Tom Rossman and Kendall Russell were always willing to lend their assistance. Richard Oettel continually reminded me of the valuable support of friends and family, and the importance of having fun in your work. Our corporate sponsors: Cascade Design Automation, Chronologic, Cadence, and Metasoft, provided software and support that made this work possible. -
Roofline Model Toolkit: a Practical Tool for Architectural and Program
Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis Yu Jung Lo, Samuel Williams, Brian Van Straalen, Terry J. Ligocki, Matthew J. Cordery, Nicholas J. Wright, Mary W. Hall, and Leonid Oliker University of Utah, Lawerence Berkeley National Laboratory {yujunglo,mhall}@cs.utah.edu {swwilliams,bvstraalen,tjligocki,mjcordery,njwright,loliker}@lbl.gov Abstract. We present preliminary results of the Roofline Toolkit for multicore, manycore, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level paral- lelism. These benchmarks are specialized to quantify the behavior of dif- ferent architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the perfor- mance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measure sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture char- acterization with the Roofline model based solely on architectural spec- ifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we in- strument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture. Keywords: Roofline, Memory Bandwidth, CUDA Unified Memory 1 Introduction The growing complexity of high-performance computing architectures makes it difficult for users to achieve sustained application performance across different architectures. -
TOP500 Supercomputer Sites
7/24/2018 News | TOP500 Supercomputer Sites HOME | SEARCH | REGISTER RSS | MY ACCOUNT | EMBED RSS | SUPER RSS | Contact Us | News | TOP500 Supercomputer Sites http://top500.org/blog/category/feature-article/feeds/rss Are you the publisher? Claim or contact Browsing the Latest Browse All Articles (217 Live us about this channel Snapshot Articles) Browser Embed this Channel Description: content in your HTML TOP500 News Search Report adult content: 04/27/18--03:14: UK Commits a 0 0 Billion Pounds to AI Development click to rate The British government and the private sector are investing close to £1 billion Account: (login) pounds to boost the country’s artificial intelligence sector. The investment, which was announced on Thursday, is part of a wide-ranging strategy to make the UK a global leader in AI and big data. More Channels Under the investment, known as the “AI Sector Deal,” government, industry, and academia will contribute £603 million in new funding, adding to the £342 million already allocated in existing budgets. That brings the grand total to Showcase £945 million, or about $1.3 billion at the current exchange rate. The UK RSS Channel Showcase 1586818 government is also looking to increase R&D spending across all disciplines by 2.4 percent, while also raising the R&D tax credit from 11 to 12 percent. This is RSS Channel Showcase 2022206 part of a broader commitment to raise government spending in this area from RSS Channel Showcase 8083573 around £9.5 billion in 2016 to £12.5 billion in 2021. RSS Channel Showcase 1992889 The UK government policy paper that describes the sector deal meanders quite a bit, describing a lot of programs and initiatives that intersect with the AI investments, but are otherwise free-standing. -
MPI on Aurora
AN OVERVIEW OF AURORA, ARGONNE’S UPCOMING EXASCALE SYSTEM ALCF DEVELOPERS SESSION COLLEEN BERTONI, SUDHEER CHUNDURI www.anl.gov AURORA: An Intel-Cray System Intel/Cray machine arriving at Argonne in 2021 Sustained Performance greater than 1 Exaflops 2 AURORA: A High-level View § Hardware Architecture: § Intel Xeon processors and Intel Xe GPUs § Greater than 10 PB of total memory § Cray Slingshot network and Shasta platform § IO • Uses Lustre and Distributed Asynchronous Object Store IO (DAOS) • Greater than 230 PB of storage capacity and 25 TB/s of bandwidth § Software (Intel One API umbrella): § Intel compilers (C,C++,Fortran) § Programming models: DPC++, OpenMP, OpenCL § Libraries: oneMKL, oneDNN, oneDAL § Tools: VTune, Advisor § Python 3 Node-level Hardware The Evolution of Intel GPUs Source: Intel 5 Intel GPUs § Intel Integrated GPUs are used for over a decade in § Laptops (e.g. MacBook pro) § Desktops § Servers § Recent and upcoming integrated generations : § Gen 9 – used in Skylake based nodes § Gen 11 – used in Ice Lake based nodes § Gen 9: Double precision peak performance: 100-300 GF § Low by design due to power and space limits Layout of Architecture components for an Intel Core i7 processor 6700K for desktop systems (91 W TDP, 122 mm) § Future Intel Xe (Gen 12) GPU series will provide both integrated and discrete GPUs 6 Intel GPU Building Blocks EU: Execution Unit Subslice L2 Slice: 24 EUs SIMD FPU Dispatch&I$ L1 Shared Local Memory (64 KB/subslice) SIMD FPU Subslice: 8 EUs Send 8x EU Sampler L2 $ L3 Data Cache Branch Dataport -
Characterizing and Understanding HPC Job Failures Over the 2K-Day Life of IBM Bluegene/Q System
Characterizing and Understanding HPC Job Failures over The 2K-day Life of IBM BlueGene/Q System Sheng Di,∗ Hanqi Guo,∗ Eric Pershey,∗ Marc Snir,y Franck Cappello∗y ∗Argonne National Laboratory, IL, USA [email protected], [email protected], [email protected], [email protected] yUniversity of Illinois at Urbana-Champaign, IL, USA [email protected] Abstract—An in-depth understanding of the failure features of Blue Joule (UK), also adopt the same Blue Gene/Q system HPC jobs in a supercomputer is critical to the large-scale system architecture and they are still in operation. maintenance and improvement of the service quality for users. In Studying the job failure features in a large-scale system is this paper, we investigate the features of hundreds of thousands of jobs in one of the most powerful supercomputers, the IBM nontrivial in that it involves numerous messages logged from Blue Gene/Q Mira, based on 2001 days of observations with a across multiple data sources and many messages are heavily total of over 32.44 billion core-hours. We study the impact of the duplicated [2]. In our work, we performed a joint analysis system’s events on the jobs’ execution in order to understand the by leveraging four different data sources: the reliability, avail- system’s reliability from the perspective of jobs and users. The ability, and serviceability (RAS) log; task execution log; job characterization involves a joint analysis based on multiple data sources, including the reliability, availability, and serviceability scheduling log; and I/O behavior log. The RAS log is the most (RAS) log; job scheduling log; the log regarding each job’s important system log related to system reliability such as node physical execution tasks; and the I/O behavior log. -
June 2012 | TOP500 Supercomputing Sites
PROJECT LISTS STATISTICS RESOURCES NEWS CONTACT SUBMISSIONS LINKS HOME Home Lists June 2012 MANNHEIM, Germany; BERKELEY, Calif.; and KNOXVILLE, Tenn.—For the first time since November 2009, a United Contents States supercomputer sits atop the TOP500 list of the world’s top supercomputers. Named Sequoia, the IBM BlueGene/Q system installed at the Department of Energy’s Lawrence Livermore National Laboratory achieved an impressive 16.32 Release petaflop/s on the Linpack benchmark using 1,572,864 cores. Top500 List Sequoia is also one of the most energy efficient systems on Press Release (PDF) the list, which will be released Monday, June 18, at the 2012 Press Release International Supercomputing Conference in Hamburg, Germany. This will mark the 39th edition of the list, which is List highlights compiled twice each year. Performance Development On the latest list, Fujitsu’s “K Computer” installed at the RIKEN Related Files Advanced Institute for Computational Science (AICS) in Kobe, Japan, is now the No. 2 system with 10.51 Pflop/s on the TOP500 List (XML) Linpack benchmark using 705,024 SPARC64 processing TOP500 List (Excel) A 1.044 persone piace cores. The K Computer held the No. 1 spot on the previous TOP500 Poster Mi piace two lists. questo elemento. Di' che Poster in PDF piace anche a te, prima di The new Mira supercomputer, an IBM BlueGene/Q system at tutti i tuoi amici. Argonne National Laboratory in Illinois, debuted at No. 3, with Drilldown 8.15 petaflop/s on the Linpack benchmark using 786,432 Performance Development cores. The other U.S. -
Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA
Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Kazuhiko Komatsu, S. Momose, Y. Isobe, O. Watanabe, A. Musa, M. Yokokawa, T. Aoyama, M. Sato, H. Kobayashi Tohoku University 14 November, 2018 SC18 Outline • Background • Overview of SX-Aurora TSUBASA • Performance evaluation • Benchmark performance • Application performance • Conclusions 14 November, 2018 SC18 2 Background • Supercomputers become important infrastructures • Widely used for scien8fic researches as well as various industries • Top1 Summit system reaches 143.5 Pflop/s • Big gap between theore8cal performance and sustained performance ◎ Compute-intensive applica8ons stand to benefit from high peak performance ✖Memory-intensive applica8ons are limited by lower memory performance Memory performance has gained more and more attentions 14 November, 2018 SC18 3 A new vector supercomputer SX-Aurora TSUBASA • Two important concepts of its design • High usability • High sustained performance • New memory integration technology • Realize the world’s highest memory bandwidth • New architecture • Vector host (VH) is attached to vector engines (VEs) • VE is responsible for executing an entire application • VH is used for processing system calls invoked by the applications VH VEVE 14 November, 2018 SC18 5 X86 Linux Vector processor New execution model • Conven1onal model • New execution model Host GPU VH VE Start Exe module load Start processing processing Transparent System call Kernel offload (I/O, etc) execution OS function Kernel Finish execution processing Fisnish processing -
Supercomputers – Prestige Objects Or Crucial Tools for Science and Industry?
Supercomputers – Prestige Objects or Crucial Tools for Science and Industry? Hans W. Meuer a 1, Horst Gietl b 2 a University of Mannheim & Prometeus GmbH, 68131 Mannheim, Germany; b Prometeus GmbH, 81245 Munich, Germany; This paper is the revised and extended version of the Lorraine King Memorial Lecture Hans Werner Meuer was invited by Lord Laird of Artigarvan to give at the House of Lords, London, on April 18, 2012. Keywords: TOP500, High Performance Computing, HPC, Supercomputing, HPC Technology, Supercomputer Market, Supercomputer Architecture, Supercomputer Applications, Supercomputer Technology, Supercomputer Performance, Supercomputer Future. 1 e-mail: [email protected] 2 e-mail: [email protected] 1 Content 1 Introduction ..................................................................................................................................... 3 2 The TOP500 Supercomputer Project ............................................................................................... 3 2.1 The LINPACK Benchmark ......................................................................................................... 4 2.2 TOP500 Authors ...................................................................................................................... 4 2.3 The 39th TOP500 List since 1993 .............................................................................................. 5 2.4 The 39th TOP10 List since 1993 ............................................................................................... -
An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers
ORNL/TM-2020/1561 An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers Hyogi Sim Awais Khan Sudharshan S. Vazhkudai Approved for public release. Distribution is unlimited. August 11, 2020 DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via US Department of Energy (DOE) SciTech Connect. Website: www.osti.gov/ Reports produced before January 1, 1996, may be purchased by members of the public from the following source: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone: 703-605-6000 (1-800-553-6847) TDD: 703-487-4639 Fax: 703-605-6900 E-mail: [email protected] Website: http://classic.ntis.gov/ Reports are available to DOE employees, DOE contractors, Energy Technology Data Ex- change representatives, and International Nuclear Information System representatives from the following source: Office of Scientific and Technical Information PO Box 62 Oak Ridge, TN 37831 Telephone: 865-576-8401 Fax: 865-576-5728 E-mail: [email protected] Website: http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal lia- bility or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or rep- resents that its use would not infringe privately owned rights. Refer- ence herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not nec- essarily constitute or imply its endorsement, recommendation, or fa- voring by the United States Government or any agency thereof. -
ALCF Newsbytes Argonne Leadership Computing Facility Argonne National Laboratory
ALCF Newsbytes Argonne Leadership Computing Facility Argonne National Laboratory Volume 2, Issue 3 | Summer 2012 Table of Contents Program Awards 247 Million Hours Program Awards 247 Million Hours .......... 1 of ALCF Supercomputer Time Mira Ranks Third on TOP500 .................... 2 Rapid Procedure Unites Quantum Chemistry with Artificial Intelligence ........ 4 Nine research projects have been awarded 247 million Lithium-Air: The “Mount Everest” processor hours of computing time at the Argonne of Batteries for Electric Vehicles ............... 5 Spotlight: Steve Crusan ............................. 6 Leadership Computing Facility (ALCF) through the ALCF Systems Shine on Green500 ............. 7 U.S. Department of Energy’s (DOE) initiative—the ASCR Mira Shares Graph 500 Top Spot ............... 8 Leadership Computing Challenge (ALCC). Chosen through a Leap to Petascale Workshop in Review ..... 8 peer-review process, the projects selected reflect areas of special interest to the DOE: energy, national emergencies, and broadening the community of researchers capable of Events of Interest Argonne’s Energy Showcase using leadership-class computing resources. Saturday, September 15, 2012 9 a.m. – 4 p.m. Argonne National Laboratory will open its gates to the community for a day of discovery ALCC awards of compute time on Intrepid (the ALCF’s Blue Gene/P) become available and fun for the whole family. The event is free in July for the following recipients: and open to the public. Advance registration is required as attendance will be limited.