No Slide Title

Total Page:16

File Type:pdf, Size:1020Kb

No Slide Title 20 Jahre TOP500 mit einem Ausblick auf neuere Entwicklungen Hans Meuer Prometeus GmbH & Universität Mannheim [email protected] ZKI Herbsttagung in Leipzig 11. - 12. September 2012 page 1 Outline Mannheim Supercomputer Statistics 1986 - 1992 & TOP500 Project Start in 1993 39 TOP500 Lists / Highlights of the 39th List and Developments Competition between Countries, Manufacturers and Sites Linpack Pros and Cons & Alternative Supercomputer Rankings HPC & Cloud Computing Summary ZKI Herbsttagung in Leipzig 2012 page 2 Mannheim Supercomputer Statistics & TOP500 Project Start in 1993 Mannheim Supercomputer Statistics 1986 – 1992 Presented at ISC’86 – ISC’92/Mannheim Supercomputer Seminars Counting of Supercomputers in the World 530 systems in the year 1992 ZKI Herbsttagung in Leipzig 2012 page 3 Mannheim Supercomputer Statistics & TOP500 Project Start in 1993 Manufacturer Share 100% NEC Hitachi 80% Fujitsu 60% CDC Cray 40% 20% 0% 1986 1987 1988 1989 1990 1991 1992 ZKI Herbsttagung in Leipzig 2012 page 4 Mannheim Supercomputer Statistics & TOP500 Project Start in 1993 Countries Share 100% Others 90% Others Europe 80% 70% Germany 60% France 50% UK 40% Japan 30% USA 20% 10% 0% 1986 1987 1988 1989 1990 1991 1992 ZKI Herbsttagung in Leipzig 2012 page 5 Mannheim Supercomputer Statistics & TOP500 Project Start in 1993 Deficits of the Mannheim Supercomputer Statistics Not very reliable Vector Supercomputers only – „numerical“/ mainly „scientific and engineering“ applications Definition of „Supercomputer“ necessary ZKI Herbsttagung in Leipzig 2012 page 6 Mannheim Supercomputer Statistics & TOP500 Project Start in 1993 Top500 Procedure Listing the 500 most powerful computers in the world Yardstick: Rmax of Linpack Solve Ax=b, dense problem, matrix is random Update twice a year: • ISC’xy in June in Germany • SC’xy in November in the U.S. All information available from the TOP500 webserver at: www.top500.org ZKI Herbsttagung in Leipzig 2012 page 7 Mannheim Supercomputer Statistics & TOP500 Project Start in 1993 Top500 Authors Project was started in spring 1993 by: Hans W. Meuer Erich Strohmaier Authors since: O6/1993 11/2000 Jack Dongarra Horst Simon ZKI Herbsttagung in Leipzig 2012 page 8 Mannheim Supercomputer Statistics & TOP500 Project Start in 1993 Dongarra's TOP500 List of World's Fastest Supercomputers Released at Mannheim Conference „CRPC researcher Jack Dongarra of the University of Tennessee and Oak Ridge National Laboratory is one of three renowned computer scientists who assemble the legendary TOP500 List of the world's fastest supercomputers. Released twice a year since 1993, the list features sites with the most powerful computer systems, determined with information from a questionnaire sent to high-performance computer (HPC) experts, computational scientists, manufacturers, and the Internet community at large. Dongarra, Hans Meuer of the University of Mannheim, and Erich Strohmaier of the University of Tennessee released their June 1999 TOP500 List at the 14th Mannheim Supercomputing Conference and Seminar, held June 10-12 in Mannheim, Germany.“ Source: http://www.crpc.rice.edu/WhatsNew/top500.html The Center for Research on Parallel Computation at Rice University ZKI Herbsttagung in Leipzig 2012 page 9 Outline Mannheim Supercomputer Statistics 1986 - 1992 & TOP500 Project Start in 1993 39 TOP500 Lists / Highlights of the 39th List and Developments Competition between Countries, Manufacturers and Sites Linpack Pros and Cons & Alternative Supercomputer Rankings HPC & Cloud Computing Summary ZKI Herbsttagung in Leipzig 2012 page 10 39 TOP500 Lists / Highlights of the 39th List and Developments 38th List: The TOP10 – Nov 11 Rmax Power Rank Site Manufacturer Computer Country Cores [Pflops] [MW] K Computer RIKEN Advanced Institute for 1 Fujitsu SPARC64 VIIIfx 2.0GHz, Japan 795,024 10.51 12.66 Computational Science Tofu Interconnect Tianhe-1A National SuperComputer 2 NUDT NUDT TH MPP, China 186,368 2.566 4.04 Center in Tianjin Xeon 6C, NVidia, FT-1000 8C Oak Ridge National Jaguar 3 Cray USA 224,162 1.759 6.95 Laboratory Cray XT5, HC 2.6 GHz Nebulae National Supercomputing 4 Dawning TC3600 Blade, Intel X5650, NVidia China 120,640 1.271 2.58 Centre in Shenzhen Tesla C2050 GPU TSUBAME-2 GSIC, Tokyo Institute of 5 NEC/HP HP ProLiant, Xeon 6C, NVidia, Japan 73,278 1.192 1.40 Technology Linux/Windows Cielo 6 DOE/NNSA/LANL/SNL Cray USA 142,272 1.110 3.98 Cray XE6, 8C 2.4 GHz NASA/Ames Research Pleiades 7 SGI USA 111,104 1.088 4.10 Center/NAS SGI Altix ICE 8200EX/8400EX DOE/SC/ Hopper 8 Cray USA 153,408 1.054 2.91 LBNL/NERSC Cray XE6, 6C 2.1 GHz Commissariat a l'Energie Tera 100 9 Bull France 138.368 1.050 4.59 Atomique (CEA) Bull bullx super-node S6010/S6030 Roadrunner 10 DOE/NNSA/LANL IBM USA 122,400 1.042 2.34 BladeCenter QS22/LS21 ZKI Herbsttagung in Leipzig 2012 39 TOP500 Lists / Highlights of the 39th List and Developments 39th List: The TOP10 – Jun 12 Power # Site Manufacturer Computer Country Cores Rmax [Pflops] [MW] Sequoia Lawrence Livermore National 1 IBM BlueGene/Q, USA 1,572,864 16.3 7.89 Laboratory Power BQC 16C 1.6GHz, Custom K Computer RIKEN Advanced Institute for 2 Fujitsu SPARC64 VIIIfx 2.0GHz, Japan 795,024 10.5 12.66 Computational Science Tofu Interconnect Mira 3 Argonne National Laboratory IBM BlueGene/Q, USA 786,432 8.16 3.95 Power BQC 16C 1.6GHz, Custom SuperMUC 4 Leibniz Rechenzentrum IBM iDataPlex DX360M4, Germany 147,456 2.90 3.52 Xeon E5 8C 2.7GHz, Infiniband FDR Tianhe-1A National SuperComputer 5 NUDT NUDT TH MPP, China 186,368 2.57 4.04 Center in Tianjin Xeon 6C, NVidia, FT-1000 8C Jaguar 6 Oak Ridge National Laboratory Cray Cray XK6, Opteron 16C 2.2GHz, Gemini, USA 298,592 1.94 5.14 NVIDIA 2090 Fermi 7 CINECA IBM BlueGene/Q, Italy 163,840 1.73 0.82 Power BQC 16C 1.6GHz, Custom JuQUEEN Forschungszentrum Juelich 8 IBM BlueGene/Q, Germany 131,072 1.38 0.66 (FZJ) Power BQC 16C 1.6GHz, Custom Commissariat a l'Energie Curie thin nodes 9 Atomique Bull Bullx B510, France 77,184 1.36 2.25 CEA/TGCC-GENCI Xeon E5 8C 2.7GHz, Infiniband QDR Nebulae National Supercomputing 10 Dawning TC3600 Blade, China 120,640 1.27 2.58 Centre in Shenzhen Intel X5650, NVidia Tesla C2050 GPU ZKI Herbsttagung in Leipzig 2012 100 150 200 250 300 350 50 0 Replacement Rate Replacement 1993 1994 1995 1996 1997 1998 and Developmentsand TOP500 39 ZKI ZKI 1999 Herbsttagung 2000 2001 2002 / Lists in Leipzig 2012 Leipzig in 2003 2004 Highlights of the 39th List of 39th theHighlights 2005 2006 2007 2008 2009 2010 2011 170 2012 39 TOP500 Lists / Highlights of the 39th List and Developments Performance Development 1 Eflop/s 123.03 PFlop/s 100 Pflop/s 16.32 PFlop/s 10 Pflop/s IBM Sequoia Roadrunner BlueGene/Q 1 Pflop/s SUM Fujitsu K Computer 100 Tflop/s IBM N=1 BlueGene/L NEC 60.82 TFlop/s 10 Tflop/s Earth Simulator 1.17 TFlop/s IBM ASCI White 1 Tflop/s Intel ASCI Red LLNL Notebook 59.7 GFlop/s Sandia 100 Gflop/s Fujitsu N=500 Notebook NWT NAL 10 Gflop/s 1 Gflop/s 400 MFlop/s Notebook 100 Mflop/s 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 ZKI Herbsttagung in Leipzig 2012 page 14 39 TOP500 Lists / Highlights of the 39th List and Developments Performance Projection 100 Eflop/s 1 Eflop/s 10 Eflop/s 1 Pflop/s 1 Eflop/s 100 Pflop/s 10 Pflop/s SUM 1 Pflop/s 1 Tflop/s 100 Tflop/s N=1 10 Tflop/s 1 Tflop/s Notebook 6-8 years 100 Gflop/s N=500 Notebook 10 Gflop/s 8-10 years 1 Gflop/s Notebook 100 Mflop/s 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 ZKI Herbsttagung in Leipzig 2012 page 15 Outline Mannheim Supercomputer Statistics 1986 - 1992 & TOP500 Project Start in 1993 39 TOP500 Lists / Highlights of the 39th List and Developments Competition between Countries, Manufacturers and Sites Linpack Pros and Cons & Alternative Supercomputer Rankings HPC & Cloud Computing Summary ZKI Herbsttagung in Leipzig 2012 page 16 Competition between Countries, Manufacturers and Sites 39th List: Countries / System Share Italy United States 2% others China Canada 12% 2% Japan Germany United States United Kingdom 4% 50% France France Germany 4% Canada Italy United Japan others Kingdom 7% China 5% 14% ZKI Herbsttagung in Leipzig 2012 Competition between Countries, Manufacturers and Sites Performance of Countries 100.000 ] US 10.000 EU [Tflop/s 1.000 Japan 100 China 10 1 Total Performance Performance Total 0 2000 2002 2004 2006 2008 2010 2012 ZKI Herbsttagung in Leipzig 2012 100 120 140 20 40 60 80 0 1993 Countries Asian 1994 1995 1996 1997 1998 1999 2000 and Sites Manufacturers between Countries, Competition ZKI ZKI 2001 Herbsttagung 2002 2003 2004 2005 in Leipzig 2012 Leipzig in 2006 2007 2008 2009 2010 2011 2012 Japan South Korea, China India Others Competition between Countries, Manufacturers and Sites European Countries 200 180 Others 160 Poland 140 Russia 120 Spain 100 Sweden Switzerland Systems 80 Netherlands 60 Italy 40 France 20 United Kingdom 0 Germany 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 ZKI Herbsttagung in Leipzig 2012 Competition between Countries, Manufacturers and Sites 39th List: Vendors / System Share Fujitsu; 7; 1% IBM Others; 46; 9% HP Oracle; 7; 1% Cray Inc. Dell; 12; 3% Appro Bull; 16; 3% SGI IBM; 213; 43% Bull SGI; 16; 3% Dell Appro; 19; 4% Oracle Fujitsu Cray Inc.; Others 27; 6% HP; 137; 27% ZKI Herbsttagung in Leipzig 2012 Competition between Countries, Manufacturers and Sites 39th List: Vendors (TOP50) / System Share IBM Cray Inc Bull SA Appro Others; 14; Others 28% IBM; 22; 44% Appro; 3; 6% Cray Inc; Bull SA; 4; 8% 7; 14% ZKI Herbsttagung in Leipzig 2012 Competition between Countries, Manufacturers and Sites Top20 Sites since 1993 Rank Sites Country Overall Share (1993-2012) 1 Lawrence Livermore National Laboratory USA 5.50% 2 Los Alamos National Laboratory USA 3.30% 3 Sandia National Laboratories USA 2.61% 4 Government USA 2.28% 5 DOE/SC/Oak Ridge National Laboratory USA 2.18% 6 Japan Agency for Marine -Earth Science and Technology Japan 1.60% 7 NASA/Ames Research Center/NAS USA 1.47% 8 National Aerospace Laboratory of Japan Japan 1.29% 9 Forschungszentrum Juelich (FZJ) Germany 1.25% 10 NCSA USA 1.07% 11 DOE/SC/LBNL/NERSC USA 1.07% 12 ECMWF U.K.
Recommended publications
  • Performance Evaluation of the Intel Sandy Bridge Based NASA
    Performance Evaluation of the Intel Sandy Bridge Based NASA Pleiades Using Scientific and Engineering Applications Subhash Saini, Johnny Chang, Haoqiang Jin NASA Advanced Supercomputing Division NASA Ames Research Center Moffett Field, California 94035-1000, USA {subhash.saini, johnny.chang, haoqiang.jin}@nasa.gov Abstract — We present a performance evaluation of Pleiades based based on the Many Integrated Core (code-named Knight’s on the Intel Xeon E5-2670 processor, a fourth-generation eight-core Corner) architecture and Yellowstone [1, 5, 6]. New and Sandy Bridge architecture, and compare it with the previous third extended features of Sandy Bridge architecture are: generation Nehalem architecture. Several architectural features have been incorporated in Sandy Bridge: (a) four memory channels as a) A ring to connect on-chip L3 cache with cores, system opposed to three in Nehalem; (b) memory speed increased from 1333 agent, memory controller, and QPI agent and I/O MHz to 1600 MHz; (c) ring to connect on-chip L3 cache with cores, controller to increase the scalability. L3 cache per core system agent, memory controller, and QPI agent and I/O controller to has been increased from 2 MB to 2.5 MB. increase the scalability; (d) new AVX unit with wider vector registers of 256 bit; (e) integration of PCI-Express 3.0 controllers into the I/O b) New micro-ops (L0) cache that caches instructions as subsystem on chip; (f) new Turbo Boost version 2.0 where base they are decoded. The cache is direct mapped and can frequency of processor increased from 2.6 to 3.2 GHz; and (g) QPI link store 1.5 K micro-ops.
    [Show full text]
  • President Lands at Moffett Field During Silicon Valley Visit by Huong Nguyen and Jessica Culler President Barack Obama’S Arrival Onboard Air Force One on Sunday, Sept
    Fall 2011 - A Quarterly Publication Thousands turn out to tour airborne observatory at Ames The Stratospheric Observatory for In- frared Astronomy (SOFIA) visited NASA Ames and provided a rare opportunity to tour the airborne observatory in October. News media and Ames employees were invited to tour on Friday, Oct. 14 and the public on Saturday, Oct. 15. On Friday, an estimated 2,500 people, including representatives from CNET, Fox News, KQED (PBS), New Scientist, Space. com, Mountain View Patch and the San Mateo Daily Journal attended the event. On Saturday, an estimated 5,500 people visited Ames to tour SOFIA. See page 6 for a feature about the SOFIA visit. NASA photo by Doiminic Hart President lands at Moffett Field during Silicon Valley visit BY HUONG NGUYEN AND JESSICA CULLER President Barack Obama’s arrival onboard Air Force One on Sunday, Sept. 25, 2011, at Moffett Federal Airfield marked his first land- ing at NASA Ames. Center Director Pete Worden met President Barack Obama upon his arrival. "I had the honor to meet President Obama when he arrived at Moffett Federal Airfield," said Worden, who along with San Jose Mayor Chuck Reed and Mountain View Mayor Jac Siegel, greeted the president. "It was fitting that the president came to Silicon Valley to talk about his job creation plan, given how critical Silicon Valley is to the future of the U.S. economy," Worden said. NASA Ames is an integral com- ponent of the world-renowned inno- vation economy and actively partici- pates in the valley’s technological Photo credit: Official White House Photo by Pete Souza and scientific evolution.
    [Show full text]
  • Presentación De Powerpoint
    Towards Exaflop Supercomputers Prof. Mateo Valero Director of BSC, Barcelona National U. of Defense Technology (NUDT) Tianhe-1A • Hybrid architecture: • Main node: • Two Intel Xeon X5670 2.93 GHz 6-core Westmere, 12 MB cache • One Nvidia Tesla M2050 448-ALU (16 SIMD units) 1150 MHz Fermi GPU: • 32 GB memory per node • 2048 Galaxy "FT-1000" 1 GHz 8-core processors • Number of nodes and cores: • 7168 main nodes * (2 sockets * 6 CPU cores + 14 SIMD units) = 186368 cores (not including 16384 Galaxy cores) • Peak performance (DP): • 7168 nodes * (11.72 GFLOPS per core * 6 CPU cores * 2 sockets + 36.8 GFLOPS per SIMD unit * 14 SIMD units per GPU) = 4701.61 TFLOPS • Linpack performance: 2.507 PF 53% efficiency • Power consumption 4.04 MWatt Source http://blog.zorinaq.com/?e=36 Cartagena, Colombia, May 18-20 Top10 Rank Site Computer Procs Rmax Rpeak 1 Tianjin, China XeonX5670+NVIDIA 186368 2566000 4701000 2 Oak Ridge Nat. Lab. Cray XT5,6 cores 224162 1759000 2331000 3 Shenzhen, China XeonX5670+NVIDIA 120640 1271000 2984300 4 GSIC Center, Tokyo XeonX5670+NVIDIA 73278 1192000 2287630 5 DOE/SC/LBNL/NERSC Cray XE6 12 cores 153408 1054000 1288630 Commissariat a l'Energie Bull bullx super-node 6 138368 1050000 1254550 Atomique (CEA) S6010/S6030 QS22/LS21 Cluster, 7 DOE/NNSA/LANL PowerXCell 8i / Opteron 122400 1042000 1375780 Infiniband National Institute for 8 Computational Cray XT5-HE 6 cores 98928 831700 1028850 Sciences/University of Tennessee 9 Forschungszentrum Juelich (FZJ) Blue Gene/P Solution 294912 825500 825500 10 DOE/NNSA/LANL/SNL Cray XE6 8-core 107152 816600 1028660 Cartagena, Colombia, May 18-20 Looking at the Gordon Bell Prize • 1 GFlop/s; 1988; Cray Y-MP; 8 Processors • Static finite element analysis • 1 TFlop/s; 1998; Cray T3E; 1024 Processors • Modeling of metallic magnet atoms, using a variation of the locally self-consistent multiple scattering method.
    [Show full text]
  • The Outer Limits
    Welcome to the outer limits Budapest, March 19, 2013 M.Sc. Ji ří Hlavá č HPC consultant + sales manager for CEE [email protected] ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 1 Jiri Hlavac (by Czech Ji ří Hlavá č) 51 years old, 4 children … • MSc. Computers (1986) • Development of PC OSs for Tesla Czech (1986-1989) • Own SW company (1986-1991) • Owner SGI Distributor @ Czechoslovakia (1991-1995) • Employee @ SGI Czech office (1995-now) • Technical Director, Academic Sales, Enterprise Sales • HPC Consultant (2001-2011) • Sales Manager for Central + East Europe (2005-now) ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 2 SGI = Experts @ HPC Structural Mechanics Structural Mechanics Computational Fluid Electro-Magnetics Implicit Explicit Dynamics Computational Chemistry Computational Chemistry Computational Biology Seismic Processing Quantum Mechanics Molecular Dynamics Reservoir Simulation Rendering / Ray Tracing Climate / Weather Data Analytics Ocean Simulation ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 3 SGI = Focus on every detail (here Power Consumption) ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure4 Agreement 4 SGI = Frontier @ Research SGI = Winner of last HPCwire Readers' Choice Award (Nov 2012) for "Top Supercomputing Achievement" for SGI contribution to the NASA Ames Pleiades supercomputer. SGI = Winner of last HPCwire Editor's Choice Award (Nov 2012) for "Best use of HPC in 'edge HPC‘ application" for Wikipedia historical mapping and exploration on UV 2000. SGI stock is growing 5 ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure5 Agreement 5 Advanced Energy Exploration and Production Total: World's Largest Commercial HPC System (2.3 PF) SGI ICE ©2012 Silicon Graphics International Corp.
    [Show full text]
  • NASA Announces a New Approach to Earth Science Data Analysis 20 April 2010
    NASA Announces A New Approach To Earth Science Data Analysis 20 April 2010 (PhysOrg.com) -- The way we analyze planet Earth than ten hours. will never be the same, thanks to a new initiative at NASA that integrates supercomputers with global NEX uses a new approach for collaboration among satellite observations and sophisticated models of scientists and science teams working to model the the Earth system in an online collaborative Earth system and analyze large Earth observation environment. As part of its celebration of Earth datasets. Using on-line collaboration technologies, Week, NASA unveiled the NASA Earth Exchange NEX will bring together geographically dispersed (NEX) at a “Green Earth” public forum held at the multi-disciplinary groups of scientists focused on NASA Exploration Center, Moffett Field, Calif. global change research. Scientists will be able to build custom project environments containing the By making NEX available, NASA expects to better datasets and software components needed to solve enable scientists to collaboratively conduct complex Earth science problems. These project research and address the impacts of changes in environments, built using virtualization technology, climate and land use patterns on ecosystems. NEX will be highly portable and reusable and will will link NASA’s supercomputing resources with automatically capture the entire analysis process, massive Earth system data sets, and provide a including the data and processing steps required to collection of tools for analysis and visualization. replicate the results in an open and transparent way. For example, results from the processing of "Currently, it can require months for scientists to the global Landsat data would be available to gather and analyze global-scale data sets, due to scientists with the additional expertise required to computing limitations, data storage requirements analyze rates of urbanization, deforestation, or and network bandwidth constraints”, said biodiversity impacts.
    [Show full text]
  • Ushering in a New Era: Argonne National Laboratory & Aurora
    Ushering in a New Era Argonne National Laboratory’s Aurora System April 2015 ANL Selects Intel for World’s Biggest Supercomputer 2-system CORAL award extends IA leadership in extreme scale HPC Aurora Argonne National Laboratory >180PF Trinity NNSA† April ‘15 Cori >40PF NERSC‡ >30PF July ’14 + Theta Argonne National Laboratory April ’14 >8.5PF >$200M ‡ Cray* XC* Series at National Energy Research Scientific Computing Center (NERSC). † Cray XC Series at National Nuclear Security Administration (NNSA). 2 The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery & innovation >180 PFLOPS (option to increase up to 450 PF) 18X higher performance† >50,000 nodes Prime Contractor 13MW >6X more energy efficient† 2018 delivery Subcontractor Source: Argonne National Laboratory and Intel. †Comparison of theoretical peak double precision FLOPS and power consumption to ANL’s largest current system, MIRA (10PFs and 4.8MW) 3 Aurora | Science From Day One! Extreme performance for a broad range of compute and data-centric workloads Transportation Biological Science Renewable Energy Training Argonne Training Program on Extreme- Scale Computing Aerodynamics Biofuels / Disease Control Wind Turbine Design / Placement Materials Science Computer Science Public Access Focus Areas Focus US Industry and International Co-array Fortran Batteries / Solar Panels New Programming Models 4 Aurora | Built on a Powerful Foundation Breakthrough technologies that deliver massive benefits Compute Interconnect File System 3rd Generation 2nd Generation Intel® Xeon Phi™ Intel® Omni-Path Intel® Lustre* Architecture Software >17X performance† >20X faster† >3X faster† FLOPS per node >500 TB/s bi-section bandwidth >1 TB/s file system throughput >12X memory bandwidth† >2.5 PB/s aggregate node link >5X capacity† bandwidth >30PB/s aggregate >150TB file system capacity in-package memory bandwidth Integrated Intel® Omni-Path Architecture Processor code name: Knights Hill Source: Argonne National Laboratory and Intel.
    [Show full text]
  • Performance Evaluation of the Intel Sandy Bridge Based NASA
    NAS Technical Report: NAS-2015-05 Performance Evaluation of the Intel Sandy Bridge Based NASA Pleiades Using Scientific and Engineering Applications Subhash Saini, Johnny Chang, Haoqiang Jin NASA Advanced Supercomputing Division NASA Ames Research Center Moffett Field, California 94035-1000, USA {subhash.saini, johnny.chang, haoqiang.jin}@nasa.gov Abstract — We present a performance evaluation of Pleiades based based on the Many Integrated Core (code-named Knight’s on the Intel Xeon E5-2670 processor, a fourth-generation eight-core Corner) architecture and Yellowstone [1, 5, 6]. New and Sandy Bridge architecture, and compare it with the previous third extended features of Sandy Bridge architecture are: generation Nehalem architecture. Several architectural features have been incorporated in Sandy Bridge: (a) four memory channels as a) A ring to connect on-chip L3 cache with cores, system opposed to three in Nehalem; (b) memory speed increased from 1333 agent, memory controller, and QPI agent and I/O MHz to 1600 MHz; (c) ring to connect on-chip L3 cache with cores, controller to increase the scalability. L3 cache per core system agent, memory controller, and QPI agent and I/O controller to has been increased from 2 MB to 2.5 MB. increase the scalability; (d) new AVX unit with wider vector registers of 256 bit; (e) integration of PCI-Express 3.0 controllers into the I/O b) New micro-ops (L0) cache that caches instructions as subsystem on chip; (f) new Turbo Boost version 2.0 where base they are decoded. The cache is direct mapped and can frequency of processor increased from 2.6 to 3.2 GHz; and (g) QPI link store 1.5 K micro-ops.
    [Show full text]
  • Roofline Model Toolkit: a Practical Tool for Architectural and Program
    Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis Yu Jung Lo, Samuel Williams, Brian Van Straalen, Terry J. Ligocki, Matthew J. Cordery, Nicholas J. Wright, Mary W. Hall, and Leonid Oliker University of Utah, Lawerence Berkeley National Laboratory {yujunglo,mhall}@cs.utah.edu {swwilliams,bvstraalen,tjligocki,mjcordery,njwright,loliker}@lbl.gov Abstract. We present preliminary results of the Roofline Toolkit for multicore, manycore, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level paral- lelism. These benchmarks are specialized to quantify the behavior of dif- ferent architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the perfor- mance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measure sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture char- acterization with the Roofline model based solely on architectural spec- ifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we in- strument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture. Keywords: Roofline, Memory Bandwidth, CUDA Unified Memory 1 Introduction The growing complexity of high-performance computing architectures makes it difficult for users to achieve sustained application performance across different architectures.
    [Show full text]
  • Project Status Report High End Computing Capability Strategic Capabilities Assets Program December 10, 2014
    Project Status Report High End Computing Capability Strategic Capabilities Assets Program December 10, 2014 Dr. Rupak Biswas – Project Manager NASA Ames Research Center, Moffett Field, CA [email protected] (650) 604-4411 www.nasa.gov Pleiades Achieves 3.38 Petaflops, Ranks 7th in U.S. and 11th Worldwide on TOP500 List • On the November 2014 TOP500 list of the world’s Mission Impact: Enhancing NASA’s most powerful most power supercomputers, Pleiades ranked 7th in supercomputer and running subsequent diagnostic the U.S. and 11th worldwide, with an impressive tests enables HECC to validate the entire system, improving stability for all jobs. efficiency of 84.6%. The new list was released on November 17 at the SC14 conference in New Orleans (see slide 11). • The HECC Systems team worked with SGI engineers to complete system-wide testing on Pleiades, including a LINPACK benchmark run, prior to the release of new Intel Haswell processors for general production on October 27. • During the LINPACK run, Pleiades delivered a sustained performance of 3.38 petaflops using 73% of its compute nodes, representing 89% of the theoretical peak of the overall system. The benchmark was run on 1,024 Haswell nodes, 5,365 Pleiades’ increased LINPACK benchmark performance since 2008, when Ivy Bridge nodes, and 1,792 Sandy Bridge nodes. the original system was installed at the NASA Advanced Supercomputing (NAS) facility. The initial configuration achieved a sustained performance • In addition to measuring performance, the of 0.487 petaflops and 80% efficiency in benchmark testing. With the latest LINPACK run helped the Systems team detect upgrade, Pleiades achieved 84.6% efficiency on the LINPACK benchmark.
    [Show full text]
  • Technological Forecasting of Supercomputer Development: the March to Exascale Computing
    Portland State University PDXScholar Engineering and Technology Management Faculty Publications and Presentations Engineering and Technology Management 10-2014 Technological Forecasting of Supercomputer Development: The March to Exascale Computing Dong-Joon Lim Portland State University Timothy R. Anderson Portland State University, [email protected] Tom Shott Portland State University, [email protected] Follow this and additional works at: https://pdxscholar.library.pdx.edu/etm_fac Part of the Engineering Commons Let us know how access to this document benefits ou.y Citation Details Lim, Dong-Joon; Anderson, Timothy R.; and Shott, Tom, "Technological Forecasting of Supercomputer Development: The March to Exascale Computing" (2014). Engineering and Technology Management Faculty Publications and Presentations. 46. https://pdxscholar.library.pdx.edu/etm_fac/46 This Post-Print is brought to you for free and open access. It has been accepted for inclusion in Engineering and Technology Management Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Technological forecasting of supercomputer development: The march to Exascale computing Dong-Joon Lim *, Timothy R. Anderson, Tom Shott Dept. of Engineering and Technology Management, Portland State University, USA Abstract- Advances in supercomputers have come at a steady pace over the past 20 years. The next milestone is to build an Exascale computer however this requires not only speed improvement but also significant enhancements for energy efficiency and massive parallelism. This paper examines technological progress of supercomputer development to identify the innovative potential of three leading technology paths toward Exascale development: hybrid system, multicore system and manycore system.
    [Show full text]
  • Pleiades • the NASA Workload • System Build Strategy • Operational Strategy • Tools and Analysis Software • Issues Do We See • Whats Lustre Does • What We Want
    NASA SuperComputing A Ground Based Instrument for Exploration and Discovery LUG 2015 Bob Ciotti Chief Architect/Supercomputing Systems Lead LUG 2015 - Denver LUG - Denver Apr 2015 Discussion • What is Pleiades • The NASA Workload • System Build Strategy • Operational Strategy • Tools and Analysis Software • Issues Do We See • Whats Lustre Does • What We Want LUG - Denver Apr 2015 Supercomputing Support for NASA Missions Jet aircraft wake vortices Jet engine • Agency wide resource noise • Production Supercomputing emissions • Focus on availability • Machines mostly run large ensembles • Some very large calculations (50k) Orbiting, Spinning – Typically o500 jobs running Detailed CLV Black Holes Geometry • Example applications • ARMD – LaRC: Jet wake vortex simulations, to increase airport capacity and safety – GRC: Understanding jet noise simulations, to decrease airport noise • ESMD – ARC: Launch pad flame trench simulations for Ares vehicle safety analysis – MSFC: Correlating wind tunnel tests and simulations of Solar surface Ares I-X test vehicle – ARC/LaRC: High-fidelity CLV flight simulation with convection detailed protuberances • SMD – Michigan State: Ultra-high-resolution solar surface convection simulation – GSFC: Gravity waves from the merger of orbiting, spinning black holes • SOMD – JSC/ARC: Ultra-high-resolution Shuttle ascent analysis • NESC – KSC/ARC: Initial analysis of SRB burn risk in Vehicle 2-SRB Burn Assembly Building in VAB Shuttle Ascent Configuration 3 LUG - Denver Apr 2015 LUG - Denver Apr 2015 ECCO – Ocean Modeling
    [Show full text]
  • Characterizing and Understanding HPC Job Failures Over the 2K-Day Life of IBM Bluegene/Q System
    Characterizing and Understanding HPC Job Failures over The 2K-day Life of IBM BlueGene/Q System Sheng Di,∗ Hanqi Guo,∗ Eric Pershey,∗ Marc Snir,y Franck Cappello∗y ∗Argonne National Laboratory, IL, USA [email protected], [email protected], [email protected], [email protected] yUniversity of Illinois at Urbana-Champaign, IL, USA [email protected] Abstract—An in-depth understanding of the failure features of Blue Joule (UK), also adopt the same Blue Gene/Q system HPC jobs in a supercomputer is critical to the large-scale system architecture and they are still in operation. maintenance and improvement of the service quality for users. In Studying the job failure features in a large-scale system is this paper, we investigate the features of hundreds of thousands of jobs in one of the most powerful supercomputers, the IBM nontrivial in that it involves numerous messages logged from Blue Gene/Q Mira, based on 2001 days of observations with a across multiple data sources and many messages are heavily total of over 32.44 billion core-hours. We study the impact of the duplicated [2]. In our work, we performed a joint analysis system’s events on the jobs’ execution in order to understand the by leveraging four different data sources: the reliability, avail- system’s reliability from the perspective of jobs and users. The ability, and serviceability (RAS) log; task execution log; job characterization involves a joint analysis based on multiple data sources, including the reliability, availability, and serviceability scheduling log; and I/O behavior log. The RAS log is the most (RAS) log; job scheduling log; the log regarding each job’s important system log related to system reliability such as node physical execution tasks; and the I/O behavior log.
    [Show full text]