HPC Challenge (HPCC) Benchmark Suite Acknowledgements

Total Page:16

File Type:pdf, Size:1020Kb

HPC Challenge (HPCC) Benchmark Suite Acknowledgements HPC Challenge (HPCC) Benchmark Suite Jack Dongarra Piotr Luszczek Allan Snavely SC07, Reno, Nevada Sunday, November 11, 2007 Session S09 8:30AM-12:00PM, A1 Acknowledgements • This work was supported in part by the Defense Advanced Research Projects Agency (DARPA), the Department of Defense, the National Science Foundation (NSF), and the Department of Energy (DOE) through the DARPA High Productivity Computing Systems (HPCS) program under grant FA8750-04-1-0219 and under Army Contract W15P7T-05-C-D001 • Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government 1 Introduction • HPC Challenge Benchmark Suite – To examine the performance of HPC architectures using kernels with more challenging memory access patterns than HPL – To augment the TOP500 list – To provide benchmarks that bound the performance of many real applications as a function of memory access characteristics ― e.g., spatial and temporal locality – To outlive HPCS • HPC Challenge pushes spatial and temporal boundaries and defines performance bounds TOP500 and HPCC • TOP500 • HPCC – Performance is represented by – Performance is represented by only a single metric multiple metrics – Data is available for an – Benchmark is new — so data is extended time period available for a limited time (1993-2005) period (2003-2007) • Problem: • Problem: There can only be one “winner” There cannot be one “winner” • Additional metrics and statistics • We avoid “composite” benchmarks – Count (single) vendor systems – Perform trend analysis on each list • HPCC can be used to show – Count total flops on each list complicated kernel/ per vendor architecture performance characterizations – Use external metrics: price, ownership cost, power, … – Select some numbers for comparison – Focus on growth trends over time – Use of kiviat charts • Best when showing the differences due to a single independent “variable” • Over time — also focus on growth trends 2 High Productivity Computing Systems (HPCS) Goal: Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2010) s & Ass nalysi essme Impact: A ry R nt Indust &D Performance (time-to-solution): speedup critical national Programming Performance Models security applications by a factor of 10X to 40X Characterization & Prediction Hardware Technology Programmability (idea-to-first-solution): reduce cost and System Architecture time of developing application solutions Software Technology Portability (transparency): insulate research and Ind operational application software from system ustry R&D Anal sment Robustness (reliability): apply all known techniques to ysis & Asses protect against outside attacks, hardware faults, & HPCS Program Focus Areas programming errors Applications: Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant modeling and biotechnology FillFill thethe CriticalCritical TechnologyTechnology andand CapabilityCapability GapGap TodayToday (late(late 80’s80’s HPCHPC technology)…..technology)…..toto…..Future…..Future (Quantum/Bio(Quantum/Bio Computing)Computing) HPCS Program Phases I-III Productivity Assessment (MIT LL, DOE, DoD, NASA, NSF) Concept System CDR Early Final Review Design PDR SCR DRR Demo Demo Industry Review Milestones 1 2 3 4 5 6 7 Technology Assessment Review HPLS SW SW SW SW MP Peta-Scale Plan Rel 1 Rel 2 Rel 3 Dev Unit Procurements Mission Partner Mission Partner Mission Partner Deliver Units Peta-Scale Dev Commitment System Commitment Application Dev MP Language Dev Year (CY) 0203 04 05 06 07 08 09 10 11 (Funded Five) (Funded Three) Mission Program Reviews Phase I Phase II Phase III Partners Industry R&D Prototype Development Critical Milestones Concept Program Procurements Study 3 HPCS Benchmark and Application Spectrum Execution and System Bounds Execution Development Indicators Indicators Discrete Math 3 Scalable … Compact Apps Execution Graph Pattern Matching Bounds Analysis Graph Analysis Signal Processing Local … DGEMM Linear STREAM Solvers 3 Petascale/s RandomAccess HPCS … Simulation 1D FFT Signal (Compact) Spanning Set Applications of Kernels Processing Global … Linpack PTRANS Simulation Others RandomAccess Classroom 1D FFT … Experiment Future Applications Codes Existing Applications I/O Applications Emerging 8 HPCchallenge Benchmarks (~40) Micro & Kernel (~10) Compact 9 Simulation Benchmarks Applications Applications Simulation Intelligence Reconnaissance SpectrumSpectrum ofof benchmarksbenchmarks provideprovide differentdifferent viewsviews ofof systemsystem •• HPCchallengeHPCchallenge pushespushes spatialspatial andand temporaltemporal boundaries;boundaries; setssets performanceperformance boundsbounds •• ApplicationsApplications drivedrive systemsystem issues;issues; setset legacylegacy codecode performanceperformance boundsbounds •• KernelsKernels andand CompactCompact AppsApps forfor deeperdeeper analysisanalysis ofof executionexecution andand developmentdevelopment timetime Motivation of the HPCC Design FFT DGEMM High HPL Mission Partner Applications PTRANS Temporal Locality Temporal RandomAccess STREAM Low Spatial Locality High 4 Measuring Spatial and Temporal Locality (1/2) Generated by PMaC @ SDSC HPC Challenge Benchmarks 1.00 Select Applications HPL 0.80 Gamess Overflow HYCOM Test3D OOCore 0.60 FFT RFCTH2 AVUS 0.40 CG Temporal locality Temporal 0.20 RandomAccess STREAM 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Spatial Locality •• SpatialSpatial andand temportemporalal datadata localitylocality herehere isis forfor oneone node/processornode/processor —— i.e.,i.e., locallylocally oror “in“in thethe small”small” Measuring Spatial and Temporal Locality (2/2) Generated by PMaC @ SDSC High Temporal Locality Good Performance on Cache-based systems HPC Challenge High Spatial Locality Spatial Locality occurs Benchmarks Sufficient Temporal Locality in registers 1.00 Select Applications Satisfactory Performance on HPL 0.80 Gamess Overflow Cache-based systems HYCOM Test3D OOCore 0.60 FFT RFCTH2 AVUS 0.40 CG Temporal locality Temporal 0.20 RandomAccess STREAM 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Spatial Locality No Temporal or Spatial Locality High Spatial Locality Poor Performance on Moderate Performance on Cache-based systems Cache-based systems 5 Supercomputing Architecture Issues Standard Parallel Corresponding Performance Computer Architecture Memory Hierarchy Implications CPU CPU CPU CPU Registers RAM RAM RAM RAM Instr. Operands Cache Disk Disk Disk Disk Blocks Network Switch Local Memory Messages CPU CPU CPU CPU Bandwidth Increasing Remote Memory Increasing Programmability RAM RAM RAM RAM Pages Increasing Latency Increasing Increasing Capacity Increasing Disk Disk Disk Disk Disk •• StandardStandard architecturearchitecture producesproduces aa “steep”“steep” multi-layered multi-layered memorymemory hierarchyhierarchy –– ProgrammerProgrammer mustmust managemanage thisthis hierarchyhierarchy toto getget goodgood performanceperformance •• HPCSHPCS technicaltechnical goalgoal –– ProduceProduce aa systemsystem withwith aa “flatter”“flatter” memory memory hierarchyhierarchy thatthat isis easiereasier toto programprogram HPCS Performance Targets HPC Challenge Corresponding HPCS Targets Benchmark Memory Hierarchy (improvement) • Top500: solves a system Registers Ax = b 2 Petaflops Instr. Operands (8x) • STREAM: vector operations Cache 6.5 Petabyte/s A = B + s x C (40x) Blocks Local Memory 0.5 Petaflops • FFT: 1D Fast Fourier bandwidth (200x) Transform latency Messages Z = FFT(X) 64,000 GUPS Remote Memory • RandomAccess: random (2000x) updates Pages T(i) = XOR( T(i), r ) Disk •• HPCSHPCS programprogram hashas developeddeveloped aa newnew suitesuite ofof benchmarksbenchmarks (HPC(HPC Challenge)Challenge) •• EachEach benchmarkbenchmark focusesfocuses onon aa differentdifferent partpart ofof thethe memorymemory hierarchyhierarchy •• HPCSHPCS programprogram performanceperformance targetstargets willwill flattenflatten thethe memorymemory hierarchy,hierarchy, improveimprove realreal applicationapplication performance,performance, andand makemake programmingprogramming easiereasier 6 Official HPCC Submission Process 1. Download Prequesites:Prequesites: ●● CC compilercompiler 2. Install ●● BLASBLAS ● MPI 3. Run ● MPI 4. Upload results ProvideProvide detaileddetailed 5. Confirm via @email@ installationinstallation andand executionexecution environmentenvironment 6. Tune 7. Run 8. Upload results ●● OnlyOnly somesome routinesroutines cancan bebe replacedreplaced ● Data layout needs to be preserved 9. Confirm via @email@ ● Data layout needs to be preserved ●● MultipleMultiple languageslanguages cancan bebe usedused ResultsResults areare immediatelyimmediately availableavailable onon thethe webweb site:site: ●● InteractiveInteractive HTMLHTML ●● XMLXML OptionalOptional ●● MSMS ExcelExcel ●● KiviatKiviat chartscharts (radar(radar plots)plots) HPCC as a Framework (1/2) • Many of the component benchmarks were widely used before the HPC Challenge suite of Benchmarks was assembled – HPC Challenge has been more than a packaging effort – Almost all component benchmarks were augmented from their original form to provide consistent verification and reporting • We stress the importance of running these benchmarks on a single machine — with a single configuration and options – The benchmarks were useful separately for the HPC community, meanwhile – The unified
Recommended publications
  • Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance
    Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance Wilson Lisan August 24, 2018 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2018 Abstract This dissertation project is based on participation in the Student Cluster Competition (SCC) at the International Supercomputing Conference (ISC) 2018 in Frankfurt, Germany as part of a four-member Team EPCC from The University of Edinburgh. There are two main projects which are the team-based project and a personal project. The team-based project focuses on the optimisations and tweaks of the HPL, HPCG, and HPCC benchmarks to meet the competition requirements. At the competition, Team EPCC suffered with hardware issues that shaped the cluster into an asymmetrical system with mixed hardware. Unthinkable and extreme methods were carried out to tune the performance and successfully drove the cluster back to its ideal performance. The personal project focuses on testing the SCC benchmarks to evaluate the performance efficiency and system balance at several HPC systems. HPCG fraction of peak over HPL ratio was used to determine the system performance efficiency from its peak and actual performance. It was analysed through HPCC benchmark that the fraction of peak ratio could determine the memory and network balance over the processor or GPU raw performance as well as the possibility of the memory or network bottleneck part. Contents Chapter 1 Introduction ..............................................................................................
    [Show full text]
  • Supercomputers – Prestige Objects Or Crucial Tools for Science and Industry?
    Supercomputers – Prestige Objects or Crucial Tools for Science and Industry? Hans W. Meuer a 1, Horst Gietl b 2 a University of Mannheim & Prometeus GmbH, 68131 Mannheim, Germany; b Prometeus GmbH, 81245 Munich, Germany; This paper is the revised and extended version of the Lorraine King Memorial Lecture Hans Werner Meuer was invited by Lord Laird of Artigarvan to give at the House of Lords, London, on April 18, 2012. Keywords: TOP500, High Performance Computing, HPC, Supercomputing, HPC Technology, Supercomputer Market, Supercomputer Architecture, Supercomputer Applications, Supercomputer Technology, Supercomputer Performance, Supercomputer Future. 1 e-mail: [email protected] 2 e-mail: [email protected] 1 Content 1 Introduction ..................................................................................................................................... 3 2 The TOP500 Supercomputer Project ............................................................................................... 3 2.1 The LINPACK Benchmark ......................................................................................................... 4 2.2 TOP500 Authors ...................................................................................................................... 4 2.3 The 39th TOP500 List since 1993 .............................................................................................. 5 2.4 The 39th TOP10 List since 1993 ...............................................................................................
    [Show full text]
  • Overview of the HPC Challenge Benchmark Suite
    Overview of the HPC Challenge Benchmark Suite Jack J. Dongarra Piotr Luszczek Oak Ridge National Laboratory and University of Tennessee Knoxville University of Tennessee Knoxville Abstract— The HPC Challenge1 benchmark suite has been library development, and comparison of system architectures, released by the DARPA HPCS program to help define the parallel system design, and procurement of new systems. performance boundaries of future Petascale computing systems. This paper is organized as follows: sections II and III give HPC Challenge is a suite of tests that examine the performance of HPC architectures using kernels with memory access patterns motivation and overview of HPC Challenge while section IV more challenging than those of the High Performance Lin- provides more detailed description of the HPC Challenge pack (HPL) benchmark used in the Top500 list. Thus, the suite is tests and section V talks briefly about the scalability of designed to augment the Top500 list, providing benchmarks that the tests; sections VI, VII, and VIII describe the rules, the bound the performance of many real applications as a function of software installation process and some of the current results, memory access characteristics e.g., spatial and temporal locality, and providing a framework for including additional tests. In respectively. Finally, section IX concludes the paper. particular, the suite is composed of several well known compu- II. MOTIVATION tational kernels (STREAM, HPL, matrix multiply – DGEMM, parallel matrix transpose – PTRANS, FFT, RandomAccess, The DARPA High Productivity Computing Systems (HPCS) and bandwidth/latency tests – b eff) that attempt to span high program has initiated a fundamental reassessment of how we and low spatial and temporal locality space.
    [Show full text]
  • The HPC Challenge (HPCC) Benchmark Suite Characterizing a System with Several Specialized Kernels
    The HPC Challenge (HPCC) Benchmark Suite Characterizing a system with several specialized kernels Rolf Rabenseifner [email protected] High Performance Computing Center Stuttgart (HLRS) University of Stuttgart www.hlrs.de SPEC Benchmarking Joint US/Europe Colloquium June 22, 2007 Technical University of Dresden, Dresden, Germany http://www.hlrs.de/people/rabenseifner/publ/publications.html#SPEC2007 1/31 Acknowledgements • Update and extract from SC’06 Tutorial S12: Piotr Luszczek, David Bailey, Jack Dongarra, Jeremy Kepner, Robert Lucas, Rolf Rabenseifner, Daisuke Takahashi: “The HPC Challenge (HPCC) Benchmark Suite” SC06, Tampa, Florida, Sunday, November 12, 2006 • This work was supported in part by the Defense Advanced Research Projects Agency (DARPA), the Department of Defense, the National Science Foundation (NSF), and the Department of Energy (DOE) through the DARPA High Productivity Computing Systems (HPCS) program under grant FA8750-04-1-0219 and under Army Contract W15P7T-05-C- D001 • Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government 2/31 Outline • Overview • The HPCC kernels • Database & output formats • HPCC award • Augmenting TOP500 • Balance Analysis • Conclusions 3/31 • Overview • The Kernels Introduction • Output formats • HPCC awards • Augm. TOP500 • HPC Challenge Benchmark Suite • Balance Analys. – To examine the performance • Conclusions of HPC architectures using kernels with more challenging memory access patterns than
    [Show full text]
  • Parallel Programming (MPI) and Batch Usage (SLURM) Dr
    Parallel Programming (MPI) and Batch Usage (SLURM) Dr. Gabriele Cavallaro Postdoctoral Researcher High Productivity Data Processing Group Jülich Supercomputing Centre Outline • High Performance Computing (HPC) – TOP500 – Architectures of HPC Systems – Message Passing Interface (MPI) Concepts – Collective functions • Batch System – The Batch System Flow • Practicals – Access the JURECA Cluster – Write and Submit a Batch Script Parallel Programming (MPI) and Batch Usage (SLURM) 2 High Performance Computing Parallel Programming (MPI) and Batch Usage (SLURM) 3 What is High Performance Computing? • Wikipedia: ‘redirects from HPC to Supercomputer’ – A supercomputer is a computer at the frontline of contemporary processing capacity with particularly high speed of calculation [1] Wikipedia ‘Supercomputer’ Online • HPC includes work on ‘four basic building blocks’: – Theory (numerical laws, physical models, speed-up performance, etc.) – Technology (multi-core, supercomputers, networks, storages, etc.) – Architecture (shared-memory, distributed-memory, interconnects, etc.) – Software (libraries, schedulers, monitoring, applications, etc.) [2] Introduction to High Performance Computing for Scientists and Engineers Parallel Programming (MPI) and Batch Usage (SLURM) 4 Understanding High Performance Computing • High Performance Computing (HPC) is based on computing resources that enable the efficient use of parallel computing techniques through specific support with dedicated hardware such as high performance cpu/core interconnections HPC network interconnection
    [Show full text]
  • Icl-2014-Report.Pdf
    EDITED BY Sam Crawford DESIGNED BY David Rogers ©2014 Innovative Computing Laboratory. All rights reserved. Any trademarks or registered trademarks that appear in this document are the property of their respective owners. EEO/TITLE IX/AA/SECTION 504 STATEMENT All qualified applicants will receive equal consideration for employment and admissions without regard to race, color, national origin, religion, sex, pregnancy, marital status, sexual orientation, gender identity, age, physical or mental disability, or covered veteran status. Eligibility and other terms and conditions of employment benefits at The University of Tennessee are governed by laws and regulations of the State of Tennessee, and this non-discrimination statement is intended to be consistent with those laws and regulations. In accordance with the requirements of Title VI of the Civil Rights Act of 1964, Title IX of the Education Amendments of 1972, Section 504 of the Rehabilitation Act of 1973, and the Americans with Disabilities Act of 1990, The University of Tennessee affirmatively states that it does not discriminate on the basis of race, sex, or disability in its education programs and activities, and this policy extends to employment by the University. Inquiries and charges of violation of Title VI (race, color, national origin), Title IX (sex), Section 504 (disability), ADA (disability), Age Discrimination in Employment Act (age), sexual orientation, or veteran status should be directed to the Office of Equity and Diversity (OED), 1840 Melrose Avenue, Knoxville, TN 37996-3560, telephone (865) 974-2498 (V/TTY available) or 974-2440. Requests for accommodation of a disability should be directed to the ADA Coordinator at the Office of Equity and Diversity.
    [Show full text]
  • How to Benchmark Your Application
    1967-17 Advanced School in High Performance and GRID Computing 3 - 14 November 2008 How to benchmark your application COZZINI Stefano CNR-INFM Democritos c/o SISSA via Beirut 2-4, 34014 Trieste ITALY AdvancedAdvanced SchoolSchool iinn HHighigh PerformancePerformance andand GGRIDRID ComputingComputing TheThe art art ofof benchmarkingbenchmarking Stefano Cozzini CNR-INFM DEMOCRITOS, Trieste ICTPICTP HPCHPC SchoolSchool 20082008 – Trieste,Trieste, IItalytaly - N Novemberovember 003-14,3-14, 22008008 Agenda/Agenda/ AimsAims Give you the feeling how much is important to know how your system/ application/computational experiment is performing.. Name a few standard benchmarks that can help you in making/taking a decision Show you some tricks and tips how to make your own benchmarking procedure Stop in less than 30 minutes. benchmark:benchmark: aa definition definition a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it from wikipedia threethree importantimportant notes:notes: no single number can reflect overall performance the only benchmark that matters is the intended workload. The purpose of benchmarking is not to get the best results, but to get consistent repeatable accurate results that are also the best results. aa fewfew challengeschallenges inin benchmarking:benchmarking: Benchmarking is not easy and often involves several iterative rounds in order to arrive at predictable, useful conclusions. Interpretation of benchmarking data is also extraordinarily difficult. – Vendors tend to tune their products specifically for industry-standard benchmarks. Use extreme caution in interpreting their results.
    [Show full text]
  • Design and Implementation of High Performance Computing Cluster for Educational Purpose
    Design and Implementation of High Performance Computing Cluster for Educational Purpose Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by SURAJ CHAVAN Roll No: 121022015 Under the guidance of PROF. S. U. GHUMBRE Department of Computer Engineering and Information Technology College of Engineering, Pune Pune - 411005. June 2012 Dedicated to My Mother Smt. Kanta Chavan DEPARTMENT OF COMPUTER ENGINEERING AND INFORMATION TECHNOLOGY, COLLEGE OF ENGINEERING, PUNE CERTIFICATE This is to certify that the dissertation titled Design and Implementation of High Performance Computing Cluster for Educational Purpose has been successfully completed By SURAJ CHAVAN (121022015) and is approved for the degree of Master of Technology, Computer Engineering. PROF. S. U. GHUMBRE, DR. JIBI ABRAHAM, Guide, Head, Department of Computer Engineering Department of Computer Engineering and Information Technology, and Information Technology, College of Engineering, Pune, College of Engineering, Pune, Shivaji Nagar, Pune-411005. Shivaji Nagar, Pune-411005. Date : Abstract This project work confronts the issue of bringing high performance computing (HPC) education to those who do not have access to a dedicated clustering en- vironment in an easy, fully-functional, inexpensive manner through the use of normal old PCs, fast Ethernet and free and open source softwares like Linux, MPICH, Torque, Maui etc. Many undergraduate institutions in India do not have the facilities, time, or money to purchase hardware, maintain user accounts, configure software components, and keep ahead of the latest security advisories for a dedicated clustering environment. The projects primary goal is to provide an instantaneous, distributed computing environment. A consequence of provid- ing such an environment is the ability to promote the education of high perfor- mance computing issues at the undergraduate level through the ability to turn an ordinary off the shelf networked computers into a non-invasive, fully-functional cluster.
    [Show full text]
  • Design and Implementation of the HPC Challenge
    Design and Implementation of the November 2006 A HPC Challenge Benchmark Suite Piotr Luszczek, University of Tennessee Jack Dongarra, University of Tennessee, Oak Ridge National Laboratory Jeremy Kepner, MIT Lincoln Lab Introduction The HPCC benchmark suite was initially developed for the DARPA's HPCS program 1 to provide a set of standardized hardware probes based on commonly occurring computational software kernels. The HPCS program has initiated a fundamental reassessment of how we define and measure performance, programmability, portability, robustness and, ultimately, productivity in the high-end domain. Consequently, the suite was aimed to both provide conceptual expression of the underlying computation as well as be applicable to a broad spectrum of computational science fields. Clearly, a number of compromises must have lead to the current form of the suite given such a broad scope of design requirements. HPCC was designed to approximately bound computations of high and low spatial and temporal locality (see Figure 1 which gives the conceptual design space for the HPCC component tests). In addition, because the HPCC tests consist of simple mathematical operations, this provides a unique opportunity to look at language and parallel programming model issues. As such, the benchmark is to serve both the system user and designer communities 2. Finally, Figure 2 shows a generic memory subsystem and how each level of the hierarchy is tested by the HPCC software and what are the design goals of the future HPCS system - these are the projected target performance numbers that are to come out of the wining HPCS vendor designs. Figure 2. HPCS program benchmarks and Figure 1.
    [Show full text]
  • LINPACK Benchmark?Benchmark?
    Center for Information Services and High Performance Computing (ZIH) Performance Analysis of Computer Systems Benchmarks: TOP 500, Stream, and HPCC Nöthnitzer Straße 46 Raum 1026 Tel. +49 351 - 463 - 35048 Holger Brunst ([email protected]) Matthias S. Mueller ([email protected]) Center for Information Services and High Performance Computing (ZIH) Summary of Previous Lecture Nöthnitzer Straße 46 Raum 1026 Tel. +49 351 - 463 - 35048 Holger Brunst ([email protected]) Matthias S. Mueller ([email protected]) Summary of Previous Lecture Different workloads: – Test workload – Real workload – Synthetic workload Historical examples for test workloads: – Addition instruction – Instruction mixes – Kernels – Synthetic programs – Application benchmarks Holger Brunst, Matthias Müller: Leistungsanalyse Excursion on Speedup and Efficiency Metrics Comparison of sequential and parallel algorithms Speedup: T1 Sn = Tn – n is the number of processors – T1 is the execution time of the sequential algorithm – Tn is the execution time of the parallel algorithm with n processors Efficiency: Sp E = p p – Its value estimates how well-utilized p processors solve a given problem – Usually between zero and one. Exception: Super linear speedup (later) Holger Brunst, Matthias Müller: Leistungsanalyse Amdahl’s Law Find the maximum expected improvement to an overall system when only part of the system is improved Serial execution time = s+p Parallel execution time = s+p/n s + p S = n p s + n – Normalizing with respect to serial time (s+p) = 1 results in: • Sn = 1/(s+p/n) – Drops off rapidly as serial fraction increases – Maximum speedup possible = 1/s, independent of n the number of processors! Bad news: If an application has only 1% serial work (s = 0.01) then you will never see a speedup greater than 100.
    [Show full text]
  • Introduction to the HPC Challenge Benchmark Suite∗
    Introduction to the HPC Challenge Benchmark Suite∗ Jack J. Dongarra† Piotr Luszczek‡ December 13, 2004 Abstract trial user community. HPCS program researchers have initiated a fundamental reassessment of how we define The HPC Challenge suite of benchmarks will exam- and measure performance, programmability, portabil- ine the performance of HPC architectures using ker- ity, robustness and ultimately, productivity in the HPC nels with memory access patterns more challenging domain. than those of the High Performance Linpack (HPL) The HPCS program seeks to create trans-Petaflop benchmark used in the Top500 list. The HPC Chal- systems of significant value to the Government HPC lenge suite is being designed to augment the Top500 community. Such value will be determined by assess- list, provide benchmarks that bound the performance ing many additional factors beyond just theoretical peak of many real applications as a function of memory flops (floating-point operations). Ultimately, the goal is access characteristics e.g., spatial and temporal local- to decrease the time-to-solution, which means decreas- ity, and provide a framework for including additional ing both the execution time and development time of an benchmarks. The HPC Challenge benchmarks are scal- application on a particular system. Evaluating the capa- able with the size of data sets being a function of bilities of a system with respect to these goals requires a the largest HPL matrix for a system. The HPC Chal- different assessment process. The goal of the HPCS as- lenge benchmark suite has been released by the DARPA sessment activity is to prototype and baseline a process HPCS program to help define the performance bound- that can be transitioned to the acquisition community aries of future Petascale computing systems.
    [Show full text]
  • Introduction to the HPC Challenge Benchmark Suite
    Introduction to the HPC Challenge Benchmark Suite Piotr Luszczek1, Jack J. Dongarra1,2, David Koester3, Rolf Rabenseifner4,5, Bob Lucas6,7, Jeremy Kepner8, John McCalpin9, David Bailey10, and Daisuke Takahashi11 1University of Tennessee Knoxville 2Oak Ridge National Laboratory 3MITRE 4High Performance Computing Center (HLRS) 5University Stuttgart, www.hlrs.de/people/rabenseifner 6Information Sciences Institute 7University of Southern California 8MIT Lincoln Lab 9IBM Austin 10Lawrence Berkeley National Laboratory 11University of Tsukuba March 2005 Abstract and bandwidth/latency tests – b eff) that attempt to span high and low spatial and temporal locality space. By de- The HPC Challenge1 benchmark suite has been re- sign, the HPC Challenge tests are scalable with the size leased by the DARPA HPCS program to help define the of data sets being a function of the largest HPL matrix performance boundaries of future Petascale computing for the tested system. systems. HPC Challenge is a suite of tests that exam- ine the performance of HPC architectures using kernels with memory access patterns more challenging than 1 High Productivity Computing those of the High Performance Linpack (HPL) bench- mark used in the Top500 list. Thus, the suite is designed Systems to augment the Top500 list, providing benchmarks that The DARPA High Productivity Computing Sys- bound the performance of many real applications as tems (HPCS) [1] is focused on providing a new gener- a function of memory access characteristics e.g., spa- ation of economically viable high productivity comput- tial and temporal locality, and providing a framework ing systems for national security and for the industrial for including additional tests.
    [Show full text]