5.4 Mixed Workload Scheduling
Total Page:16
File Type:pdf, Size:1020Kb
PERCU: A Holistic Method for Evaluating High Performance Computing Systems William TC Kramer Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2008-143 http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-143.html November 5, 2008 Copyright 2008, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement See text of report PERCU: A Holistic Method for Evaluating High Performance Computing Systems by William T.C. Kramer B.S. (Purdue University) 1975 M.S. (Purdue University) 1976 M.E. (University of Delaware) 1986 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Sciences in the Graduate Division of the University of California, Berkeley Committee in charge: Professor James Demmel, Chair Professor David Culler Professor James Siegrist Fall 2008 The dissertation of William T. C. Kramer is approved: Chair _________________________________________ Date _____________ ______________________________________________ Date _____________ ______________________________________________ Date _____________ University of California, Berkeley Fall 2008 PERCU: A Holistic Method for Evaluating High Performance Computing Systems © 2008 by William T.C. Kramer Abstract PERCU: A Holistic Method for Evaluating High Performance Computing Systems by William T.C. Kramer Doctor of Philosophy in Computer Sciences University of California, Berkeley Professor James Demmel, Chair Professor David Culler Professor James Siegrist PERCU is a comprehensive evaluation methodology for large-scale systems that expands Performance analysis to include Effective work dispatching, Reliability, Consistency, and Usability. The PERCU approach and its components can be used for initial system assessment as well as for on-going quality assurance of High Performance Computing (HPC) and other systems. PERCU leverages work that has to be done in traditional benchmarking and acquisition approaches by compositing existing data to gain additional insights. A key contribution is the Sustained System Performance (SSP) concept which uses time-to-solution for assessing the productive work potential of 1 systems for an arbitrary set of applications. The SSP provides a fair way to compare systems deployed at different times and provides a method to assess sustained price performance in a comprehensive manner. This work also discusses the Effective System Performance (ESP) test, developed to encourage and assess improved job launching and resource management – both important aspects for a productive HPC system. Reliability is the third characteristic of a productive system. This work explores the major causes of failure for very large systems and suggests improved methods for a priori assessment of the reliability of HPC systems. Consistent execution of programs is a metric often overlooked in assessments, but is a key service quality feature. This work shows how lack of consistency impacts quality of service and defines approaches for assessing and improving consistency. Usability is discussed for completeness and as future work. PERCU can be used, in all or part, and with a limitless scale of detail and effort. At its simplest, it is a framework for holistic evaluation. In its detail, it introduces a set of methods for measurement of key parameters that impact quality of service on HPC systems. The use and impact of each PERCU element is documented for multiple systems, mostly using systems evaluated at the National Energy Research Scientific Computing (NERSC) Facility. _____________________________________ Professor James Demmel, Chair 2 Dedication This work is dedicated to the two ladies in my life that give me the inspiration to and the joy of excellence. My daughter Victoria, who works harder than anyone I have ever met, from her first days, has a love of learning, a sense of humor and a style that inspire me. My wife Laura is my complete partner and friend, for infinity. She inspires me with her continuous evolution and re-invention of herself, her unselfishness and her intelligence that are beyond anyone I have ever met. To these two outstanding ladies, I dedicate this work and my life as a small token of my thanks and love. i Table Of Contents CHAPTER 1: INTRODUCTION AND MOTIVATION .............................................................. 1 1.1 CHAPTER SUMMARY ......................................................................................................... 1 1.2 STEPS TAKEN TO DEVELOP PERCU METHODOLOGY ........................................................ 3 1.3 PERCU‘S IMPORTANCE TO THE HPC COMMUNITY ............................................................. 4 1.4 ORGANIZATION ................................................................................................................. 6 1.4.1 Introduction and Motivation ........................................................................................ 6 1.4.2 Comparing Evaluation Requirements ........................................................................ 7 1.4.3 Sustained System Performance Method .................................................................... 9 1.4.4 Practical Use of SSP for HPC Systems ................................................................... 10 1.4.5 Effectiveness of Resource Use and Work Scheduling ............................................. 11 1.4.6 Reliability .................................................................................................................. 11 1.4.7 Consistency of Performance .................................................................................... 12 1.4.8 Usability – Something for the Future ........................................................................ 12 1.4.9 PERCU‟s Impacts, Conclusions and Observations.................................................. 13 CHAPTER 2: COMPARING EVALUATION REQUIREMENTS ........................................... 14 2.1 CHAPTER SUMMARY ....................................................................................................... 14 2.2 ANALYSIS METHOD ......................................................................................................... 16 2.3 SUMMARY OF EVALUATION FACTOR ANALYSIS ................................................................. 19 2.4 OVERALL CATEGORIES OF EVALUATION FACTORS ............................................................ 22 2.4.1 Minimum/Mandatory/Baseline Requirements .......................................................... 26 2.4.2 Desired/Performance/Non-Mandatory ..................................................................... 27 2.5 CROSS CUT GROUPINGS ................................................................................................ 30 2.6 CHAPTER CONCLUSION .................................................................................................. 31 CHAPTER 3: SUSTAINED SYSTEM PERFORMANCE METHOD ..................................... 32 3.1 CHAPTER SUMMARY ....................................................................................................... 32 ii 3.2 THE BASIC SSP CONCEPT .............................................................................................. 32 3.3 BUYING TECHNOLOGY AT THE BEST MOMENT .................................................................. 36 3.4 GOOD BENCHMARK TESTS SHOULD SERVE FOUR PURPOSES .......................................... 37 3.5 DEFINITIONS FOR SSP .................................................................................................... 40 3.6 CONSTANTS ................................................................................................................... 41 3.7 VARIABLES ..................................................................................................................... 42 3.8 RUNNING EXAMPLE PART 1 – APPLICATIONS ................................................................... 47 3.9 ALIGNING THE TIMING OF THE PHASES ............................................................................. 48 3.10 RUNNING EXAMPLE PART 2 – SYSTEMS ........................................................................... 52 3.11 THE COMPOSITE PERFORMANCE FUNCTION (W, P) ....................................................... 54 3.12 SSP AND TIME-TO-SOLUTION ......................................................................................... 56 3.13 ATTRIBUTES OF GOOD METRICS ..................................................................................... 60 3.14 RUNNING EXAMPLE PART 3 – HOLISTIC ANALYSIS ............................................................ 66 3.15 CHAPTER CONCLUSION .................................................................................................. 67 CHAPTER 4: PRACTICAL USE OF SSP FOR HPC SYSTEMS ......................................... 68 4.1 CHAPTER SUMMARY ....................................................................................................... 68 4.2 A REAL WORLD PROBLEM, ONCE REMOVED .................................................................... 69 4.3 DIFFERENT COMPOSITE