Application Signature: a New Way to Predict Application Performance Rajat Kumar Todi Iowa State University
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 2003 Application Signature: a new way to predict application performance Rajat Kumar Todi Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Computer Sciences Commons Recommended Citation Todi, Rajat Kumar, "Application Signature: a new way to predict application performance " (2003). Retrospective Theses and Dissertations. 1913. https://lib.dr.iastate.edu/rtd/1913 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Application Signature: A new way to predict application performance by Rajat Kumar Todi A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Science Program of Study Committee: John Gustafson, Co-major Professor Gurpur Prabhu, Co-major Professor Don Heller Srinivas Aluru Doug Jacobson Iowa State University Ames, Iowa 2003 Copyright © Rajat Kumar Todi, 2003. All rights reserved. UMI Number: 3279645 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. UMI UMI Microform 3279645 Copyright 2007 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 ii Graduate College Iowa State University This is to certify that the doctoral dissertation of Rajat Kumar Todi has met the dissertation requirements of Iowa State University Signature was redacted for privacy. Co-major Professor Signature was redacted for privacy. Co-major Professor Signature was redacted for privacy. For thk Major Program iii TABLE OF CONTENTS List of Figures ix List of Tables xv Acknowledgments xxi Abstract xxii CHAPTER 1 Introduction 1 CHAPTER 2 Benchmarks 5 2.1 Introduction 5 2.2 User Groups of Benchmarks 7 2.3 Usefulness of Benchmarking 14 CHAPTER 3 Benchmarks Classification 15 3.1 Classification Based on Usage 15 3.2 Benchmark Strategy 16 3.3 Narrow versus Broad-spectrum Benchmark 16 3.4 Benchmark Examples 17 3.4.1 Peak Performance 17 3.4.2 Linpack 17 3.4.3 STREAM 17 3.4.4 SPEC CPU95 19 3.4.5 SPEC CPU2000 24 3.4.6 SPEC CPU2004 24 3.4.7 NPB Benchmarks 25 iv 3.4.8 SPLASH Benchmarks 27 3.4.9 GAMESS 27 3.4.10 Assorted Benchmarks 27 3.4.11 HINT 28 3.4.12 Peak FLOPS 29 3.4.13 Lawrence Livermore Loops 29 3.4.14 Whetstone 29 3.4.15 SLALOM Benchmark 30 CHAPTER 4 Common Problems with Benchmarks 31 4.1 Benchmarks Won't Follow Moore's Law 31 4.2 Benchmarks Won't Correlate With Real Applications 31 4.3 Benchmarks are Redundant 33 4.3.1 Inter-Benchmark Redundancy 34 4.3.2 Intra-Benchmark Redundancy 34 4.4 Past Benchmarks Predict Future Performance 35 4.5 Other Selected Problems of Benchmark 36 4.6 Summary 39 CHAPTER 5 Metrics 40 5.1 Characteristics of Good Performance Metrics 40 5.2 Means versus Ends Metrics 41 5.3 Uniprocessor Performance Metrics 41 5.3.1 MFLOPS 41 5.3.2 MIPS 42 5.3.3 Clock Frequency 42 5.3.4 QUIPS 42 5.4 Parallel Processing Performance Metrics 43 5.4.1 Speedup 43 5.4.2 Efficiency 44 V 5.4.3 Scalability . 45 5.5 Summary 49 CHAPTER 6 Statistical Background 50 6.1 Pearson Product Moment Correlation 50 6.2 Linear Relation 52 6.3 Spearman's Rank Correlation 52 6.3.1 A Matlab Example 53 6.4 The Harmonic Mean 54 6.5 The Weighted Harmonic Mean 55 CHAPTER 7 HINT: The Hardware Signature 57 7.1 Introduction 57 7.2 Task and Terminology 58 7.3 An Example using 8-bit Data Type 59 7.4 Salient features 62 7.5 Understanding HINT Graphs 64 7.5.1 Generic HINT Graphs 64 7.5.2 Classical Memory-Regime Revealing Graph 65 7.5.3 Varying Precision 66 7.5.4 Varying Main Memory 67 7.5.5 Varying Clock Speed 68 7.5.6 Cache-Dependent and Cache-Independent systems 68 7.5.7 Dedicated Machine versus Machine with Interrupts 69 7.5.8 Scalable Parallel Computers 70 7.5.9 Non-Scalable Parallel Computers 70 7.5.10 Special-Purpose Computer 71 7.5.11 Business computer 71 7.5.12 Serial versus Workstation Clusters 72 7.5.13 Same Machine Different Operating System 76 vi 7.5.14 Serial versus Vector Computer 76 7.5.15 Region of Computation 77 7.5.16 Superset of Other Benchmarks 78 7.5.17 Problem Detection using HINT 79 7.5.18 Identical Machines Varied Performance 80 7.5.19 Bug in Motherboard's BIOS software 81 7.5.20 Dual processors Pentium machine with Slow Memory Bandwidth .... 82 CHAPTER 8 Application Signature 85 8.1 History of Application Signature 85 8.2 What is Application Signature? 87 8.3 Characteristics of Application Signature 89 8.4 Modeling Application-Architecture Performance: A Car Transportation Analogy 90 8.5 Application Performance Model 91 8.5.1 Hardware Performance Predictors 91 8.5.2 Application Performance Predictors 93 8.5.3 The Proposed Computer Design Model 94 8.6 Experiment Setup 94 8.6.1 Machines Used 94 8.6.2 Benchmarks Used 96 8.7 Summary 97 CHAPTER 9 Definitions and Notations 99 9.1 Measured Time, APPMAP Time, and Projected Time 104 9.2 Validation Strategy for the Models 106 CHAPTER 10 Model 1: Application Signature Using Instantaneous QUIPS 107 10.1 Model 107 10.1.1 Using Instantaneous QUIPS as Application Signature 108 10.1.2 Using NetQUIPS as Application Signature 108 10.1.3 Using NetQUIPS and Instantaneous QUIPS Application Signature . 109 vii 10.1.4 Using Correlation Vector as Application Signature 109 10.2 Results 109 10.3 Summary 110 CHAPTER 11 Model 2: Application Signature Using Optimization Method 112 11.1 Model 112 11.2 Results 113 CHAPTER 12 Model 3: Application Signature Using Cache Misses 115 12.1 Model 115 12.2 Results 116 CHAPTER 13 Model 4: Application Signature Using Cache Sensitivity . 118 13.1 Model 118 13.2 Results 119 CHAPTER 14 Applications of APPMAP technology 120 14.1 System Design 120 14.2 Selecting System on Applications 121 14.3 Multiprocessor Scheduling 122 14.4 Utility based Computing 122 14.5 Power versus Performance 123 14.6 Chapter Summary 123 CHAPTER 15 Conclusion and Future Directions 125 15.1 Original Contributions of the Thesis 126 15.2 Future Directions 126 APPENDIX A Cache Memory Subsystem 129 APPENDIX B HINT Database 132 APPENDIX C Application Characteristics - I 135 APPENDIX D Application Characteristics - II 162 viii APPENDIX E Machine Characteristics using HINT 176 APPENDIX F LMBENCH . 195 APPENDIX G Machine Profile Using Stream Benchmark 201 APPENDIX H More Modell Results 211 APPENDIX I More Model2 Results 227 Bibliography 249 ix LIST OF FIGURES 1.1 Application Signature Performance Model 2 2.1 iCOMP Index 2.0 Weightings 11 4.1 Computation Chemistry vs LINPACK 32 4.2 Peak FLOPS versus EP benchmark 33 4.3 Workload Benchmarks (a) Ideal (b) Redundancy in SPEC CFP2000 Benchmarks 33 4.4 Intra-Benchmark Redundancy in SWIM benchmark of SPEC CFP2000 34 4.5 Past Benchmarks are used for Future System Design 35 4.6 Benchmarks emphasize different Problem Sizes 37 7.1 Problem Solved by HINT: Area to be Bounded under the Curve .... 58 7.2 Two Subintervals of One Dimension Integration with 8-bit Data Precision 60 7.3 Sequence of Hierarchical Refinement of Integral Bounds 61 7.4 Precision-Limited Last Iteration, 8-bit data 62 7.5 Memory Cost versus QUIPS 63 7.6 Generic HINT Graphs 65 7.7 Memory Regime Revealing Graph 66 7.8 Varying Precision 67 7.9 Varying Main Memory 68 7.10 Varying Clock Speed 69 7.11 Cache-independent and Cache-dependent System 70 7.12 Dedicated Machine versus Machine with interrupts 71 X 7.13 Scalable Parallel Computer 72 7.14 Unscalable Parallel Computer 73 7.15 Special Purpose Computer 73 7.16 Business Computer 74 7.17 Serial versus Workstation Cluster 74 7.18 Linux Cluster 75 7.19 Serial versus Workstation Cluster 76 7.20 Serial versus Vector Machine 77 7.21 Vector versus Parallel Computers 78 7.22 Region of Computation 79 7.23 Superset of Other Benchmarks 80 7.24 Identical Machine Varied Performance 81 7.25 Mosix Xluster's Identical Nodes Perform Differently 82 7.26 Bug in Alpha LX motherboard's BIOS 83 7.27 Serial versus Threaded HINT on Dual Processors 300 MHz Pentiumll 84 8.1 Hypothetical Application Signature for (a) Word Processing Applica tion (b) Computational Fluid Dynamic 86 8.2 Gustafson's Great Crossover: The crossover of memory and arithmetic performance 92 8.3 HINT (Double) QUIPS-Time Graph for Machines M1-M8 96 9.1 Two Different Ranking of Machines k\ and at Memory Points mi and m2.