Unifying Hardware and Software Benchmarking: a Resource-Agnostic Model
Total Page:16
File Type:pdf, Size:1020Kb
Unifying hardware and software benchmarking: a resource-agnostic model Davide Morelli University of Pisa Computer Science Department June 2015 Ph.D. thesis 2 Abstract Lilja (2005) states that \In the field of computer science and engineering there is surprisingly little agreement on how to measure something as fun- damental as the performance of a computer system.". The field lacks of the most fundamental element for sharing measures and results: an appropriate metric to express performance. Since the introduction of laptops and mobile devices, there has been a strong research focus towards the energy efficiency of hardware. Many papers, both from academia and industrial research labs, focus on methods and ideas to lower power consumption in order to lengthen the battery life of portable device components. Much less effort has been spent on defining the responsibility of software in the overall computational system energy consumption. Some attempts have been made to describe the energy behaviour of software, but none of them abstract from the physical machine where the measurements were taken. In our opinion this is a strong drawback because results can not be generalized. In this work we attempt to bridge the gap between characterization and prediction, of both hardware and software, of performance and energy, in a single unified model. We propose a model designed to be as simple as possible, generic enough to be abstract from the specific resource being described or predicted (applying to both time, memory and energy), but also concrete and practical, allowing useful and precise performance and energy predictions. The model applies to the broadest set of resource possible. We focus mainly on time and memory (hence bridging hardware benchmarking and classical algorithms time complexity), and energy consumption. To ensure a wide applicability of the model in real world scenario, the model is completely black-box, it does not require any information about the source code of the program, and only relies on external metrics, like completion time, energy consumption, or performance counters. Extending the benchmarking model, we define the notion of experimental computational complexity, as the characterization of how the resource usage 3 4 changes as the input size grows. Finally, we define a high-level energy model capable of characterizing the power consumption of computers and clusters, in terms of the usage of resources as defined by our benchmarking model. We tested our model in four experiments: Expressiveness we show the close relationship between energy and clas- sical theoretical complexity, also showing that our experimental com- putational complexity is expressive enough to capture interesting be- haviour of programs simply analysing their resource usage. Performance prediction we use the large database of performance mea- sures available on the CPU SPEC website to train our model and predict the performance of the CPU SPEC suite on randomly selected computers. Energy profiling we tested our model to characterize and predict the power usage of a cluster running OpenFOAM, changing the number of active nodes and cores. Scheduling on heterogeneous systems applying our performance pre- diction model to features of programs extracted at runtime, we predict the device where is most convenient to execute the programs, in an heterogeneous system. Acknowledgements I want to thank my supervisor Antonio Cisternino, for the constant insight, for ensuring I was following the right path in my research, and for his friend- ship. The advice from Andrea Canciani has been extremely important to assess the methodology in general, and to develop the energy model in particular. Leonardo Bartoloni has helped me understanding the probabilistic frame- work underlying a model based on regression analysis. The last experiment reported in this thesis has been developed with Gabriele Cocco, using his Ph.D. research we created a scheduling method- ology for heterogeneous platforms. I am grateful for the support and constant affect I received from my family: my parents, my grandparents, and my sister, who supported and encouraged me throughout this journey. I wish to thank my wife for sharing her life with me, constantly pushing me to pursue my dreams. 5 6 Contents I Introduction 19 1 Benchmarking: art or science 21 1.1 Introduction . 21 1.2 Benchmarking is pre-science . 23 2 Lack of a unified model 25 2.1 Time and energy . 26 2.2 Hardware and Software . 27 2.3 Computational energy models . 28 3 Contributions of this work 31 3.1 Contributions . 31 3.2 Plan of work . 32 4 State of the art: characterization and prediction of perfor- mance and energy 35 4.1 Hardware benchmarking . 36 4.1.1 Metrics . 36 4.1.2 Benchmarks . 39 4.2 Software characterization . 41 4.3 Energy characterization . 44 4.3.1 Power measurement approaches . 44 4.3.2 Energy consumption of virtual machines . 46 4.3.3 Energy consumption related to parallelism . 46 4.3.4 Energy consumption related to time complexity . 47 4.3.5 Energy management policies . 47 4.3.6 Coarse grain energy models . 47 4.4 Scheduling on heterogeneous architectures . 50 7 8 CONTENTS II Benchmarking Model 53 5 High level model 57 5.1 Definitions . 58 5.1.1 Program . 58 5.1.2 Computational Environment . 58 5.1.3 Measure of resource consumption . 59 5.1.4 Computational Pattern . 61 5.1.5 Solver . 61 5.2 The model . 62 5.2.1 Characterizing a target program . 63 5.2.2 Characterizing the target resource . 65 5.2.3 A unified HW and SW model . 66 5.2.4 A resource agnostic model . 67 5.3 Role of computational patterns . 68 5.4 Algebraic characterization of X ................. 71 5.4.1 X needs to be full rank . 71 5.4.2 Norm . 71 5.4.3 Cosine similarity . 73 5.5 Conclusion . 75 6 Solvers 77 6.1 Simplex . 77 6.1.1 The simplex algorithm . 78 6.1.2 Geometric interpretation . 79 6.1.3 Limits . 82 6.2 Linear regression . 84 6.2.1 Example . 84 6.2.2 Linear regression . 86 6.2.3 Algebraic characterization of the solver . 87 6.2.4 Hardware and Software intrinsic relationship . 90 6.2.5 Assumptions and limits . 90 6.2.6 Residuals analysis . 91 6.2.7 Algebraic characterization of computational patterns . 92 6.3 Non negative matrix factorization . 95 6.3.1 The surrogate . 96 6.3.2 The predictor . 97 6.3.3 Hardware and Software intrinsic relationship . 98 6.3.4 Limits of NMF . 99 6.4 Conclusion and future work . 100 CONTENTS 9 6.4.1 Bayesian regression . 100 6.4.2 Support Vector Regression . 102 7 Experimental complexity of software 105 7.1 Characterization through the surrogate . 105 7.2 Definition of experimental complexity . 106 7.3 Relation with computational complexity . 109 7.4 Bottlenecks . 109 7.4.1 Grace area . 110 7.5 Compositionality of Surrogates and Experimental complexity 111 7.5.1 Linear composition of surrogates and experimental com- plexity ........................... 111 7.5.2 Function composition . 112 7.6 A single number is not enough . 113 7.7 Conclusions . 117 8 Computational energy model 119 8.1 The energy model . 120 8.1.1 General form . 120 8.1.2 Fixed number of active machines and limited consid- ered resources . 122 8.1.3 Automatic characterization of hardware . 124 8.2 Discussion . 125 8.3 Conclusions . 129 9 Conclusions 131 III Experimental validation 133 10 Validating the expressiveness of the model and experimental complexity using micro-benchmarks 137 10.1 Expressiveness of the model . 138 10.1.1 Using the simplex solver . 138 10.1.2 Using the linear regression and NMF solvers . 140 10.2 Experimental computational complexity . 145 10.2.1 The experimental setup . 145 10.2.2 ξ is compositional . 149 10.3 Energy and experimental computational complexity: sorting algorithms . 149 10 CONTENTS 10.3.1 Experiment setting . 152 10.3.2 Experimental results . 152 10.4 Micro-architecture independence: limits of a single dimension 154 10.4.1 Different architectures . 155 10.4.2 Using all the measures in the same model, only 1 pro- gram as basis . 158 10.4.3 Using all the measures in the same model, with a larger basis . 159 10.4.4 Using the target program as basis . 160 10.4.5 Discussion . 160 10.5 Conclusions . 162 11 Validating performance prediction on multiple architectures using CPUSPEC 163 11.1 CPUSPEC . 163 11.1.1 Bias error . 164 11.2 Predicting completion time using the linear regression solver . 164 11.2.1 Completion time prediction accuracy . 167 11.2.2 Fitting residuals, bias, and prediction error . 169 11.2.3 Discussion . 172 11.3 Conclusions . 172 12 Validating the energy model for concurrent parallel tasks using OpenFOAM 177 12.1 Predict peak power . 177 12.1.1 Experiment setting . 178 12.1.2 Modelling power using Pinfr+m and P∆c, without sep- arating Pinfr and Pm ................... 180 12.1.3 Modelling power using Pinfr,P∆c, and Pm ....... 182 12.2 Modelling energy consumption of concurrent programs . 184 12.3 Conclusions . 190 13 Combining static analysis with the prediction model: best device scheduling in heterogeneous environments 191 13.1 Introduction . 192 13.2 Code analysis and feature extraction . 193 13.3 Prediction model . 194 13.4 Experimental validation . 197 13.4.1 Experiment setup . 197 13.4.2 Predicting the completion time . 201 CONTENTS 11 13.4.3 Best-device prediction . 204 13.4.4 Interpretation of the regression coefficients . 205 13.4.5 Limits . 207 13.5 Conclusions . 208 IV Conclusions 209 14 Contributions 213 15 Future work 217 12 CONTENTS List of Figures 6.1 Convex polytope of solutions in the resource consumption vec- tor space .............................