Performance Prediction Based on Inherent Program Similarity
Total Page:16
File Type:pdf, Size:1020Kb
Performance Prediction based on Inherent Program Similarity Kenneth Hostey, Aashish Phansalkarz, Lieven Eeckhouty, Andy Georgesy, Lizy K. Johnz and Koen De Bosscherey yELIS, Ghent University, Belgium zECE, The University of Texas at Austin fkehoste,leeckhou,ageorges,[email protected] faashish,[email protected] ABSTRACT 1. INTRODUCTION A key challenge in benchmarking is to predict the performance of From a benchmark consumer point-of-view, a key challenge is to an application of interest on a number of platforms in order to de- determine the platform that yields the best performance for a given termine which platform yields the best performance. This paper application of interest. Ideally, the user's application of interest proposes an approach for doing this. We measure a number of is his best benchmark. However, in many practical circumstances microarchitecture-independent characteristics from the application the user has to rely on the performance scores of a standardized of interest, and relate these characteristics to the characteristics of benchmark suite for estimating the performance of the application the programs from a previously profiled benchmark suite. Based of interest for two reasons. First, it is too difficult or costly to port on the similarity of the application of interest with programs in the the application program of interest to a wide range of platforms. benchmark suite, we make a performance prediction of the applica- Second, there are many platforms for which the performance needs tion of interest. We propose and evaluate three approaches (normal- to be measured before making a choice about which platform yields ization, principal components analysis and genetic algorithm) to to the best performance for the given application. transform the raw data set of microarchitecture-independent char- A popular tool for estimating performance of an application pro- acteristics into a benchmark space in which the relative distance gram on an unavailable platform is detailed cycle-accurate proces- is a measure for the relative performance differences. We evalu- sor simulation. However, next to not solving the porting problem, ate our approach using all of the SPEC CPU2000 benchmarks and simulation is very time consuming and thus is difficult to use in real hardware performance numbers from the SPEC website. Our practice. framework estimates per-benchmark machine ranks with a 0.89 av- This motivates us to come up with a different solution to this erage and a 0.80 worst case rank correlation coefficient. ubiquitous problem in benchmarking. The methodology proposed in this paper uses the already known performance scores of stan- dardized benchmark suites on the systems of our interest. As a Categories and Subject Descriptors part of our methodology we measure a set of microarchitecture- C.4 [Performance of Systems]: Modeling Techniques, Performance independent characteristics for the new application of interest and attributes relate them to the same characteristics of the benchmarks in the standardized benchmark suite. The microarchitecture-independent characteristics capture the inherent program behavior that is un- General Terms biased towards a particular microarchitecture. We then use the Experimentation, Measurement, Performance knowledge of similarity between the application of interest and the corresponding benchmarks to predict the performance of the appli- cation of interest. In other words, we use the standardized bench- Keywords marks as proxies for our application of interest based on similarity. Performance Modeling, Workload Characterization, Inherent Pro- The key issue in a methodology that uses program similarity gram Behavior based on microarchitecture-independent program characteristics is to determine how differences in microarchitecture-independent char- acteristics translate into differences in performance. We propose and evaluate three approaches for achieving that, namely normal- ization, principal components analysis and a genetic algorithm, of which the genetic algorithm shows to be the most accurate. A ge- netic algorithm learns how to rescale the benchmark space so that Permission to make digital or hard copies of all or part of this work for the Euclidean distance in the benchmark space becomes a more personal or classroom use is granted without fee provided that copies are accurate measure for performance differences when running the not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to benchmarks on a variety of platforms. republish, to post on servers or to redistribute to lists, requires prior specific We evaluate our framework for predicting machine ranks using permission and/or a fee. SPEC published speedup rates that cover various commercial ma- PACT'06, September 16–20, 2006 , Seattle, Washington, USA. chines with different ISAs, compiler settings and microprocessors. Copyright 2006 ACM 159593264X/06/0009 ...$5.00. benchmark space uarch-indep data chars transform data uarch-indep transform chars build data application performance transform of interest numbers benchmark suite predict performance based on programs in neighborhood Figure 1: The framework proposed in this paper for predicting performance based on microarchitecture-independent program characteristics. Current practice, which uses the average rank across all bench- the microarchitecture-independent characteristics using the same marks for predicting ranks for specific applications of interest, a- data transformation matrix as above. This locates the application chieves an average 0.83 and a worst case 0.64 rank correlation co- of interest in the benchmark space. Performance is then predicted efficient for the estimated speedups versus the measured speedups. by appropriately weighting the performance numbers of the bench- Our framework based on inherent program similarity achieves an marks in the neighborhood of the application of interest. average correlation coefficient of 0.89; the worst case correlation We now discuss a number of aspects of this framework: (i) the coefficient that we observe is 0.79. These results demonstrate that microarchitecture-independent characteristics, (ii) how to build the our framework is indeed capable of tracking performance differ- data transformation matrix, and (iii) how to compute a performance ences across platforms with different ISAs, compilers and micro- number for the application of interest. processors. To the best of our knowledge, this paper is the first to propose a methodology for predicting machine ranks for indi- 2.1 Microarchitectureindependent character vidual programs based on microarchitecture-independent program istics similarity. Ideally, the program characteristics that serve as input to our This paper is organized as follows. We first detail on our per- framework should be platform-independent characteristics. In other formance prediction framework. We then present our experimental words, they should be compiler-independent, ISA-independent and setup followed by the evaluation of our framework. Finally, we microarchitecture-independent in order to capture the true inherent discuss related work and conclude. program behavior. Since this is difficult to do, we take a prag- matic approach and use microarchitecture-independent character- istics. The characteristics that we collect are specific to a given 2. PERFORMANCE PREDICTION FRAME ISA and a given compiler, however they are independent of a given WORK microarchitecture, i.e., the characteristics are independent of cache Figure 1 illustrates the framework that we propose in this paper size, branch predictor size, processor core configuration, etc. As for predicting performance based on microarchitecture-independent will be shown in the evaluation section of this paper, these charac- program similarity. The framework assumes a collection of pro- teristics, inspite of being ISA-dependent and compiler-dependent, grams which we call the benchmark suite. For each of these bench- are accurate enough for tracking performance across different plat- marks, we have a collection of microarchitecture-independent char- forms with different ISAs and compilers. acteristics as well as performance numbers on a (number of) plat- Table 1 summarizes the 47 microarchitecture-independent char- form(s). The performance numbers could be obtained from simu- acteristics that we use in this paper. The range of microarchitecture- lation or from real hardware execution. These microarchitecture- independent characteristics is fairly broad in order to cover all ma- independent characteristics along with the performance numbers jor program behaviors such as instruction mix, inherent ILP, work- are then used to build a data transformation matrix — building the ing set sizes, memory strides, branch predictability, etc. Measur- data transformation matrix can also be done without using perfor- ing these program characteristics can be done efficiently through mance numbers, thus using microarchitecture-independent charac- instrumentation which is substantially faster than simulation. We teristics solely (hence the dashed line between the `performance include the following characteristics: numbers' box and the `build data transform' box in Figure 1). Once Instruction mix. We include the percentage of loads, stores, the data transformation matrix is computed, the original microarchi- control transfers, arithmetic operations, integer multiplies and floa-