Benchmarking Optimization Software with Performance Pro�Les

ARGONNE NATIONAL LABORATORY South Cass Avenue Argonne Illinois Benchmarking Optimization Software with Performance Proles Elizab eth D Dolan and Jorge J More Mathematics and Computer Science Division Preprint ANLMCSP January This work was supp orted by the Mathematical Information and Computational Sciences Division subprogram of the Oce of Advanced Scientic Computing US Department of Energy under Contract WEng and by the National Science Foundation Chal lenges in Computational Science grant CDA and Information Technology Re search grant CCR Contents Introduction Performance Evaluation Benchmarking Data Case Study Optimal Control and Parameter Estimation Problems Case Study The Full COPS Case Study Linear Programming Conclusions Acknowledgments References Benchmarking Optimization Software with Performance Proles y z Elizab eth D Dolan and Jorge J More Abstract We prop ose p erformance proles distribution functions for a p erformance metric as a to ol for b enchmarking and comparing optimization software We show that p er formance proles combine the b est features of other to ols for p erformance evaluation Introduction The b enchmarking of optimization software has recently gained considerable visibility Hans Mittlemanns work on a variety of optimization software has frequently uncovered deciencies in the software and has generally led to software improvements Although Mittelmanns eorts have gained the most notice other researchers have b een concerned with the evaluation and p erformance of optimization co des As recent examples we cite The interpretation and analysis of the data generated by the b enchmarking pro cess are the main technical issues addressed in this pap er Most b enchmarking eorts involve tables displaying the p erformance of each solver on each problem for a set of metrics such as CPU time number of function evaluations or iteration counts for algorithms where an iteration implies a comparable amount of work Failure to display such tables for a small test set would b e a gross omission but they tend to b e overwhelming for large test sets In all cases the interpretation of the results from these tables is often a source of disagreement The quantities of data that result from b enchmarking with large test sets have spurred researchers to try various to ols for analyzing the data The solvers average or cumulative total for each p erformance metric over all problems is sometimes used to evaluate p erfor mance As a result a small number of the most dicult problems can tend to dominate these results and researchers must take pains to give additional information An other drawback is that computing averages or totals for a p erformance metric necessitates discarding problems for which any solver failed eectively biasing the results against the most robust solvers As an alternative to disregarding some of the problems a p enalty value can b e assigned for failed solver attempts but this requires a sub jective choice for the p enalty Most researchers choose to rep ort the number of failures only in a fo otnote or separate table This work was supp orted by the Mathematical Information and Computational Sciences Division subprogram of the Oce of Advanced Scientic Computing US Department of Energy under Contract WEng and by the National Science Foundation Challenges in Computational Science grant CDA and Information Technology Research grant CCR y Department of Electrical and Computer Engineering Northwestern University and Mathematics and Computer Science Division Argonne National Lab oratory Argonne Illinois dolanmcsanlgov z Mathematics and Computer Science Division Argonne National Lab oratory Argonne Illinois moremcsanlgov To address the shortcomings of the previous approach some researchers rank the solvers In other words they count the number of times that a solver comes in k th place usually for k Ranking the solvers p erformance for each problem helps prevent a minority of the problems from unduly inuencing the results Information on the size of the improvement however is lost Comparing the medians and quartiles of some p erformance metric for example the dierence b etween solver times app ears to b e a viable way of ensuring that a minority of the problems do not dominate the results but in our testing we have witnessed large leaps in quartile values of a p erformance metric rather than gradual trends If only quartile data is used then information on trends o ccurring b etween one quartile and the next is lost and we must assume that the journey from one p oint to another pro ceeds at a mo derate pace Also in the sp ecic case of contrasting the dierences b etween solver times the comparison fails to provide any information on the relative size of the improvement A nal drawback is that if results are mixed interpreting quartile data may b e no easier than using the raw data and dealing with comparisons of more than two solvers might b ecome unwieldy The idea of comparing solvers by the ratio of one solvers runtime to the b est runtime app ears in with solvers rated by the p ercentage of problems for which a solvers time is termed very competitive or competitive The ratio approach avoids most of the diculties that we have discussed providing information on the p ercent improvement and eliminating the negative eects of allowing a small p ortion of the problems to dominate the conclusions The main disadvantage of this approach lies in the authors arbitrary choice of limits to dene the b orders of very competitive and competitive In Section we introduce performance proles as a to ol for evaluating and comparing the p erformance of optimization software The p erformance prole for a solver is the cu mulative distribution function for a p erformance metric In this pap er we use the ratio of the computing time of the solver versus the b est time of all of the solvers as the p erformance metric Section provides an analysis of the test set and solvers used in the b enchmark results of Sections and This analysis is necessary to understand the limitations of the b enchmarking pro cess Sections and demonstrate the use of p erformance proles with results obtained with version of the COPS test set We show that p erformance proles eliminate the inuence of a small number of problems on the b enchmarking pro cess and the sensitivity of results asso ciated with the ranking of solvers Performance proles provide a means of visu alizing the exp ected p erformance dierence among many solvers while avoiding arbitrary parameter choices and the need to discard solver failures from the p erformance data We conclude in Section by showing how p erformance proles apply to the data of Mittelmann for linear programming solvers This section provides another case study of the use of p erformance proles and also shows that p erformance proles can b e applied to a wide range of p erformance data Performance Evaluation Benchmark results are generated by running a solver on a set P of problems and recording information of interest such as the number of function evaluations and the computing time In this section we introduce the notion of a p erformance prole as a means to evaluate and compare the p erformance of the solvers on a test set P We assume that we have n solvers and n problems We are interested in using com s p puting time as a p erformance measure although the ideas b elow can b e used with other measures For each problem p and solver s we dene t computing time required to solve problem p by solver s ps If for example the number of function evaluations is the p erformance measure of interest set t accordingly ps We require a baseline for comparisons We compare the p erformance on problem p by solver s with the b est p erformance by any solver on this problem that is we use the performance ratio t ps ps min ft s n g ps s We assume that a parameter for all p s is chosen and if and only if M ps ps M solver s do es not solve problem p The p erformance of solver s on any given problem may b e of interest but we would like to obtain an overall assessment of the p erformance of the solver If we dene n o size p P p ps s n p then p is the probability that a p erformance ratio is within a factor of the b est s ps p ossible ratio The function p is the cumulative distribution function for the p erformance s ratio We use the term performance prole for the distribution function of a p erformance met ric Our claim is that a plot of the p erformance prole reveals all of the ma jor p erformance characteristics In particular if the set of problems P is suitably large and representative of problems that are likely to o ccur in applications then solvers with large probability p s are to b e preferred The term p erformance prole has also b een used for a plot of some p erformance metric versus a problem parameter For example Higham pages plots the ratio kAk where is the estimate for the l norm of a matrix A pro duced by the LAPACK 1 1 condition number estimator Note that in Highams use of the term p erformance prole there is no attempt at determining a distribution function The p erformance prole p R for a solver is a nondecreasing piecewise constant s function continuous from the right at each breakp oint The value of p is the probability s that the solver will win over the rest of the solvers Thus if we are interested only in the number of wins we need only to compare the values of p for all of the solvers s The denition of the p erformance prole

Benchmarking Optimization Software with Performance Pro�Les

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support