Estimating Parallel Runtimes for Randomized Algorithms in Constraint Solving Charlotte Truchet, Alejandro Arbelaez, Florian Richoux, Philippe Codognet

Estimating parallel runtimes for randomized algorithms in constraint solving Charlotte Truchet, Alejandro Arbelaez, Florian Richoux, Philippe Codognet To cite this version: Charlotte Truchet, Alejandro Arbelaez, Florian Richoux, Philippe Codognet. Estimating parallel runtimes for randomized algorithms in constraint solving. Journal of Heuristics, Springer Verlag, 2015, pp.1-36. <10.1007/s10732-015-9292-3>. <hal-01248168> HAL Id: hal-01248168 https://hal.archives-ouvertes.fr/hal-01248168 Submitted on 24 Dec 2015 HAL is a multi-disciplinary open access L'archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinéeau dépôtet àla diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiésou non, lished or not. The documents may come from émanant des établissements d'enseignement et de teaching and research institutions in France or recherche fran¸caisou étrangers,des laboratoires abroad, or from public or private research centers. publics ou privés. Journal of Heuristics manuscript No. (will be inserted by the editor) Estimating Parallel Runtimes for Randomized Algorithms in Constraint Solving Charlotte Truchet · Alejandro Arbelaez · Florian Richoux · Philippe Codognet the date of receipt and acceptance should be inserted later Abstract This paper presents a detailed analysis of the scalability and parallelization of Local Search algorithms for constraint-based and SAT (Boolean satisfiability) solvers. We propose a framework to estimate the parallel performance of a given algorithm by analyzing the runtime behavior of its sequential version. Indeed, by approximating the runtime distribution of the sequential process with statistical methods, the runtime behavior of the parallel process can be predicted by a model based on order statistics. We apply this ap- proach to study the parallel performance of a Constraint-Based Local Search solver (Adaptive Search), two SAT Local Search solvers (namely Sparrow and CCASAT), and a propagation-based constraint solver (Gecode, with a random labeling heuristic). We compare the performance predicted by our model to actual parallel implementations of those methods using up to 384 processes. We show that the model is accurate and predicts performance close to the empirical data. Moreover, as we study different types of problems, we observe that the experimented solvers exhibit different behaviors and that their runtime distributions can be approximated by two types of distributions: exponential (shifted and non-shifted) and lognormal. Our results show that the proposed framework estimates the runtime of the parallel algorithm with an average discrepancy of 21% w.r.t. the empirical data across all the experiments with the maximum allowed number of processors for each technique. Charlotte Truchet, Florian Richoux LINA, UMR 6241/ University of Nantes E-mail: fcharlotte.truchet,fl[email protected] Alejandro Arbelaez INSIGHT Centre for Data Analytics / University College Cork E-mail: [email protected] Philippe Codognet JFLI - CNRS / UPMC / University of Tokyo E-mail: [email protected] 2 Charlotte Truchet et al. 1 Introduction In the last years, parallel algorithms for solving hard combinatorial problems, such as Constraint Satisfaction Problems (CSP), have been of increasing in- terest in the scientific community. The combinatorial nature of the problem makes it difficult to parallelize existing solvers, without costly communication schemes. Several parallel schemes have been proposed for incomplete or complete solvers, one of the most popular (as observed in the latest SAT competi- tions www.satcompetition.org) being to run several competing instances of the algorithm on different processes with different initial conditions or parameters, and let the fastest process win over others. The resulting parallel algorithm thus terminates with the minimal runtime among the launched processes. The framework of independent multi-walk parallelism, seems to be a promising way to deal with large-scale parallelism. Cooperative algorithms might perform well on shared-memory machines with a few tens of processors, but are difficult to extend efficiently on distributed hardware. This leads to so-called independent multi-walk algorithms in the CSP community [55] and portfolio algorithms in the SAT community (satisfiability of Boolean formulae) [28]. However, although it is easy to obtain good Speed-up on a small-scale parallel machine (viz. with a few tens of processes), it is not easy to know how a parallel variant of a given algorithm would perform on a massively parallel machine (viz. with thousands of processes). Parallel performance models are thus particularly important for parallel constraint solvers, and any indication on how a given algorithm (or, more precisely, a pair formed by the algorithm and the problem instance) would scale on massively parallel hardware is valuable. If it becomes possible to estimate the maximum number of processes until which parallelization is efficient, then the actual parallel computing power needed to solve a problem could be deduced. This piece of information might be quite relevant, since supercomputers or systems such as Google Cloud and Amazon EC2 can be rented by processor-hour with a limit on the maximum number of processors to be used. In this context, modelling tools for the behavior of parallel algorithms are expected to be very valuable in the future. The goal of this paper is to study the parallel performance of randomized constraint solving algorithms under the independent multi-walk scheme, and to model the performance of the parallel execution from the runtime distribution of sequential runs of a given algorithm. Randomized constraint solvers consid- ered in this paper include Local Search algorithms for Constraint Satisfaction Problems, Local Search techniques for SAT, and complete algorithms with random components e.g., a propagation-based backtrack search with random heuristics. An important application of this work relates to the increasing com- putational power being available in cloud systems (e.g., Amazon Cloud EC2, Google Cloud and Microsoft Azure), a good estimate on how the algorithm scales might allow users to rent just the right number of cores. Most papers on the performance of stochastic Local Search algorithms focus on the average runtime in order to measure the performance of both sequential and parallel executions. However, a more detailed analysis of the runtime behavior could Parallel Randomized Algorithms 3 be done by looking at the runtime of the algorithm (e.g., CPU-time or number of iterations) as a random variable and performing a statistical analysis of its probability distribution. More precisely, we first approximate the empirical sequential runtime distribution by a well-known statistical distribution (e.g., exponential or lognormal) and then derive the runtime distribution of the parallel version of the solver. Our model is related to order statistics, a rather recent domain of statistics [23], which is the statistics of sorted random draws. Our method encompasses any algorithm whose solving time is random and makes it possible to formally determine the average parallel runtime of such algorithms for any number of processors. For Local Search, we will consider algorithms in the framework of Las Vegas algorithms [10], a class of algorithms related to Monte-Carlo algorithms introduced a few decades ago, whose runtime may vary from one execution to another, even on the same input. The classical parallelization scheme of multi-walks for Local Search methods can easily be generalized to any Las Vegas algorithm. We will study two different sets of algorithms and problems: first, a Constraint-Based Local Search solver on CSP instances, and, secondly, two SAT Local Search solvers on random and crafted instances. Interestingly, this general framework encompasses other types of Las Vegas algorithms, and we will also apply it to a propagation-based constraint solver with a random labeling procedure on CSP instances. We will confront the performance predicted by the statistical model with actual speed-ups obtained for parallel implementations of the above-mentioned algorithms and show that the prediction can be quite accurate, matching the actual speed-up up to a large number of processors. More interestingly, we can also model both the initial and the asymptotic behavior of the parallel algorithm. This paper extends [51] and [9] by giving a detailed presentation of the runtime estimation model, based on order statistics, and by validating the model on randomized propagation-based constraint solvers, extensive experimental results for stochastic local search algorithms on well-known CSP instances from CSPLib and SAT instances obtained from the international SAT com- petition. Additionally, we provide a more detailed theoretical analysis of the reference distributions used for predicting the parallel performance. The paper is organized as follows. Section 2 presents the existing ap- proaches in parallel constraint solving, and formulates the question we address in the following. Section 3 details our probabilistic model for the class of parallel algorithms we tackle in this article, based on Las Vegas algorithms. Several such algorithms can be used in constraint solving. Each of the three last sec- tions is dedicated to a specific family of Las Vegas algorithm: Constraint-Based Local Search in

Load more