Multi-Level Parallel Branch-And-Bound Algorithms for Solving Permutation Problems on GPU-Accelerated Clusters Mohand Mezmaz
Total Page:16
File Type:pdf, Size:1020Kb
Multi-level Parallel Branch-and-Bound Algorithms for Solving Permutation Problems on GPU-accelerated Clusters Mohand Mezmaz To cite this version: Mohand Mezmaz. Multi-level Parallel Branch-and-Bound Algorithms for Solving Permutation Prob- lems on GPU-accelerated Clusters. Complexité [cs.CC]. Université Lille 1, 2020. tel-03100303 HAL Id: tel-03100303 https://hal.archives-ouvertes.fr/tel-03100303 Submitted on 6 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Public Domain École Doctorale Sciences pour l’Ingénieur Université Lille Nord de France HABILITÉ À DIRIGER DES RECHERCHES DISCIPLINE : INFORMATIQUE présentée et soutenue publiquement, le 18 septembre 2020, par Mohand Mezmaz Multi-level Parallel Branch-and-Bound Algorithms for Solving Permutation Problems on GPU-accelerated Clusters Algorithmes Parallèles Multi-niveaux de Séparation et Évaluation pour la Résolution de Problèmes de Permutation sur des Clusters Accélérés par GPUs Jury Président : Pierre Boulet Professeur, Université Lille Nord de France Rapporteurs : Enrique Alba Professeur, Université de Málaga Frédéric Saubion Professeur, Université d’Angers Pierre Sens Professeur, Sorbonne Université Examinateurs : Pascal Bouvry Professeur, Université du Luxembourg Amir Nakib MdC HDR, Université Paris-Est Créteil Garant : Nouredine Melab Professeur, Université Lille Nord de France 0 1 2 3 Acknowledgments Je tiens tout particulièrement à exprimer ma profonde gratitude à Nouredine Melab, Garant de mon HDR: tannemirt! C’est une grande chance d’avoir continué, après ma thèse, de bénéficier de la rigueur de Nouredine, de sa motivation et de ses conseils. Mes remerciements sincères vont également à Enrique Alba, Frédéric Saubion et Pierre Sens de m’avoir fait l’honneur de rapporter cette HDR, ainsi qu’a Pierre Boulet, Pascal Bouvry et Amir Nakib d’avoir accepté de faire partie de mon jury. Leur grande implication dans cette soutenance m’a permis de présenter mon HDR, même dans des conditions particulières. Je voudrais aussi adresser un grand merci à Daniel Tuyttens, Chef du Service MARO de l’Université de Mons. Par sa confiance et ses conseils, Daniel a beaucoup contribué à mes recherches depuis toutes ces années. Mes remerciements sincères vont également à Rudi Leroy et Jan Gmys, que j’ai co- encadré durant leurs thèses. Leurs contributions constituent les ingrédients secrets du succès de cette HDR. Merci à tous les collègues et anciens collègues du Service MARO d’avoir participé à cet esprit d’équipe formidable. Mes derniers remerciements et non les moindres vont à ma famille, et en particulier à mes parents. Leurs efforts, leurs patiences et leurs encouragements sont déterminants pour cette réussite. 4 Contents 1 Main research activities since my PhD1 2 Introduction9 3 Background and related works 15 3.1 Introduction................................... 15 3.2 Permutation problems............................. 16 3.2.1 Permutations.............................. 16 3.2.2 Flowshop problem........................... 17 3.3 Linked-list based PB&B............................. 18 3.3.1 Branching................................ 20 3.3.2 Bounding................................ 21 3.3.3 Pruning................................. 22 3.3.4 Selection................................ 23 3.4 Related works.................................. 24 3.4.1 Parallel CPU B&B............................ 24 3.4.2 Parallel GPU B&B............................ 26 3.4.3 Cluster B&B............................... 28 3.5 Conclusions................................... 29 4 Single-core IVM-based permutation B&B 31 4.1 Introduction................................... 31 4.2 IVM data structure............................... 32 4.3 PB&B Operations................................ 34 4.3.1 Selection................................ 35 4.3.2 Branching................................ 37 4.3.3 Bounding................................ 37 4.3.4 Pruning................................. 39 4.3.5 Initialization and finalization..................... 39 4.4 Experiments................................... 40 4.4.1 Experimental protocol......................... 40 4.4.2 Obtained results............................ 42 4.5 Conclusions................................... 44 6 Contents 5 Multi-core Interval-based permutation B&B 47 5.1 Introduction................................... 47 5.2 Factoradic intervals............................... 49 5.2.1 Number systems............................ 49 5.2.2 Factorial number system........................ 51 5.3 Interval-based work stealing.......................... 57 5.3.1 Work unit: interval of factoradics................... 57 5.3.2 Work unit communication....................... 58 5.3.3 Victim selection policies........................ 60 5.3.4 Granularity policies........................... 63 5.4 Experiments................................... 64 5.4.1 Experimental protocol......................... 64 5.4.2 Pool management evaluation..................... 66 5.4.3 Interval-based work stealing evaluation............... 67 5.5 Conclusions................................... 69 6 GPU-centric permutation Branch-and-Bound 71 6.1 Introduction................................... 71 6.2 IVM-based Data structures........................... 73 6.2.1 Memory requirements......................... 73 6.2.2 Memory constraint: Number of IVMs................. 74 6.3 Operations................................... 75 6.3.1 Bound mapping operation....................... 75 6.3.2 IVM mapping operation........................ 79 6.3.3 Work stealing operation on GPU................... 80 6.4 Experiments................................... 83 6.4.1 Experimental protocol......................... 83 6.4.2 Evaluation of bound mapping schemes................ 84 6.4.3 Evaluation of IVM mapping schemes................. 86 6.4.4 Evaluation of work stealing schemes................. 88 6.4.5 Comparison of PB&B@GPU and GPU LL PB&B ............ 91 6.5 Conclusions................................... 92 7 Permutation B&B for GPU-powered clusters 95 7.1 Introduction................................... 95 7.2 Interval-list operators.............................. 96 7.2.1 Interval operators........................... 96 Contents 7 7.2.2 Interval-list operators......................... 97 7.3 Data structures and operations........................ 99 7.3.1 Data structures............................. 99 7.3.2 Work processing operations...................... 101 7.4 Experiments................................... 103 7.4.1 Experiments on a cluster of GPUs................... 103 7.4.2 Experiments on a cluster of GPUs and CPUs............. 108 7.4.3 Resolution of a big instance...................... 109 7.5 Conclusions................................... 113 8 Conclusions and perspectives 115 Bibliography 121 8 Contents C HAPTER 1 Main research activities since my PhD The research works done in my PhD thesis, defended at the end of 2007, focused on the parallelization of combinatorial optimization algorithms. During this thesis, we mainly presented a new approach, called B&B@Grid [MMT07b], for the parallelization of Branch- and-Bound (B&B) algorithms. The results of our experiments show that the coding of B&B@Grid divides the size of the communicated information by an average of 766. These experiments have also shown that the coding of the communicated information, performed using B&B@Grid, is on average 92 smaller than the coding of the token approach of PICO [EPH00], and 465 smaller than the coding of the variable approach published in [IF00]. These experi- ments demonstrate the advantage of using B&B@Grid in clusters to reduce the size of the communicated information, and therefore the communication delays. On the other hand, despite the high number of CPU cores in our clusters, about 1111 on average, the experiments, carried out with the master-worker paradigm, showed that the master exploits its processor at 1:29% on average, and that workers spend an average of 99:94% of their time in computing. These two percentages are good indicators of the quality of B&B@Grid’s load balancing strategy and its ability to scale up. The B&B@Grid approach is used to solve an instance of the flowshop problem, known as Ta056 (50 jobs and 20 machines), and published in [Tai93] which has never been optimally solved before. In terms of the used computing power, the Ta056 resolution ranks second among the large-scale challenges tackled in combinatorial optimization field before 2007. On average, 328 processors were used for more than 25 days, and a peak of about 1200 CPU cores was recorded during this resolution. In the recent years, GPU-powered clusters appeared to be as more and more inter- esting alternatives for the parallelization of some algorithms. GPU accelerators are often used in today’s largest high performance computing systems for regular and data parallel applications. However, B&B algorithms are highly irregular in terms of workload, control flow and memory access patterns. The research works, presented in this document, are 2 Chapter 1. Main research