Contributions for Resource and Job Management in High Performance Computing Yiannis Georgiou
Total Page:16
File Type:pdf, Size:1020Kb
Contributions for Resource and Job Management in High Performance Computing Yiannis Georgiou To cite this version: Yiannis Georgiou. Contributions for Resource and Job Management in High Performance Computing. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Grenoble, 2010. English. tel- 01499598 HAL Id: tel-01499598 https://hal-auf.archives-ouvertes.fr/tel-01499598 Submitted on 31 Mar 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Public Domain THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE Spécialité : Informatique Arrêté ministériel : 7 août 2006 Présentée par « Yiannis GEORGIOU » Thèse dirigée par « Olivier RICHARD » et codirigée par « Jean-Francois MEHAUT » préparée au sein du Laboratoire d'Informatique de Grenoble dans l'École Doctorale de Mathématiques, Sciences et Technologies de l'Information, Informatique Contributions for Resource and Job Management in High Performance Computing Thèse soutenue publiquement le « 5 novembre 2010 », devant le jury composé de : M. Daniel, HAGIMONT Professeur à INPT/ENSEEIHT, France, Président M. Franck, CAPPELLO Directeur de Recherche à INRIA, France, Rapporteur M. William T.C., KRAMER Directeur de Recherche à NCSA, USA, Rapporteur M. Morris, JETTE Informaticien au LLNL, USA, Membre Mme. Pascale, ROSSE-LAURENT Architecte Informatique à BULL, France, Membre M. Jean-Francois, MEHAUT Professeur à l'UJF, France, Membre M. Olivier, RICHARD Maître de Conférence à l'UJF, France, Membre CONTRIBUTIONS FOR RESOURCE AND JOB MANAGEMENT IN HIGH PERFORMANCE COMPUTING Yiannis Georgiou November 2010 Sth γιαγιά mou KrustallÐa Abstract High Performance Computing is characterized by the latest technological evolutions in com- puting architectures and by the increasing needs of applications for computing power. A particular middleware called Resource and Job Management System (RJMS), is responsible for delivering computing power to applications. The RJMS plays an important role in HPC since it has a strategic place in the whole software stack because it stands between the above two layers. However the latest evolutions in hardware and applications layers have provided new levels of complexities to this middleware. Issues like scalability, management of topological constraints, energy efficiency and fault tolerance have to be particularly considered, among others, in order to provide a better system exploitation from both the system and user point of view. This dissertation provides a state of the art upon the fundamental concepts and research issues of Resources and Jobs Management Systems. It provides a multi-level comparison (concepts, functionalities, performance) of some of the most widely used Resource and Jobs Management Systems in High Performance Computing. An important metric to evaluate the work of a RJMS on a platform is the observed system utilization. However studies and logs of production platforms show that HPC systems in general suffer of significant unutilization rates. Our study deals with these clusters’ unutilization periods by proposing methods to aggregate otherwise unutilized resources for the benefit of the system or the application. More particularly this thesis explores RJMS level mechanisms: 1) for increasing the jobs valuable computation rates in the high volatile environments of a lightweight grid context, 2) for improving system utilization with malleability techniques and 3) providing energy efficient system management through the exploitation of idle computing machines. The experimentation and evaluation in this type of contexts provide important complexities due to the inter-dependency of multiple parameters that have to be taken into control. In our study we have developed a methodology based upon real-scale controlled experimentation with submission of synthetic or real workload traces. Acknowledgements First of all I would like to thank my advisor Olivier Richard who accepted me as I am, back in the days when my main motivation for an internship in the laboratory was the proximity of ski resorts. Thank you for giving me the opportunity to discover the world of research in computer science. Thank you for your patience and continuous support during my internship, master’s and PhD. The writing of this PhD would never have been possible without your guidance and motivation throughout all these years. Then I would like to express my gratitude to my second advisor Pascale Rosse-Laurent for her mentorship within the enterprise environment. Thank you for all the time, interesting discussions and precious advices. Thank you for showing me the way and supporting me during my PhD and afterwards. Besides my advisors, I would like to thank my colleague and friend Marcia Cristina Cera with whom I collaborated during our PhDs. She was a very nice person always cheerful and motivated. It was a pleasure working with her and it is so difficult for me to believe that she is not here anymore. I also thank Prof. Jean-Francois Mehaut for his time and exchanges and Andry Razafinjatovo for the motivation and support throughout the final stage of my thesis. My sincere thanks goes to the reporters of my thesis Dr. Franck Cappello, Dr. William T.C. Kramer and Mr. Morris Jette for their precious comments and corrections. I would also like to thank my colleagues from the laboratory and enterprise for the nice working environment. A special thanks goes to all my friends and flat-mates particularly Eleni, Lauriane, Vasilis and Alexandros. Thanks for all the fun and for standing by me during the difficult moments of this thesis. Last but not least, I would like to thank my family for their spiritual and financial support throughout my academic studies all these years. Contents List of Tables . 7 List of Figures . 9 1 Introduction 13 1.1 Motivation . 14 1.2 Research Subjects . 15 1.2.1 Scheduling and Management of Resources and Jobs . 15 1.2.2 Techniques for efficient resources exploitation . 16 1.2.3 Experimental Methodology based on workload traces . 19 1.3 Thesis Overview . 20 2 Evaluation Methodology for Resource and Job Management Systems 21 2.1 Related Work . 22 2.1.1 Experimentation methodologies . 23 2.1.2 Workload modelling, characterization and benchmarks . 26 2.2 Real-scale experimentation upon Grid5000 platform . 28 2.2.1 Grid5000 Design and Architecture . 28 2.2.2 Automatization and Reproduction techniques . 33 2.3 Evaluation Methodology based upon Replay of Workloads . 34 2.3.1 System factors . 35 2.3.2 Workloads . 36 2.3.3 Executed Applications . 41 2.3.4 Evaluation Metrics . 45 2.3.5 Xionee: Workload Trace Analysis, Visualization and Automatic Injection Toolkit . 46 2.4 Conclusions . 49 3 3 Comparisons of Resource and Job Management Systems 51 3.1 Background and Related Work . 52 3.2 RJMS Principles . 54 3.2.1 Resource Management . 57 3.2.2 Job Management . 59 3.2.3 Scheduling . 61 3.3 Research challenges . 64 3.3.1 Topology aware Placement . 64 3.3.2 High Availability . 67 3.3.3 Energy Consumption . 68 3.3.4 Launcher and Scheduler . 69 3.4 Survey of Resource and Job Management Systems . 73 3.4.1 Commercial RJMS . 73 3.4.2 Opensource RJMS . 78 3.4.3 SLURM . 81 3.4.4 OAR . 86 3.4.5 Synthesis . 94 3.5 Performance Evaluation of opensource RJMS . 96 3.5.1 Launching and Scheduling Evaluation . 97 3.5.2 Network Topology Aware placement Evaluation . 102 3.5.3 Scalability and Efficiency Evaluation . 107 3.6 Conclusions . 114 4 Improving system utilization in a lightweight grid context 116 4.1 Background Information and Related Work . 118 4.1.1 Grid and alternative grid technologies . 118 4.1.2 Fault Treatment upon grid technologies . 119 4.1.3 Scheduling for bag-of-tasks applications . 120 4.1.4 Checkpoint-Restart Recovery technique . 121 4.2 An alternative lightweight grid computing approach for bag-of-tasks applications . 122 4.2.1 Lightweight grid and cluster utilization policy . 122 4.2.2 Global Architecture . 123 4.3 Enhancements for turnaround time optimization . 127 4.3.1 Transparent Checkpoint/Restart mechanism for a lightweight grid . 128 4 4.3.2 Periodic and Triggered checkpoints strategies . 129 4.4 Experimentation Results . 130 4.4.1 Deploying a lightweight grid upon Grid5000 platform . 131 4.4.2 Performance Evaluation . 133 4.5 Conclusions . 137 5 Improving resources exploitation with Malleability techniques 139 5.1 Background and Related Work . 140 5.1.1 Dynamic Applications and Programming . 141 5.1.2 Resource Management and Dynamicity . 143 5.1.3 RJMS and Applications Communication protocols . 144 5.1.4 Scheduling with Dynamic Applications . 146 5.2 Malleability techniques: Implementation and Experimental Testbed . 148 5.2.1 Implementation upon OAR resource manager . 148 5.2.2 Experimentation through controlled platform and real traces . 149 5.3 Malleability on Clusters with Dynamic MPI . 151 5.3.1 Dynamic MPI mechanism Requirements . 151 5.3.2 Evaluating Dynamic MPI technique . 153 5.4 Malleability on Multicore Nodes with Dynamic CPUSET Mapping . 157 5.4.1 Dynamic CPUSETs Mapping Requirements . 158 5.4.2 Evaluating Dynamic CPUSETs Mapping technique . 160 5.5 Improving Resource Utilization using Malleability . 165 5.6 Conclusions . 169 6 Energy Efficient Management Techniques 171 6.1 Related Work . 172 6.2 Measuring Energy Consumption upon Grid5000 . 174 6.2.1 The Hardware Energy-Meters . 174 6.2.2 A Library to Interface with Energy-Meters . 175 6.3 Energy Reduction through idle resources manipulation . 176 6.3.1 Adapting OAR to exploit idle resources for energy conservations .