HPC Scheduling in a Brave New World

HPC Scheduling in a Brave New World Gonzalo P. Rodrigo Alvarez´ PhD Thesis, March 2017 Department of Computing Science Ume˚a University Sweden Department of Computing Science Ume˚aUniversity SE-901 87 Ume˚a,Sweden [email protected] Copyright c 2017 by the authors Except Paper II, c Springer-Verlag, 2013 ISBN 978-91-7601-693-0 ISSN 0348-0542 UMINF 17.05 Printed by Print & Media, Ume˚aUniversity, 2017 Abstract Many breakthroughs in scientific and industrial research are supported by si- mulations and calculations performed on high performance computing (HPC) systems. These systems typically consist of uniform, largely parallel compute resources and high bandwidth concurrent file systems interconnected by low latency synchronous networks. HPC systems are managed by batch schedulers that order the execution of application jobs to maximize utilization while steering turnaround time. In the past, demands for greater capacity were met by building more power- ful systems with more compute nodes, greater transistor densities, and higher processor operating frequencies. Unfortunately, the scope for further increases in processor frequency is restricted by the limitations of semiconductor techno- logy. Instead, parallelism within processors and in numbers of compute nodes is increasing, while the capacity of single processing units remains unchanged. In addition, HPC systems' memory and I/O hierarchies are becoming deeper and more complex to keep up with the systems' processing power. HPC applications are also changing: the need to analyze large data sets and simulation results is increasing the importance of data processing and data-intensive applications. Moreover, composition of applications through workflows within HPC centers is becoming increasingly important. This thesis addresses the HPC scheduling challenges created by such new systems and applications. It begins with a detailed analysis of the evolution of the workloads of three reference HPC systems at the National Energy Research Supercomputing Center (NERSC), with a focus on job heterogeneity and scheduler performance. This is followed by an analysis and improvement of a fairshare prioritization mechanism for HPC schedulers. The thesis then surveys the current state of the art and expected near-future developments in HPC hard- ware and applications, and identifies unaddressed scheduling challenges that they will introduce. These challenges include application diversity and issues with workflow scheduling or the scheduling of I/O resources to support applications. Next, a cloud-inspired HPC scheduling model is presented that can accommodate application diversity, takes advantage of malleable applications, and enables short wait times for applications. Finally, to support ongoing scheduling research, an open source scheduling simulation framework is proposed that allows new scheduling algorithms to be implemented and evaluated in a production scheduler using workloads modeled on those of a real system. The thesis concludes with the presentation of a workflow scheduling algorithm to minimize workflows' turnaround time without over-allocating resources. iii Sammanfattning p˚asvenska M˚angagenombrott i vetenskaplig och industriell forskning stödsav simuleringar och beräkningarp˚ahögpresterandedatorsystem, s˚akallade HPC-system. Tra- ditionellt best˚ardessa av en stor mängdparallella och homogena datorresurser som ärsammankopplade med ett synkront nätverk med högbandbredd och l˚ag latens. Resurserna i HPC-systemet hanteras av en schemaläggaresom planerar var och närolika applikationer ska exekveras föratt minimera användarnas väntetider och maximera resursnyttjandet. Till förn˚agra˚arsedan skedde den normala utvecklingen av alltmer kraft- fulla HPC-system genom att man ökade antalet noder i det parallella systemet samt ökade transistordensiteten och klockfrekvensen i processorerna. P˚asenare tid kan man p˚agrund av halvledartekniska begränsningar inte längredra fördel av ökade klockfrekvenser. Iställetökas kapaciteten genom ökad parallellitet i processorer och i form av ökat antal datornoder. En annan del av utvecklingen äratt HPC-systemens minnes- och I/O-hierarkier blir allt djupare och mer komplexa föratt h˚allajämnasteg med utvecklingen av processorkraft. De applikationer som körsp˚aHPC-systemen förändrasocks˚a:behovet att analysera stora datamängderoch simuleringsresultat ökar betydelsen av datahantering och dataintensiva applikationer. Därtillhar behovet av tillämpningari form av l˚angaarbetsflödeniställetförenskilda jobb ocks˚aökat. Denna avhandling studerar hur denna utveckling skapar nya utmaningar förframtidens HPC-systems schemaläggningssystem. Studien inleds med en detaljerad analys av utvecklingen av tre referenssystem vid National Energy Research Supercomputing Center (NERSC) vid Lawrence Berkeley Lab, med fokus p˚ajobbens heterogenitet och schemaläggarnasprestanda. Detta följsav en analys och förbättringav en prioriteringsmekanism (s˚akallad fairshare) för schemaläggare.Därefterpresenteras en studie av nulägeoch förväntad framtida utveckling vad gällerHPC-h˚ardvara och -applikationer. Studien presenterar ett antal viktiga utmaningar förframtida schemaläggare,som t.ex. applikation- ernas ökade heterogeneitet, behovet av att effektivt schemaläggaarbetsflöden samt schedulering av I/O-resurser. Därefterpresenteras en modell försche- maläggningförHPC-system som utvecklats med inspiration fr˚anschemaläggn- ing fördatormoln. Avhandlingen presenterar ocks˚aen simulator, tillgänglig som öppen källkod, förforskning kring schemaläggning. Simulatorn gördet möjligtatt studera nya algoritmer förschemaläggningi ett system av produk- tionskaraktär. Vi avslutar med att presentera en algoritm förschemaläggn- ing av arbetsflödenmed m˚alsättningatt minimera användarnasväntetid och minimera resursutnyttjandet. v Preface This thesis contains an introduction that summarizes key concepts that the author considers useful to understand before reading research articles about HPC scheduling, including the definition of and need for high performance computing, basic batch scheduling, and the characteristics of present and future HPC systems together with the associated scheduling challenges. The thesis includes the following papers: Paper I Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ Erik Elmroth, Katie Anty- pas, Richard Gerber, and Lavanya Ramakrishnan. Towards Un- derstanding HPC Users and Systems: A NERSC Case Study. Sub- mitted, 2017. This paper is an extended, joint version of papers IX and X. Paper II Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ and Erik Elmroth. Priority Operators for Fairshare Scheduling. In Proceedings of the 18th Workshop on Job Scheduling Strategies for Parallel Processing, pp. 70-89, Springer International Publishing, 2014. Paper III Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ Erik Elmroth, and Lavanya Ramakrishnan. A2L2: An Application Aware Flexible HPC Sche- duling Model for Low-Latency Allocation. In Proceedings of the 8th International Workshop on Virtualization Technologies in Dis- tributed Computing (VTDC'15), pp. 11-19, ACM, 2015. Paper IV Gonzalo P. Rodrigo, Erik Elmroth, Per-Olov Ostberg,¨ and Lavanya Ramakrishnan. ScSF: A Scheduling Simulation Framework. In Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, Accepted, Springer International Publishing, 2017. Paper V Gonzalo P. Rodrigo, Erik Elmroth, Per-Olov Ostberg,¨ and Lavanya Ramakrishnan. Enabling workflow aware scheduling on HPC systems. Submitted, 2017. vii In addition, the following publications were produced during the author's PhD studies but are not included in the thesis: Paper VI Gonzalo P. Rodrigo. Proof of compliance for the relative operator on the proportional distribution of unused share in an ordering fairshare system. Technical report UMINF-14.14, Ume˚aUniversity, 2014 Paper VII Gonzalo P. Rodrigo. Establishing the equivalence between operators: theorem to establish a sufficient condition for two operators to produce the same ordering in a Fairshare prioritization system. Technical report UMINF-14.15, Ume˚aUniversity, 2014 Paper VIII Gonzalo P. Rodrigo. Theoretical analysis of a workflow aware scheduling algorithm. Technical report UMINF-17.06, Ume˚aUni- versity, 2017 Paper IX Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ Erik Elmroth, Katie Anty- pas, Richard Gerber, and Lavanya Ramakrishnan. HPC System Lifetime Story: Workload Characterization and Evolutionary Ana- lyses on NERSC Systems. In Proceedings of the 24th Interna- tional ACM Symposium on High-Performance Distributed Com- puting (HPDC'15), ACM 2015 Paper X Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ Erik Elmroth, Katie Anty- pas, Richard Gerber, and Lavanya Ramakrishnan. Towards Un- derstanding Job Heterogeneity in HPC: A NERSC Case Study. In Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'16), IEEE/ACM 2016 This material is based upon work supported in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Finan- cial support has been provided in part by the Swedish Government's strategic effort eSSENCE, by the European Union's Seventh Framework Programme under grant agreement 610711 (CACTOS), the European Union's Framework

Load more