
HPC Scheduling in a Brave New World Gonzalo P. Rodrigo Alvarez´ PhD Thesis, March 2017 Department of Computing Science Ume˚a University Sweden Department of Computing Science Ume˚aUniversity SE-901 87 Ume˚a,Sweden [email protected] Copyright c 2017 by the authors Except Paper II, c Springer-Verlag, 2013 ISBN 978-91-7601-693-0 ISSN 0348-0542 UMINF 17.05 Printed by Print & Media, Ume˚aUniversity, 2017 Abstract Many breakthroughs in scientific and industrial research are supported by si- mulations and calculations performed on high performance computing (HPC) systems. These systems typically consist of uniform, largely parallel compute resources and high bandwidth concurrent file systems interconnected by low latency synchronous networks. HPC systems are managed by batch schedu- lers that order the execution of application jobs to maximize utilization while steering turnaround time. In the past, demands for greater capacity were met by building more power- ful systems with more compute nodes, greater transistor densities, and higher processor operating frequencies. Unfortunately, the scope for further increases in processor frequency is restricted by the limitations of semiconductor techno- logy. Instead, parallelism within processors and in numbers of compute nodes is increasing, while the capacity of single processing units remains unchanged. In addition, HPC systems' memory and I/O hierarchies are becoming deeper and more complex to keep up with the systems' processing power. HPC appli- cations are also changing: the need to analyze large data sets and simulation results is increasing the importance of data processing and data-intensive appli- cations. Moreover, composition of applications through workflows within HPC centers is becoming increasingly important. This thesis addresses the HPC scheduling challenges created by such new systems and applications. It begins with a detailed analysis of the evolution of the workloads of three reference HPC systems at the National Energy Research Supercomputing Center (NERSC), with a focus on job heterogeneity and sche- duler performance. This is followed by an analysis and improvement of a fair- share prioritization mechanism for HPC schedulers. The thesis then surveys the current state of the art and expected near-future developments in HPC hard- ware and applications, and identifies unaddressed scheduling challenges that they will introduce. These challenges include application diversity and issues with workflow scheduling or the scheduling of I/O resources to support appli- cations. Next, a cloud-inspired HPC scheduling model is presented that can accommodate application diversity, takes advantage of malleable applications, and enables short wait times for applications. Finally, to support ongoing sche- duling research, an open source scheduling simulation framework is proposed that allows new scheduling algorithms to be implemented and evaluated in a production scheduler using workloads modeled on those of a real system. The thesis concludes with the presentation of a workflow scheduling algorithm to minimize workflows' turnaround time without over-allocating resources. iii Sammanfattning p˚asvenska M˚angagenombrott i vetenskaplig och industriell forskning st¨odsav simuleringar och ber¨akningarp˚ah¨ogpresterandedatorsystem, s˚akallade HPC-system. Tra- ditionellt best˚ardessa av en stor m¨angdparallella och homogena datorresurser som ¨arsammankopplade med ett synkront n¨atverk med h¨ogbandbredd och l˚ag latens. Resurserna i HPC-systemet hanteras av en schemal¨aggaresom planerar var och n¨arolika applikationer ska exekveras f¨oratt minimera anv¨andarnas v¨antetider och maximera resursnyttjandet. Till f¨orn˚agra˚arsedan skedde den normala utvecklingen av alltmer kraft- fulla HPC-system genom att man ¨okade antalet noder i det parallella systemet samt ¨okade transistordensiteten och klockfrekvensen i processorerna. P˚asenare tid kan man p˚agrund av halvledartekniska begr¨ansningar inte l¨angredra f¨ordel av ¨okade klockfrekvenser. Ist¨allet¨okas kapaciteten genom ¨okad parallellitet i processorer och i form av ¨okat antal datornoder. En annan del av utvecklin- gen ¨aratt HPC-systemens minnes- och I/O-hierarkier blir allt djupare och mer komplexa f¨oratt h˚allaj¨amnasteg med utvecklingen av processorkraft. De ap- plikationer som k¨orsp˚aHPC-systemen f¨or¨andrasocks˚a:behovet att analysera stora datam¨angderoch simuleringsresultat ¨okar betydelsen av datahantering och dataintensiva applikationer. D¨artillhar behovet av till¨ampningari form av l˚angaarbetsfl¨odenist¨alletf¨orenskilda jobb ocks˚a¨okat. Denna avhandling studerar hur denna utveckling skapar nya utmaningar f¨orframtidens HPC-systems schemal¨aggningssystem. Studien inleds med en detaljerad analys av utvecklingen av tre referenssystem vid National Energy Research Supercomputing Center (NERSC) vid Lawrence Berkeley Lab, med fokus p˚ajobbens heterogenitet och schemal¨aggarnasprestanda. Detta f¨oljsav en analys och f¨orb¨attringav en prioriteringsmekanism (s˚akallad fairshare) f¨or schemal¨aggare.D¨arefterpresenteras en studie av nul¨ageoch f¨orv¨antad framtida utveckling vad g¨allerHPC-h˚ardvara och -applikationer. Studien presenterar ett antal viktiga utmaningar f¨orframtida schemal¨aggare,som t.ex. applikation- ernas ¨okade heterogeneitet, behovet av att effektivt schemal¨aggaarbetsfl¨oden samt schedulering av I/O-resurser. D¨arefterpresenteras en modell f¨orsche- mal¨aggningf¨orHPC-system som utvecklats med inspiration fr˚anschemal¨aggn- ing f¨ordatormoln. Avhandlingen presenterar ocks˚aen simulator, tillg¨anglig som ¨oppen k¨allkod, f¨orforskning kring schemal¨aggning. Simulatorn g¨ordet m¨ojligtatt studera nya algoritmer f¨orschemal¨aggningi ett system av produk- tionskarakt¨ar. Vi avslutar med att presentera en algoritm f¨orschemal¨aggn- ing av arbetsfl¨odenmed m˚als¨attningatt minimera anv¨andarnasv¨antetid och minimera resursutnyttjandet. v Preface This thesis contains an introduction that summarizes key concepts that the author considers useful to understand before reading research articles about HPC scheduling, including the definition of and need for high performance computing, basic batch scheduling, and the characteristics of present and future HPC systems together with the associated scheduling challenges. The thesis includes the following papers: Paper I Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ Erik Elmroth, Katie Anty- pas, Richard Gerber, and Lavanya Ramakrishnan. Towards Un- derstanding HPC Users and Systems: A NERSC Case Study. Sub- mitted, 2017. This paper is an extended, joint version of papers IX and X. Paper II Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ and Erik Elmroth. Priority Operators for Fairshare Scheduling. In Proceedings of the 18th Workshop on Job Scheduling Strategies for Parallel Processing, pp. 70-89, Springer International Publishing, 2014. Paper III Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ Erik Elmroth, and Lavanya Ramakrishnan. A2L2: An Application Aware Flexible HPC Sche- duling Model for Low-Latency Allocation. In Proceedings of the 8th International Workshop on Virtualization Technologies in Dis- tributed Computing (VTDC'15), pp. 11-19, ACM, 2015. Paper IV Gonzalo P. Rodrigo, Erik Elmroth, Per-Olov Ostberg,¨ and Lavanya Ramakrishnan. ScSF: A Scheduling Simulation Framework. In Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, Accepted, Springer International Publishing, 2017. Paper V Gonzalo P. Rodrigo, Erik Elmroth, Per-Olov Ostberg,¨ and Lavanya Ramakrishnan. Enabling workflow aware scheduling on HPC sys- tems. Submitted, 2017. vii In addition, the following publications were produced during the author's PhD studies but are not included in the thesis: Paper VI Gonzalo P. Rodrigo. Proof of compliance for the relative operator on the proportional distribution of unused share in an ordering fair- share system. Technical report UMINF-14.14, Ume˚aUniversity, 2014 Paper VII Gonzalo P. Rodrigo. Establishing the equivalence between oper- ators: theorem to establish a sufficient condition for two operators to produce the same ordering in a Fairshare prioritization system. Technical report UMINF-14.15, Ume˚aUniversity, 2014 Paper VIII Gonzalo P. Rodrigo. Theoretical analysis of a workflow aware scheduling algorithm. Technical report UMINF-17.06, Ume˚aUni- versity, 2017 Paper IX Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ Erik Elmroth, Katie Anty- pas, Richard Gerber, and Lavanya Ramakrishnan. HPC System Lifetime Story: Workload Characterization and Evolutionary Ana- lyses on NERSC Systems. In Proceedings of the 24th Interna- tional ACM Symposium on High-Performance Distributed Com- puting (HPDC'15), ACM 2015 Paper X Gonzalo P. Rodrigo, Per-Olov Ostberg,¨ Erik Elmroth, Katie Anty- pas, Richard Gerber, and Lavanya Ramakrishnan. Towards Un- derstanding Job Heterogeneity in HPC: A NERSC Case Study. In Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'16), IEEE/ACM 2016 This material is based upon work supported in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Finan- cial support has been provided in part by the Swedish Government's strategic effort eSSENCE, by the European Union's Seventh Framework Programme under grant agreement 610711 (CACTOS), the European Union's Framework
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages136 Page
-
File Size-