Jakub T. Mościcki Understanding and Mastering Dynamics

UvA-DARE (Digital Academic Repository) Understanding and mastering dynamics in computing grids: processing moldable tasks with user-level overlay Mościcki, J.T. Publication date 2011 Document Version Final published version Link to publication Citation for published version (APA): Mościcki, J. T. (2011). Understanding and mastering dynamics in computing grids: processing moldable tasks with user-level overlay. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:24 Sep 2021 Understanding and Mastering Dynamics in Computing Grids Jakub T. Mościcki is a researcher and software engineer at CERN, Geneva, Switzerland. He obtained the MSc in Computer Science from the AGH University of Science and Technology in Kraków, Poland. He is a lead developer of the Ganga project and creator of the DIANE framework, which are used to support very large LHC user communities as well as users of multidisciplinary applications in theoretical physics, medical and radiation studies, bio-informatics, drug design, telecommunications. His research interests focus on scheduling and management of distributed and parallel applications, large-scale computing infrastructures such as grids, and various forms of High Throughput and High Performance Computing. Understanding and Mastering Thousands of scientific users witness every day inherent instabilities and bottlenecks of large-scale task processing systems: lost or incomplete jobs and hard-to-predict completion times. They are struggling to resubmit failed jobs and get consistent results. And it is always Dynamics in Computing Grids difficult to catch up with latest deployed software environments or system configurations. In addition, the users have often more than one system to deal with: they continue to use locally available computing power (a desktop PC, a nearby computing center, a small cluster Processing Moldable Tasks next door) while exploiting global resources such as grids. On top of this, grids use a large variety of middleware stacks, which are customized in different ways by user communities. Quality of Service and usability are the two keywords probably most frequently echoed in with User-Level Overlay the corridors of many "grid-enabled" research labs. Jakub T. Mościcki This PhD dissertation presents scientific research from the problem statement, system analysis, modeling and simulation, to validation through experimental results. It captures and characterizes complexity and dynamics of global task processing systems using as an example the largest scientific grid to date - the EGEE/EGI Grid. A task processing model developed in this work allows to rigorously explain why the late-binding method is superior to traditional task scheduling based on early binding. A study of statistical properties of task processing times is complemented by Monte Carlo simulation. This book is also addressed to grid practitioners: developers and users. Presenting several successful application examples from diverse domains, it explains how heterogeneity and dynamics of global task processing systems may be addressed and mastered in a cost- effective way directly by the users. It describes a User-level Overlay, based on two software packages, Ganga and DIANE, which are ready to use with little or no customization for your application. Advanced resource selection strategies and scheduling approaches developed in this book may be reused in your environment. Jakub T. Mościcki 5 800054 694140 Understanding and Mastering Dynamics in Computing Grids: Processing Moldable Tasks with User-Level Overlay ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof. dr. D.C van den Boom ten overstaan van een door het college voor promoties ingestelde commissie, in het openbaar te verdedigen in de Agnietenkapel op dinsdag 12 april 2011, te 12:00 uur door Jakub Tomasz Mo´scicki geboren te Kraków, Polen. Promotiecommissie Promotor: Prof. Dr. Marian T. Bubak Co-promotor: Prof. Dr. Peter M.A. Sloot Overige leden: Prof. Dr. Hamideh Afsarmanesh Dr. ir. Alfons G. Hoekstra Dr. Juergen Knobloch Prof. Dr. ir. Cees Th.A.M. de Laat Prof. Dr. Krzysztof Zielinski Faculteit: Faculteit der Natuurwetenschappen Wiskunde en Informatica This work makes use of results produced by the Enabling Grids for E-sciencE project, a project co-funded by the European Commission (under contract number INFSO-RI- 222667) through the Seventh Framework Programme. EGEE brings together 91 part- ners in 32 countries to provide a seamless Grid infrastructure available to the European research community 24 hours a day. Full information is available at www.eu-egee.org and www.egi.eu The book cover uses Stacy Reed's Chaos From Order artwork which she kindy shared with me for this purpose. Stacy is a diverse artist who enjoys exploring chaos through fractal applications. Chaos From Order represents the notion that in evolution, chaotic shapes and patterns, mutations, abnormalities and anomalies emerge over time, from what was once, in this case, a perfect mathematical form. More of her artwork can be viewed by visiting www.shedreamsindigital.net. Author contact: [email protected] Printed by lulu.com Copyright © 2011 Jakub T. Mo´scicki Table of Contents 1 Motivation and research objectives 1 1.1 Distributed applications: common patterns and characteristics . 2 1.2 Infrastructures for scientific computing ................... 8 1.3 Higher-level middleware systems ...................... 9 1.4 User requirements .............................. 13 1.5 The research objectives and roadmap .................... 15 2 Dynamics of large computing grids 19 2.1 EGEE { world's largest computing and data Grid . 19 2.2 Grid as an infrastructure ........................... 22 2.3 Grid as a task processing system ...................... 27 2.4 Summary ................................... 39 3 Analysis and modeling of task processing with late binding on the Grid 41 3.1 Introduction .................................. 41 3.2 Task processing model ............................ 42 3.3 Distribution of job queuing time ...................... 44 3.4 Simulation of task processing models .................... 48 3.5 Summary ................................... 57 4 Development of the User-level Overlay 59 4.1 Vision ..................................... 60 4.2 Functional breakdown and architecture ................... 62 4.3 DIANE and Ganga software packages ................... 63 4.4 Operation of the User-level Overlay ..................... 64 4.5 The DIANE task coordination framework . 66 4.6 The Ganga resource access API and user interface . 73 iv TABLE OF CONTENTS 4.7 Heuristic resource selection ......................... 80 4.8 Adaptive workload balancing ........................ 85 4.9 Summary ................................... 89 5 User-level Overlay in action 91 5.1 Monte Carlo simulation with Geant4 toolkit . 92 5.2 Workflows for medical imaging simulations . 99 5.3 Data processing for ATLAS and LHCb experiments . 102 5.4 Massive molecular docking for Avian Flu . 103 5.5 Other examples of using DIANE/Ganga overlay . 105 5.6 Summary ................................... 106 6 Capability computing case study: ITU broadcasting planning 109 6.1 Introduction .................................. 109 6.2 Broadcasting planning process . 110 6.3 Compatibility analysis ............................ 111 6.4 Implementation of grid-based analysis system for the RRC06 . 113 6.5 Analysis of task processing . 115 6.6 Summary ................................... 120 7 Capacity computing case study: LatticeQCD simulation 121 7.1 Introduction .................................. 121 7.2 Problem to be solved ............................. 122 7.3 Simulation model ............................... 123 7.4 Implementation and operation of the simulation system . 125 7.5 Task scheduling and prioritization . 130 7.6 Analysis of adaptive resource selection . 137 7.7 Exploiting low-level parallelism for finer lattices . 139 7.8 Summary ................................... 140 8 Conclusions and future work 143 8.1 Grid dynamics and its consequences for task processing . 143 8.2 Contributions of this work . 144 8.3 Open issues .................................. 146 8.4 Future work .................................. 147 8.5 Postscriptum ................................. 148 Bibliography 164 Summary 165 Nederlandse samenvatting 167 Streszczenie po polsku 169 Publications 171 TABLE OF CONTENTS v Acknowledgments 175 Index 177 vi TABLE OF CONTENTS List of Abbreviations AWLB Adaptive Workload Balancing CDF Cumulative Probability Distribution Function CE Computing Element CERN European Laboratory for

Jakub T. Mościcki Understanding and Mastering Dynamics

Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++

Parallel Computing a Key to Performance

Control Replication: Compiling Implicit Parallelism to Efficient SPMD with Logical Regions

Regent: a High-Productivity Programming Language for Implicit Parallelism with Logical Regions

Implicit Parallelism, Mixed Approaches

Lawrence Berkeley National Laboratory Recent Work

14. Parallel Computing 14.1 Introduction 14.2 Independent

Implicitly-Multithreaded Processors

Parallel Programming – Multicore Systems

MT 2015 21011433 8070.Pdf

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT (Revision 1.4 — June 2007 Release)

SCS5108 Parallel Systems Unit II Parallel Programming