STATE ENGINEERING UNIVERSITY of ARMENIA Department of Computer Systems and Informatics Development of Resource Sharing System Co
Total Page:16
File Type:pdf, Size:1020Kb
STATE ENGINEERING UNIVERSITY OF ARMENIA Department of Computer Systems and Informatics Artem T. HARUTYUNYAN Development of Resource Sharing System Components for AliEn Grid Infrastructure Ph.D. Thesis CERN-THESIS-2010-084 07/05/2010 Scientific adviser: Prof. Ara A. GRIGORYAN Yerevan-2010 CONTENTS INTRODUCTION .............................................................................................. 4 ACKNOWLEDGEMENTS .................................................................................. 10 CHAPTER 1. GRID COMPUTING AND CLOUD COMPUTING ............................. 12 1.1 Grid computing concepts ...................................................................... 12 1.2 The architecture of the Grid .................................................................. 15 1.2.1 End systems....................................................................................... 15 1.2.2 Clusters ............................................................................................. 16 1.2.3 Intranets ........................................................................................... 17 1.2.4 Internets ........................................................................................... 18 1.2.5 Core Grid services ............................................................................... 19 1.3 Implementations of Grid infrastructures and projects ............................... 20 1.3.1 Grid projects worldwide ....................................................................... 20 1.3.2 National Grid initiative in Armenia ......................................................... 26 1.4 Cloud computing concepts .................................................................... 28 1.4.1 Infrastructure as a Service (IaaS) ......................................................... 30 1.4.2 Platform as a Service (PaaS) ................................................................ 31 1.4.3 Software as a Service (SaaS) ............................................................... 32 1.5 Summary of Chapter 1 ......................................................................... 33 CHAPTER 2. ALIEN - GRID INFRASTRUCTURE OF CERN ALICE EXPERIMENT . 34 2.1 CERN ALICE experiment ....................................................................... 34 2.2 Distributed computing architecture of ALICE ........................................... 35 2.3 The architecture of AliEn ...................................................................... 37 2.3.1 AliEn file and metadata catalogue ......................................................... 41 2.3.2 AliEn monitoring system ...................................................................... 44 2.3.3 AliEn Workload Management System (WMS) .......................................... 47 2.4 Problem definition ............................................................................... 51 2.5 Summary of Chapter 2 ......................................................................... 53 CHAPTER 3. DESIGN AND DEVELOPMENT OF GRID BANKING SERVICE MODEL FOR JOB SCHEDULING IN ALIEN ....................................................... 55 3.1 Development of the Grid Banking Service (GBS) model for AliEn WMS ....... 55 3.2 Discrete-event system model for the simulation of the work of AliEn WMS and GBS ........................................................................................................ 57 3.3 The simulation toolkit details ................................................................ 66 2 3.4 Evaluation of the GBS model with the use of simulation toolkit .................. 67 3.5 Integration of the banking service with AliEn WMS .................................. 69 3.6 Summary of Chapter 3 ......................................................................... 77 CHAPTER 4. DEVELOPMENT OF TWO MODELS OF INTEGRATION OF CLOUD COMPUTING RESOURCES WITH ALIEN .......................................................... 79 4.1 CernVM – a virtual appliance for LHC applications .................................... 79 4.2 Nimbus – a toolkit for building IaaS computing clouds.............................. 86 4.3 Development of ‗Classic‘ model for integration of cloud computing resources with AliEn ................................................................................................ 89 4.4 Development of ‗Co-Pilot‘ model for integration of cloud computing resources with AliEn ................................................................................................ 95 4.4.1 Development of Co-Pilot Agent – Co-Pilot Adapter communication protocol 97 4.5 Comparison of ‗Classic‘ and ‗Co-Pilot‘ models. Measurement of their timing characteristics ......................................................................................... 102 4.6 Summary of Chapter 4 ........................................................................ 106 CHAPTER 5. DEVELOPMENT OF SASL BASED SECURITY SYSTEM AND DEMONSTRATION OF THE PORTABILITY OF ALIEN CLIENT PART TO WINDOWS .................................................................................................. 110 5.1 Development of SASL based authentication and authorization system in AliEn ............................................................................................................. 110 5.2 Demonstration of the portability of the client part of AliEn to Windows ...... 113 5.3. Summary of Chapter 5 ....................................................................... 116 BIBLIOGRAPHY ........................................................................................... 118 Appendix A. Glossary of acronyms .............................................................. 132 Appendix B. AliEn site description file for deploying dynamic virtual sites on Nimbus IaaS cloud (“Classic” model) .................................................... 134 Appendix C. Implementation certificate (YerPhI) ....................................... 139 Appendix D. Implementation certificate (CERN) ......................................... 140 Appendix E. Implementation certificate (University of Chicago and Argonne National Laboratory) .................................................................................. 141 3 INTRODUCTION The problem of the resource provision, sharing, accounting and use represents a principal issue in the contemporary scientific cyberinfrastructures. For example, collaborations in physics, astrophysics, Earth science, biology and medicine need to store huge amounts of data (of the order of several petabytes (250 bytes)) as well as to conduct highly intensive computations. The appropriate computing and storage capacities cannot be ensured by one (even very large) research center. The modern approach to the solution of this problem suggests exploitation of computational and data storage facilities of the centers participating in collaborations. The most advanced implementation of this approach is based on Grid technologies, which enable effective work of the members of collaborations regardless of their geographical location. Currently there are several tens of Grid infrastructures deployed all over the world. The Grid infrastructures of CERN Large Hadron Collider experiments - ALICE, ATLAS, CMS, and LHCb which are exploited by specialists from five inhabited continents, are among the largest ones. A decade of extensive exploitation of Grid resources by various scientific communities has revealed the following problems: Need in an appropriate coordination of the resource usage and accounting for the resources. Necessity of the increase of the computing and storage capacity of Grid by a seamless integration of external resources. Minimization of the work of resource administrators on the maintenance and support of specific application software required by different scientific communities. Need in a secure access to resources on the base of different authentication mechanisms. 4 This dissertation is devoted to the solution of aforementioned problems within the Grid infrastructure of ALICE experiment at CERN Large Hadron Collider (LHC), called AliEn (ALICE Environment on the Grid). AliEn is a set of Grid middleware and application tools and services which are exploited by ALICE collaboration to store and analyze the experimental data, as well as to perform Monte-Carlo simulations. AliEn uses computing and data storage facilities of member institutions from Europe, Asia, Americas and Africa, about 100 centers overall1. Yerevan Physics Institute participates in the ALICE collaboration since 1994. Objectives of the work are: Design and development of a model for the coordination and accounting of the use of resources in the AliEn Workload Management System. Design and development of a model for seamless integration of external resources provided using Cloud Computing technologies. Design and development of an authentication and authorization framework for access to the AliEn Grid resources. Demonstration of the portability of AliEn code to different Operating Systems. The main results of the work are: Development and implementation of a model of Grid Banking Service for job scheduling in Workload Management System of AliEn. The service provides a flexible job scheduling scheme which is based on the collaborative use of resources, where for the execution of jobs users ‗pay‘ from their ‗bank accounts‘ to the sites where their jobs were executed. 1 AliEn is also exploited by other physics collaborations: Panda and CBM at GSI (Gesellschaft