Experience with LCG-2 and Storage Resource Management Middleware Dimitrios Tsirigkas September 10Th, 2004
Total Page:16
File Type:pdf, Size:1020Kb
¡£¢¥¤§¦ ¨ © Experience with LCG-2 and Storage Resource Management Middleware Dimitrios Tsirigkas September 10th, 2004 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2004 Authorship declaration I, Dimitrios Tsirigkas, confirm that this dissertation and the work presented in it are my own achievement. 1. Where I have consulted the published work of others this is always clearly attributed; 2. Where I have quoted from the work of others the source is always given. With the exception of such quotations this dissertation is entirely my own work; 3. I have acknowledged all main sources of help; 4. If my research follows on from previous work or is part of a larger collaborative research project I have made clear exactly what was done by others and what I have contributed myself; 5. I have read and understand the penalties associated with plagiarism. Signed: Date: Matriculation no: Abstract The University of Edinburgh is participating in the ScotGrid project, working with Glasgow and Durham to create a prototype Tier 2 site for the LHC Computing Grid (LCG). This requires that LCG-2, the software release of the LCG project, has to be installed on the University hardware. Being a site that will mainly provide storage, Edinburgh is also actively involved in the devel- opment of ways to interface such resources to the Grid. The Storage Resource Manager (SRM) is a protocol for an interface between client applications and storage systems. The Storage Re- source Broker (SRB), developed at the San Diego Supercomputer Center (SDSC), is a system that can be used to manage distributed storage resources in Grid-like environments. In this report, we will describe work done during a period of sixteen weeks, in the context of an MSc in High Performance Computing. The first part of the work involved helping to set up LCG software at the Edinburgh ScotGrid site and to monitor the hardware using the Ganglia distributed monitoring system. The second part of the work aimed at the development of an interface between the SDSC Storage Resource Broker and an implementation of the SRM specification, which was developed at Lawrence Berkeley National Laboratory (LBNL). Acknowledgements I would like to thank James Perry and Philip Clark for supervising my dissertation. I am also grateful to Alasdair Earl and Steve Thorn, who offered a great deal of both practical help and information in the course of my project. Paul Walsh should also be thanked for helping me write the Latex file for this document. Contents List of Figures iv 1 Introduction 1 2 Background on Grid Computing 3 2.1 Virtual Organisations . 3 2.2 Grid Computing . 3 2.2.1 Security . 4 2.2.2 Information . 5 2.2.3 Data and Storage Resources Management . 6 2.2.4 Job and Computing Resources Management . 6 2.3 Grid Projects . 7 2.3.1 Globus . 7 2.3.2 The European Data Grid . 7 3 LCG, GridPP and ScotGrid 9 3.1 The Large Hadron Collider - why Grid Technologies? . 9 3.2 The LCG Project . 10 3.3 Status and near future of the LCG Project . 10 3.4 GridPP . 11 3.5 ScotGrid . 11 3.5.1 ScotGrid Hardware in Edinburgh . 11 4 LCG-2 13 4.1 Interaction with the user and the applications . 13 4.2 Interaction with the resources . 13 4.3 Security . 14 4.4 Information System . 14 4.5 Job Management . 15 4.5.1 The Job Description Language . 15 4.5.2 Command line tools . 15 4.6 Data Management . 15 4.6.1 File names . 16 4.6.2 Command line tools . 16 4.7 Relevance to the Dissertation . 16 i 5 LCFGng 18 5.1 The architecture of LCFG . 18 5.1.1 Source Files . 19 5.1.2 Profiles . 19 5.1.3 Components . 20 5.2 LCFG and LCG-2. Relevance to the Dissertation . 20 5.2.1 Installing LCFG . 20 5.2.2 Installing LCG . 22 5.2.3 Relevance to the Dissertation . 23 6 Monitoring with Ganglia 25 6.1 The Ganglia Architecture . 25 6.2 Metrics . 26 6.3 Transmitting and Storing Monitoring Information . 27 6.3.1 Messages on the multicast channel . 27 6.3.2 XML messages . 27 6.3.3 Storing Monitoring Information . 27 6.4 The PHP Front end . 28 6.5 Using Ganglia for the Scotgrid Hardware. Relevance to the Dissertation . 28 6.5.1 Using LCFG to configure Ganglia . 28 6.5.2 The near future . 29 6.5.3 Monitoring Examples . 30 7 SRM and SRB 35 7.1 SRM . 35 7.1.1 SRM file and storage space types . 35 7.1.2 SRM functionality . 37 7.1.3 File pinning . 38 7.1.4 LCG-2 and SRM . 39 7.2 The LBNL SRM . 39 7.3 The SDSC SRB . 39 7.3.1 The SRB architecture . 40 7.3.2 The Metadata Catalogue . 41 7.3.3 The S-commands . 42 7.3.4 The SRB client API . 42 7.4 An interface between SRM and SRB . 42 7.4.1 Installing SRB from source . 43 7.4.2 Installing SRM from source . 43 7.4.3 Thoughts on an Interface . 44 8 Summary and Conclusions 46 8.1 Summary of the MSc Project . 46 8.2 Post Mortem . 47 8.2.1 General Issues . 47 8.2.2 Specific Issues . 47 8.2.3 Final thoughts . 48 Appendices 48 A leak.c 49 B Index of Acronyms 51 Bibliography 53 List of Figures 2.1 The structure of the EDG project. Image taken from [3]. 8 3.1 A logical layout of ScotGrid. Image taken from [6]. 12 5.1 The LCFG architecture . 19 6.1 Ganglia cluster hierarchies . 26 6.2 Ganglia on the Edinburgh front end . 29 6.3 The leak program output, a terminal running the top utility and the Ganglia webpage. The program claims to have allocated 1253MB, the top utility gives a value of 1.3 GB and the value shown on the Ganglia Graph for the total memory usage is approximately 1.4GB. 31 6.4 Memory usage, free memory and free swap memory. We notice that the first two graphs are consistent. The third graph shows the usage of swap memory for the second run. 32 6.5 The Ganglia page for Glenmorangie shortly after the file transfer. The start of the file transfer resulted in a quick rise in memory usage. When all the memory was used, Glenmorangie only received and processed data as quickly as it could write it on the hard disk. The result was a drop in CPU and network activity. 33 6.6 As Glenmorangie receives packets of data it sends confirmation packets back to Glenkinchie. Almost 20 minutes after the transfer started and with almost 5GB transfered, the process was stopped, since there was not enough disk space on Glenellen to hold a 21GB file. 34 7.1 SRM filetypes. Image taken from [22]. 36 7.2 SRM and file transfers. Image taken from [17]. 38 iv 7.3 DRM, TRM and HRM and the systems they interface to the Grid. Image taken from [21]. 40 7.4 The SRB architecture. Image taken from [20]. 41 Chapter 1 Introduction When the Large Hadron Collider (LHC) comes online in 2007, it will become the largest el- ementary particle accelerator ever to have operated in the world. Four experiments will be conducted on the LHC and the data generated will scale to Petabytes. Managing this data effi- ciently across a world wide network of collaborating institutes and universities is a challenge, which the particle physics community has chosen to address using Grid computing. The Uni- versity of Edinburgh is one of the major contributors to this effort. It possesses substantial storage resources to be used for storing LHC data and is currently in the process of connecting them to a prototype Grid being set up by a number of institutes in the UK. This document details the work completed as part of a dissertation project for the MSc in High Performance Computing at EPCC. The project had two main parts. The first part involved work on setting up and configuring the Edinburgh Grid site. The goal of the second part was to create an interface between two pieces of middleware used to manage storage resources in a distributed environment. The contents of the chapters following this introduction are summarised bellow. Chapter 2 provides a background in Grid computing. All the necessary concepts in under- standing the following chapters can be found here. There is also a brief description of two Grid related projects that are very relevant to this work, Globus and the European DataGrid. In Chapter 3 we explain why modern experimental particle physics can benefit from Grid Computing. We then introduce three related projects. The LHC Computing Grid, GridPP and ScotGrid. The LHC Computing Grid is a successor of the European DataGrid and aims at utilising Grid technologies to address the computing needs of the Large Hadron Colllider experiments. GridPP is an effort to produce the infrastructure and deploy the tecnhology for the creation of a Particle Physics Grid in the UK and ScotGrid is the subset of GridPP that refers to Scottish Grid sites. Describing LCG-2, the latest release of the LHC Computing Grid is the main purpose of Chap- ter 4. We will see how LCG-2 attempts to address the challenges associated with any Grid 1 2 project and provide an outline of how it can be used. LCFGng is a piece of software that was developed at.