Performance of the Nordugrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2

Performance of the Nordugrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2

Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2 Sturrock, R.; Eerola, Paula; Konya, Balazs; Smirnova, Oxana; Lindemann, Jonas; et, al. Published in: CERN-2005-002 2005 Link to publication Citation for published version (APA): Sturrock, R., Eerola, P., Konya, B., Smirnova, O., Lindemann, J., & et, A. (2005). Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2. In A. Aimar, J. Harvey, & N. Knoors (Eds.), CERN- 2005-002 (Vol. 2, pp. 1095-1098). CERN. Total number of authors: 6 General rights Unless other specific re-use rights are stated the following general rights apply: Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Read more about Creative commons licenses: https://creativecommons.org/licenses/ Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. LUND UNIVERSITY PO Box 117 221 00 Lund +46 46-222 00 00 Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2 Robert Sturrock, University of Melbourne, Australia Henrik Jensen, Josva Kleist, Alborg˚ University, Denmark John Renner Hansen, Jakob Langgaard Nielsen, Anders Wa¨an¨ anen,¨ NBI, Copenhagen, Denmark Daniel Kalici, University of Southern Denmark, Odense, Denmark Arto Teras,¨ CSC, Helsinki, Finland Helmut Heller, Leibniz-Rechenzentrum, Munich, Germany John Kennedy, Gunter¨ Duckeck, Ludwig-Maximilians-Universitat,¨ Munich, Germany Aleksandr Konstantinov, University of Oslo, Norway and Vilnius University, Lithuania Jan-Frode Myklebust, University of Bergen, Norway Farid Ould-Saada, Katarzyna Pajchel, Alex Read, Haakon Riiser, University of Oslo, Norway Morten Hanshaugen, Sturle Sunde, USIT, University of Oslo, Norway Andrej Filipciˇ c,ˇ Matevzˇ Tadel, Jozefˇ Stefan Institute, Ljubljana, Slovenia Leif Nixon, NSC, Linkoping¨ University, Sweden Paula Eerola, Balazs´ Konya,´ Oxana Smirnova, Lund University, Sweden Jonas Lindemann, LUNARC, Lund University, Sweden Lars Malinowsky, Nils Smeds, PDC, KTH, Stockholm, Sweden Ake˚ Sandgren, Mattias Wadenstein, HPC2N, Umea˚ University, Sweden Tord Ekelof,¨ Uppsala University, Sweden Christian Haberli,¨ University of Bern, Switzerland Mattias Ellert∗, CERN, Geneva, Switzerland Abstract dedicated resources across the world in order to enable ef- fective participation in ATLAS DC2. This was the first at- This talk describes the various stages of ATLAS Data tempt to harness large amounts of strongly heterogeneous Challenge 2 (DC2) in what concerns usage of resources resources in various countries for a single collaborative ex- deployed via NorduGrid’s Advanced Resource Connector ercise using Grid tools. This talk addresses various issues (ARC). It also describes the integration of these resources that arose during different stages of DC2 in this environ- with the ATLAS production system using the Dulcinea ex- ment: preparation, such as ATLAS software installation; ecutor. deployment of the middleware; and processing. The results ATLAS Data Challenge 2 (DC2), run in 2004, was de- and lessons are summarised as well. signed to be a step forward in the distributed data process- ing. In particular, much coordination of task assignment to resources was planned to be delegated to Grid in its differ- ent flavours. An automatic production management system ATLAS DATA CHALLENGE 2 was designed, to direct the tasks to Grids and conventional The ATLAS [1] collaboration, in preparation for the resources. startup date of its experiment at the Large Hadron Collider The Dulcinea executor is a part of this system that pro- (LHC) [2] at CERN in 2007, is conducting a series of so vides interface to the information system and resource bro- called Data Challenges. These exercises are intended to kering capabilities of the ARC middleware. The execu- verify the capability of the collaboration to handle the com- tor translates the job definitions received from the supervi- putational effort needed to analyse the data collected by its sor to the extended resource specification language (XRSL) experiment. The Data Challenges have been designed to used by the ARC middleware. It also takes advantage of the gradually increase in size to eventually reach a complex- ARC middleware’s built-in support for the Globus Replica ity equal to the computational effort foreseen to be needed Location Server (RLS) for file registration and lookup. when the experiment starts acquiring data. During 2004 the NorduGrid’s ARC has been deployed on many ATLAS- second of these exercises, ATLAS Data Challenge 2 (DC2), ∗ [email protected] was carried out. THE ATLAS PRODUCTION SYSTEM In order to handle the task of ATLAS DC2 an automated production system [3] was designed. This production sys- tem consists of several parts: a database for defining and keeping track of the computing tasks to be done, the Don Quijote data management system [4] for handling the input and output data of the computations, the Windmill super- visor program that was in charge of distributing the tasks between various computing resources and a set of execu- tors responsible for carrying out the tasks. By writing vari- ous executor the supervisor could be presented with a com- mon interface to each type of computing resource avail- able to the ATLAS collaboration. Executors were written to handle resources on the LHC Computing Grid [5], Grid 3 [6, 7], NorduGrid’s ARC and various legacy batch systems [8]. During ATLAS DC2 the three Grid flavours carried out about one third of the total computational task each. The subject of this paper is the executor written for NorduGrid’s ARC, called Dulcinea, and the part of DC2 that was carried out with it. NORDUGRID’S ARC The Advanced Resource Connector (ARC) [9] is a Grid middleware developed by the NorduGrid collaboration Figure 2: This map shows the geographical location of the [10]. It is built on top of the libraries provided by the ARC enabled sites. Globus toolkit [11]. An overview of the ARC middleware components can be found in Figure 1. for the tasks to be performed, as well as by various mon- itoring tools like the NorduGrid Grid Monitor (see Figure 3). NorduGrid’s ARC is fully integrated with the Globus Replica Location Service (RLS) [14]. This allows jobs sent to an ARC enabled site to specify input file locations as an RLS catalogue entry instead of a physical file location. A job can also, if desired, automatically register created out- put files in an RLS catalogue. THE DULCINEA EXECUTOR The ATLAS production system executor for Nor- Figure 1: An overview of the ARC architecture showing duGrid’s ARC, Dulcinea, was implemented as a C++ the different components of the middleware. shared library. This shared library was then imported into the production system’s python framework. The executor NorduGrid’s ARC has been deployed at a number of calls the ARC user interface API and the Globus RLS API computing resources around the world (see Figure 2). to perform its tasks. These resources are running various Linux distributions The job description received from the Windmill super- and use several different local resource management sys- visor in form of an XML message was translated by the tems (LRMS). Although various flavours of PBS are most Dulcinea executor into an extended resource specification common, there are sites running SGE, Easy and Condor as language (XRSL) [15] job description. This job description well. Using different LRMS specific information providers was then sent to one of the ARC enabled sites, selecting the different sites can present the information about their a suitable site using the resource brokering capabilities of available resources in a uniform way in ARC’s informa- the ARC user interface API. In the brokering, among other tion system [12]. This information system is based on the things, the availability of free CPUs and the amount of data Globus Monitoring and Discovery Service (MDS). The in- needed to be staged in on each site to perform a specific formation provided is used by the brokering algorithm in task is taken into account. the ARC user interface API [13] to find suitable resources The look-up of input data files in the RLS catalogue and cinea executor request the RTE in its XRSL job description only sites with the proper software installation was consid- ered in the brokering. Of the computing resources connected through Nor- duGrid’s ARC 22 sites had installed the ATLAS software needed to run DC2 (see Table 1). In total these sites had approximately 3,000 CPUs, of which approximately 700 were dedicated to ATLAS DC2. During the first weeks of ATLAS DC2 we had severe problems due to the Globus RLS server hanging. This caused jobs to fail either at stage-in, because the location of input files could not be looked up in the catalogue, or at stage-out because the location of the output files could not be registered. Of these the first kind of failure was not that serious, since no CPU time was lost for these jobs, while the second kind was serious, because the CPU time for those jobs had been wasted. A few weeks into ATLAS DC2 a patch from the developers of the RLS was available that solved this problem. Apart from this, most failures were due to site specific hardware problems. A storage element going down meant that all jobs requiring an input file stored on this particular storage element would fail at stage-in.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us