PoS(ACAT08)046 , a http://pos.sissa.it/ , Katarina Pajchel a † a , Farid Ould-Saada a , Adrian Taga a , Aleksandr Konstantinov , Bjørn Samset ab a ∗ [email protected] Speaker. This work was partly supported by the Nordic Data Grid Facility and EU KnowARC (contract nr. 032691). The NorduGrid collaboration and itsnector), middleware span product, institutions in ARC Scandinavia (the andworld. Advanced several other The Resource countries innovative Con- nature in of Europe the andideal ARC the choice design rest to and of connect flexible, the lightweight heterogeneous distribution distributedcations make resources alike. it for ARC an use has by been HEP usedhas and by been non-HEP scientific hardened appli- projects and refined for to many a years reliable,architecture and efficient and software through design product. experience of In ARC it this and paper show we howfacilitates present ARC’s taking simplicity the advantage eases application of integration distributed and resources.some Example results applications from are one shown particular along application, with experiment, simulation as production an and illustration analysis of for ARC’sability the common to ATLAS usage handle today. significant These fractions results ofwell demonstrate the into ARC’s computing the needs future. of the LHC experiments today and ∗ † Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike Licence. c

XII Advanced Computing and Analysis TechniquesNovember in 3-7, Physics 2008 Research Erice, Italy David Cameron The Advanced Resource Connector for Distributed LHC Computing Department of Physics, University of ,Nordic P.b. Data 1048 Grid Blindern, Facility, N-0316 Kastruplundgade Oslo, 22, E-mail: DK-2770 Kastrup, Alexander Read a b PoS(ACAT08)046 David Cameron ). 3 2 In ARC, a major effort was put into making the resulting Grid system stable by design by An important requirement from the very start of ARC development was to keep the middleware Perhaps the most important requirement that affects ARC architecture stems from the fact Another important requirement is that of a simple client that can be easily installed at any NorduGrid [8] is a Grid Research and Development collaboration aiming at development, avoiding centralised services, for example forthe architecture: job brokering. There are three main components in portable, compact and manageable both on therequirements server is and the that client the side. system The must basic be set efficient, of reliable, high-level scalable, secure andthat simple. most user tasksof deal input with data massive is datadata available processing. via as a A a POSIX typical result open jobin of call. expects size. its that activity. Moreover, The a such most Theseto certain a challenging input set ensure job task and that produces of output the another theThe data set necessary Grid only might of input is adequate well is approach thus reach staged is notthe several to just in computing Gigabytes include to for resource massive find it, front-end data freefunctionality and staging functionality. into cycles all and the for disk Grid the The a front-end space output decision job,reliability is management of is to one but into the of properly include middleware the and staged job optimizing cornerstones out. it of data for the management massive architecture, data adding processing to tasks. location high by any user. Such aenvironment client and must provide yet all remain the functionality smallGrid necessary job in to submission, size monitoring work and in and manipulation, the simplejobs the Grid possibility with in to minimal declare installation. overhead, and and submit means The batchesfulfils to of these functionality work requirements, with includes exposing the the data functionality locatedand through on Python a the Application Grid. Command Programming Line The Interfaces Interface, ARC (see and client Section C++ 2. ARC Architecture and Design maintenance and support of thetor free (ARC) Grid [4]. middleware, known Institutes asand and the contribute projects Advance to from Resource ARC. several Connec- the ARC European was countries participate first wishing in conceived toGlobus NorduGrid in harness Toolkit 2001, Grid [2], computing to UNICORE resources. meet [1]features the Existing and or projects urgent the were such needs scheduled European as of to DataGridlaboration the finish users [3] started too in to were far develop either its in missing ownGlobus the Grid required Toolkit. future middleware to By solution, be 2002, built useful. on ARCsands top was of Hence of already jobs the components in NorduGrid performing of use col- the sites Monte-Carlo in in simulation several a countries. for production These the environment jobssince have managing ATLAS been then, [5] thou- running along almost experiment with non-stop jobs across at fromamount an several other increasing of scientific scale computing communities, ever in resources parallel managed to25000 by the CPUs ARC. rapid in At growth more in the than the 50 end sites of across 2008, 13 ARC countries was [9]. managing over The Advanced Resource Connector for Distributed LHC Computing 1. Introduction PoS(ACAT08)046 y r e o t c re put n Di de m io ss o ce Se i No C v David Cameron r e maintain the list of ting s S M he R L Compu a Cac t Index Services Da

Grid t u p Manager n I rs e d i v 3 o r g) y ory / r o P o f t S TP ( In ), which is implemented as a GridFTP-Grid Manager P er irec P S v A ver 1 I T r dF rol Direct L LD i n D Ser T r io se ion H t G

a Sess r t s i Job Cont or g e R serves as the “nervous system” of the Grid. It is implemented as (see Figure is the “brain” of the Grid and the Grid user’s interface. It is enabled e er er d ad d a o o nlo l p n w u

do Head Information System ARC server architecture, showing the set of services which enable a resource. Brokering Client Computing Service services Indexing LDAP infosys query pair of services. The Grid Managerresource’s is (usually the a “heart” cluster of of the computers) Grid,to front-end instantiated the as at computing a each resource new computing through service. a Itface GridFTP serves to channel. as the a The Local gateway Grid Resource Manager Management providesmanagement an System (download inter- (LRMS), and facilitates upload job of manipulation,data), input data and and allows for output accounting data, and and other essential local functions. caching of popular a hierarchical distributed database: theservice information (computing, is storage) produced and inComputing stored the Service), locally Local while for Information the each hierarchically System connected (LIS, contained within the known resources. with powerful resource discovery and brokering capabilitiesthe to distribute Grid. the It workload across also provides clientand functionality installable for by all any user the in Grid an services, arbitrary and location yet in is a lightweight matter of few minutes. stage out Stage in & The major user of ARC is currently the ATLAS experiment, performing Monte-Carlo gener- This combination of components gives two distinct advantages over other Grid middleware management to stdout/stderr Stage in & access 1. The 2. The 3. The Job submission & Figure 1: The Advanced Resource Connector for Distributed LHC Computing designs. Firstly, no Grid data managementstorage is and performed data on indexing services the is computingbroker controlled nodes, service rather and between than so the chaotic. access client to Secondly, andclient. there the is resources no - resource all resource brokering is done by each3. user’s Application Integration with ARC PoS(ACAT08)046 2 David Cameron 4 ATLAS use of ARC Python API. Figure 2: # import ARC API * . The green bars represent successfully completed jobs and red bars represent jobs 3 The efficiency of these jobs (ratio of successfully completed jobs to total submitted jobs) Using ARC has proven very successful for ATLAS. In 2008 just over one million jobs ran Firstly the information system is queried for a list of possible job submission targets. Then These jobs typically run from 30 minutes to 24 hours depending on the type of simulation. from arclib import clusters = GetClusterResources()queuelist # = get GetQueueInfo(clusters, list ...) of #xrsl clusters get = from queues ’&(executable="/bin/sh")...’X infosys on # = each construct Xrsltargetlist cluster job (xrsl) = description ConstructTargetstargetlist (queuelist, = X) PerformStandardBrokering # (targetlist) job #submitter submission rank = targets the JobSubmissionjobid targets (X, = targetlist) submitter.Submit() # # job submitjobinfos submitter job = object GetJobInfo(jobids, ...) # check status which failed. was 92.4%, and the walltimetime efficiency used) (ratio of was CPU 97.1%. timeATLAS. This used Both can for be these attributed successful to values jobs two are to main factors far total - CPU higher that data than transfer either does not of use CPU the cycles other and Grids used by on ARC resources, around 10%Dashboard of the [6]. total The ATLAS jobs number runseen of worldwide, in jobs as Figure run shown on per the day ATLAS from mid-March to mid-December 2008 can be the job description is constructedextended version in of the the eXtended Globus Resource RSLagainst Specification [7]. the Language possible Brokering (XRSL), is targets an performed andstatus by producing can matching be a the polled. ranked job The list. description easeto of take The use advantage of job of the is Grid ARC resources. then API makes submitted, it and suitable later for any the application wishing 4. Results shows the basic Python code that is used to interact with ARC. All the interactions with jobsof call-outs on to NorduGrid prepare are jobs, handledwritten to in by check Python an their and “executor” status, interacts which and via performs to the post-process a Python the set API results. of The the executor NorduGrid is ARC middleware. Figure The Advanced Resource Connector for Distributed LHC Computing ation and reconstruction of simulatedand data the Grid in infrastructure order for to theof real prepare up data and to which test tens is of due both to thousands theteams arrive of and ATLAS in Monte assigned software 2009. Carlo to Physics simulation one tasks or of consisting reconstructionNordic the jobs countries 3 are Grids have agreed used defined to by by ATLAS provide physics roughlythe 5.6% according ATLAS of experiment, to the but their required have capacities consistently computing (the provided and more storage than resources 10% for over the last 2 years). PoS(ACAT08)046 , Future . . Unicore David Cameron International Journal of . Journal of Grid Computing . 5 , 23:219–240, 2007. Globus: A Metacomputing Infrastructure Toolkit , 11(2):115–128, 1997. ATLAS ARC jobs completed per day for 9 months in 2008. Unicore Plus Final Report - Uniform Interface to Computing Resources The EU DataGrid Setting the Basis for Production Grids Figure 3: et al. Advanced Resource Connector middleware for lightweight computational Grids This paper has presented the ARC middleware and shown how simple it is to use distributed http://www.globus.org/toolkit/docs/2.4/gram/rsl_spec1.html. http://dashb-atlas-prodsys.cern.ch/dashboard/request.py/summary. http://atlas.web.cern.ch. Supercomputer Applications 2(4):299–400, 2004. Generation Computer Systems Forum, Forschungszentrum Julich, 2003. [8] The NorduGrid Collaboration. Web site. URL[9] http://www.nordugrid.org. The NorduGrid Monitor. Web site. URL http://www.nordugrid.org/monitor/. [7] The Globus Alliance. The Globus Resource Specification Language RSL v1.0. Web site. URL [6] The ATLAS Dashboard. Web site. URL [3] E. Laure et al. [4] M Ellert [5] The ATLAS Collaboration. ATLAS - A Toroidal LHC ApparatuS. Web site. URL [2] I. Foster and C. Kesselman. resources to obtain scientificcations, results. ARC Although is now initially being designedimaging used for to by high-energy bioinformatics. scientific physics projects appli- ARC infocussed a itself on variety enabling continues of interoperability to between diverse evolve, ARC fields,services. and and from However, other the a medical Grids simple large through and amount standards-based lightweightan of web design ever-expanding number will effort of remain, is scientific and now communities. continue to be attractive to References [1] D. Erwin, editor. hence data transfer problems do notexperienced NorduGrid contribute community to is walltime fast loss, and and effective that at the reacting close-knit to and any highly problems5. that do Conclusion occur. The Advanced Resource Connector for Distributed LHC Computing