PoS(ISGC 2013)010 PoS(ISGC 2013)010

sa.it set

a

interfaces http://pos.sis the operations

integration of resources such as PRACE and EUDAT. relies on the operational ShareAlike ShareAlike Licence. - operational interfaces are being extended in NonCommercial -

the

. The integration

resource infrastructures : configuration management, monitoring, EGI

Netherlands

faces faces that enable efficient management of the distributed computing

the the tools, interfaces and procedures that allow the integration other with software stacks into

the the 1

to enable to

Speaker

[email protected] [email protected]

This paper describes based on different of the tools that support daily operations dashboard, support and accounting. In parallel order The European Grid Infrastructure (EGI) is a provides than 320,000more CPUs and 152logical distributed PB space disk of and data. EGI provides computing infrastructure (DCI) that of basic operational inter environment and coordination all of stakeholders. mail: mail: Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under - -

 1 E Ferrari Tiziana EGI.eu Science Park 140, Amsterdam, The E infrastructures integration infrastructures Imamagic Emir University Computing Centre, University of Zagreb J. Marohnica Zagreb,5, Croatia EGI grid middleware and distributed computing computing distributed and middleware grid EGI PoS(ISGC 2013)010 PoS(ISGC 2013)010

to other other

sa.it , Globus stacks stacks or omputing omputing procedures procedures C [9]

the http://pos.sis software software , gLite istributed istributed ficient ficient and sustainable D [15]

erent to to integrate with

diff . Operational procedures and resources resources using the following or or Advanced Computing in Europe

ShareAlike ShareAlike Licence.

- .

work work done , Desktop Grids [25]

the the itoring itoring interface, an accounting interface, a [3] is is a publicly funded

NonCommercial ) - ils ils are provided about the current integration [8]

PaRtnership PaRtnership f provides provides access to ARC with with their daily operations duties and to get and consists of set of operational tools which are used

EGI EGI and

EGI EGI and UNICORE by such as

whose whose access is supported by [21]

ructure ructure (EGI) basic basic operational interfaces provided by EGI and

the set of basic operational interfaces that enable ef

a

infrastructures

core infrastructure platform . developed developed to define the use of the agreed basic set of interfaces which operational is facilitated by a number of operational interfaces

, QosCosGrid (QCG) , QosCosGrid s Advanced Advanced Resource Connector ( ntegration ntegration of resources 22, 2013 - were were [14] all all stakeholders (EGI operations, National Grid Infrastructure managers, site

The EGI This paper describes In In the same time EGI operational interfaces are being extended in order to enable The The i The The European Grid Infrast Introduction Operational tools Operational

Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under

nfrastructure (DCI) that provides more than 320,000 logical CPUs and 152 PB of disk space  2. assist administrators, users, security staff, etc.) for integrating new resources. Furthermore deta status of the platforms infrastructures supported platforms: platforms: Toolkit integration with other DCIs. multiple from resources utilize easily to users enabling thus and EUDAT, (PRACE) services that form part of the EGI operations portfolio. EGI part the of form that services architecture policies should be supported by all resources. and data. EGI provides management of the distributed computing environment and coordination (resource of and all stakeholders infrastructure providers, technology providers, interfaces consist of a management interface, users). a mon The basic operational support interface and a dashboard interface. These interfaces are implemented by tools and 1. I The International Symposiumon Grids and 2013 (ISGC) Clouds March 17 Academia Sinica, Taipei, Taiwan

PoS(ISGC 2013)010 PoS(ISGC 2013)010

is is

[6] sa.it . New

register

on about federated federated . of of the EGI [8]

National National Grid

operational operational tool. http://pos.sis

are: configuration

each creation creation of new service

the InSPIRE project InSPIRE - points. points. GOCDB allows sites to -

ures ures and is independent of the a reference database for all other raised in the Dashboard running running new middleware to ShareAlike ShareAlike Licence.

-

contains contains general static information about

is used to monitor the resources within the

[5]

. urce urce of information about NonCommercial

- [20] [2] smooth smooth operations of a distributed high high level overview of

core infrastructure platform and

source source of alarms the the

[1] ngle, ngle, centrally managed repository within EGI, and is day activities. - provide provide a to

- dashboard, dashboard, support and accounting. These features are provided

established established open source projects such as Nagios and the ActiveMQ - the It It can provide information about logical groups of services belonging

the the authoritative h are developed and maintained by the EGI by and maintained developed h are , sites, services and administrators responsible for services. for responsible administrators and services , sites,

All All data are stored in a database and users are provided with a common web . . These tools enable

(NGI)

in order to define unambiguously the semantics associated to a service type name types types require documentation about the nature of the service requested. In addition the

SAM SAM is based on well The Service Availability Monitor (SAM) Integration Integration of new middleware stacks into GOCDB requires GOCDB GOCDB can host information for multiple infrastruct The The Grid Configuration Database (GOCDB) The following subsections The basic features provided by . This step enables site administrators with resources s anization (VO) operators are also notified for services in their scope. Monitoring results are Monitoring Configuration management Configuration

Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under

 also used for the of calculation and availability ofreliability the infrastructure at different levels of aggregation. SAM is portal Operations components: following the consists of SAM system. messaging production production infrastructure. SAM notifies site administrators in case of malfunctioning services and outages that concern their portion of the infrastructure. NGI operations teams and Virtual Org defined defined service service type ch..cvmfs.stratum.0). name reflects the 2.2 usable scope of deployment (e.g. eu.egi.MPI, interface. interface. GOCDB is deployed as si hours. two every instance failover remote a to up backed type respective service endpoints. The appropriate procedure for adding new service type operational operational tools, providing Infrastructures the definitive so services being registered. to multiple sites. the sites participating in one or more infrastructures. It provides those infrastructures by topology defining a respective list informati of service end store, to update and view the topology of the production infrastructure and basic about information the respective resources within it. GOCDB acts as by different tools whic tools different by in be found can information Additional 2.1 provide provide support day and operators’ infrastructure management, monitoring, PoS(ISGC 2013)010

PoS(ISGC 2013)010

sa.it

,

deployed deployed at , ( http://pos.sis the the architecture of

monitoring monitoring framework Nagios

,

Figure Figure 2 shows ShareAlike ShareAlike Licence. -

bility of sites and services and of sites bility provides provides an interface to group different

that NonCommercial -

, enables enables users to access current status, history and

, enables enables other tools (e.g. Operations Portal, VO ) system

that that Architecture of the frameworkSAM

1 basis for configuring Nagios and other SAM components SAM and other Nagios configuring for basis

the

Figure Figure ortal ortal MyEGI

probes probes used to test monitored services which are provided by middleware message message bus infrastructure used for communication between distributed SAM Profile Profile Management (POEM visualization visualization p programmatic programmatic interface probe probe execution framework based on the open source

, consumes consumes results from all NGI instances, calculates availability and reliability and a the the the the a databases containing topology information gathered from GOCDB and other sources,

)

SAM SAM is deployed in a manner. Each distributed deploys NGI its own SAM instance is that • • • • • • • Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under

esponsible esponsible for monitoring sites and services of that NGI. The central instance  r CERN provides information for the central visualization portal. framework. SAM availability of monitored sites and services and sites of monitored availability relia and and availability results test to access dashboards) community). Nagios NGIs, (e.g. parties third and developers tests into profiles which are which profiles into tests instances and the Nagios Configuration Generator (NCG) Generator Configuration Nagios and the and services of sites and reliability availability and test results PoS(ISGC 2013)010 PoS(ISGC 2013)010

the the sa.it

discovery discovery alarms alarms are

. naming naming and test

[12] http://pos.sis and implementation erface. erface. The portal is

define define the defines how to add new test new test add to how defines level support and perform

- atic atic and dynamic information definition definition e subsequently integrated into a single int [13] the

: ShareAlike ShareAlike Licence. -

activities

NonCommercial - probe probe developer to nal level. nal

execution execution frequency, timeout, etc). Two EGI ration the Availability and Reliability Profile and Reliability Availability the the

made accessible made through accessible SAM SAM monitoring system, EGI information of the regional and central Operations Portal the integ the

[11] the Operations Operations tickets on natio tickets

Figure 2 shows the architecture of the Operations Portal. the Operations of architecture the 2 shows Figure alarms alarms in the Operations Portal. Once the test is added, content content is based on information which is retrieved from several

Architecture

manage 2

Figure Figure

were defined to structure structure to defined were tests which generate Adding new probes to SAM to SAM new probes Adding the EGI of Management

The procedure “Setting a Nagios test status to operations” to status test a Nagios “Setting procedure The The Operations Portal combines and harmonizes different st The Operations Portal • • Integrating Integrating a new middleware stack into SAM requires list Dashboard

a Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under

 to different different sources (e.g. GOCDB, system, GGUS, web services, etc.) and organized around a modular web that highly service Lavoisier a transparent integration provides sources. information of various and enables the operators to manage alarms coming from the SAM system. Operators use dashboard to react on alarms, and alarms of oversight interact with sites, provide first procedures procedures 2.3 of tests for the service types defined in GOCDB. The probes ar the SAM release. Integration also requires configuration (e.g. probe parameters, PoS(ISGC 2013)010 PoS(ISGC 2013)010

n the sa.it agreed agreed

. These level level and of of various

iders, iders, new ensure ensure the , operators

prov perators perators follow http://pos.sis experts endpoint s. O

transparent transparent exchange xGUS xGUS is a simplified by by defining a specific and consists

enables enables to pass trouble

technology technology The EGI EGI technology technology

all all other tools can connect and . This central helpdesk enables

(e.g. (e.g. [7]

new middleware is deployed i ShareAlike ShareAlike Licence. - hen publish publish their usage records centrally. The

system system

) NonCommercial - respective respective service endpoint

. the the

GGUS alarms. with with

of experts). A reference implementation was defined for the handling handling the the of helpdesk systems can be implemented allowing d implementation of GGUS is called xGUS. rough rough existing information providers (e.g. registered in GOCDB, monitored instance instance that EGI can provide to resource infrastructure providers

ructure within the EGI Helpdesk. Helpdesk. EGI the within ructure

s to instantiate a specific support unit in GGUS, which will enable operators to an agreement has to be made with that technology provider on how to integrate Global Global Grid User Support ( . In xGUS the tickets can have a local or global scope. All answers to a global hierarchy hierarchy

. These indicate problems in redirected to the central GGUS and centrally accessible. centrally and GGUS the central to redirected

infrastructures infrastructures that are part of the federation

g the central management of incidents management central the g m, m, the rt rt units The EGI accounting infrastructure is a distributed system with central aggregation, where The new middleware can use multiple approaches for integration into the EGI helpdesk. The regionalize The The interlinking of all ticket systems in place throughout The The user support infrastructure within EGI Helpdesk is distributed If If the resources using new grid middleware are already integrated in all the other virtual virtual research environment Accounting Support

incident incident records across different resource infrastructures. It also enables the communication Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under

ites and  2.5 s usage of the EGI computing resources by the users is accounted at service One One approach i assign tickets directly to the appropriate support structure. This middleware approach team must is follow heavyweight all as EGI procedures. W infrastructure, infrast support its regional regional helpdesk instances are operated centrally but can be customized by the suppo regions are ticket and ticket assignment between experts from different areas and interface between ticket systems and also a template for service. of quality a ticket layout exists to enablin tickets from one system to the other in a way that is transparent to the user. By exposing interfaces, a of topical topical and regional helpdesk systems that are linked platfor together through a central integration formalized communication between the submitter involved in of user the support by providing incident an record interface to which and all partners by by SAM, present in the system and information registered in interface), existing support plugins can be reused and no integration effort additional is needed. For new and integrated. developed to be need plugins 2.4 displayed for procedures standard operational tools th PoS(ISGC 2013)010 PoS(ISGC 2013)010

. ,

sa.it

lternative a federation

an is aggregated Publication Publication of

ted. ted. Summaries http://pos.sis of of filtering options or into y range range

. Other accounting systems [4] wide wide to to calculate the overall usage

a ShareAlike ShareAlike Licence. OGF OGF standard Usage Record scheme - from from the different sites n order I enables enables 3 shows diagram of data exchange with

data data NonCommercial -

Figure the the central repository. Secure Stomp Messenger , which . evel). into into l [24]

ta into into the EGI accounting repositor

For For a unified access to VO accounting information, sites can stored stored in a central repository. Exchange and collection of accounting records through SSM

3

. transferred transferred from services to the central repository by using the EGI message

ccounting ccounting infrastructure currently collects CPU accounting records from sites Figure Figure compatible compatible profile. is is the messaging system used by APEL to securely transmit messages. It is written -

repository

[23]

The core EGI accounting infrastructure is based on APEL

The The EGI a Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under

 regional (SSM) in Python and uses the STOMP protocol SSM. In addition, data must be formatted toaccording the and an EGI publish VO usage records centrally have have to interface to APEL to publish da national infrastructure and resource centre centre resource and infrastructure national and/or and/or grid infrastructures and summarizes the data by site, date, VO, and user. accounting records for storage, virtual and machines jobs parallel is being valida can be displayed in a central Accounting Portal at any level of the hierarchical tree structure which defines EGI infrastructure (e.g. the accounting data is statistics of the distributed resource infrastructure, Usage records are bus infrastructure. PoS(ISGC 2013)010 PoS(ISGC 2013)010

.

the the 3.0 3.0 was was was

[17] sa.it oriented, oriented, - the the distributed

within within EGI . within within EGI http://pos.sis

[2] G G and UNICORE.

the the major European QC , us in Europe (IGE)

task task force a common framework with framework a common

, a Toolkit that that are accessible through

eving eving integrated integrated into all EGI operational

ShareAlike ShareAlike Licence. - the the integration status and the actions is is a collaboration of thus achi thus

mation can be found in be found can mation , Initiative Initiative for Glob a similar integration status. integration a similar

[9] NonCommercial -

ovided, ovided, which are maintained by the UNICORE overview overview of

the project harmonize it harmonize

. Additional infor . Additional . ARC and gLite are fully provide provide an components. components. Such middleware at the same time reduces integration integration of UNICORE resources currently currently consists of resources

and simplifies deployment and maintenance for

the : ARC, dCache, gLite and UNICORE. The : dCache, ARC, gLite goal main and of UNICORE. the project is e e 5 Globus service types in the StandardGOCDB. Nagios probes and is an open source software toolkit used for building grids. It is being

: ARC, Desktop Grids, gLite, Globus

Middleware Middleware Initiative (EMI)

integration

the existing middleware and middleware the existing

middlewares . Currently . there Currently ar . . Thirteen service types for UNICORE services were added to the GOCDB. For

Globus Toolkit provides its own accounting system (GridSAFE). the3.0 After (GridSAFE). release system IGE own its Toolkit Globus accounting provides In In order to speed up the integration of Globus resources, a task force The Accounting Accounting integration is possible with the publishers released with the EMI version In In order to speed up The European The The following subsections The The EGI infrastructure middleware middleware providers

Globus Toolkit Globus EMI middlewares EMI

Software Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under

 currently monitored with 8 different tests, all of which are integrated into the Operations Portal. Portal. the Operations into integrated are which all of tests, different 8 with monitored currently tested. successfully was client SSM via with APEL GridSAFE of integration IGE IGE is especially focusing GridFTP). GRAM, MyProxy, (e.g. components on providing packaged version of various created Globus Toolkit probes provided by the Croatian NGI are used for monitoring. Globus Toolkit resources are release which supports SSM version 2.0. version SSM supports which release 3.2 developed by the Globus Alliance and created monitoring purposes, Nagios probes were pr project. In total there are ten tests fully integrated into SAM, out of which eight are integrated Portal. Operations the into simpler simpler and easier interoperability to problems use computing infrastructure providers tools. 3.1 grid to improve Predecessors Predecessors of the EGI infrastructure, DataGrid and later EGEE were heavily gLite so gLite middleware is fully integrated with all operational tools. However, with the increased reaching are middleware other on integration effort middleware grid each for completed 3. following PoS(ISGC 2013)010 PoS(ISGC 2013)010

to to up - is is a aims aims

sa.it going going [26] [19]

is

.

unities that are [16] http://pos.sis The new follow SP)

-

develops develops middleware that (IDGF

s to integrate two applications

[15] accounting interfaces accounting oject

pean pean distributed supercomputing ) ShareAlike ShareAlike Licence. -

Euro - IDGF ( infrastructures infrastructures (MAPPER) project

- Support Pr support and support NonCommercial - Infrastructure Infrastructure services and resources through

- - ber of CPUs and cores. CPUs ber of released released with the EMI version 3.0 release which is is a pan

[20] allocation reducing the bureaucratic overhead, the bureaucratic reducing allocation

ain requirements are: ain requirements

.

The m The obes for monitoring QCG services. Probes were integrated into SAM and and SAM into integrated were Probes services. QCG for monitoring obes

infrastructures. infrastructures. In order to further the project’s aim, MAPPER initiated -

and require an extremely large num large extremely an require and need need for a streamlined access to e availability of monitoring information across EGI and PRACE, EGI across information of monitoring availability portal, a single through information of accounting availability X.509 certificates, through authentication user International International Desktop Grid Federation ss data. ss middleware stack middleware the the the the International International Desktop Grid Federation

• • • Requirements Requirements were collected from MAPPER users which are in need of using both EGI • EGI EGI is currently investigating the operations integration with two different infrastructures: QCG QCG team provided accounting solution compatible with APEL system. Deployment in For the needs of QCG middleware 3 new service types were added to the GOCDB. QCG The Multiscale APPlications on EuRopean e GOCDB GOCDB was extended with three new services types representing desktop grid resources. The QosCosGrid Desktop Grids Desktop QCG

DCIs integration DCIs

Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under

 common mechanisms for resource for mechanisms common collaborative collaborative distributed infrastructure that enables users and communities to seamlessly share and proce resources. and PRACE 4. PRACE and EUDAT. infrastructure, providing access to PRACE high performance computing resources. EUDAT 5 new tests were added both to SAM and to Operations Portal. Portal. to Operations and to SAM both added were 5 new tests production is possible with the publishers 2.0. version SSM supports collaboration collaboration with EGI InSPIRE and PRACE. The main goal wa that perform distributed multiscale computing by using a set of core middleware services from the pr of set provided team 3.4 to deploy a computational science environment for distributed multiscale computing, across on and European e Grids Grids with Desktop Grids (DGs) in order to support EGI and NGI user users heavy comm A probe for monitoring Desktop Grids services was integrated into SAM. project dashboard, into EGI on integrating work continue 3.3 consolidates the results achieved in the EDGeS project concerning the extension of Service PoS(ISGC 2013)010 PoS(ISGC 2013)010

a

was nels nels sa.it

]

]

]

http://pos.sis into into EGI. Several res integration into GGUS

]

] ]

.eu softwares softwares

] emi - . The user requirements mentioned ShareAlike ShareAlike Licence. - egi.eu

InSPIRE D4.6 EGI Operations - ,

smooth smooth integration process. This paper ] www.eu www. EGI the the NonCommercial - APEL collaboration https://wiki.egi.eu/wiki/GOCDB [

http:// , [ http://www.nordugrid.org/arc/ [

http:// , [ https://wiki.egi.eu/wiki/ ) [ PRACE PRACE - ARC InSPIRE MS421 IntegratingResources into the EGI Production ( -

] Adding new service types EGI

– he he integration of helpdesk and accounting infrastructures ew requirements for the extension of the operational interfaces EGI EGI infrastructure. These operational services are designed in tabase (GOCDB) of the EGI

the support support in case of problems concerning different infrastructures at the

Lechner, step https://documents.egi.eu/document/1309 https://documents.egi.eu/document/1308

, [ , [

https://wiki.egi.eu/wiki/ e first [

,

opean Grid Infrastructure (EGI), EGI EGI operations portfolio consists of services that are crucial for sustainable and Global Grid SupportUser (GGUS), Eur European InitiativeMiddleware (EMI), APEL Grid Configuration Da GOCDB Documentation E. Imamagic, M. Advanced Resource Connector S. Burke, T. Ferrari, M.P. Krakowian, Solagna, harmonized harmonized user

https://wiki.egi.eu/wiki/GOCDB/Input_System_User_Documentation#Add

Functionalities Functionalities and requirements of the different operational interfaces will evolve over The The • A feasibility study of t [ ing_new_services_types Infrastructure Infrastructure [7] [8] [9] [4] [5] [6] [2] [3] [1] Conclusion

Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under

 References EGI infrastructure. EGI time, based on the input from resources providers and research infrastructures. Integration with other DCIs will also bring n in EGI. deployed currently efficient efficient management of modular fashion that enables integration of new grid and procedures other and policies are defined to provides overview of ensure operational interfaces and current status of middlewa above above need common policies for secure access to confidential information. EGI chan and communication policies security sharing by matters and on security collaborate already PRACE activities. response incident security integrated to support 5. same time, i.e. single point of contact for both EGI and PRACE infrastructures. infrastructures. and PRACE EGI for both of contact point i.e. single time, same identified as th PoS(ISGC 2013)010 PoS(ISGC 2013)010

-

] sa.it

] - http://pos.sis

]

specification ] - data - , ISGTW, 2012,

] project.eu

- ] ShareAlike ShareAlike Licence. -

] ] collaborative - www.ige

.eu/ ki.egi.eu/wiki/SSM ] https://wiki.egi.eu/wiki/PROC07

[ NonCommercial - ] eration.org

] ] ]

]

https://wiki.egi.eu/wiki/SAM ] [ .eu/ http://

project [ - sp.eu ri.eu/ - https://wi - www.qoscosgrid.org [

http://stomp.github.com/stomp

[ ] unicore mapper www.globus.org ment ofment the EGI AvailabilityOPSand Reliability Profile, http:// www.idgf http://go.egi.eu/pdnon [ [

, ) science www.desktopgridfed Adding new probes Adding new to SAM, Manage Setting statustest toNagios operations, -

http:// Towards a collaborative data infrastructure for science [ – – – QCG http://www. http://www. http:// [ [ , [

http://www.eudat.eu/ http://www.prace

] http:// , [ , [ [ SP, - UNICORE EUDAT QosCosGrid ( Service Availability Monitoring (SAM), Secure Stomp Messenger (SSM), Stomp ProtocolSpecification, MAPPER, PRACE IDGF, IDGF Initiative for Globus in(IGE), Europe D. Lecarpentier, EGI Procedure Globus Toolkit, EGI InSPIRE paper, EGI Procedure EGI Procedure

http://www.isgtw.org/feature/towards https://wiki.egi.eu/wiki/PROC08 https://wiki.egi.eu/wiki/PROC06 1.1.html [ infrastructure [ [ [25] [21] [22] [23] [24] [19] [20] [15] [16] [17] [18] [13] [14] [10] [11] [12] [26] Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under