PoS(ISGC 2013)010 PoS(ISGC 2013)010
sa.it set
a
interfaces http://pos.sis the operations
integration of resources such as PRACE and EUDAT. relies on the operational ShareAlike ShareAlike Licence. - operational interfaces are being extended in NonCommercial -
the
. The integration
resource infrastructures : configuration management, monitoring, EGI
Netherlands
faces faces that enable efficient management of the distributed computing
the the tools, interfaces and procedures that allow the integration other with software stacks into
the the 1
to enable to
Speaker
[email protected] [email protected]
This paper describes based on different of the tools that support daily operations dashboard, support and accounting. In parallel order The European Grid Infrastructure (EGI) is a provides than 320,000more CPUs and 152logical distributed PB space disk of and data. EGI provides computing infrastructure (DCI) that of basic operational inter environment and coordination all of stakeholders. mail: mail: Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under - -
1 E Ferrari Tiziana EGI.eu Science Park 140, Amsterdam, The E infrastructures integration infrastructures Imamagic Emir University Computing Centre, University of Zagreb J. Marohnica Zagreb,5, Croatia EGI grid middleware and distributed computing computing distributed and middleware grid EGI PoS(ISGC 2013)010 PoS(ISGC 2013)010
to other other
sa.it , Globus stacks stacks or omputing omputing procedures procedures C [9]
the http://pos.sis software software , gLite istributed istributed ficient ficient and sustainable D [15]
erent to to integrate with
diff . Operational procedures and resources resources using the following or or Advanced Computing in Europe
ShareAlike ShareAlike Licence.
- .
work work done , Desktop Grids [25]
the the itoring itoring interface, an accounting interface, a [3] is is a publicly funded
NonCommercial ) - ils ils are provided about the current integration [8]
PaRtnership PaRtnership f provides provides access to ARC with with their daily operations duties and to get and consists of set of operational tools which are used
EGI EGI and
EGI EGI and UNICORE by such as
whose whose access is supported by [21]
ructure ructure (EGI) basic basic operational interfaces provided by EGI and
the set of basic operational interfaces that enable ef
a
infrastructures
core infrastructure platform . developed developed to define the use of the agreed basic set of interfaces which operational is facilitated by a number of operational interfaces
, QosCosGrid (QCG) , QosCosGrid s Advanced Advanced Resource Connector ( ntegration ntegration of resources 22, 2013 - were were [14] all all stakeholders (EGI operations, National Grid Infrastructure managers, site
The EGI This paper describes In In the same time EGI operational interfaces are being extended in order to enable The The i The The European Grid Infrast Introduction Operational tools Operational
Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under
nfrastructure (DCI) that provides more than 320,000 logical CPUs and 152 PB of disk space 2. assist administrators, users, security staff, etc.) for integrating new resources. Furthermore deta status of the platforms infrastructures supported platforms: platforms: Toolkit integration with other DCIs. multiple from resources utilize easily to users enabling thus and EUDAT, (PRACE) services that form part of the EGI operations portfolio. EGI part the of form that services architecture policies should be supported by all resources. and data. EGI provides management of the distributed computing environment and coordination (resource of and all stakeholders infrastructure providers, technology providers, interfaces consist of a management interface, users). a mon The basic operational support interface and a dashboard interface. These interfaces are implemented by tools and 1. I The International Symposiumon Grids and 2013 (ISGC) Clouds March 17 Academia Sinica, Taipei, Taiwan
PoS(ISGC 2013)010 PoS(ISGC 2013)010
is is
[6] sa.it . New
register
on about federated federated . of of the EGI [8]
National National Grid
operational operational tool. http://pos.sis
are: configuration
each creation creation of new service
the InSPIRE project InSPIRE - points. points. GOCDB allows sites to -
ures ures and is independent of the a reference database for all other raised in the Dashboard running running new middleware to ShareAlike ShareAlike Licence.
-
contains contains general static information about
is used to monitor the resources within the
[5]
. urce urce of information about NonCommercial
- [20] [2] smooth smooth operations of a distributed high high level overview of
core infrastructure platform and
source source of alarms the the
[1] ngle, ngle, centrally managed repository within EGI, and is day activities. - provide provide a to
- dashboard, dashboard, support and accounting. These features are provided
established established open source projects such as Nagios and the ActiveMQ - the It It can provide information about logical groups of services belonging
the the authoritative h are developed and maintained by the EGI by and maintained developed h are , sites, services and administrators responsible for services. for responsible administrators and services , sites,
All All data are stored in a database and users are provided with a common web . . These tools enable
(NGI)
in order to define unambiguously the semantics associated to a service type name types types require documentation about the nature of the service requested. In addition the
SAM SAM is based on well The Service Availability Monitor (SAM) Integration Integration of new middleware stacks into GOCDB requires GOCDB GOCDB can host information for multiple infrastruct The The Grid Configuration Database (GOCDB) The following subsections The basic features provided by . This step enables site administrators with resources s anization (VO) operators are also notified for services in their scope. Monitoring results are Monitoring Configuration management Configuration
Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under
also used for the of calculation and availability ofreliability the infrastructure at different levels of aggregation. SAM is portal Operations components: following the consists of SAM system. messaging production production infrastructure. SAM notifies site administrators in case of malfunctioning services and outages that concern their portion of the infrastructure. NGI operations teams and Virtual Org defined defined service service type ch.cern.cvmfs.stratum.0). name reflects the 2.2 usable scope of deployment (e.g. eu.egi.MPI, interface. interface. GOCDB is deployed as si hours. two every instance failover remote a to up backed type respective service endpoints. The appropriate procedure for adding new service type operational operational tools, providing Infrastructures the definitive so services being registered. to multiple sites. the sites participating in one or more infrastructures. It provides those infrastructures by topology defining a respective list informati of service end store, to update and view the topology of the production infrastructure and basic about information the respective resources within it. GOCDB acts as by different tools whic tools different by in be found can information Additional 2.1 provide provide support day and operators’ infrastructure management, monitoring, PoS(ISGC 2013)010
PoS(ISGC 2013)010
sa.it
,
deployed deployed at , ( http://pos.sis the the architecture of
monitoring monitoring framework Nagios
,
Figure Figure 2 shows ShareAlike ShareAlike Licence. -
bility of sites and services and of sites bility provides provides an interface to group different
that NonCommercial -
, enables enables users to access current status, history and
, enables enables other tools (e.g. Operations Portal, VO ) system
that that Architecture of the frameworkSAM
1 basis for configuring Nagios and other SAM components SAM and other Nagios configuring for basis
the
Figure Figure ortal ortal MyEGI
probes probes used to test monitored services which are provided by middleware message message bus infrastructure used for communication between distributed SAM Profile Profile Management (POEM visualization visualization p programmatic programmatic interface probe probe execution framework based on the open source
, consumes consumes results from all NGI instances, calculates availability and reliability and a the the the the a databases containing topology information gathered from GOCDB and other sources,
)
SAM SAM is deployed in a manner. Each distributed deploys NGI its own SAM instance is that • • • • • • • Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under
esponsible esponsible for monitoring sites and services of that NGI. The central instance r CERN provides information for the central visualization portal. framework. SAM availability of monitored sites and services and sites of monitored availability relia and and availability results test to access dashboards) community). Nagios NGIs, (e.g. parties third and developers tests into profiles which are which profiles into tests instances and the Nagios Configuration Generator (NCG) Generator Configuration Nagios and the and services of sites and reliability availability and test results PoS(ISGC 2013)010 PoS(ISGC 2013)010
the the sa.it
discovery discovery alarms alarms are
. naming naming and test
[12] http://pos.sis and implementation erface. erface. The portal is
define define the defines how to add new test new test add to how defines level support and perform
- atic atic and dynamic information definition definition e subsequently integrated into a single int [13] the
: ShareAlike ShareAlike Licence. -
activities
NonCommercial - probe probe developer to nal level. nal
execution execution frequency, timeout, etc). Two EGI ration the Availability and Reliability Profile and Reliability Availability the the
made accessible made through accessible SAM SAM monitoring system, EGI information of the regional and central Operations Portal the integ the
[11] the Operations Operations tickets on natio tickets
Figure 2 shows the architecture of the Operations Portal. the Operations of architecture the 2 shows Figure alarms alarms in the Operations Portal. Once the test is added, content content is based on information which is retrieved from several
Architecture
manage 2
Figure Figure
were defined to structure structure to defined were tests which generate Adding new probes to SAM to SAM new probes Adding the EGI of Management
The procedure “Setting a Nagios test status to operations” to status test a Nagios “Setting procedure The The Operations Portal combines and harmonizes different st The Operations Portal • • Integrating Integrating a new middleware stack into SAM requires list Dashboard
a Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under
to different different sources (e.g. GOCDB, system, GGUS, web services, etc.) and organized around a modular web that highly service Lavoisier a transparent integration provides sources. information of various and enables the operators to manage alarms coming from the SAM system. Operators use dashboard to react on alarms, and alarms of oversight interact with sites, provide first procedures procedures 2.3 of tests for the service types defined in GOCDB. The probes ar the SAM release. Integration also requires configuration (e.g. probe parameters, PoS(ISGC 2013)010 PoS(ISGC 2013)010
n the sa.it agreed agreed
. These level level and of of various
iders, iders, new ensure ensure the , operators
prov perators perators follow http://pos.sis experts endpoint s. O
transparent transparent exchange xGUS xGUS is a simplified by by defining a specific and consists
enables enables to pass trouble
technology technology The EGI EGI technology technology
all all other tools can connect and . This central helpdesk enables
(e.g. (e.g. [7]
new middleware is deployed i ShareAlike ShareAlike Licence. - hen publish publish their usage records centrally. The
system system
) NonCommercial - respective respective service endpoint
. the the
GGUS alarms. with with
of experts). A reference implementation was defined for the handling handling the the of helpdesk systems can be implemented allowing d implementation of GGUS is called xGUS. rough rough existing information providers (e.g. registered in GOCDB, monitored instance instance that EGI can provide to resource infrastructure providers
ructure within the EGI Helpdesk. Helpdesk. EGI the within ructure
s to instantiate a specific support unit in GGUS, which will enable operators to an agreement has to be made with that technology provider on how to integrate Global Global Grid User Support ( . In xGUS the tickets can have a local or global scope. All answers to a global hierarchy hierarchy
. These indicate problems in redirected to the central GGUS and centrally accessible. centrally and GGUS the central to redirected
infrastructures infrastructures that are part of the federation
g the central management of incidents management central the g m, m, the rt rt units The EGI accounting infrastructure is a distributed system with central aggregation, where The new middleware can use multiple approaches for integration into the EGI helpdesk. The regionalize The The interlinking of all ticket systems in place throughout The The user support infrastructure within EGI Helpdesk is distributed If If the resources using new grid middleware are already integrated in all the other virtual virtual research environment Accounting Support
incident incident records across different resource infrastructures. It also enables the communication Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under
ites and 2.5 s usage of the EGI computing resources by the users is accounted at service One One approach i assign tickets directly to the appropriate support structure. This middleware approach team must is follow heavyweight all as EGI procedures. W infrastructure, infrast support its regional regional helpdesk instances are operated centrally but can be customized by the suppo regions are ticket and ticket assignment between experts from different areas and interface between ticket systems and also a template for service. of quality a ticket layout exists to enablin tickets from one system to the other in a way that is transparent to the user. By exposing interfaces, a of topical topical and regional helpdesk systems that are linked platfor together through a central integration formalized communication between the submitter involved in of user the support by providing incident an record interface to which and all partners by by SAM, present in the system and information registered in interface), existing support plugins can be reused and no integration effort additional is needed. For new and integrated. developed to be need plugins 2.4 displayed for procedures standard operational tools th PoS(ISGC 2013)010 PoS(ISGC 2013)010
. ,
sa.it
lternative a federation
an is aggregated Publication Publication of
ted. ted. Summaries http://pos.sis of of filtering options or into y range range
. Other accounting systems [4] wide wide to to calculate the overall usage
a ShareAlike ShareAlike Licence. OGF OGF standard Usage Record scheme - from from the different sites n order I enables enables 3 shows diagram of data exchange with
data data NonCommercial -
Figure the the central repository. Secure Stomp Messenger , which . evel). into into l [24]
ta into into the EGI accounting repositor
For For a unified access to VO accounting information, sites can stored stored in a central repository. Exchange and collection of accounting records through SSM
3
. transferred transferred from services to the central repository by using the EGI message
ccounting ccounting infrastructure currently collects CPU accounting records from sites Figure Figure compatible compatible profile. is is the messaging system used by APEL to securely transmit messages. It is written -
repository
[23]
The core EGI accounting infrastructure is based on APEL
The The EGI a Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under
regional (SSM) in Python and uses the STOMP protocol SSM. In addition, data must be formatted toaccording the and an EGI publish VO usage records centrally have have to interface to APEL to publish da national infrastructure and resource centre centre resource and infrastructure national and/or and/or grid infrastructures and summarizes the data by site, date, VO, and user. accounting records for storage, virtual and machines jobs parallel is being valida can be displayed in a central Accounting Portal at any level of the hierarchical tree structure which defines EGI infrastructure (e.g. the accounting data is statistics of the distributed resource infrastructure, Usage records are bus infrastructure. PoS(ISGC 2013)010 PoS(ISGC 2013)010
.
the the 3.0 3.0 was was was
[17] sa.it oriented, oriented, - the the distributed
within within EGI . within within EGI http://pos.sis
[2] G G and UNICORE.
the the major European QC , us in Europe (IGE)
task task force a common framework with framework a common
, a Toolkit that that are accessible through
eving eving integrated integrated into all EGI operational
ShareAlike ShareAlike Licence. - the the integration status and the actions is is a collaboration of thus achi thus
mation can be found in be found can mation , Initiative Initiative for Glob a similar integration status. integration a similar
[9] NonCommercial -
ovided, ovided, which are maintained by the UNICORE overview overview of
the project harmonize it harmonize
. Additional infor . Additional . ARC and gLite are fully provide provide an components. components. Such middleware at the same time reduces integration integration of UNICORE resources currently currently consists of resources
and simplifies deployment and maintenance for
the : ARC, dCache, gLite and UNICORE. The : dCache, ARC, gLite goal main and of UNICORE. the project is e e 5 Globus service types in the StandardGOCDB. Nagios probes and is an open source software toolkit used for building grids. It is being
: ARC, Desktop Grids, gLite, Globus
Middleware Middleware Initiative (EMI)
integration
the existing middleware and middleware the existing
middlewares . Currently . there Currently ar . . Thirteen service types for UNICORE services were added to the GOCDB. For
Globus Toolkit provides its own accounting system (GridSAFE). the3.0 After (GridSAFE). release system IGE own its Toolkit Globus accounting provides In In order to speed up the integration of Globus resources, a task force The Globus Toolkit Accounting Accounting integration is possible with the publishers released with the EMI version In In order to speed up The European The The following subsections The The EGI infrastructure middleware middleware providers
Globus Toolkit Globus EMI middlewares EMI
Software Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under
currently monitored with 8 different tests, all of which are integrated into the Operations Portal. Portal. the Operations into integrated are which all of tests, different 8 with monitored currently tested. successfully was client SSM via with APEL GridSAFE of integration IGE IGE is especially focusing GridFTP). GRAM, MyProxy, (e.g. components on providing packaged version of various created Globus Toolkit probes provided by the Croatian NGI are used for monitoring. Globus Toolkit resources are release which supports SSM version 2.0. version SSM supports which release 3.2 developed by the Globus Alliance and created monitoring purposes, Nagios probes were pr project. In total there are ten tests fully integrated into SAM, out of which eight are integrated Portal. Operations the into simpler simpler and easier interoperability to problems use computing infrastructure providers tools. 3.1 grid to improve Predecessors Predecessors of the EGI infrastructure, DataGrid and later EGEE were heavily gLite so gLite middleware is fully integrated with all operational tools. However, with the increased reaching are middleware other on integration effort middleware grid each for completed 3. following PoS(ISGC 2013)010 PoS(ISGC 2013)010
to to up - is is a aims aims
sa.it going going [26] [19]
is
.
unities that are [16] http://pos.sis The new follow SP)
-
develops develops middleware that (IDGF
s to integrate two applications
[15] accounting interfaces accounting oject
pean pean distributed supercomputing ) ShareAlike ShareAlike Licence. -
Euro - IDGF ( infrastructures infrastructures (MAPPER) project
- Support Pr support and support NonCommercial - Infrastructure Infrastructure services and resources through
- - ber of CPUs and cores. CPUs ber of released released with the EMI version 3.0 release which is is a pan
[20] allocation reducing the bureaucratic overhead, the bureaucratic reducing allocation
ain requirements are: ain requirements
.
The m The obes for monitoring QCG services. Probes were integrated into SAM and and SAM into integrated were Probes services. QCG for monitoring obes
infrastructures. infrastructures. In order to further the project’s aim, MAPPER initiated -
and require an extremely large num large extremely an require and need need for a streamlined access to e availability of monitoring information across EGI and PRACE, EGI across information of monitoring availability portal, a single through information of accounting availability X.509 certificates, through authentication user International International Desktop Grid Federation ss data. ss middleware stack middleware the the the the International International Desktop Grid Federation
• • • Requirements Requirements were collected from MAPPER users which are in need of using both EGI • EGI EGI is currently investigating the operations integration with two different infrastructures: QCG QCG team provided accounting solution compatible with APEL system. Deployment in For the needs of QCG middleware 3 new service types were added to the GOCDB. QCG The Multiscale APPlications on EuRopean e GOCDB GOCDB was extended with three new services types representing desktop grid resources. The QosCosGrid Desktop Grids Desktop QCG
DCIs integration DCIs
Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under
common mechanisms for resource for mechanisms common collaborative collaborative distributed infrastructure that enables users and communities to seamlessly share and proce resources. and PRACE 4. PRACE and EUDAT. infrastructure, providing access to PRACE high performance computing resources. EUDAT 5 new tests were added both to SAM and to Operations Portal. Portal. to Operations and to SAM both added were 5 new tests production is possible with the publishers 2.0. version SSM supports collaboration collaboration with EGI InSPIRE and PRACE. The main goal wa that perform distributed multiscale computing by using a set of core middleware services from the pr of set provided team 3.4 to deploy a computational science environment for distributed multiscale computing, across on and European e Grids Grids with Desktop Grids (DGs) in order to support EGI and NGI user users heavy comm A probe for monitoring Desktop Grids services was integrated into SAM. project dashboard, into EGI on integrating work continue 3.3 consolidates the results achieved in the EDGeS project concerning the extension of Service PoS(ISGC 2013)010 PoS(ISGC 2013)010
a
was nels nels sa.it
]
]
]
http://pos.sis into into EGI. Several res integration into GGUS
]
] ]
.eu softwares softwares
] emi - . The user requirements mentioned ShareAlike ShareAlike Licence. - egi.eu
InSPIRE D4.6 EGI Operations - ,
smooth smooth integration process. This paper ] www.eu www. EGI the the NonCommercial - APEL collaboration https://wiki.egi.eu/wiki/GOCDB [
http:// , [ http://www.nordugrid.org/arc/ [
http:// , [ https://wiki.egi.eu/wiki/ ) [ PRACE PRACE - ARC InSPIRE MS421 IntegratingResources into the EGI Production ( -
] Adding new service types EGI
– he he integration of helpdesk and accounting infrastructures ew requirements for the extension of the operational interfaces EGI EGI infrastructure. These operational services are designed in tabase (GOCDB) of the EGI
the support support in case of problems concerning different infrastructures at the
Lechner, step https://documents.egi.eu/document/1309 https://documents.egi.eu/document/1308
, [ , [
https://wiki.egi.eu/wiki/ e first [
,
opean Grid Infrastructure (EGI), EGI EGI operations portfolio consists of services that are crucial for sustainable and Global Grid SupportUser (GGUS), Eur European InitiativeMiddleware (EMI), APEL Grid Configuration Da GOCDB Documentation E. Imamagic, M. Advanced Resource Connector S. Burke, T. Ferrari, M.P. Krakowian, Solagna, harmonized harmonized user
https://wiki.egi.eu/wiki/GOCDB/Input_System_User_Documentation#Add
Functionalities Functionalities and requirements of the different operational interfaces will evolve over The The • A feasibility study of t [ ing_new_services_types Infrastructure Infrastructure [7] [8] [9] [4] [5] [6] [2] [3] [1] Conclusion
Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under
References EGI infrastructure. EGI time, based on the input from resources providers and research infrastructures. Integration with other DCIs will also bring n in EGI. deployed currently efficient efficient management of modular fashion that enables integration of new grid and procedures other and policies are defined to provides overview of ensure operational interfaces and current status of middlewa above above need common policies for secure access to confidential information. EGI chan and communication policies security sharing by matters and on security collaborate already PRACE activities. response incident security integrated to support 5. same time, i.e. single point of contact for both EGI and PRACE infrastructures. infrastructures. and PRACE EGI for both of contact point i.e. single time, same identified as th PoS(ISGC 2013)010 PoS(ISGC 2013)010
-
] sa.it
] - http://pos.sis
]
specification ] - data - , ISGTW, 2012,
] project.eu
- ] ShareAlike ShareAlike Licence. -
] ] collaborative - www.ige
.eu/ ki.egi.eu/wiki/SSM ] https://wiki.egi.eu/wiki/PROC07
[ NonCommercial - ] eration.org
] ] ]
]
https://wiki.egi.eu/wiki/SAM ] [ .eu/ http://
project [ - sp.eu ri.eu/ - https://wi - www.qoscosgrid.org [
http://stomp.github.com/stomp
[ ] unicore mapper www.globus.org ment ofment the EGI AvailabilityOPSand Reliability Profile, http:// www.idgf http://go.egi.eu/pdnon [ [
, ) science www.desktopgridfed Adding new probes Adding new to SAM, Manage Setting statustest toNagios operations, -
http:// Towards a collaborative data infrastructure for science [ – – – QCG http://www. http://www. http:// [ [ , [
http://www.eudat.eu/ http://www.prace
] http:// , [ , [ [ SP, - UNICORE EUDAT QosCosGrid ( Service Availability Monitoring (SAM), Secure Stomp Messenger (SSM), Stomp ProtocolSpecification, MAPPER, PRACE IDGF, IDGF Initiative for Globus in(IGE), Europe D. Lecarpentier, EGI Procedure Globus Toolkit, EGI InSPIRE paper, EGI Procedure EGI Procedure
http://www.isgtw.org/feature/towards https://wiki.egi.eu/wiki/PROC08 https://wiki.egi.eu/wiki/PROC06 1.1.html [ infrastructure [ [ [25] [21] [22] [23] [24] [19] [20] [15] [16] [17] [18] [13] [14] [10] [11] [12] [26] Copyright owned by the author(s) under the terms of the Creative Commons the ofCommons author(s) the terms the Attribution Creative owned by Copyright under