Enabling Grids for E-sciencE

Overview of the EGEE project and the gLite middleware

www. eu-egee.org

EGEE-III INFSO-RI-222667 Outline Enabling Grids for E-sciencE

• What is EGEE? – The project – The infrastructure

• gLite middleware

• EGEE applications

• Sources of further information

EGEE-III INFSO-RI-222667 2 Defining the Grid Enabling Grids for E-sciencE

• A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.

EGEE-III INFSO-RI-222667 Providing a Production Grid Infrastructure for Collaborative Science 3 The EGEE Project Enabling Grids for E-sciencE • Aim of EGEE: “to establish a seamless European Grid infrastructure for the support of the European Research Area (ERA)”

• EGEE – 1 April 2004 – 31 March 2006 – 71 partners in 27 countries, federated in regional Grids

• EGEE-II – 1 April 2006 – 30 April 2008 – EddtiExpanded consortium

• EGEE-III – 1 May 2008 – 30 April 2010 – Transition to sustainable model

EGEE-III INFSO-RI-222667 Providing a Production Grid Infrastructure for Collaborative Science 4 Defining the Grid Enabling Grids for E-sciencE

• A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.

EGEE-III INFSO-RI-222667 Providing a Production Grid Infrastructure for Collaborative Science 5 EGEE working with related

Enabling Grids for E-sciencE infrastructure projects

GIN

EGEE-III INFSO-RI-222667 Providing a Production Grid Infrastructure for Collaborative Science 6 What is happening now? Enabling Grids for E-sciencE Real Time Monitor – Java tool – Displays jobs running (submitted througg)h RBs) – Shows jobs moving around world map in real time, along with changes in status

http://gridportal.hep.ph.ic.ac.uk/rtm/ (snapshot 16 January 2007) EGEE-III INFSO-RI-222667 7 EGEE Infrastructures Enabling Grids for E-sciencE • Production service – Scaling up the infrastructure with resource centres around the globe – Stable, well-supported infrastructure, running only well-tested and reliable middleware

• Pre-production service – Run in parallel with the production service (restricted nr of sites) – First deployment of new versions of the gLite middleware – Test-bed for applications and other external functionality

• T-Infrastructure (Training&Education) – Complete suite of Grid elements 20 sites on 3 continents and application (Testbed, CA, VO, monitoring, support, …) – Everyone can register and use GILDA for training and testing

EGEE-III INFSO-RI-222667 Providing a Production Grid Infrastructure for Collaborative Science 8 NA3 activity:

Enabling Grids for E-sciencE User training and induction

• Expand portfolio of training SA3 JRA1 NA1 NA2 NA3 SA2 NA4 9% 5% 2% 5% 8% materials & courses 2% 19% • Trai n a wid e var iety o f EGEE users (internal/external) SA1 49% • Develop effective mechanisms for training end-users of the NA5 EGEE infrastructure 29 Active 1% • Collaborate in cross-activity partners ~ 29 FTEs initiatives 89 Individuals – ICEAGE Project Digital Library 6 Federations – http://libraryyg.iceage-eu.org/ – Videos, MP3 talks on • http://www.egee.nesc.ac.uk/ – Training events – Training material repository • http://egee.lib.ed.ac.uk/ – EGEE Digital Library – Repository of training materials

EGEE-III INFSO-RI-222667 9 NA4 Activity:

Enabling Grids for E-sciencEApplication identification and support • Application Identification and Support (NA4) – 25 countries, 40 partners, 280+ participants, 1000s of users • Support the large an d di verse EGEE user communit y: – Promote dialog: Users’ Forums & EGEE Conferences – Technical Aid: Porting support, procedural issues – Liaison: Software and operational requirements • Main activities: – 5 application clusers: HEP, Life sciences , Astronomy & astrophysics, Earth science, Computational chemistry, Fusion, Grid observatory – Support: ƒ Application porting support www.lpds.sztaki.hu/gasuc ƒ VO support ƒ Direct user support www.ggus.org ƒ Regional support • http://egeena4.lal. in2p3.fr

EGEE-III INFSO-RI-222667 10 EGEE Infrastructure Enabling Grids for E-sciencE

Country participating in EGEE

No. Cores 80000 70000 60000 50000 40000 30000 20000 10000 0 Jul-04 Jul-05 Jul-06 Jul-07 Apr-04 Oct-04 Apr-05 Oct-05 Apr-06 Oct-06 Apr-07 Oct-07 Apr-08 Jan-05 Jan-06 Jan-07 Jan-08 > 200 sites in 40 countriesNo. Sites 300 250 ~ 38 000 CPUs 200 ~ 5 PB storage 150 100 98k jobs/day 50 > 200 Virtual Organizations 0 ⇨The world’s largest multi-disciplinary Grid Jul-07 Jul-06 Jul-05 EGEE-III INFSO-RI-222667Jul-04 Oct-07 Apr-08 Oct-06 Apr-07 Oct-05 Apr-06 Oct-04 Apr-05 Apr-04 Jan-08 Jan-07 Jan-06 Jan-05 Providing a Production Grid Infrastructure for Collaborative Science 11 Resource management: structure Enabling Grids for E-sciencE

• Operations Coordination Centre (OCC) – management, oversight of all operational and support activities • Regional Operations Centres (ROC) – providing the core of the support infrastructure, each supporting a number of resource centres within its region – Grid Operator on Duty • Resource centres – providing resources (computing, storage, network, etc.); • Grid User Support (GGUS) – At FZK, coordination and management of user support, single point of contact for users

EGEE-III INFSO-RI-222667 12 VO concept

Enabling Grids for E-sciencE • gLite middleware runs on each shared resource to provide – Data services – Computation services – Security service INTERNET • Resources and users form Virtual organisations: basis for collaboration

• Distributed services (both people and middleware) enable the grid

EGEE-III INFSO-RI-222667 13 Defining the Grid Enabling Grids for E-sciencE

• A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.

EGEE-III INFSO-RI-222667 Providing a Production Grid Infrastructure for Collaborative Science 14 Grid middleware Enabling Grids for E-sciencE • The Grid relies on advanced software, called middleware, which interfaces between resources and the applications

• The G rid middl eware: – Basic services ƒ Secure a nd e ffect ive access to resources – High level services ƒ Optimal use of resources ƒ Authentication to the different sites that are used ƒ Job execution & monitoring of progress ƒ Problem recovery ƒ TffltbktthTransfer of results back to the user

EGEE-III INFSO-RI-222667 15 Grid Middleware Enabling Grids for E-sciencE

• When using a PC or • When using a Grid you workstation you – Login with digital – Login with a username credentials – single sign- and password on (“Authentication”) (“Authentication ” ) – Use rights given you – Use rights given to you (“Authorisation”) (“Authorisation”) – Run jobs – Run jobs – Manage files: create – Manage files: create them, read/write, list them, read/write, list directories directories • Components are • Services are linked by link ed b y a b us the I nt ernet • Operating system • Middleware • One admin . domain • MdidiMany admin. domains

EGEE-III INFSO-RI-222667 16 EGEE Middleware: gLite Enabling Grids for E-sciencE • gLite 3.0, gLite 3.1 ⇨M⇨ Merger of LCG 2. 7 and GLite 1. 5

– Exploit experience and existing components from VDT (Condor, Globus), EDG/LCG, and others – Develop a lightweight stack of generic middleware useful to EGEE applications (HEP and Biomedics are pppp)ilot applications). ƒ Should eventually deploy dynamically (e.g. as a globus job) ƒ Pluggable components – cater for different implementations – Focus is on providing a stable and usable infrastructure

EGEE-III INFSO-RI-222667 17 Basic gLite use case:

Enabling Grids for E-sciencE Job submission

User Interface Information System Submit job ((p)executable + small inputs) Resource Broker

query Retrieve status & create (small) output files proxy

query Submit job publish state File and Replica Catalog Retrieve output Job Logging status Site X

Register file Computing Element Storage Element Inpp()ut file(s) JbJob status VO Management process Service Outpu t file(s) (DB of VO users) Logging and

EGEE-III INFSO-RI-222667 bookkeeping 18 Main components Enabling Grids for E-sciencE

User Interface (UI): The place where users logon to the Grid

Resource Broker (RB) (Workload Management System (WMS): Matches the user requirements with the available resources on the Grid

Information System: Characteristics and status of CE and SE

File and replica catalog: Location of grid files and grid file replicas

LLiogging and dBkki(LB) Bookkeeping (LB): LiftifjbLog information of jobs

Computing Element (CE): A batch queue on a site’s computers where the user’s job is executed

Storage Element (SE): provides (large-scale) storage for files

EGEE-III INFSO-RI-222667 19 Main components Enabling Grids for E-sciencE

User Interface (UI): The place where users logon to the Grid

Resource Broker (RB) (Workload Management System (WMS): Matches the user requirements with the available resources on the Grid

Information SystemAll: Characteristics built upon and status of CE and SE

File and replica authorisationcatalog: Location of grid files, and grid file replicas authentication, LLiogging and dBkki(LB) Bookkeepsecuritying (LB): LiftifjbLog information of jobs

Computing Element (CE): A batch queue on a site’s computers where the user’s job is executed

Storage Element (SE): provides (large-scale) storage for files

EGEE-III INFSO-RI-222667 20 Who provides the resources?! Enabling Grids for E-sciencE

Service Provider Note User interface User / institute / VO Computer with client SW Resource Broker VOs - EGEE does (WMS) not fund RBs Information System Grid operations - EGEE funded effort File and reppglica catalog VOs - EGEE does not fund catalogs Logging and VOs - EGEE does Bookkeeping not fund LB servers Computing Element VOs - EGEE does VOs provide resources to (CE) not fund CEs match average need Storage Element VOs - EGEE does VOs provide resources to (SE) not fund SEs match average need External services User / institute / VO To extend the capabilities of the core infrastructure

EGEE-III INFSO-RI-222667 21 Empowering VOs Enabling Grids for E-sciencE Where computer science meets the application communities! – Recommended External Software Packages Application for Egee CommuniTies – Current RESPECT tools: ƒ GridWay Application ƒ P-GRADE Portal ƒ GANGA toolkits ƒ GRelC ƒ I2glogin Command line & APIs – http://egeena4.lal.in2p3.fr/ Æ “Grid Higher-level gLite services software” menu (WMS,…) Production infrastructure contains Basic gLite services: these services – Basic services: Must be complete CE, SE, info, security and robust; Should not assume the use of Higher-Level Grid Services – High level services: help the users building their computing infrastructure but should not be mandatory

EGEE-III INFSO-RI-222667 22 Defining the Grid Enabling Grids for E-sciencE

• A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.

EGEE-III INFSO-RI-222667 Providing a Production Grid Infrastructure for Collaborative Science 23 EGEE Applications Enabling Grids for E-sciencE • >270 VOs from several scientific domains – Astronomy & Astrophysics – Civil Protection – Computational Chemistry – Comp. Fluid Dynamics – Computer Science/Tools – Condensed Matter Physics – Earth Sciences – Fusion – High Energy Physics – Life Sciences • Further applications undltider evaluation

Applications have moved from testing to routine and daily usage

EGEE-III INFSO-RI-222667 ~80-95% efficiency Providing a Production Grid Infrastructure for Collaborative Science 24 Application families Enabling Grids for E-sciencE • Simulation – Large number of similar, independent jobs – parameter study • Bulk Processing – Widely-distributed input data, Sophisticated data management • Workflow – Complex dependencies between individual tasks • Leggyppacy Applications – Licenses: control access to software on the grid – No recompilation ⇒ no direct use of grid APIs • PlllJbParallel Jobs – Many CPUs needed simultaneously, Use of MPI libraries – Limited support in gLite: MPI configuration is not uniform • Responsive Apps. – Short response time – No real support in gLite Æ Interactive Grid FP6 project

EGEE-III INFSO-RI-222667 25 Further information, references Enabling Grids for E-sciencE • EGEE – http://www.eu-egggee.org/ • gLite middleware – http://www.glite.org • gLite manuals, documentation – http://glite.web.cern.ch/glite/documentation/ (gLite user guide) • Recommended External Software Packages for Egee CommuniTies (RESPECT) – http://egeena4.lal.in2p3.fr/

• Description of work of EGEE-III – https://edms. . ch/document/886385/4

EGEE-III INFSO-RI-222667 26 Summary Enabling Grids for E-sciencE

• EGEE is running the largest multi-VO grid in the world! – Creating the “grid layer” in e-Infrastructure for research, public service and industry • Key concepts for EGEE – Sustainability – planning for the long-term – Production quality – User support • EGEE’s middleware: gLite. Current version 3.1 – Basic middleware services – High level middleware services • External software to foster uptake of technology

EGEE-III INFSO-RI-222667 27