<<

Worker Node Management: the VO perspective

Mark Santcroos Dennis van Dok Introduction

• e-BioScience group – Bioinformatics Laboratory – Clinical Epidemiology, Biostatistics and Bioinformatics – Academic Medical Centre, Amsterdam

• Intermediate between medical researchers and Dutch NGI

• Support a wide range of applications in Next Generation Sequencing and Medical Imaging

Worker Node Software

• Running on 15 sites in the Netherlands

• Base worker node installation (glite-WN)

• Proof of Concept (PoC) software installation, heritage of Virtual Laboratory for e-Science (ended 2009) Perspective

• Dennis van Dok is part of team that developed and managed the PoC environment at BiG Grid

• Mark is a VO manager for the vlemed VO Job / Application Scenarios

• Use installed software

• Application in Job Sandbox

• Fetch Application using wrapper

• Upgrade versions in PoC distribution

• Lobby for new versions with Site admins Limitations

• Sandbox solution has size limits

• Sandbox and wrapper have network overhead

• Installed version out of date / too new

• Responsibility of maintaining applications for end- user not always preferable

• Site admins have to be in the loop

High Level Goal

• Have a flexible solution to software available on the grid for end users that is also manageable from a VO admin perspective. Packaging Requirements

• Automatic dependency resolution

• Supported on

• Tools for install/update/remove/status

• Running entire in userspace, unprivileged

• Multiple installed versions of the same software Unsuitable candidates

• rpm// • Arch User Repository • pacman • …

• Reasons: too OS specific, difficult to manage unprivileged

• Originating in NetBSD • Supported on Linux • Self contained • Actively maintained • Can be used as a non-privileged user • Large collection of applications already packaged • Can make use of system provided dependencies • Allows maintaining a local set of packages • Could add packages to the main distribution • Supports binary and source packages Creating a package

DISTNAME= vlet-1.3.2 CATEGORIES= local MASTER_SITES= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ EXTRACT_SUFX= .zip

MAINTAINER= [email protected] HOMEPAGE= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ COMMENT= This is the VL-e Toolkit LICENSE= apache-2.0 NO_CONFIGURE= NO_BUILD= yes

PKG_DESTDIR_SUPPORT= user-destdir

INSTALLATION_DIRS= bin lib post-extract: ${CP} ${FILESDIR}/Makefile ${WRKSRC}/Makefile

.include "../../mk/bsd.pkg.mk" Package Tree Management

• update-tree.sh – Pull upstream pkgsrc changes – Create tarball – Put on website

Implementation Principles

• $VO_[VONAME]_SW_DIR is a directory shared between all worker nodes on a site

• Run with a Software (VO) Manager proxy

• Install packages per site / cluster / CE

Architecture

Shared Storage Area

Mount

Management Jobs

Server (UI)

Worker Nodes Managing packages

• site-pkgtool.sh – Program to manage packages centrally – Initiates grid jobs

• Install, Remove, Update

• Init, Reinit, Check, Dump, Info, Version Script on the worker node

• pkgsrc-cmd.sh – Wrapper program that runs on the worker node

• Running as a grid job Information Management

• list-installed-packages.sh – Display information about installed packages for sites

• get-site-status.sh – Gather information from all supported sites

• verify-package.sh – Check if a certain package is available on a site

• get-tags.sh – Get all the package tags for the configured sites Installing a package

• Check if distribution is fresh

• Extract tree in scratch space

• Build package and dependencies

• Install package in shared software area

• Install modulefile

Environment Modules

• “The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles.”

• Select versions • Setup environment • Integrates with system provided setup

Tags

• Software Tags in Information System (BDII)

• Publish installed software versions per CE

• Used for resource selection by adding it to the “Requirements” of a JDL

• Use lcg-ManageVOTag tool to publish tag

• Structure of tags is VO-${vo}_SW_${package} Practical issues

• Tags are not omnipresent

• Shared area can become bottleneck

• No intelligent matching on tags Conclusions

• Flexible software management system

• Relieves burden from user

• Creating packages is still labor intensive work Discussion

• One size fits all? (Did we reinvent the wheel?)

• Connect to EGI AppDB?

• EMI Community Repositories?

• Usable for data distribution?

• Other mechanism for matching?

Links

• pkgsrc – http://www.netbsd.org/docs/software/packages.html • Modules – http://modules.sourceforge.net/ • BiG Grid – http://www.biggrid.nl/ • Bioinformatics Laboratory – http://www.bioinformaticslaboratory.nl/ • Project Code – http://dvandok.github.com/userspace-package- management/ Acknowledgements

• AMC Bioinformatics Laboratory – Prof. dr. Antoine van Kampen – Dr. Silvia Delgado Olabarriaga – Barbera van Schaik

• Big Grid / Nikhef – Jan Just Keijser Thanks!