Worker Node Software Management: the VO perspective
Mark Santcroos Dennis van Dok Introduction
• e-BioScience group – Bioinformatics Laboratory – Clinical Epidemiology, Biostatistics and Bioinformatics – Academic Medical Centre, Amsterdam
• Intermediate between medical researchers and Dutch NGI
• Support a wide range of applications in Next Generation Sequencing and Medical Imaging
Worker Node Software
• Running on 15 sites in the Netherlands
• Base worker node installation (glite-WN)
• Proof of Concept (PoC) software installation, heritage of Virtual Laboratory for e-Science (ended 2009) Perspective
• Dennis van Dok is part of team that developed and managed the PoC environment at BiG Grid
• Mark is a VO manager for the vlemed VO Job / Application Scenarios
• Use installed software
• Application in Job Sandbox
• Fetch Application using wrapper
• Upgrade versions in PoC distribution
• Lobby for new versions with Site admins Limitations
• Sandbox solution has size limits
• Sandbox and wrapper have network overhead
• Installed version out of date / too new
• Responsibility of maintaining applications for end- user not always preferable
• Site admins have to be in the loop
High Level Goal
• Have a flexible solution to make software available on the grid for end users that is also manageable from a VO admin perspective. Packaging Requirements
• Automatic dependency resolution
• Supported on Linux
• Tools for install/update/remove/status
• Running entire in userspace, unprivileged
• Multiple installed versions of the same software Unsuitable candidates
• rpm/yum • deb/apt • portage • Arch User Repository • pacman • …
• Reasons: too OS specific, difficult to manage unprivileged Pkgsrc
• Originating in NetBSD • Supported on Linux • Self contained • Actively maintained • Can be used as a non-privileged user • Large collection of applications already packaged • Can make use of system provided dependencies • Allows maintaining a local set of packages • Could add packages to the main distribution • Supports binary and source packages Creating a package
DISTNAME= vlet-1.3.2 CATEGORIES= local MASTER_SITES= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ EXTRACT_SUFX= .zip
MAINTAINER= [email protected] HOMEPAGE= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ COMMENT= This is the VL-e Toolkit LICENSE= apache-2.0 NO_CONFIGURE= yes NO_BUILD= yes
PKG_DESTDIR_SUPPORT= user-destdir
INSTALLATION_DIRS= bin lib post-extract: ${CP} ${FILESDIR}/Makefile ${WRKSRC}/Makefile
.include "../../mk/bsd.pkg.mk" Package Tree Management
• update-tree.sh – Pull upstream pkgsrc changes – Create tarball – Put on website
Implementation Principles
• $VO_[VONAME]_SW_DIR is a directory shared between all worker nodes on a site
• Run with a Software (VO) Manager proxy
• Install packages per site / cluster / CE
Architecture
Shared Storage Area
Mount
Management Jobs
Server (UI)
Worker Nodes Managing packages
• site-pkgtool.sh – Program to manage packages centrally – Initiates grid jobs
• Install, Remove, Update
• Init, Reinit, Check, Dump, Info, Version Script on the worker node
• pkgsrc-cmd.sh – Wrapper program that runs on the worker node
• Running as a grid job Information Management
• list-installed-packages.sh – Display information about installed packages for sites
• get-site-status.sh – Gather information from all supported sites
• verify-package.sh – Check if a certain package is available on a site
• get-tags.sh – Get all the package tags for the configured sites Installing a package
• Check if distribution is fresh
• Extract tree in scratch space
• Build package and dependencies
• Install package in shared software area
• Install modulefile
Environment Modules
• “The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles.”
• Select versions • Setup environment • Integrates with system provided setup
Tags
• Software Tags in Information System (BDII)
• Publish installed software versions per CE
• Used for resource selection by adding it to the “Requirements” of a JDL
• Use lcg-ManageVOTag tool to publish tag
• Structure of tags is VO-${vo}_SW_${package} Practical issues
• Tags are not omnipresent
• Shared area can become bottleneck
• No intelligent matching on tags Conclusions
• Flexible software management system
• Relieves burden from user
• Creating packages is still labor intensive work Discussion
• One size fits all? (Did we reinvent the wheel?)
• Connect to EGI AppDB?
• EMI Community Repositories?
• Usable for data distribution?
• Other mechanism for matching?
Links
• pkgsrc – http://www.netbsd.org/docs/software/packages.html • Modules – http://modules.sourceforge.net/ • BiG Grid – http://www.biggrid.nl/ • Bioinformatics Laboratory – http://www.bioinformaticslaboratory.nl/ • Project Code – http://dvandok.github.com/userspace-package- management/ Acknowledgements
• AMC Bioinformatics Laboratory – Prof. dr. Antoine van Kampen – Dr. Silvia Delgado Olabarriaga – Barbera van Schaik
• Big Grid / Nikhef – Jan Just Keijser Thanks!