CernVM - a virtual software appliance for LHC applications

C. Aguado-Sanchez 1) , P. Buncic 1) , L. Franco 1) , A. Harutyunyan2), P. Mato 1), Y. Yao 3)

1) CERN, Geneva, 2) Yerevan Physics Institute, Yerevan, 3) LBNL, Berkeley

Predrag Buncic (CERN/PH-SFT) CernVM Project

• Talk Outline ƒ CernVM Project • Building blocks • User Interface and API • Scalability and performance ƒ Fut ure plans & directions ƒ Release status ƒ Conclusions ƒ Links

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 2 CernVM Project

• Portable Analysis Environment using Virtualization Technology (WP9) ƒ Approved in 2007 (2+2 years) as R&D activity in CERN/PH Department ƒ Start ed J anuary 2008 ƒ Sister project to Multicore R&D

• Project goals: ƒ Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid independent of physical software and hardware platform (Linux, Windows, MacOS) ƒ Decouple application lifecycle from evolution of system infrastructure ƒ Reduce effort to install, maintain and keep up to date the experiment software ƒ Lower the cost of software development by reducing the number of compiler-platform combinations

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 3 • A complete Data Analysis environment available for each experiment ƒ Code check-out, edition, compilation, local small test, debugging, … ƒ Grid submission, data access… ƒ Event displays, interactive data analysis, … • No software installation required • Su spend/resume capabilit y

[email protected]/9/09 4 CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 4 Key Building Blocks

• rBulder from rPath (www.rpath.org) ƒ A tool to build VM images for various virtualization platforms • rPath Linux 1 ƒ Slim Linux OS binary compatible with RH/SLC4 • rAA -rPthLiPath Linux App liance Agen t Build types ƒ Web user interface ƒ XMLRPC API ƒ Installable CD/DVD ƒ Stub Image • Can be f ull y customiz ed an d ex ten ded by ƒ Raw Filesystem Image means of plugins (#401) ƒ Netboot Image ƒ Compressed Tar File ƒ Demo CD/DVD (Live CD/DVD) ƒ Raw Hard Disk Image • CVMFS - CernVM file system ƒ Vmware ® Virtual Appliance ƒ Read only file system optimized for software ƒ Vmware ® ESX Server Virtual Appliance distribution ƒ Microsoft ® VHD Virtual Apliance • Aggressive caching ƒ Xen Enterprise Virtual Appliance ƒ Virtual Iron Virtual Appliance ƒ Operational in offline mode ƒ Parallels Virtual Appliance • For as long as you stay within the cache ƒ Amazon Machine Image ƒ Update CD/DVD ƒ Appliance Installable ISO • 

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 5 Conary Package Manager

class Root(CPackageRecipe): name='root' version='5.19.02'

buildRequires = [ ' libpng:devel', 'libpng:devellib','krb5:devel', 'libstdc++:devel’,'libxml2:devel', 'openssl:devel','python:devel', 'xorg-x11:devel', 'zlib:devel', ':devel', 'perl:runtime']

def setup(r): r.addArchive('ftp://root.cern.ch/root/%(name)s v%(version)s.source.tar.gz’) r.Environment((,())'ROOTSYS',%(builddir)s') r.ManualConfigure('--prefix=/opt/root ') r.Make() r.MakeInstall()

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 6 CernVM Variations

group--desktop (lightweight X environment)

group- (grou ps and ex tra packages group-cernvm-desktop required by experiment) X11

group-cernvm-devel group-slc4 (SLC4 compatibility libs) (development tools) (SLC4 compatibility libs) compat-db4 compat-openssl compat-libstdc++slc3 compat-libxml2 compat-readline compat-tcl group-cernvm compat-tk 100 MB (core packages)

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 7 As easy as 1,2,3

imppport xmlrpclib import os url=‘http://user:password@host:8004/rAA/xmlrpc’ server = xmlrpclib.ServerProxy(url) r = server.cernvm.Config.configGridUIVersion(”3.1.22-0")

1. Login to Web interface

2. Create user account

3. Select experiment, appliance flavor and preferences

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 8 CVMFS

• CernVM File System (CVMFS) is derived from Parrot (http://www.cctools .org )and) and its GROW-FS plugin code base and adapted to run as a FUSE kernel module adding extra features like: ƒ possibility to use multiple file catalogues on the server side ƒ ttfilidiithhldtransparent file compression under given size threshold ƒ dynamical expansion of environment variables embedded in symbolic links

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 9 Scalable Infrastructure

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 10 Publishing Releases

1. Each experiment is given a VM to install and test their software using own 2. Publishing is an atomic installation tools operation Current deployment model

Primary (master) software repository Load Secondary software Secondary software Reverse Proxy Balancing repository repository Reverse Proxy Load Balancing Load RPReverse Proxy Balancing Regional Reverse Proxy Reverse Proxy

CernVM (()thin client)

Reverse Proxy Site Reverse Proxy

The aim is to reduce latency which is the most important issue for distributed network file systems

[email protected]/9/09 12 CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 12 Benchmark setup ROOT (version 5.22, 100Mb)

• The benchmark runs in Xen VM and includes compilation and execution of ROOT stressHepix suite ƒ The system has to find and fetch binaries, libraries and headers from the network file system but, once loaded, the libraries remain in a local cache • We compare the performance of CVMFS to AFS by artificially introducing network latency and limiting bandwith (using tc tool) CVMFS vs AFS

(Δ ) • These plots show the time penalty ttt=tAFS,CVMFS - tlocal resulting from having the application binaries, search paths, and include paths reside on a network file system, ƒ Additional performance loss due to virtualization (not shown here) is consistently <5% • CVMFS shows consistently better performance than AFS in case of ‘cold cache’ ƒ ‘Worm cache’ performance is in case of AFS and CVMFS equal to local file access Removing single point of failure (1) HTTP server Slave servers could be deployed on Proxy strategic locations to reduce latency Server and provide redundancy HTTP server CernVM HTTP server

Proxy Proxy Server CernVM Server Proxy Server HTTP server

HTTP CernVM server Proxy Server HTTP server Removing single point of failure (2)

CernVM

HTTP server CernVM Content Proxy Distributio Server n Network HTTP server CernVM

LAN WAN Use P2P like mechanism for discovery Use existing Content Delivery Networks to of nearby CernVMs and cache sharing remove single point of failure between them. No need to manually ƒAmazon CloudFront (http://aws.amazon.com/cloudfront/ setup proxy servers (but they could still ƒCoral CDN (http://www.coralcdn.org) be used where exist) Bridging Grids & Clouds

• BOINC ƒ Open-source software for volunteer computing and grid computing ƒ http://boinc.berkeley.edu/

• CernVM CoPilot development ƒ Based on BOINC, LHC@HOME exper ience an d Cern VM image ƒ Image size is of outmost importance to motivate volonteers ƒ Can be easily adapted to Pilot Job frameworks (AliEn,Dirac, Panda) • … or Condor Worker,,p or proofd.. ƒ Aims to demonstrate running of ATLAS simulation using BOINC infrastructure and PanDa

BOINC LHC@HOM E PanDA Pilot

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 17 CernVM CoPilot AliEn/DIRAC/PanDA Adapter

0. Send host JDL 1. Append framework (free dis k space, free memory, specific information and available packages) request a job

3. Send input files and 2. Send user job JDL from Task commands for execution Queue (packages are already there)

4. When the job is done 5. Register output files send back the output files (and the result of validation)

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 18 Simplifying Grid deployment

• Nimbus (former Globus Workspace Service) ƒ Nimbus is a set of open source tools that together provide an "Inf rastructure-as-a-Si"(IS)ldService" (IaaS) cloud comput ing so lilution • http://workspace.globus.org/ ƒ Google Summer School (hosted at ANL) project to deploy a one-click, auto-configuring virtual Grid overlay for Alice/AliEn • Successfully created virtual AliEn site for ALICE with one command • See CHEP’09 contribution by Artem Harutyunyan (poster #214)

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 19 1.2.0 Release now available

• Available now for download from ƒ http://rbuilder.cern.ch/project/cernvm/releases • CbCan be run on ƒ Linux (KVM, Xen,VMware Player, VirtualBox) ƒ Windows(WMware Player, VirtualBox) ƒ Mac (Fusion, Parallels, VirtualBox) • Release Notes http://cernvm.web.cern.ch/cernvm/index.cgi?page=ReleaseNotes

• Appliance can be configured and used with ALICE, LHCb, ATLAS (and CMS) software frameworks ƒ ALICE (AliEn 2.15,2.16, AliRoot v4-14-Rev-04, v4-15-Rev-03, v4-16-Rev-01) ƒ ATLAS (14.2.20, 14.2.23, 14.2.24, 14.4.0, 14.5.0, 14.5.1) ƒ CMS (CMSSW_2_2_5, CMSSW_2_2_6 ) ƒ LHCB (GAUDI_v19r8, GAUDI_v19r9, GAUDI_v20r0, GAUDI_v20r2, GAUDI_v20r3, GAUDI_v10r11, DAVINCI_v20r3, DAVINCI_v19r12, DAVINCI_v20r3,DAVINCI_v22r1)

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 20 Summary

• If you work for one of the LHC experiments and you have one or a few problems listed below then CernVM might be what you are looking for: ƒ Your expppypgyeriment software is not compatible with your hardware or s/w platform running on your laptop/desktop ƒ You do not want to spend time to manually install and keep software up to date ƒ Your want to profit from the latest developments in CPU technology and use your multi/many core CPU to its maximal potential without modifying your application ƒ You want to run yyyyour software on voluntary resources beyond the current Grid

• CernVM is a ‘thin’ virtual machines capable of running software of all four LHC experiments ƒ Based on rPath Linux, binary compatible with SL4 ƒ Extensible user interface (plu gins ) ƒ Package groups can be tailored to individual user group

• Experiment software is injected in VM by means of a file system (CVMFS) specially optimized for efficient software distribution ƒ This allows us to develop, deploy and maintain only ONE image ƒ Easy for end user but also great potential for deployment on cloud & grid ƒ Using HTTP protocol allows efficient caching using standard technology

• Version 1.2.0 available at http://rbuilder.cern.ch/project/cernvm/releases

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 21 CernVM Links…

• Mailing lists ƒ [email protected] (open list for announcements and discussion) cernvm.support@h([email protected] (end-user support fhCVMj)for the CernVM project) • Savannah Portal ƒ Please submit bugs and feature requests to Savannah at • http://savannah.cern.ch/projects/cernvm • CernVM Home Page: ƒ http://cernvm.cern.ch • rBuilder & Download Page: ƒ http://rbuilder.cern.ch • ATLAS ƒ https://twiki.cern.ch/twiki/bin/view/Atlas/CernVM

CHEP 2009 CernVM – A Virtial Machine for LHC Experiments Prague, 26/03/2009 - 22