Status for interfacing ARC and AliEn Using virtual machines in AliEn

Using virtual machines in AliEn

Bjarte Kileng

Bergen University College

June 15, 2012

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 1 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Status for interfacing ARC and AliEn

I First, a status report on the interfacing of ARC and AliEn.

I Two students are following different approaches.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 2 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn AliEn-ARC interface

I NDGF-T1 uses ARC middleware internally.

I AliEn is run directly on some NDGF-T1 sites, but not all.

I With an AliEn-ARC interface we might get access to all the NDGF-T1 sites.

I With an AliEn-ARC interface, we can have a common VO-box for all the NDGF-T1 sites.

I Challenge with a common VO-box for all the NDGF-T1 sites: Different architecture at different sites.

I Solutions: Torrent installation: Martin Mjelde Tollefsen Virtual machines: Ann Kristin Skudal

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 3 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn AliEn-ARC and torrent installation

I The existing AliEn-ARC interface submit jobs into a local ARC environment.

I A new solution: An ARC site is found using EGIIS (Enhanced Grid Information Indexing Service). The job is sent to the ARC site. AliEn and the necessary packages are retrieved using the torrent install method.

I We need a new ARC.pm module in AliEn.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 4 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn AliEn-ARC and torrent installation What has been done?

I A local AliEn setup is running: The central services. The VO-box services. Xrootd for storage. Torque as batch system.

I The student has set up a computer with ARC including the indexing service and a computing element.

I An ARC site for a job can be found using the indexing service.

I Jobs can be submitted to ARC.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 5 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn AliEn-ARC and the use of CernVM

I If all packages are preinstalled, there is no need for package installation.

I CernVM uses the CernVM , a network file system where all the experiment software is pre-installed.

I The alice.cern.ch software repository has been removed from CernVM.

I The student working on this project will have a maternity leave soon.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 6 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Why virtual machines?

I Using CernVM in an AliEn-ARC interface. I Homogenizing the execution environment is interesting on its own: Can circumvent incompatibility issues between AliEn and worker nodes.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 7 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Our setup

I Torque is used as batch system.

I Currently, only with XEN as hypervisor. I Some code has been taken from the ViBatch project: https://ekptrac.physik.uni-karlsruhe.de/trac/BatchVirt/wiki/ViBatch

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 8 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn The setup is running

I Uses libvirt: Libvirt can boot both XEN and KVM. Newer versions of libvirt can also boot VirtualBox.

I Each job on a worker node will run inside its own virtual machine: A worker node can have several simultaneous running virtual machines.

I The virtual machine must be set up for use by the system, but: There are no site specific configurations inside the virtual machine.

I Only tested on a worker node running SLC5.8 and with a virtual machine image for SLC5.8.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 9 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn How it works

I Torque can run scripts when a job is started (prologue) and when a job finishes (epilogue): Starts and stops the virtual machine.

I Using the option «-S» to qsub, we can specify a shell for the job: The setup uses a script remoteshell which returns a SSH connection to the virtual machine.

I The home directory of the submitter on the worker node must be accessible as a NFS share in the virtual machine.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 10 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Prologue, epilogue and the remote shell

Prologue The prologue script is run as root on the worker node before the job is executed.

Epilogue The epilogue script is run as root on the worker node after the job is finished.

The remote shell script The script remoteshell is run as the submitting user on the worker node.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 11 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn The prologue script

I New disk images are created based on template images: Unless configured read only, a can not be shared between simultaneously running virtual machines.

I Disk images are mounted on the worker node and prepared for the virtual machine: The job is copied to to the image. The home directory of the submitting user is created. Creates (if necessary) and prepares a SSH key for the submitting user on the disk image. Creates an option file which is read by customized init.d scripts on the virtual machine.

I The disks are unmounted end removed from the worker node.

I The virtual machine is started.

I The script exits when the virtual machine is up and running

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 12 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn The prologue script Options stored in the option file on the disk image

I Username, group, uid and gid of submitting user

I NFS host, share and mount point. I Path where to put the alive directory: The alive directory is created by the virtual machine on the NFS share when the machine is up and running. The prologue script will exit when the alive directory exists.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 13 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn The remoteshell script

I A SSH connection to the virtual machine is opened as the submitting user.

I The environment of the job on the worker node is piped into the SSH-connection.

I The virtual machine is now ready for use by Torque as the execution environment.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 14 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn The epilogue script

I The virtual machine is stopped.

I The disk images are deleted.

I The alive directory is removed.

I Lock files are removed.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 15 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn The customized init.d scripts

I The submitting user is created on the virtual machine.

I The NFS share is mounted.

I The alive directory on the NFS share is created.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 16 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Some network comments

I MAC addresses for the virtual machines can either be configured as a list of MAC addresses or as an initial address: If an initial address is given, successive values are used for the virtual machines. A worker node will always use the same subset of MAC addresses for its virtual machines.

I The network setup on the virtual machines must use DHCP: It is easy to add functionality for using static network setup. IP numbers could be given as a list or as an initial value as is done with MAC addresses.

I The alive directory created by the virtual machine is identified by the MAC address of the virtual machine.

I The IP number assigned to the virtual machine is stored by the virtual machine in the alive directory.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 17 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Creating disk images from templates

I Unless read only, a disk image can not be shared between simultaneous running virtual machines.

I Jobs run in identical virtual machines.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 18 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Creating disk images from templates QCOW files

I Copying of complete disk images is both time- and disk consuming: Can use qcow files ( copy on write).

I A qcow file can be based on a master file, or backing file.

I The new file (or «copy») only contains the diffs from the master.

I Unmodified reads go to the master file.

I Writes and modified reads go to the new file.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 19 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn XEN and qcow on SLC5.8

I XEN supports qcow, but only if created with XEN tools.

I XEN can not boot from a qcow file. I One solution is to split the image in two: The boot image is a raw disk image and configured read only. The rest of the file system is stored on qcow files.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 20 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn XEN and qcow on SLC5.8 Extract from the libvirt domain xml template

...

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 21 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Reusing virtual machines

I If a virtual machine is up and running and a new job arrives, the virtual machine could be used also for the new job.

I Adding the necessary functionality should be rather simple. I Configuring the virtual machine is difficult if the number of jobs run by the machine can vary: How many CPUs should we assign to the machine? How much memory? ...

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 22 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn The need for shared file system

I If Torque uses a shared file system for file transfer, then: We can generally not know which files the job will use. With AliEn, we could use a PBS.pm customized for our system. The best solution is probably to include the virtual machines in the shared file system.

I If Torque uses file staging, then: Files could be copied to disk images before a virtual machine is started. After the job is finished and the virtual machine is down, files could be retrieved from the images.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 23 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Using the console of the virtual machine

I The console of the virtual machine can probably be used for running the job.

I This will remove the need for a SSH-key in the home directory of the submitter on the virtual machine.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 24 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Other hypervisors KVM

I The current version of our system require XEN when preparing disk images on the worker node: XEN on SLC5.8 can not handle qcow files created for KVM. Mounting disks using qemu tools is yet not implemented in our system.

I The rest of the system should work with KVM.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 25 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Other hypervisors VirtualBox

I Libvirt on SLC5.8 can not boot VirtualBox: A newer version working with VirtualBox is easily installed.

I VirtualBox can not read qemu files: VirtualBox can use raw devices. The qemu-nbd tool can use the nbd kernel module and map a qcow file into a devices.

I Have successfully booted VirtualBox (and also XEN) from a nbd device connected to a qcow disk image.

I This functionality is not yet implemented into our system.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 26 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Outlook

I More testing is necessary.

I Improve the handling of disk images to support KVM and maybe also VirtualBox.

I Remove the need for a shared file system when Torque uses file staging.

I Add support for other execution shells on the virtual machine: Using the console of the virtual machine. Using RSH.

I Add support for static network configuration.

I Write documentation.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 27 / 28 Status for interfacing ARC and AliEn Using virtual machines in AliEn Download

Download Available with Subversion from https://eple.hib.no/svn/vmbatch/tags/.

Bjarte Kileng (Bergen University College) Using virtual machines in AliEn June 15, 2012 28 / 28