Report from CernVM Users Workshop 6-8 June 2016, RAL, UK

J Blomer, G Ganis

20 June 2016 EP-SFT weekly meeng The event

• Second in the new series – Previous at CERN, March 2015 • GridPP/STFC offered to fund and host it at RAL – Catalin Condurache provided the local organizaon • 44 registered people – All LHC experiements, EUCLID, Stratum-1 admins, … • 5 invited speakers from industry – 2 ex members of CernVM

• Slides available at : hps://indico.cern.ch/event/469775/ – Videocasts should follow soon

20/6/16 J Blomer, G Ganis, CUW 2016 report 2 The parcipants (some of)

20/6/16 J Blomer, G Ganis, CUW 2016 report 3 The program

• Day 1 & Day 2 – Status reports and plans • From the team • From related acvies /collaboraons – Feedback from exp. / users / admins – Technology session • Day 3 – Discussion – Hands-on sessions (NEW)

20/6/16 J Blomer, G Ganis, CUW 2016 report 4 The program: day 1

G Ganis (CERN) Welcome & Introducon J Blomer (CERN) CernVM-FS Status and Plans R Meusel (ex CERN) Performance Engineering the CernVM-FS Backend G Ganis (CERN) Status and Plans for CernVM I Reid (Brunel) CernVM and Volunteer Compung

D Dykstra (Fermilab) Evoluon of the HEP Content Distribuon Network R Walker (LMU Munich) HEP soware on Supercomputers

Q Le Boulc'h (CC-IN2P3) Using CernVM-FS to Deploy EUCLID Processing Soware on Compung Centres B Couturier (CERN) Feedback from LHCb D Berzano (CERN) Feedback from ALICE A Lahiff (RAL) Feedback from CMS A Dewhurst (RAL) Feedback from ATLAS

A Harutyunyan (Mesosphere) Introducon to Mesos and the Datacenter Operang System

Core acvies Related acvies Users’ Feedback Technology Session

20/6/16 J Blomer, G Ganis, CUW 2016 report 5 The program: day 2

Josh Simons (VMware) Virtualizing High Performance Compung Workloads Oliver Oberst (IBM) IBM and the next generaon HPC Marn Stadtler (Linaro) Linaro Enterprise Group (LEG) George Lestaris (Pivotal) CernVM-FS for Docker Image Distribuon in Foundry

Brian Bockelman (Univ NL) CVMFS and Data Federaons Jose Caballero (BNL) CVMFS Stratum-1 Replica Daemon Markus Fasel (LBNL) Soware Distribuon via CVMFS @ NERSC

Wei-Jen Chang (ASGC) CernVM-FS at ASGC Dennis Van Dok (NIKHEF) CVMFS for the Rest of Us Catalin Condurache (RAL) EGI CernVM-FS Infrastructure Report John Kelly (RAL) CVMFS at RAL - Client View Dan Van Der Ster(CERN) CernVM-FS Operaons in the CERN IT Storage group

Tom Whyne (QMUL) CERN@school and CernVM Frazer Barsnley (RAL) CCP4 DAaaS - STFC Use Case

Technology Session Related acvies Admins Feedback Other acvies

20/6/16 J Blomer, G Ganis, CUW 2016 report 6 The program

• Day 1 & Day 2 – Status reports and plans • From the team • From related acvies /collaboraons – Feedback from exp. / users / admins – Technology session Will focus • Day 3 on these – Discussion – Hands-on sessions

20/6/16 J Blomer, G Ganis, CUW 2016 report 7 Feedback from exp. / users: LHC

• CernVM-FS plan of work fits well needs – Improved publicaon/propagaon – Soluons for HPC (parrot, preload) being tested • CernVM used by all – Including CMS, in parcular for cloud acvies – Request for having in CERN Openstack • Touches all aspects – Condions Data delivery, Standard & Opportunisc clouds, HLT Exploitaon, soware validaon – Realized on top of Mesos and containers (ALICE)

20/6/16 J Blomer, G Ganis, CUW 2016 report 8 Feedback from exp. / users: EUCLID Q Le Boulc’h • CernVM-FS chosen as tool for soware deployment – Current setup: 1 Stratum-0 and 2 Stratum-1 – Tuning configuraon and deployment

20/6/16 J Blomer, G Ganis, CUW 2016 report 9 Feedback from admins: CERN D Van Der Ster • Stratum-0: fully virtualized, backend – 30 repositories, ~11 TB, ~250M files, ~40 GB catalogs • Stratum-1, OurProxy : virtualized squids (Ceph volumes) • Expect growing usage from AFS phase out • Dedicated nightly Stratum-0’s being prototyped • CernVM-FS Docker Volume plug-in – Manages bind mounts between host and containers

20/6/16 J Blomer, G Ganis, CUW 2016 report 10 Feedback from admins: EGI C Condurache • EGI CernVM-FS task force – Promote usage, create network of sites providing the service, foster cooperaon among organizaons – egi.eu : 30 repos, HEP/non-HEP, Stratum-0@RAL

• Operated by STFC and EGI • 30 repos, 1.2 TB, 3M files, 980 GB cat. • Mul-disciplinary • Tesng S3+Ceph storage

20/6/16 J Blomer, G Ganis, CUW 2016 report 11 Feedback from admins: NIKHEF D Van Dock • sodrive.nl : single repository for e-science users in the Netherlands – User interface, Stratum-0, 2x Stratum-1, available on Dutch grid resources – Beta version, early adopon – Uses garbage collecon, would benefit for beer monitoring

20/6/16 J Blomer, G Ganis, CUW 2016 report 12 Related acvies

• Enhancements – Evoluon of the HEP Content Distribuon Network D Dykstra • In parcular: Squid registry and auto-discovery – CernVM-FS Stratum-1 Replica Daemon J Caballero • Harnessing HPC – HEP soware on Supercomputers – Soware Distribuon via CernVM-FS @ NERSC • Extended funconality – CernVM-FS and Data Federaons

20/6/16 J Blomer, G Ganis, CUW 2016 report 13 Harnessing HPC

• HEP soware on SuperMUC R Walker – Parrot/CernVM-FS necessary to ramp up SuperMUC use to 4800 cores / 18M core hours Geant4 producon – Other centers invesgang parrot/CernVM-FS: UK HPC Archer, CSCS, Prague – Trend is towards containers and more connecvity • Soware Distribuon via CernVM-FS @ NERSC M Fasel – Successful experiments with parrot/CermVM-FS and with ``Shier’’ (container-based SW delivery) – Ongoing scale tests for parrot/CernVM-FS – Special job scheduler to translate between HTC batch jobs and HPC MPI world

20/6/16 J Blomer, G Ganis, CUW 2016 report 14 CernVM-FS for Data Federaons B Bockelman • POSIX compliant, consistent, cryptographically secured name space for data files – Combine data scalability of data federaons with metadata scalability of CernVM-FS

Required changes: • Graing files ✔ LCG views Describe namespace without processing a file • Uncompressed files ✔ • External data ✔ Separate downloads of files catalogs and data files • HTTPS/VOMS support ✔ Login to the HTTPS server with Via external VOMS support authencaon helper

1/6/16 G.Ganis, Status of the CernVM Project 15 CernVM-FS for DF: status B Bockelman • Available for test in CernVM-FS v2.3.0 • Being tested on – StashCache (OSG data federaon service for VOs) • With series of caches in front • Targeng workflows with datasets < 10TB • Repo: stash.osgstorage.org – LIGO: last few runs stored at Nebraska • Repo: ligo.osgstorage.org • Next project: cms.osgstorage.org – CernVM-FS-based front to the enre CMS dataset – Prototype exists providing ~1PB data from AAA sites exporng HTTPS

1/6/16 G.Ganis, Status of the CernVM Project 16 Other acvies

• CERN@School, T Whyne (STFC) • CCP4 and DAaaS, F. Barnsley (RAL)

20/6/16 J Blomer, G Ganis, CUW 2016 report 17 CERN @ School T Whyne

• Based around the Timepix hybrid silicon detector – Used to detect ionising radiaon, make energy measurements, perform parcle ID • Deployed UK-wide: ~24 detectors funded by STFC – Deployed also on ISS and satellite experiments • ~40 nodes in the CERN@School detector network – Allows students to have access to those data

20/6/16 J Blomer, G Ganis, CUW 2016 report 18 CERN @ School T Whyne

20/6/16 J Blomer, G Ganis, CUW 2016 report 19 CCP4 and DAaaS F Barnsley

• CernVM-FS – Stratum-0 at RAL • Integrated suite of programs to • SCARF HPC study macromolecular structures – 458 nodes, 5808 cores, 27 TB RAM • Used at Diamond (RAL) • OpenNebula Cloud • Coordinated by SFTC – 850 cores, 3.5 TB RAM

20/6/16 J Blomer, G Ganis, CUW 2016 report 20 CCP4 and DAaaS F Barnsley • From F. Barnsley’s conclusions:

20/6/16 J Blomer, G Ganis, CUW 2016 report 21 Technology session

• Compung infrastructures – Mesosphere, Pivotal • Core technology enhancements – VMware, Linaro • Supercomputers trends – IBM

20/6/16 J Blomer, G Ganis, CUW 2016 report 22 Introducing Mesos and DC/OS A Harutyunyan • Concepts of MESOS – Cluster management with 2-level scheduling

– Automated cluster management operaons – View Datacenters as computers: MESOS is the kernel

• Used by: Apple, Ebay, Cisco, Nelix, PayPal, Twier, …

23 Introducing Mesos and DC/OS (2) A Harutyunyan

MESOS features: quota, oversubscripon, Maintenance, persistence primives, External storage, security, CernVM-FS integraon

20/6/16 J Blomer, G Ganis, CUW 2016 report 24 Introducing Mesos and DC/OS (3) A Harutyunyan

DC/OS includes: MESOS; API, CLI and GUI; Service Discovery & Load Balancing; Storage Volumes; Package Manager; Installer on premise and cloud

20/6/16 J Blomer, G Ganis, CUW 2016 report 25 Pivotal / G Lestaris • Cloud Foundry: Plaorm-as-a-Service – End-to-end system: Scalability, Orchestraon, Load Balancing, Isolaon, Logs, Metrics, Data services – Started by VMware, moved to Pivotal in 2013 – Using containers since 2011 • Potenal Use cases for CernVM-FS in CF – Container distribuon/caching; CF root file system management; provide packages / stacks – Prototyped integraon in BOSH, CF’s tool chain for cloud management • OCI: Open Container Iniave – foundaon collaborave project – Provide standards around container formats and runme • runC, CLI tool, used by Docker, soon by CF • image-spec, aempt to standardize container images format, under development

20/6/16 J Blomer, G Ganis, CUW 2016 report 26 VMware: virtualizing HPC workloads J Simons • Evoluonary container support – Containers in VMs, jeVM (PhotonOS)

• Revoluonary container support – Photon Plaorm: deep integraon with modern applicaon frameworks

20/6/16 J Blomer, G Ganis, CUW 2016 report 27 VMware: Performance tests J Simons • RHEL 6.5-based setup for tesng • Bioinformacs benchmarks all close to bare metal – BioPerf, BLAST, MonteCarlo • MPI-based benchmarks within 10% – With latest ESXi • I/O : within 5% with tuned setups

Anything worse than 5% is likely to be a VM/host configuraon issue

20/6/16 J Blomer, G Ganis, CUW 2016 report 28 VMware: Performance tuning ps J Simons

20/6/16 J Blomer, G Ganis, CUW 2016 report 29 Linaro: ARM and virtualizaon M Stadtler • Linaro: consorum of ARM interested partners – Linaro Enterprise Group (LEG) • Unified Kernel • UEFI/ACPI boot architecture • AArch64-based server enablement • Foster commercial distribuon development (RHEL, SUSE, ...) • Focusing on typical Big Data problem – Needs input form HEP for new use cases – I/O is a an issue • Linaro ARM-based development cloud available for tesng – hp://register.linaro.cloud • Measurements of performance overhead of ARM virtualizaon – { KVM vs } vs x86

20/6/16 J Blomer, G Ganis, CUW 2016 report 30 Performance overhead of ARM virtualizaon M Stadtler

20/6/16 J Blomer, G Ganis, CUW 2016 report 31 IBM and HPC • Goals of ETP4HPC O Oberst – Virtualizaon and cloud geng more and more important – Exascale machine wanted • Overview of OpenPOWER – Open architecture for HPC and Big Data – Wide collaboraon (IBM, Nvidia, Mellanox, , …) – R&D projects including academia (e.g. CORAL) • Overview of POWER8(+), POWER9 – Virtualizaon support • PowerKVM, PowerVM, PowerVC • Soware Defined Data Management – Build around GPFS (now SSFS, Spectrum Scale File System) • Path to Next Generaon HPC • Data Centric System Node • New programming models at exascale • Enhancing interconnects • New shared file system

20/6/16 J Blomer, G Ganis, CUW 2016 report 32 Added list of acons

• 16 new JIRA ckets

20/6/16 J Blomer, G Ganis, CUW 2016 report 33 Summary

• Very nice atmosphere for discussions • Very good feedback from all users • Hands-on session useful – But organizaon to be improved

• Many thanks to GridPP/STFC for supporng and providing appropriate facilies

• Next workshop: most likely Jan 2018 @ CERN – It will be 10 years since the start

20/6/16 J Blomer, G Ganis, CUW 2016 report 34 Discussion Topics

• CernVM-FS – Propagaon improvements • Concurrent (push) techniques – Secure soware • Interface with Kerberos V – Monitoring • File access stascs at Stratum-1 • CernVM – Commissioning of RHEL 7 • Synchronized with lxplus/lxbatch

20/6/16 J Blomer, G Ganis, CUW 2016 report 35