Global Software Distribution with CernVM-FS
Jakob Blomer CERN
2016 CCL Workshop on Scalable Computing October 19th, 2016
[email protected] CernVM-FS 1 / 15 The Anatomy of a Scientific Software Stack (In High Energy Physics)
[email protected] CernVM-FS 2 / 15 The Anatomy of a Scientific Software Stack (In High Energy Physics)
My Analysis Code < 10 Python Classes
CMS Software Framework O(1000) C++ Classes
Simulation and I/O Libraries ROOT, Geant4, MC-XYZ
CentOS 6 and Utilities 풪(10) Libraries
[email protected] CernVM-FS 2 / 15 The Anatomy of a Scientific Software Stack (In High Energy Physics)
My Analysis Code < 10 Python Classes How to install on. . . ∙ my laptop: CMS Software Framework O(1000) C++ Classes compile into /opt ∼ 1 week ∙ my local cluster: Simulation and I/O Libraries ask sys-admin to install in ROOT, Geant4, MC-XYZ /nfs/software > 1 week ∙ CentOS 6 and Utilities someone else’s cluster:? 풪(10) Libraries
[email protected] CernVM-FS 2 / 15 The Anatomy of a Scientific Software Stack (In High Energy Physics)
My Analysis Code < 10 Python Classes How to install (again) on. . . changing ∙ my laptop: CMS Software Framework O(1000) C++ Classes compile into /opt ∼ 1 week ∙ my local cluster: Simulation and I/O Libraries ask sys-admin to install in ROOT, Geant4, MC-XYZ /nfs/software > 1 week
CentOS 6 and Utilities ∙ someone else’s cluster:? 풪(10) Libraries stable
[email protected] CernVM-FS 2 / 15 Beyond the Local Cluster
World Wide LHC ComputingWLCG Grid sites ∙ ∼ 200 sites: from 100 to 100 000 cores ∙ Different countries, institutions, batch schedulers, OSs, . ∙ Augmented by clouds, supercomputers, LHC@Home
[email protected] CernVM-FS 3 / 15
17 What about Docker?
Example: in Docker
$ docker pull r-base Linux Libs ... −→ 1 GB image $ docker run -it r-base $ ... (fitting tutorial) −→ only 30 MB used Container (“App”)
It’s hard to scale Docker:
iPhone App Docker Image 20 MB 1 GB changes every month changes twice a week phones update staggered servers update synchronized
−→ Your preferred cluster or supercomputer might not run Docker
[email protected] CernVM-FS 4 / 15 A File System for Software Distribution
Software FS rAA Basic System Utilities Global HTTP Cache Hierarchy
OS Kernel
Worker Node’s Worker Node’s Central Web Server Memory Buffer Disk Cache Entire Software Stack Megabytes Gigabytes Terabytes
Pioneered by CCL’s GROW-FS for CDF at Tevatron Refined in CernVM-FS, in production for CERN’s LHC and other experiments
1 Single point of publishing 2 HTTP transport, access and caching on demand 3 Important for scaling: bulk meta-data download (not shown)
[email protected] CernVM-FS 5 / 15 One More Ingredient: Content-Addressable Storage
HTTP Transport Transformation Caching & Replication
Read-Only Content-Addressed Read/Write File System Objects File System
Worker Nodes Software Publisher / Master Source
Two independent issues
1 How to mount a file system (on someone else’s computer)?
2 How to distribute immutable, independent objects?
[email protected] CernVM-FS 6 / 15 Content-Addressable Storage: Data Structures
/cvmfs/icecube.opensciencegrid.org Object Store amd64-gcc6.0 ∙ Compressed files and chunks 4.2.0 ∙ De-duplicated ChangeLog . . File Catalog Compression, SHA-1 ∙ Directory structure, symlinks 806fbb67373e9... ∙ Content hashes of regular files ∙ Digitally signed Repository ⇒ integrity, authenticity ∙ Time to live ∙ Partitioned / Merkle hashes (possibility of sub catalogs) Object Store File catalogs
⇒ Immutable files, trivial to check for corruption, versioning
[email protected] CernVM-FS 7 / 15 Transactional Publish Interface
Read/Write Scratch Area CernVM-FS Read-Only Union File System AUFS or OverlayFS
Read/Write Interface File System, S3
Publishing New Content [ ~ ]# cvmfs_server transaction icecube.opensciencegrid.org [ ~ ]# make DESTDIR=/cvmfs/opensciencgrid.org/amd64-gcc6.0/4.2.0 install [ ~ ]# cvmfs_server publish icecube.opensciencegrid.org
Uses cvmfs-server tools and an Apache web server
[email protected] CernVM-FS 8 / 15 Transactional Publish Interface
Read/Write Scratch Area CernVM-FS Read-Only Union File System AUFS or OverlayFS
Read/Write Interface File System, S3
Reproducible: as in git, you can always come back to this state
Publishing New Content [ ~ ]# cvmfs_server transaction icecube.opensciencegrid.org [ ~ ]# make DESTDIR=/cvmfs/opensciencgrid.org/amd64-gcc6.0/4.2.0 install [ ~ ]# cvmfs_server publish icecube.opensciencegrid.org
Uses cvmfs-server tools and an Apache web server
[email protected] CernVM-FS 8 / 15 Content Distribution over the Web
Server side: stateless services
Data Center Caching Proxy Web Servery 풪(100) nodes / server 풪(10) DCs / server
Worker Nodes HTTP HTTP
[email protected] CernVM-FS 9 / 15 Content Distribution over the Web
Server side: stateless services
Data Center Load Balancing Web Servery 풪(100) nodes / server 풪(10) DCs / server HTTP
HTTP Worker Nodes HTTP
HTTP
[email protected] CernVM-FS 9 / 15 Content Distribution over the Web
Server side: stateless services
Data Center Caching Proxies Web Servery 풪(100) nodes / server 풪(10) DCs / server
Failover
Worker Nodes HTTP
HTTP
[email protected] CernVM-FS 9 / 15 Content Distribution over the Web
Server side: stateless services
Data Center Caching Proxies Mirror Serversy 풪(100) nodes / server 풪(10) DCs / server
Failover
Worker Nodes
Geo-IP HTTP
[email protected] CernVM-FS 9 / 15 Content Distribution over the Web
Server side: stateless services
Data Center Caching Proxies Mirror Serversy 풪(100) nodes / server 풪(10) DCs / server
Failover
Worker Nodes
HTTP
[email protected] CernVM-FS 9 / 15 Content Distribution over the Web
Server side: stateless services
Data Center Caching Proxies Mirror Serversy 풪(100) nodes / server 풪(10) DCs / server
Worker Nodes Prefetched Cache
[email protected] CernVM-FS 9 / 15 Mounting the File System Client: Fuse
Available for RHEL, Ubuntu, OS X; Intel, ARM, Power
Works on most grids and virtual machines (cloud) inflate+verify
HTTP GET fd file descr. open(/ChangeLog) CernVM-FS
SHA1 glibc libfuse
user space syscall /dev/fuse kernel space
Fuse . VFS . inode cache NFS dentry cache
[email protected] CernVM-FS 10 / 15 Mounting the File System Client: Parrot
Available for Linux / Intel
Works on supercomputers, opportunistic clusters, in containers inflate+verify
Parrot Sandbox HTTP GET fd file descr. open(/ChangeLog) libcvmfs
SHA1 glibc libparrot
user space syscall / Parrot kernel space
Fuse . VFS . inode cache NFS dentry cache
ext3
[email protected] CernVM-FS 11 / 15 Scale of Deployment
∙ > 350 million files under management ∙ > 50 repositories ∙ Installation service by OSG and EGI
[email protected] CernVM-FS 12 / 15 Docker Integration
Under Construction!
Docker Daemon Improved Docker Daemon
Funded Project pull & push containers file-based transfer
Docker Registry CernVM File System
[email protected] CernVM-FS 13 / 15 Client Cache Manager Plugins Under Construction!
3rd party plugins C library Transport Memory, cvmfs/fuse Cache Manager Channel Ceph, libcvmfs/parrot (Key-Value Store) (TCP, Socket, . . . ) RAMCloud, ...
Draft C Interface
cvmfs_add_refcount( s t r u c t hash object_id, i n t change_by); cvmfs_pread( s t r u c t hash object_id, i n t o f f s e t, i n t s i z e, void * b u f f e r);
// Transactional writing in fixed −s i z e d chunks cvmfs_start_txn( s t r u c t hash object_id, i n t txn_id, s t r u c t i n f o object_info); cvmfs_write_txn( i n t txn_id, void * b u f f e r, i n t s i z e); cvmfs_abort_txn( i n t txn_id); cvmfs_commit_txn( i n t txn_id);
[email protected] CernVM-FS 14 / 15 Summary
CernVM-FS Use Cases ∙ Global, HTTP-based file system ∙ Scientific software for software distribution ∙ Distribution of static data ∙ Works great with Parrot e. g. conditions, calibration ∙ Optimized for small files, ∙ VM / container distribution heavy meta-data workload cf. CernVM ∙ Open source (BSD), ∙ Building block for long-term used beyond high-energy physics data preservation
Source code: https://github.com/cvmfs/cvmfs Downloads: https://cernvm.cern.ch/portal/filesystem/downloads Documentation: https://cvmfs.readthedocs.org Mailing list: [email protected]
[email protected] CernVM-FS 15 / 15 Backup Slides
[email protected] CernVM-FS 16 / 15 CernVM-FS Client Tools
Fuse Module Mount helpers ∙ Normal namespace: ∙ Setup environment (number of file /cvmfs/
[email protected] CernVM-FS 17 / 15 Experiment Software from a File System Viewpoint
Software Directory Tree atlas.cern.ch 15 ] 6 repo 10 ×
software 10
x86_64-gcc43
5 Files 17.1.0 File System Entries [ 17.2.0 . 1 . Statistics over 2 Years
[email protected] CernVM-FS 18 / 15 Experiment Software from a File System Viewpoint
Software Directory Tree atlas.cern.ch 15 ] 6 repo 10 ×
software 10
x86_64-gcc43
Duplicates 5 17.1.0 File System Entries [ 17.2.0 . 1 File Kernel . Statistics over 2 Years
[email protected] CernVM-FS 18 / 15 Experiment Software from a File System Viewpoint
Software Directory Tree atlas.cern.ch 15 ] 6 repo 10 ×
software 10
x86_64-gcc43
Duplicates 5 17.1.0 File System Entries [ 17.2.0 . 1 File Kernel . Statistics over 2 Years
Between consecutive software versions: only ≈ 15 % new files
[email protected] CernVM-FS 18 / 15 Experiment Software from a File System Viewpoint
Software Directory Tree atlas.cern.ch 15 ]
6 Directories repo
10 Symlinks × software 10 x86_64-gcc43
Duplicates 17.1.0 5
File System Entries [ 17.2.0 . 1 File Kernel . Statistics over 2 Years
Fine-grained software structure (Conway’s law) Between consecutive software versions: only ≈ 15 % new files
[email protected] CernVM-FS 18 / 15 Directory Organization
Athena 17.0.1 CMSSW 4.2.4 50 LCG Externals R60
40
30
20 Fraction of Files [%]
10
0 0 5 10 15 20 Directory Depth
Typical (non-LHC) software: majority of files in directory level ≤ 5
[email protected] CernVM-FS 19 / 15 Cumulative File Size Distribution
ATLAS CMS Requested LHCb UNIX ALICE Web Server
218 216 214 212 210 28 Dateigröße [B] 26 24
0 10 20 30 40 50 60 70 80 90 100 Perzentil
cf. Tanenbaum et al. 2006 for “Unix” and “Webserver”
Good compression rates (factor 2–3) [email protected] CernVM-FS 20 / 15 Runtime Behavior
Working Set ∙ ≈10 % of all available files are requested at runtime ∙ Median of file sizes: < 4 kB
Flash Crowd Effect /share Software
∙ Up to 500 kHz /share meta data request dDoS rate /share
∙ Up to 1 kHz file ∙ ∙ ∙ open request rate
Shared Software Area
[email protected] CernVM-FS 21 / 15 Software vs. Data
Based on ATLAS Figures 2012
Software Data POSIX Interface put, get, seek, streaming File dependencies Independent files 107 objects 108 objects 1012 B volume 1016 B volume Whole files File chunks Absolute paths Any mountpoint Open source Confidential WORM (“write-once-read-many”) Versioned
[email protected] CernVM-FS 22 / 15