
Global Software Distribution with CernVM-FS Jakob Blomer CERN 2016 CCL Workshop on Scalable Computing October 19th, 2016 [email protected] CernVM-FS 1 / 15 The Anatomy of a Scientific Software Stack (In High Energy Physics) [email protected] CernVM-FS 2 / 15 The Anatomy of a Scientific Software Stack (In High Energy Physics) My Analysis Code < 10 Python Classes CMS Software Framework O(1000) C++ Classes Simulation and I/O Libraries ROOT, Geant4, MC-XYZ CentOS 6 and Utilities O(10) Libraries [email protected] CernVM-FS 2 / 15 The Anatomy of a Scientific Software Stack (In High Energy Physics) My Analysis Code < 10 Python Classes How to install on. ∙ my laptop: CMS Software Framework O(1000) C++ Classes compile into /opt ∼ 1 week ∙ my local cluster: Simulation and I/O Libraries ask sys-admin to install in ROOT, Geant4, MC-XYZ /nfs/software > 1 week ∙ CentOS 6 and Utilities someone else’s cluster:? O(10) Libraries [email protected] CernVM-FS 2 / 15 The Anatomy of a Scientific Software Stack (In High Energy Physics) My Analysis Code < 10 Python Classes How to install (again) on. changing ∙ my laptop: CMS Software Framework O(1000) C++ Classes compile into /opt ∼ 1 week ∙ my local cluster: Simulation and I/O Libraries ask sys-admin to install in ROOT, Geant4, MC-XYZ /nfs/software > 1 week CentOS 6 and Utilities ∙ someone else’s cluster:? O(10) Libraries stable [email protected] CernVM-FS 2 / 15 Beyond the Local Cluster World Wide LHC ComputingWLCG Grid sites ∙ ∼ 200 sites: from 100 to 100 000 cores ∙ Different countries, institutions, batch schedulers, OSs, . ∙ Augmented by clouds, supercomputers, LHC@Home [email protected] CernVM-FS 3 / 15 17 What about Docker? Example: in Docker $ docker pull r-base Linux Libs ... −! 1 GB image $ docker run -it r-base $ ... (fitting tutorial) −! only 30 MB used Container (“App”) It’s hard to scale Docker: iPhone App Docker Image 20 MB 1 GB changes every month changes twice a week phones update staggered servers update synchronized −! Your preferred cluster or supercomputer might not run Docker [email protected] CernVM-FS 4 / 15 A File System for Software Distribution Software FS rAA Basic System Utilities Global HTTP Cache Hierarchy OS Kernel Worker Node’s Worker Node’s Central Web Server Memory Buffer Disk Cache Entire Software Stack Megabytes Gigabytes Terabytes Pioneered by CCL’s GROW-FS for CDF at Tevatron Refined in CernVM-FS, in production for CERN’s LHC and other experiments 1 Single point of publishing 2 HTTP transport, access and caching on demand 3 Important for scaling: bulk meta-data download (not shown) [email protected] CernVM-FS 5 / 15 One More Ingredient: Content-Addressable Storage HTTP Transport Transformation Caching & Replication Read-Only Content-Addressed Read/Write File System Objects File System Worker Nodes Software Publisher / Master Source Two independent issues 1 How to mount a file system (on someone else’s computer)? 2 How to distribute immutable, independent objects? [email protected] CernVM-FS 6 / 15 Content-Addressable Storage: Data Structures /cvmfs/icecube.opensciencegrid.org Object Store amd64-gcc6.0 ∙ Compressed files and chunks 4.2.0 ∙ De-duplicated ChangeLog . File Catalog Compression, SHA-1 ∙ Directory structure, symlinks 806fbb67373e9... ∙ Content hashes of regular files ∙ Digitally signed Repository ) integrity, authenticity ∙ Time to live ∙ Partitioned / Merkle hashes (possibility of sub catalogs) Object Store File catalogs ) Immutable files, trivial to check for corruption, versioning [email protected] CernVM-FS 7 / 15 Transactional Publish Interface Read/Write Scratch Area CernVM-FS Read-Only Union File System AUFS or OverlayFS Read/Write Interface File System, S3 Publishing New Content [ ~ ]# cvmfs_server transaction icecube.opensciencegrid.org [ ~ ]# make DESTDIR=/cvmfs/opensciencgrid.org/amd64-gcc6.0/4.2.0 install [ ~ ]# cvmfs_server publish icecube.opensciencegrid.org Uses cvmfs-server tools and an Apache web server [email protected] CernVM-FS 8 / 15 Transactional Publish Interface Read/Write Scratch Area CernVM-FS Read-Only Union File System AUFS or OverlayFS Read/Write Interface File System, S3 Reproducible: as in git, you can always come back to this state Publishing New Content [ ~ ]# cvmfs_server transaction icecube.opensciencegrid.org [ ~ ]# make DESTDIR=/cvmfs/opensciencgrid.org/amd64-gcc6.0/4.2.0 install [ ~ ]# cvmfs_server publish icecube.opensciencegrid.org Uses cvmfs-server tools and an Apache web server [email protected] CernVM-FS 8 / 15 Content Distribution over the Web Server side: stateless services Data Center Caching Proxy Web Servery O(100) nodes / server O(10) DCs / server Worker Nodes HTTP HTTP [email protected] CernVM-FS 9 / 15 Content Distribution over the Web Server side: stateless services Data Center Load Balancing Web Servery O(100) nodes / server O(10) DCs / server HTTP HTTP Worker Nodes HTTP HTTP [email protected] CernVM-FS 9 / 15 Content Distribution over the Web Server side: stateless services Data Center Caching Proxies Web Servery O(100) nodes / server O(10) DCs / server Failover Worker Nodes HTTP HTTP [email protected] CernVM-FS 9 / 15 Content Distribution over the Web Server side: stateless services Data Center Caching Proxies Mirror Serversy O(100) nodes / server O(10) DCs / server Failover Worker Nodes Geo-IP HTTP [email protected] CernVM-FS 9 / 15 Content Distribution over the Web Server side: stateless services Data Center Caching Proxies Mirror Serversy O(100) nodes / server O(10) DCs / server Failover Worker Nodes HTTP [email protected] CernVM-FS 9 / 15 Content Distribution over the Web Server side: stateless services Data Center Caching Proxies Mirror Serversy O(100) nodes / server O(10) DCs / server Worker Nodes Prefetched Cache [email protected] CernVM-FS 9 / 15 Mounting the File System Client: Fuse Available for RHEL, Ubuntu, OS X; Intel, ARM, Power Works on most grids and virtual machines (cloud) inflate+verify HTTP GET fd file descr. open(/ChangeLog) CernVM-FS SHA1 glibc libfuse user space syscall /dev/fuse kernel space Fuse . VFS . inode cache NFS dentry cache ext3 [email protected] CernVM-FS 10 / 15 Mounting the File System Client: Parrot Available for Linux / Intel Works on supercomputers, opportunistic clusters, in containers inflate+verify Parrot Sandbox HTTP GET fd file descr. open(/ChangeLog) libcvmfs SHA1 glibc libparrot user space syscall / Parrot kernel space Fuse . VFS . inode cache NFS dentry cache ext3 [email protected] CernVM-FS 11 / 15 Scale of Deployment ∙ > 350 million files under management ∙ > 50 repositories ∙ Installation service by OSG and EGI [email protected] CernVM-FS 12 / 15 Docker Integration Under Construction! Docker Daemon Improved Docker Daemon Funded Project pull & push containers file-based transfer Docker Registry CernVM File System [email protected] CernVM-FS 13 / 15 Client Cache Manager Plugins Under Construction! 3rd party plugins C library Transport Memory, cvmfs/fuse Cache Manager Channel Ceph, libcvmfs/parrot (Key-Value Store) (TCP, Socket, . ) RAMCloud, ... Draft C Interface cvmfs_add_refcount( s t r u c t hash object_id, i n t change_by); cvmfs_pread( s t r u c t hash object_id, i n t o f f s e t, i n t s i z e, void * b u f f e r); // Transactional writing in fixed −s i z e d chunks cvmfs_start_txn( s t r u c t hash object_id, i n t txn_id, s t r u c t i n f o object_info); cvmfs_write_txn( i n t txn_id, void * b u f f e r, i n t s i z e); cvmfs_abort_txn( i n t txn_id); cvmfs_commit_txn( i n t txn_id); [email protected] CernVM-FS 14 / 15 Summary CernVM-FS Use Cases ∙ Global, HTTP-based file system ∙ Scientific software for software distribution ∙ Distribution of static data ∙ Works great with Parrot e. g. conditions, calibration ∙ Optimized for small files, ∙ VM / container distribution heavy meta-data workload cf. CernVM ∙ Open source (BSD), ∙ Building block for long-term used beyond high-energy physics data preservation Source code: https://github.com/cvmfs/cvmfs Downloads: https://cernvm.cern.ch/portal/filesystem/downloads Documentation: https://cvmfs.readthedocs.org Mailing list: [email protected] [email protected] CernVM-FS 15 / 15 Backup Slides [email protected] CernVM-FS 16 / 15 CernVM-FS Client Tools Fuse Module Mount helpers ∙ Normal namespace: ∙ Setup environment (number of file /cvmfs/<repository> descriptors, access rights, . ) e. g. /cvmfs/atlas.cern.ch ∙ Used by autofs on /cvmfs ∙ Private mount as a user possible ∙ Used by /etc/fstab or mount as ∙ One process per fuse module + root watchdog process mount -t cvmfs atlas.cern.ch /cvmfs/atlas.cern.ch ∙ Cache on local disk ∙ Cache LRU managed Diagnostics ∙ NFS Export Mode ∙ Nagios check available ∙ Hotpach functionality ∙ cvmfs_config probe cvmfs_config reload ∙ cvmfs_config chksetup ∙ cvmfs_fsck Parrot ∙ cvmfs_talk, connect to running ∙ Built in by default instance [email protected] CernVM-FS 17 / 15 Experiment Software from a File System Viewpoint Software Directory Tree atlas.cern.ch 15 ] 6 repo 10 × software 10 x86_64-gcc43 5 Files 17.1.0 File System Entries [ 17.2.0 . 1 . Statistics over 2 Years [email protected] CernVM-FS 18 / 15 Experiment Software from a File System Viewpoint Software Directory Tree atlas.cern.ch 15 ] 6 repo 10 × software 10 x86_64-gcc43 Duplicates 5 17.1.0 File System Entries [ 17.2.0 . 1 File Kernel . Statistics over 2 Years [email protected] CernVM-FS 18 / 15 Experiment Software from a File System Viewpoint Software Directory Tree atlas.cern.ch 15 ] 6 repo 10 × software 10 x86_64-gcc43 Duplicates 5 17.1.0 File System Entries [ 17.2.0 . 1 File Kernel . Statistics over 2 Years Between consecutive software versions: only ≈ 15 % new files [email protected] CernVM-FS 18 / 15 Experiment Software from a File System Viewpoint Software Directory Tree atlas.cern.ch 15 ] 6 Directories repo 10 Symlinks × software 10 x86_64-gcc43 Duplicates 17.1.0 5 File System Entries [ 17.2.0 .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages34 Page
-
File Size-