Updates on CVMFS and Container Integration

Total Page:16

File Type:pdf, Size:1020Kb

Updates on CVMFS and Container Integration Updates on CVMFS and container integration EP-SFT weekly meeting, 14 - Sept - 2020 Simone Mosciatti 1 Background on CVMFS ● FUSE filesystem targeted to software distribution ● Works over HTTP ● Lazy pull of files based on actual file-system calls ● Great bandwidth efficiency ● Aggressive use of caches + CAS for improving latency ● The file-system can grow indefinitely in size keeping good performances as long as the subcatalogs are managed correctly. ● Widely used and deployed inside CERN, and great interest also outside (EESI, Microsoft, HFT firms, other scientific experiments). 2 Background on containers ● Use linux namespace to create isolated environment where to run computation ● Distribute the container root file-system as hash-verified tarballs ● In a nutshell a. Create the root filesystem by stacking the content of hash-verified tarballs on top of each other b. Create an isolated environment using namespaces c. Run (reproducible) computation ● Different implementations (docker, singularity, podman, k8s-crio). ● Underneath they use all the same pieces a. containerd b. containers/storage ● Moving towards rootless implementations, simplifying deployment on GRID, data centers and supercomputer where CVMFS is already present. 3 How containers can be consumed (1) Unpacked file-system a. The whole file-system of the container is provided as a directory b. The container runtime pick up the directory filesystem and creates all the infrastructure c. Works out of the box with Singularity d. Works with podman with some tricks (specifying the runtime engine) singularity exec /cvmfs/unpacked.cern.ch/registry.hub.docker.com/atlas/athena:21.0.77/ All the file-system of the container is already in the directory 4 How containers can be consumed (2) Layers a. The image is describe in a json manifest file b. The runtime fetches the layers as tarball (or use the one from cache) c. The runtime generates the whole file-system stacking layers one on top of each other using overlayfs or similar unionfs technologies docker run atlas/athena:21 The step `b` is usually a problem, since it requires a lot of time and consume a lot of bandwidth, especially with large images or when spawning several images. 5 Why merging the two technologies ● Very elegant model, taking the best from the two technologies. Efficient distribution from CVMFS and resource isolation from containers. ● Users are accustomed to container technology 6 State of the art ● Distribution of container root file-system by pointing to a directory (such as /cvmfs/unpacked.cern.ch/registry.hub.docker.com/atlas/athena:21.0.77/) ○ Mostly used for singularity ○ Possible to use with podman ● Distribution of unpacked layers ○ Used with docker with graph-driver plugin ● Automatic creation and management of the unpacked container image content on CVMFS with DUCC ○ /cvmfs/unpacked.cern.ch ○ DUCC, golang application that interfaces directly with the CVMFS publisher / Stratum-0 ○ It takes care of translating the high-level concept of container ingestion into low level file-system manipulation 7 Recent advancements (DUCC) 1. Improve throughput and latency of DUCC a. More parallelization b. Faster check of images already in the filesystem (from several minutes to less than 30 seconds) c. Reasonable performance for a fresh installation 250 images in ~24 hours 8 Recent advancements (containerd) 2. Creation of remote snapshotter for containerd ● Allows containerd based containers to start-up using layers directly from CVMFS ○ docker, kubernetes ● It is not necessary to download the layers from a central service ● Provides saving in bandwidth ● Promising preliminary tests, still need complete test cycles, in contact with IT Standard docker images (no thin images) can now be used out of the box. containerd, if correctly set, will use the unpacked layers store in CVMFS to create the union file-system. Only files actually necessary will be downloaded from the network, not all the files. 9 Recent advancements (containers/storage) 3. Generation of container image meta-data as part of DUCC image conversion (GSoC) ● Allows containers/storage based containers to start-up using layers directly from CVMFS ○ podman, k8s cri-o (need testing) ● It is not necessary to download the layers from a central service ● Provides saving in bandwidth ● Good preliminary tests, found some minor bugs The set of cached layers can now be hosted in CVMFS. When the runtime checks if a layer is in local storage, it checks also in CVMFS. All the layers found in CVMFS are not downloaded from the network. 10 Roadmap for the next 6 months 1. Wider tests on the containerd remote snapshotter 2. Iron out bugs of the `containers/storage` file-system implementation and merge it in DUCC 3. Implement a “docker registry shim”. 4. Store each step of a container file-system (the “chain”) in CVMFS using the new (2.8 unreleased) template transactions 11 Active development: docker-registry shim (1) ● The asynchronous nature of DUCC is an issue ○ User requires an image to be pushed in unpacked.cern.ch ○ The request is, eventually, satisfied ○ It usually takes 10 minutes, sometimes it may take much longer when the repository is busy ○ Only way to check is to look into /cvmfs/unpacked.cern.ch ● Users would prefer to push images to a docker registry and to know that when the push finishes the image is in unpacked.cern.ch ○ Preliminary investigation suggest that this is possible with the docker registry API ○ The real implementation will requires major adjustment in the DUCC code. ○ Requires coordination with operations ● End of a CI/CD pipeline, the image is push into a registry and it automatically appears in unpacked.cern.ch ● Fit well with the change in pricing recently announced by Docker Inc. 12 Active development: docker-registry shim (2) ● Will allow nicer integration with the containers ecosystem ○ We will “speak the same language” ● Will allow to integrate with Harbor (IT unofficially blessed cloud registry registry.cern.ch) and Gitlab docker registry Working principles ● When the user push a layer, before to accept it, the layer can be ingested in CVMFS ● When the user push a manifest, before to accept it, we can create all the supporting structure for the image 13 Active development: storing all the stages of the container filesystem for fast ingestion of derived containers ● Several containers are based on standard images, e.g. `FROM centos:centos7` ● When creating the unpacked flat root file-system - e.g. for use with singularity - a lot of work is repeated to ingest all the files from the base image every time ○ The files are eventually deduplicated but we pay a price in ingestion/conversion time ● A new feature called “template transactions” will allow to avoid all this repeated work and ingest only the files of the so-far unseen layers. ○ cvmfs_server transaction repo.ch/foo/:bar/ ○ Creates a new transaction in which the content of the whole foo directory is already in the bar directory. ● Need to be integrated with DUCC 14 Active development: storing all the stages of the container filesystem for fast ingestion of derived containers (1) FROM centos:centos7 sha256:00001 RUN yum install python3 sha256:00002 ADD analysis1.py sha256:00003 ● At the moment we only store the last ring of the chain (sha256:00003) ● If another image based on centos7, that installs python3 need to be installed, all the files need to be ingested again. ● This is a quite common scenario, the building pieces are similar and the thin layer of user code at the top changes frequently. 15 Active development: storing all the stages of the container filesystem for fast ingestion of derived containers (2) FROM centos:centos7 sha256:00001 RUN yum install python3 sha256:00002 ADD better_analysis.py sha256:00042 ● With the new template transaction, we want to store all the ring of the chain (sha256:00001, sha256:00002 and sha256:00003) ● When a new images comes along, we can build on top of rings already ingested in CVMFS. ● In this case we could ingest just the last layer, since sha256:00002 is already 16 in the repository. Long term ideas (1) - CERN / HEP wide registry + unpacked With the docker shim it would be possible to operate a CERN or HEP wide container registry together with a CVMFS repository as a canonical home for all the experiment containers. Moreover, it would expose the docker registry API to allow for deletion of old images. Together with the rootless containers, user would then be more easily able to: 1. Create and test their own containers in the local environment 2. Push the container in unpacked 3. Run the same computation on the GRID or on lxplus 4. Store the exact same environment for software preservation 17 Long term ideas (2) - Predictive cache While the bandwidth saving are great, start-up latency can still be problematic, especially for interactive use cases. Predictive cache based on subcatalogs may fit very well and helps a lot. Containers have a well defined set of “hot files”, all the files that are necessary to run the ENTRYPOINT. The “hot set” will fit very well this cache. Proposed summer student project, unfortunately it didn’t work out due to COVID. 18 Recap ● Soon we will be able to start images on unpacked with podman (need configuration) ● containerd is able to pick up layers from unpacked (need plugin installed) ● Working on synchronous docker registry shim ● Working on fast images ingestion through chains 19 Questions? 20.
Recommended publications
  • CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N
    CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N. Gutierrez Center for Education and Research Information Assurance and Security Purdue University, West Lafayette, IN 47907-2086 DECEPTIVE MEMORY SYSTEMS ADissertation Submitted to the Faculty of Purdue University by Christopher N. Gutierrez In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2017 Purdue University West Lafayette, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF DISSERTATION APPROVAL Dr. Eugene H. Spa↵ord, Co-Chair Department of Computer Science Dr. Saurabh Bagchi, Co-Chair Department of Computer Science Dr. Dongyan Xu Department of Computer Science Dr. Mathias Payer Department of Computer Science Approved by: Dr. Voicu Popescu by Dr. William J. Gorman Head of the Graduate Program iii This work is dedicated to my wife, Gina. Thank you for all of your love and support. The moon awaits us. iv ACKNOWLEDGMENTS Iwould liketothank ProfessorsEugeneSpa↵ord and SaurabhBagchi for their guidance, support, and advice throughout my time at Purdue. Both have been instru­ mental in my development as a computer scientist, and I am forever grateful. I would also like to thank the Center for Education and Research in Information Assurance and Security (CERIAS) for fostering a multidisciplinary security culture in which I had the privilege to be part of. Special thanks to Adam Hammer and Ronald Cas­ tongia for their technical support and Thomas Yurek for his programming assistance for the experimental evaluation. I am grateful for the valuable feedback provided by the members of my thesis committee, Professor Dongyen Xu, and Professor Math­ ias Payer.
    [Show full text]
  • HTTP-FUSE Xenoppix
    HTTP-FUSE Xenoppix Kuniyasu Suzaki† Toshiki Yagi† Kengo Iijima† Kenji Kitagawa†† Shuichi Tashiro††† National Institute of Advanced Industrial Science and Technology† Alpha Systems Inc.†† Information-Technology Promotion Agency, Japan††† {k.suzaki,yagi-toshiki,k-iijima}@aist.go.jp [email protected], [email protected] Abstract a CD-ROM. Furthermore it requires remaking the entire CD-ROM when a bit of data is up- dated. The other solution is a Virtual Machine We developed “HTTP-FUSE Xenoppix” which which enables us to install many OSes and ap- boots Linux, Plan9, and NetBSD on Virtual plications easily. However, that requires in- Machine Monitor “Xen” with a small bootable stalling virtual machine software. (6.5MB) CD-ROM. The bootable CD-ROM in- cludes boot loader, kernel, and miniroot only We have developed “Xenoppix” [1], which and most part of files are obtained via Internet is a combination of CD/DVD bootable Linux with network loopback device HTTP-FUSE “KNOPPIX” [2] and Virtual Machine Monitor CLOOP. It is made from cloop (Compressed “Xen” [3, 4]. Xenoppix boots Linux (KNOP- Loopback block device) and FUSE (Filesys- PIX) as Host OS and NetBSD or Plan9 as Guest tem USErspace). HTTP-FUSE CLOOP can re- OS with a bootable DVD only. KNOPPIX construct a block device from many small block is advanced in automatic device detection and files of HTTP servers. In this paper we describe driver integration. It prepares the Xen environ- the detail of the implementation and its perfor- ment and Guest OSes don’t need to worry about mance. lack of device drivers.
    [Show full text]
  • DMFS - a Data Migration File System for Netbsd
    DMFS - A Data Migration File System for NetBSD William Studenmund Veridian MRJ Technology Solutions NASAAmes Research Center" Abstract It was designed to support the mass storage systems de- ployed here at NAS under the NAStore 2 system. That system supported a total of twenty StorageTek NearLine ! have recently developed DMFS, a Data Migration File tape silos at two locations, each with up to four tape System, for NetBSD[I]. This file system provides ker- drives each. Each silo contained upwards of 5000 tapes, nel support for the data migration system being devel- and had robotic pass-throughs to adjoining silos. oped by my research group at NASA/Ames. The file system utilizes an underlying file store to provide the file The volman system is designed using a client-server backing, and coordinates user and system access to the model, and consists of three main components: the vol- files. It stores its internal metadata in a flat file, which man master, possibly multiple volman servers, and vol- resides on a separate file system. This paper will first man clients. The volman servers connect to each tape describe our data migration system to provide a context silo, mount and unmount tapes at the direction of the for DMFS, then it will describe DMFS. It also will de- volman master, and provide tape services to clients. The scribe the changes to NetBSD needed to make DMFS volman master maintains a database of known tapes and work. Then it will give an overview of the file archival locations, and directs the tape servers to move and mount and restoration procedures, and describe how some typi- tapes to service client requests.
    [Show full text]
  • Unionfs: User- and Community-Oriented Development of a Unification File System
    Unionfs: User- and Community-Oriented Development of a Unification File System David Quigley, Josef Sipek, Charles P. Wright, and Erez Zadok Stony Brook University {dquigley,jsipek,cwright,ezk}@cs.sunysb.edu Abstract If a file exists in multiple branches, the user sees only the copy in the higher-priority branch. Unionfs allows some branches to be read-only, Unionfs is a stackable file system that virtually but as long as the highest-priority branch is merges a set of directories (called branches) read-write, Unionfs uses copy-on-write seman- into a single logical view. Each branch is as- tics to provide an illusion that all branches are signed a priority and may be either read-only writable. This feature allows Live-CD develop- or read-write. When the highest priority branch ers to give their users a writable system based is writable, Unionfs provides copy-on-write se- on read-only media. mantics for read-only branches. These copy- on-write semantics have lead to widespread There are many uses for namespace unifica- use of Unionfs by LiveCD projects including tion. The two most common uses are Live- Knoppix and SLAX. In this paper we describe CDs and diskless/NFS-root clients. On Live- our experiences distributing and maintaining CDs, by definition, the data is stored on a read- an out-of-kernel module since November 2004. only medium. However, it is very convenient As of March 2006 Unionfs has been down- for users to be able to modify the data. Uni- loaded by over 6,700 unique users and is used fying the read-only CD with a writable RAM by over two dozen other projects.
    [Show full text]
  • We Get Letters Sept/Oct 2018
    SEE TEXT ONLY WeGetletters by Michael W Lucas letters@ freebsdjournal.org tmpfs, or be careful to monitor tmpfs space use. Hey, FJ Letters Dude, Not that you’ll configure your monitoring system Which filesystem should I use? to watch tmpfs, because it’s temporary. And no matter what, one day you’ll forget —FreeBSD Newbie that you used memory space as a filesystem. You’ll stash something vital in that temporary space, then reboot. And get really annoyed Dear FreeBSD Newbie, when that vital data vanishes into the ether. First off, welcome to FreeBSD. The wider com- Some other filesystems aren’t actively terrible. munity is glad to help you. The device filesystem devfs(5) provides device Second, please let me know who told you to nodes. Filesystems that can’t store user data are start off by writing me. I need to properly… the best filesystems. But then some clever sysad- “thank” them. min decides to hack on /etc/devfs.rules to Filesystems? Sure, let’s talk filesystems. change the standard device nodes for their spe- Discussing which filesystem is the worst is like cial application, or /etc/devd.conf to create or debating the merits of two-handed swords as reconfigure device nodes, and the whole system compared to lumberjack-grade chainsaws and goes down the tubes. industrial tulip presses. While every one of them Speaking of clever sysadmins, now and then has perfectly legitimate uses, in the hands of the people decide that they want to optimize disk novice they’re far more likely to maim everyone space or cut down how many copies of a file involved.
    [Show full text]
  • LSF ’07: 2007 Linux Storage & Filesystem Workshop Namically Adjusted According to the Current Popularity of Its Hot Zone
    June07login1summaries_press.qxd:login summaries 5/27/07 10:27 AM Page 84 PRO: A Popularity-Based Multi-Threaded Reconstruction overhead of PRO is O(n), although if a priority queue is Optimization for RAID-Structured Storage Systems used in the PRO algorithm the computation overhead can Lei Tian and Dan Feng, Huazhong University of Science and be reduced to O(log n). The entire PRO implementation in Technology; Hong Jiang, University of Nebraska—Lincoln; Ke the RAIDFrame software only added 686 lines of code. Zhou, Lingfang Zeng, Jianxi Chen, and Zhikun Wang, Hua- Work on PRO is ongoing. Future work includes optimiz - zhong University of Science and Technology and Wuhan ing the time slice, scheduling strategies, and hot zone National Laboratory for Optoelectronics; Zhenlei Song, length. Currently, PRO is being ported into the Linux soft - Huazhong University of Science and Technology ware RAID. Finally, the authors plan on further investigat - ing use of access patterns to help predict user accesses and Hong Jiang began his talk by discussing the importance of of filesystem semantic knowledge to explore accurate re - data recovery. Disk failures have become more common in construction. RAID-structured storage systems. The improvement in disk capacity has far outpaced improvements in disk band - The first questioner asked about the average rate of recov - width, lengthening the overall RAID recovery time. Also, ery for PRO. Hong answered that the average reconstruc - disk drive reliability has improved slowly, resulting in a tion time is several hundred seconds in the experimental very high overall failure rate in a large-scale RAID storage setup.
    [Show full text]
  • Modular Data Storage with Anvil
    Modular Data Storage with Anvil Mike Mammarella Shant Hovsepian Eddie Kohler UCLA UCLA UCLA/Meraki [email protected] [email protected] [email protected] http://www.read.cs.ucla.edu/anvil/ ABSTRACT age strategies and behaviors. We intend Anvil configura- Databases have achieved orders-of-magnitude performance tions to serve as single-machine back-end storage layers for improvements by changing the layout of stored data – for databases and other structured data management systems. instance, by arranging data in columns or compressing it be- The basic Anvil abstraction is the dTable, an abstract key- fore storage. These improvements have been implemented value store. Some dTables communicate directly with sta- in monolithic new engines, however, making it difficult to ble storage, while others layer above storage dTables, trans- experiment with feature combinations or extensions. We forming their contents. dTables can represent row stores present Anvil, a modular and extensible toolkit for build- and column stores, but their fine-grained modularity of- ing database back ends. Anvil’s storage modules, called dTa- fers database designers more possibilities. For example, a bles, have much finer granularity than prior work. For ex- typical Anvil configuration splits a single “table” into sev- ample, some dTables specialize in writing data, while oth- eral distinct dTables, including a log to absorb writes and ers provide optimized read-only formats. This specialization read-optimized structures to satisfy uncached queries. This makes both kinds of dTable simple to write and understand. split introduces opportunities for clean extensibility – for Unifying dTables implement more comprehensive function- example, we present a Bloom filter dTable that can slot ality by layering over other dTables – for instance, building a above read-optimized stores and improve the performance read/write store from read-only tables and a writable journal, of nonexistent key lookup.
    [Show full text]
  • Matt Ahrens [email protected] Brian Behlendorf Behlendorf1@Llnl
    LinuxCon 2013 September 17, 2013 Brian Behlendorf, Open ZFS on Linux LLNL-PRES-643675 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC High Performance Computing ● Advanced Simulation – Massive Scale – Data Intensive ● Top 500 – #3 Sequoia ● 20.1 Peak PFLOP/s ● 1,572,864 cores ● 55 PB of storage at 850 GB/s – #8 Vulcan ● 5.0 Peak PFLOPS/s ● 393,216 cores ● 6.7 PB of storage at 106 GB/s ● ● World class computing resources 2 Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx Linux Clusters ● First Linux cluster deployed in 2001 ● Near-commodity hardware ● Open Source ● Clustered HiGh Availability OperatinG System (CHAOS) – Modified RHEL Kernel – New packaGes – monitorinG, power/console, compilers, etc – Lustre Parallel Filesystem – ZFS Filesystem LLNL Loves Linux 3 Lawrence Livermore National Laboratory LLNL-PRES-643675 Lustre Filesystem ● Massively parallel distributed filesystem ● Lustre servers use a modified ext4 – Stable and fast, but... – No sCalability – No data integrity – No online manageability ● Something better was needed – Use XFS, BTRFS, etC? – Write a filesystem from sCratCh? – Port ZFS to Linux? Existing Linux filesystems do not meet our requirements 4 Lawrence Livermore National Laboratory LLNL-PRES-643675 ZFS on Linux History ● 2008 – Prototype to determine viability ● 2009 – Initial ZVOL and Lustre support ● 2010 – Development moved to Github ● 2011 – POSIX layer
    [Show full text]
  • Freebsd Enterprise Storage Polish BSD User Group Welcome 2020/02/11 Freebsd Enterprise Storage
    FreeBSD Enterprise Storage Polish BSD User Group Welcome 2020/02/11 FreeBSD Enterprise Storage Sławomir Wojciech Wojtczak [email protected] vermaden.wordpress.com twitter.com/vermaden bsd.network/@vermaden https://is.gd/bsdstg FreeBSD Enterprise Storage Polish BSD User Group What is !nterprise" 2020/02/11 What is Enterprise Storage? The wikipedia.org/wiki/enterprise_storage page tells nothing about enterprise. Actually just redirects to wikipedia.org/wiki/data_storage page. The other wikipedia.org/wiki/computer_data_storage page also does the same. The wikipedia.org/wiki/enterprise is just meta page with lin s. FreeBSD Enterprise Storage Polish BSD User Group What is !nterprise" 2020/02/11 Common Charasteristics o Enterprise Storage ● Category that includes ser$ices/products designed &or !arge organizations. ● Can handle !arge "o!umes o data and !arge num%ers o sim#!tano#s users. ● 'n$olves centra!ized storage repositories such as SA( or NAS de$ices. ● )equires more time and experience%expertise to set up and operate. ● Generally costs more than consumer or small business storage de$ices. ● Generally o&&ers higher re!ia%i!it'%a"aila%i!it'%sca!a%i!it'. FreeBSD Enterprise Storage Polish BSD User Group What is !nterprise" 2020/02/11 EnterpriCe or EnterpriSe? DuckDuckGo does not pro$ide search results count +, Goog!e search &or enterprice word gi$es ~ 1 )00 000 results. Goog!e search &or enterprise word gi$es ~ 1 000 000 000 results ,1000 times more). ● /ost dictionaries &or enterprice word sends you to enterprise term. ● Given the *+,CE o& many enterprise solutions it could be enterPRICE 0 ● 0 or enterpri$e as well +.
    [Show full text]
  • Union Mounts
    Union mounts http://valerieaurora.org/union/ Valerie Aurora Red Hat, Inc. <[email protected]> Overview ● The problem ● Existing solutions ● Union mounts ● Why not unionfs? ● Limitations of unionfs ● What you can do ● How to take over the world with union mounts and Puppet What's the problem? ● Lots of nearly identical root file systems ● Virtual machines ● Any cluster ● LiveCD ● Could we share some of these? Ways to share file systems ● Copy-on-write (COW) block devices ● NFS-exported root file system ● Other network/cluster file systems ● Snapshots Why do we need union mounts? ● COW block device inefficient, divergent ● NFS server gets overloaded ● Cluster file system is needlessly heavyweight ● Snapshots are limited to a single system Solution: file-level sharing ● Take one read-only file system (shared) ● Add one local writable file system ● Lookups “fall through” to lower read-only fs ● Writes go to topmost writable fs ● Writes to lower fs trigger copy-up to topmost fs Sounds like... unionfs ● Unionfs and aufs panic the kernel ● Unionfs architecture is fundamentally broken ● Al Viro's canonical explanation here: http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0839.html ● University research project ● Unmerged for 7 years and counting Why union mounts is different ● Implemented in the VFS ● Designed with help of VFS maintainers ● Many fewer features Limitations of union mounts ● open(O_RDONLY) gets you a different file from open (O_RDWR) ● fchmod()/fchown()/futimensat() on fd don't work ● Only one writable layer ● Requires on-disk format changes to topmost fs What you can do ● Use open(O_RDONLY) ● Don't use fchmod()/fchown()/futimensat() on fd ● Don't access same file from multiple fds ● Let your engineers work on union mounts ● Review ● Test ● http://valerieaurora.org/union/ Puppet + Union mounts = World Domination Thank you Jan Blunck Miklos Szeredi Felix Fietkau Al Viro Erez Zadok Christoph Hellwig J.
    [Show full text]
  • Kernel Korner
    Kernel Korner http://0-delivery.acm.org.innopac.lib.ryerson.ca/10.1145/1050000/10449... Kernel Korner Unionfs: Bringing Filesystems Together Charles P. Wright Erez Zadok Abstract Unionfs merges several directories into a single unified view. We describe applications of Unionfs and also interesting implementation aspects. For ease of management, it can be useful to keep related but different sets of files in separate locations. Users, however, often prefer to see these related files together. In this situation, unioning allows administrators to keep such files separate physically, but to merge them logically into a single view. A collection of merged directories is called a union, and each physical directory is called a branch. As shown in Figure 1, Unionfs simultaneously layers on top of several filesystems or on different directories within the same filesystem. This layering technique is known as stacking (see the on-line Resources for more on stacking). Unionfs presents a filesystem interface to the kernel, and in turn Unionfs presents itself as the kernel's VFS to the filesystems on which it stacks. Because Unionfs presents a filesystem view to the kernel, it can be employed by any user-level application or from the kernel by the NFS server. Because Unionfs intercepts operations bound for lower-level filesystems, it can modify operations to present the unified view. Unlike earlier stackable filesystems, Unionfs is a true fan-out filesystem; it can access many underlying branches directly. Figure 1. A union consists of several underlying branches, which can be of any filesystem type. Unionfs Semantics and Usage In Unionfs, each branch is assigned a precedence.
    [Show full text]
  • Current Events in Container Storage
    Current Events in Container Storage Keith Hudgins Docker 2019 Storage Developer Conference. © Docker, Inc. All Rights Reserved. 1 Container Community Orgs ▪ Cloud Native Computing Foundation (CNCF) ▪ https://www.cncf.io/ ▪ Open Container Initiative (OCI) ▪ https://www.opencontainers.org/ ▪ Both are part of the Linux Foundation ▪ https://www.linuxfoundation.org/ 2019 Storage Developer Conference. © Docker, Inc. All Rights Reserved. 2 Container Runtimes ▪ Docker ▪ Default for most container installs, widest user base ▪ 20 million+ Docker Community Engine installs alone ▪ Containerd ▪ Fully graduated CNCF project (as of Feb 2019) ▪ Windows and Linux container runtime ▪ Upstream of Docker (for Linux, anyway) ▪ CoreOS (rkt) ▪ CoreOS acquired by RedHat May 2018 ▪ rkt archived by CNCF as of 8/16/2019 ▪ CRI-O ▪ Default runtime for RH OpenShift as of 4.0, June 2019 ▪ Intended to be Kubernetes-native runtime ▪ Fully open-source ▪ Upstream of these projects* are CNCF efforts. *Docker has non-CNCF upstream 3 2019 Storage Developer Conference. © Docker, Inc. All Rights Reserved. Components as well Container Storage Lifecycle 2019 Storage Developer Conference. © Docker, Inc. All Rights Reserved. 4 Container Storage Lifecycle (cont) ▪ Containers use storage in 3 primary contexts: ▪ Raw container image ▪ At rest, either as a file on disk or object storage ▪ At runtime ▪ One of several graph drivers ▪ These are NOT persistent ▪ Persistent storage ▪ Volumes attached to containers for persistent data ▪ Most of this talk will focus on this type of storage
    [Show full text]