A Versatile Persistent Caching Framework for File Systems Gopalan Sivathanu and Erez Zadok Stony Brook University

A Versatile Persistent Caching Framework for File Systems Gopalan Sivathanu and Erez Zadok Stony Brook University Technical Report FSL-05-05 Abstract IDE RAID array that has an Ext2 file system can cache the recently-accessed data into a smaller but faster SCSI We propose and evaluate an approach for decoupling disk to improve performance. The same could be done persistent-cache management from general file system for a local disk; it can be cached to a faster flash drive. design. Several distributed file systems maintain a per- The second problem with the persistent caching sistent cache of data to speed up accesses. Most of these mechanisms of present distributed file systems is that file systems retain complete control over various aspects they have a separate name space for the cache. Hav- of cache management, such as granularity of caching, ing the persistent cache directory structure as an exact and policies for cache placement and eviction. Hard- replica of the source file system is useful even when coding cache management into the file system often re- xCachefs is not mounted. For example, if the cache has sults in sub-optimal performance as the clients of the file the same structure as the source, disconnected reads are system are prevented from exploiting information about possible directly from the cache, as in Coda [2]. their workload in order to tune cache management. In this paper we present a decoupled caching mech- We introduce xCachefs, a framework that allows anism called xCachefs. With xCachefs, data can be clients to transparently augment the cache management cached from any file system to a faster file system. of the file system and customize the caching policy xCachefs provides several options that can be setup based on their resources and workload. xCachefs can be based on user workloads to maximize performance. We used to cache data persistently from any slow file system implemented a prototype of xCachefs for Linux and to a faster file system. It mounts over two underlying file evaluated its performance. xCachefs yielded a perfor- systems, which can be local disk file systems like Ext2 mance enhancement of 64% and 20% for normal read or remote file systems like NFS. xCachefs maintains the and read-write workloads respectively over NFS. same directory structure as in the source file system, so that disconnected reads are possible when the source file 2 Background system is down. Persistent caching in file systems is not a new idea. The Solaris CacheFS [5] is capable of caching files from 1 Introduction remote file systems to a local disk directory. Solaris Caching is a common way of improving I/O perfor- CacheFS is tailored to work with NFS as the source mance. Aside from non-persistent caching, caching of file system, though it can be used for other file sys- file data from slow, possibly remote, file systems onto tems as well. CacheFS does not maintain the direc- faster file systems can significantly increase the perfor- tory structure of the source file system in its code. It mance of slow file systems. Also, the cache remains stores cache data in the form of a database. The An- persistent across reboots. The main problem with most drew File System [1], a popular distributed file system, of today's persistent caching mechanisms in file systems performs persistent caching of file data. AFS clients like AFS [1] is that the scheme is mostly hardcoded into keep pieces of commonly-used files on local disk. The the file systems such that they provide generic caching caching mechanism used by AFS is integrated to the file policies for a variety of requirements. These generic system itself and it cannot be used with any other file caching policies often end up being either sub-optimal system. Coda [2] performs client-side persistent caching or unsuitable for clients with specific needs and condi- to enable disconnected operations. The central idea be- tions. For example, in a system where the read pattern hind Coda is that caching of data, widely used for per- is sequential for the entire file, caching it in full during formance, can also be exploited to improve availabil- the first read helps in improving performance rather than ity. Sprite [3] uses a simple distributed mechanism for caching each page separately. The same is not appropri- caching files among a networked collection of worksta- ate in a system where reads are random and few. Having tions. It ensures consistency of the cache even when the caching mechanism separate helps in integrating per- multiple clients access and update data simultaneously. sistent caching to any file system. For example, a large This caching mechanism is integrated into the Sprite 1 Network File System. can choose a suitable value for the size threshold. Gen- erally it is more efficient to adopt page-level caching for 3 Design large files that are not accessed sequentially. For small The main goals behind the design of xCachefs are: files that are usually read in full, whole file caching is helpful. Since page-level caching has additional over- • To develop a framework useful to cache file data heads due to management of bitmaps, performing page- from any slow file system to a faster file system. level caching for small files is less efficient. • To provide customizable options that can be set up based on the nature of the workload and access pat- Cache Structure and Reclamation. xCachefs main- terns, so as to maximize performance. tains in the cache, the same directory structure and file • To maintain the directory structure of the cache as naming as the source. Hence, all file system meta op- the exact replica of the source file system struc- erations like create, unlink, mkdir etc., are performed in ture, so that disconnected reads are possible with- an identical manner in both the cache and the source file out modifying or restarting the applications. systems. Maintaining the same directory structure in the cache has several advantages. When the source file sys- We designed xCachefs as a stackable file system that tem is not accessible due to system or network failures, mounts on top of two underlying file systems, the source disconnected reads are possible directly from the cache and the cache file systems. Stackable file systems incre- file system without modifications to the applications that mentally add new features to existing file systems [6]. are running. xCachefs has a disconnected mode of op- A stackable file system operates between the virtual file eration which can be set through an ioctl that directs system (VFS) and the lower level file system. It inter- all reads to the cache branch. cepts calls for performing special operations and even- Typically, the cache file system has less storage space tually redirects them to the lower level file systems. compared to the source branch. This is because the Figure 1 shows the overall structure of xCachefs. cache file system is expected to be in a faster media than During every read, xCachefs checks if the requested the source and hence might be expensive. For example, page is in the cache file system. On a cache miss, the source file system can be on a large IDE RAID array, xCachefs reads the page from the source file system and and the cache can be a fast SCSI disk of smaller storage. writes it to the cache. All writes are performed on both Therefore, we have a user level cleaning utility that runs the cache and source file systems. xCachefs compares periodically to delete the least recently-used files from the file meta-data during every open to check if the cache the cache file system, to free up space for new files. The is stale and if so it updates or removes the cached copy user level cleaner performs cleaning if the cache size has appropriately. Each check does not require disk I/O as grown larger than an upper threshold. It deletes files recently-accessed inodes are cached in the VFS icache. from the cache until the size is below a lower threshold. Typically the upper and lower threshold are around 90% Cache Cleaner User Processes User and 70%, respectively. Virtual File System (VFS) Implementation. We implemented a prototype of xCachefs as a Linux kernel module. xCachefs has xCachefs Kernel 10,663 lines of kernel code, of which 4,227 lines of code Meta Operations Meta operations belong to the FiST [7] generated template that we used Reads and writes Copy up Writes, reads upon cache miss Lower level file systems as our base. Cache File System Source File System Files whose sizes are smaller than the size threshold (a mount option), are copied in full to the cache Figure 1: xCachefs Structure branch whenever an open for the file is called. Page- level caching is performed in the readpage function Caching Options. xCachefs caches file data from the of the address space operations (a ops). Page bitmaps source file system either at the page-level or at the file for partially-cached files are stored in separate files. level. It has a configurable parameter called the size Directories are cached during lookup, so as to en- threshold. Files whose sizes are below the size thresh- sure that the directory structure corresponding to a file old are cached in full upon an open. Those files that are always exists in the cache branch prior to caching that larger than the size threshold are cached only one page file. Since a lookup is always done before an open, at a time as the pages are read.

A Versatile Persistent Caching Framework for File Systems Gopalan Sivathanu and Erez Zadok Stony Brook University

Implementing Nfsv4 in the Enterprise: Planning and Migration Strategies

System Administration Guide Devices and File Systems

IBM Spectrum Scale 5.1.0: Concepts, Planning, and Installation Guide Summary of Changes

Dell EMC Avamar Administration Guide

A File System Component Compiler

IBM Spectrum Scale 5.0.3: Concepts, Planning, and Installation Guide Configuration of an IBM Spectrum Scale Stretch Chapter 8

AIX 5L Differences Guide Version 5.1 Edition

FS-Cache Aware CIFS

Oracle Solaris Administration Network Services

Fist: a System for Stackable File-System Code Generation Erez

File System Virtualization and Service for Grid Data Management

(12) United States Patent (10) Patent No.: US 8.458,239 B2 Ananthanarayanan Et Al