“Application - File System” Divide with Promises

Bridging the “Application - File System” divide with promises Raja Bala Computer Sciences Department University of Wisconsin, Madison, WI [email protected] Abstract that hook into the file system and the belief that the underlying file system is the best judge File systems today implement a limited set of when it comes to operations with files. Unfor- abstractions and semantics wherein applications tunately, the latter isn’t true, since applications don’t really have much of a say. The generality know more about their behavior and what they of these abstractions tends to curb the application need or do not need from the file system. Cur- performance. In the global world we live in, it seems rently, there is no real mechanism that allows reasonable that applications are treated as first-class the applications to communicate this informa- citizens by the file system layer. tion to the file system and thus have some degree In this project, we take a first step towards that goal of control over the file system functionality. by leveraging promises that applications make to the file system. The promises are then utilized to deliver For example, an application that never ap- a better-tuned and more application-oriented file pends to any of the files it creates has no means system. A very simple promise, called unique-create of conveying this information to the file sys- was implemented, wherein the application vows tem. Most file systems inherently assume that never to create a file with an existing name (in a it is good to preallocate extra blocks to a file, directory) which is then used by the file system so that when it expands, the preallocated blocks to speedup creation time. The application com- can be used. This is a simple optimization based municates this promise to the virtual file system on the realization that a sequential read is a lot (VFS) layer using its process state. VFS then skips faster than one interspersed with random seeks. the directory cache and real filesystem lookups, For the aforementioned application, this would resulting in around 10% performance improvement translate to a bunch of empty blocks that won’t in file creation time. be used unless there is a shortage of blocks. Such fine grained control is completely absent in the current file system implementation. 1 Introduction It is interesting to compare the hourglass de- The file system layer is a black box of limited sign of the traditional file system in Linux/U- abstractions in the eyes of applications. This nix to that of the networking stack. In the lat- is due to the limited number of system calls ter, the Internet Protocol layer (IP) forms the 1 neck of the hourglass and enables communica- 2 Related Work tion between different link layer and transport Patterson et al [1] utilize the disclosure of ap- layer protocols as long as they use IP datagrams. plication knowledge of future accesses to en- In storage systems, the POSIX file system API able informed prefetching and caching. Dis- serves a similar purpose of allowing transparent closing hints are issued via I/O control (ioctl) communication between applications and stor- calls. Their work focused primarily on read re- age systems. There is a difference however, in quests and utilizes the application’s knowledge that IP has optional fields that can be leveraged for proactive resource management and control by the other layers to exchange information that over caching policies. They found that prefetch- wasn’t thought of during the design. This has ing and caching based on the application level actually been utilized by transport protocols to information reduced file access read latency in provide hints to lower layers in wireless envi- both local and network file systems. The hints ronments [5][6]. The virtual file system design may or may not be used by the file system, and on the other hand does not have any extensible hence, are more like guidelines, which differs mechanism for communication between the ap- from promises, which are always checked by the plication layer and the low level file systems. file system. In this paper, the notion of a promise is Steere [2] proposes dynamic sets as an operat- introduced as the extensible vehicle of com- ing system abstraction to address the problem of munication between these layers.. The local I/O latency by exposing the application’s non- knowledge that an application possesses is determinism and future data needs to the sys- called a promise and it is passed onto the kernel tem, which then exploits this new information as part of the process state. The file system to reduce latency by improving scheduling and implementation has been modified to check for ordering of the I/O accesses.Cao et al [4] fo- one particular promise, unique-create, and uti- cus on applications with large data sets and al- lize the presence of this promise to speed up file low them to express control over cache replace- creation time. Unique-create essentially tells ment. The application controlled replacement the file system that the application promises decisions combined with their LRU-SP (Least never to create a file with an existing name in Recently Used with Swapping and Placehold- that directory. ers) with kernel allocation policy reduces the number of disk I/O significantly. The rest of the paper is organized as follows. These works indicate the importance of us- Section 2 discusses some of the related work in ing the application’s knowledge to dictate some passing information from higher layers to the of the file system policies, but they’ve almost file system. Section 3 describes underlying as- always seemed to focus on read related issues. sumptions in promises and gives a few examples The application can leverage promises for both of useful promises. Section 4 describes the the read and write related optimization. The unique- unique-create promise in more detail, while Sec- create promise discussed in Section 4 is an ex- tion 5 discusses the implementation of unique- ample of a write optimization. create, the tests used to evaluate file creation with and without the promise and the results ob- served. Section 5 serves as the summary and conclusion of this work. 2 3 Underlying assumptions 4 The Unique-Create Promise So far, it has been assumed that the promises To assess the viability of promises, a simple are known before-hand. In reality, promises promise called “Unique-Create” was chosen. must be inferred from the applications, either by When an application makes this promise to the means of their very design and implementation, file system, it pledges that it will never create or by using formal analysis over the code or a file with an existing name in a directory’s by observing the arguments in the file system namespace. This promise is limited to the calls. This paper does not attempt to answer the particular case of the open system call wherein bootstrapping problem of identifying promises. both the flags O CREAT and O EXCL set. It If the application doesn’t keep its promise, then returns an error if the file already exists and cre- all bets are off. The behavior in such a situation ates the file otherwise. Now, putting this in the is a consequence of the implementation of the context of the unique-create promise, it means promise. It is the file systems duty to check that open( ) would never return an error, since for the promises an application makes. This, in the application promises the non-existence of turn, means that only the file system implemen- the file. To understand why this promise could tation is modified to handle the promise, and be useful, it is important to know the filename thus requires no change in the application code, lookup and create mechanism in Linux. which makes promises a practical solution. Figure 1 shows a high-level view of the Linux Some promises that could be of potential file system architecture. The user space con- use are: tains the applications and the GNU C Library i) The application could tell the file system that provides the interface for the file system about the importance of a file by the number of calls open, read, write, close. The system call times it reads or writes to it, to influence some interface acts as a switch and funnels system of the caching policies. Most of the related calls from the user space to the appropriate work focused on similar issues. end points in the kernel space. In Linux and ii) The application might want to disable jour- Unix based operating systems, the VFS is the naling for writes to some files it doesn’t care primary interface to the underlying file systems. about to speed up performance. It exports a common set of interfaces and iii) In distributed file systems, a lot of the abstracts them to the individual file systems permission checks occur at the master, before (such as ext3, ext4, btrfs, etc). There are two being routed to the chunk servers. Thus, the use caches for the file system objects at the VFS of low-level permission checks might not be level; the inode cache and the dentry (directory that important and an unnecessary operation. entry) cache. iv) In GFS, every chunk is limited to 64 megabytes. It might be interesting to see if this information could simplify the block allocation When a file name lookup happens, VFS uses logic and speedup writes. the file name and its parent directory and looks into the directory cache to check if the entry already exists.

“Application - File System” Divide with Promises

11.7 the Windows 2000 File System

Verifying a High-Performance Crash-Safe File System Using a Tree Specification

Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques

Operating Systems File Systems

Zfs-Ascalabledistributedfilesystemusingobjectdisks

File Systems

Orion File System : File-Level Host-Based Virtualization

Data Storage on Unix

Beegfs Unofficial Documentation

Windows OS File Systems

Sibylfs: Formal Specification and Oracle-Based Testing for POSIX and Real-World File Systems

User-Level Remote Data Access in Overlay Metacomputers