Bridging the “Application - ” divide with promises

Raja Bala Computer Sciences Department University of Wisconsin, Madison, WI [email protected]

Abstract that hook into the file system and the belief that the underlying file system is the best judge File systems today implement a limited set of when it comes to operations with files. Unfor- abstractions and semantics wherein applications tunately, the latter isn’t true, since applications don’t really have much of a say. The generality know more about their behavior and what they of these abstractions tends to curb the application need or do not need from the file system. Cur- performance. In the global world we live in, it seems rently, there is no real mechanism that allows reasonable that applications are treated as first-class the applications to communicate this informa- citizens by the file system layer. tion to the file system and thus have some degree In this project, we take a first step towards that goal of control over the file system functionality. by leveraging promises that applications make to the file system. The promises are then utilized to deliver For example, an application that never ap- a better-tuned and more application-oriented file pends to any of the files it creates has no means system. A very simple promise, called unique-create of conveying this information to the file sys- was implemented, wherein the application vows tem. Most file systems inherently assume that never to create a file with an existing name (in a it is good to preallocate extra blocks to a file, directory) which is then used by the file system so that when it expands, the preallocated blocks to speedup creation time. The application com- can be used. This is a simple optimization based municates this promise to the virtual file system on the realization that a sequential read is a lot (VFS) layer using its process state. VFS then skips faster than one interspersed with random seeks. the directory cache and real filesystem lookups, For the aforementioned application, this would resulting in around 10% performance improvement translate to a bunch of empty blocks that won’t in file creation time. be used unless there is a shortage of blocks. Such fine grained control is completely absent in the current file system implementation. 1 Introduction It is interesting to compare the hourglass de- The file system layer is a black box of limited sign of the traditional file system in /U- abstractions in the eyes of applications. This nix to that of the networking stack. In the lat- is due to the limited number of system calls ter, the Internet Protocol layer (IP) forms the

1 neck of the hourglass and enables communica- 2 Related Work tion between different link layer and transport Patterson et al [1] utilize the disclosure of ap- layer protocols as long as they use IP datagrams. plication knowledge of future accesses to en- In storage systems, the POSIX file system API able informed prefetching and caching. Dis- serves a similar purpose of allowing transparent closing hints are issued via I/O control (ioctl) communication between applications and stor- calls. Their work focused primarily on read re- age systems. There is a difference however, in quests and utilizes the application’s knowledge that IP has optional fields that can be leveraged for proactive resource management and control by the other layers to exchange information that over caching policies. They found that prefetch- wasn’t thought of during the design. This has ing and caching based on the application level actually been utilized by transport protocols to information reduced file access read latency in provide hints to lower layers in wireless envi- both local and network file systems. The hints ronments [5][6]. The virtual file system design may or may not be used by the file system, and on the other hand does not have any extensible hence, are more like guidelines, which differs mechanism for communication between the ap- from promises, which are always checked by the plication layer and the low level file systems. file system. In this paper, the notion of a promise is Steere [2] proposes dynamic sets as an operat- introduced as the extensible vehicle of com- ing system abstraction to address the problem of munication between these layers.. The local I/O latency by exposing the application’s non- knowledge that an application possesses is determinism and future data needs to the sys- called a promise and it is passed onto the kernel tem, which then exploits this new information as part of the process state. The file system to reduce latency by improving scheduling and implementation has been modified to check for ordering of the I/O accesses.Cao et al [4] fo- one particular promise, unique-create, and uti- cus on applications with large data sets and al- lize the presence of this promise to speed up file low them to express control over cache replace- creation time. Unique-create essentially tells ment. The application controlled replacement the file system that the application promises decisions combined with their LRU-SP (Least never to create a file with an existing name in Recently Used with Swapping and Placehold- that directory. ers) with kernel allocation policy reduces the number of disk I/O significantly. The rest of the paper is organized as follows. These works indicate the importance of us- Section 2 discusses some of the related work in ing the application’s knowledge to dictate some passing information from higher layers to the of the file system policies, but they’ve almost file system. Section 3 describes underlying as- always seemed to focus on read related issues. sumptions in promises and gives a few examples The application can leverage promises for both of useful promises. Section 4 describes the the read and write related optimization. The unique- unique-create promise in more detail, while Sec- create promise discussed in Section 4 is an ex- tion 5 discusses the implementation of unique- ample of a write optimization. create, the tests used to evaluate file creation with and without the promise and the results ob- served. Section 5 serves as the summary and conclusion of this work.

2 3 Underlying assumptions 4 The Unique-Create Promise So far, it has been assumed that the promises To assess the viability of promises, a simple are known before-hand. In reality, promises promise called “Unique-Create” was chosen. must be inferred from the applications, either by When an application makes this promise to the means of their very design and implementation, file system, it pledges that it will never create or by using formal analysis over the code or a file with an existing name in a directory’s by observing the arguments in the file system namespace. This promise is limited to the calls. This paper does not attempt to answer the particular case of the system call wherein bootstrapping problem of identifying promises. both the flags O CREAT and O EXCL set. It If the application doesn’t keep its promise, then returns an error if the file already exists and cre- all bets are off. The behavior in such a situation ates the file otherwise. Now, putting this in the is a consequence of the implementation of the context of the unique-create promise, it means promise. It is the file systems duty to check that open( ) would never return an error, since for the promises an application makes. This, in the application promises the non-existence of turn, means that only the file system implemen- the file. To understand why this promise could tation is modified to handle the promise, and be useful, it is important to know the filename thus requires no change in the application code, lookup and create mechanism in Linux. which makes promises a practical solution. Figure 1 shows a high-level view of the Linux Some promises that could be of potential file system architecture. The con- use are: tains the applications and the GNU C i) The application could tell the file system that provides the interface for the file system about the importance of a file by the number of calls open, read, write, . The system call times it reads or writes to it, to influence some interface acts as a switch and funnels system of the caching policies. Most of the related calls from the user space to the appropriate work focused on similar issues. end points in the kernel space. In Linux and ii) The application might want to disable jour- based operating systems, the VFS is the naling for writes to some files it doesn’t care primary interface to the underlying file systems. about to speed up performance. It exports a common set of interfaces and iii) In distributed file systems, a lot of the abstracts them to the individual file systems permission checks occur at the master, before (such as , , , etc). There are two being routed to the chunk servers. Thus, the use caches for the file system objects at the VFS of low-level permission checks might not be level; the inode cache and the dentry (directory that important and an unnecessary operation. entry) cache. iv) In GFS, every chunk is limited to 64 megabytes. It might be interesting to see if this information could simplify the block allocation When a file name lookup happens, VFS uses logic and speedup writes. the file name and its parent directory and looks into the directory cache to check if the entry already exists. If the entry is found, it returns the dentry object immediately. If it is not found here, then a real lookup happens, resulting in a

3 Figure 1: High Level Architecture of the Linux file system disk read. The latter is naturally a costly opera- nel. The VFS code in fs/namei.c now checks tion, since memory accesses are extremely fast if current->promises is 1 (indicating the pres- when compared to disk accesses. Now, when ence of the quick-create promise) where current an application promises unique-create, there is is a global pointer to the task structure of the cur- no reason to lookup either the directory cache or rent process. If present, both the directory cache the filesystem in the disk when the system call and file system specific lookups are skipped and open has the flags O CREAT and O EXCL set. a VFS dentry object is created based on the file- Thus, skipping this step should technically lead name and parent directory details. After the to much faster file creation times. usual permissions check, the file system spe- cific inode and dentry are created (in this case, 5 Implementation and Results ext3). The absence of the promise results in Applications are a bunch of processes and the the usual lookup mechanism and thus, doesn’t promises they make can be incorporated as a change anything. part of the process state. By encoding the promises in this way, it becomes straightforward 5.1 Tests conducted for the file system to check the promises pro- cess (task) state and take the appropriate ac- It thus becomes simple to test the system, since tion. The implementation involved changing the process state can be changed on the fly. This fs/namei.c, fs/proc/base.c, include/linux/sched.h obviates the need to have two kernel images, and fs/ext3/namei.c in the Linux 2.6.35.9 ker- one with and another without the promise. To

4 5.2 Results Figure 2 shows the number of files versus cre- ation time plot for the file size 100 bytes. The creation time with the promise is consis- tently lower than without it, though the differ- ence varies from 5% to 15% for different num- ber of files in the directory. Similar graphs are seen for other file sizes as well, with an average creation time improvement of 10%. [12pt,letterpaper]report

Table 1: file creation time split Operation Promise No Promise ms) (ms) Ext3 lookup 0.005 83.33 Inode creation 13.33 13.33 Figure 2: Test 1 results for file size 100 bytes Dentry addition 73.33 3.33

check the performance impact of quick-create, Breaking down the steps in file creation to two tests were conducted on a Pentium 4, 3 GHz the lookup time, inode creation time and dentry machine with an 80GB SATA disk. In the first addition time, one can see the gain in lookup test, a large number of files (upto 100,000) were time with the promise. But a lot of this gain created in a directory and the time taken was is lost when the file system specific dentry is measured with and without the promise. The added to the parent directory, resulting in only sizes of the files were varied from 100 bytes to a 10% speedup. It is still unclear as to why 10 kilobytes. The unique-create promise can be ext3 add dentry( ) takes much longer with the toggled using echo 0 > /proc/pid/promises promise, even though the ext3 filename exists and echo 1 > /proc/pid/promises, where check in add dirent to buf( ) has been skipped. pid denotes the process id of the application process. In the second test, a directory con- taining over 100,000 files was created. The file 6 Summary and Future Work system was unmounted and mounted back to The file system API is a one-size fits all ap- clear any cache entries. An alternative to mount proach and doesn’t give applications the free- and unmount is sync followed by echo 3 > dom to control its policies and assumptions in /proc/sys/vm/drop-caches, which pushes all spite of them knowing more about their under- dirty files to disk and then clears the caches. A lying needs. The unique-create promise allows file was created in that directory and the time applications that never create a file with an ex- taken was measured, again with and without isting name to do so 10% faster than usual with- the quick-create promise. The inode cache and out changing their code. The only modifica- directory cache entries were also monitored tion needed in the application was to set the using /proc/slabinfo . promises process state field to 1 since the file

5 system was extended to check for the existence [4] Pei Cao, Edward W. Felten and Kai of the promise and take the necessary action. Li, “Implementation and Performance of Quick-create can be considered as a simple Application-Controlled File Caching,” In experiment in the domain of promises. Judg- Proceedings of the First Symposium on Op- ing the usefulness of promises on its basis erating Systems Design and Implementa- alone would be a pre-matured conclusion. tion, pages 165-178, 1994 Discovering promises that would be of use to [5] Gurtov, Andrei and Floyd, Sally, “Model- applications without requiring extensive file ing wireless links for transport protocols,’ system code changes is the crux of the problem. ’SIGCOMM Computing and Communica- Proving that the file system works as before tions Review, pages 85-96, 2004 in the absence of promises is another problem that needs to be tackled in this space. The [6] Andrei Gurtov and Reiner Ludwig, “Life- security implications of a promise needs to be time Packet Discard for Efficient Real-Time considered as well. Transport over Cellular Links,”ACM Mo- bile Computing and Communications Re- view, pages 32-45, 2003 7 Acknowledgments I’d like to thank Prof. Remzi Arpaci-Dusseau for his guidance throughout this project. Special thanks to Prof. Stupide for all the awesome brainstorming sessions that never happened.

References [1] R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel Stodolsky and Jim Zelenka, “Informed Prefetching and Caching,” In Proceedings of the Fifteenth ACM Sym- posium on Operating Systems Principles, pages 79–95, 1995

[2] David C. Steere, “Exploiting the non- determinism and asynchrony of set itera- tors to reduce aggregate file I/O latency,” In SIGOPS Operating Systems Review, pages 252-263, 1997

[3] Griffioen, James and Appleton, Randy. “Re- ducing file system latency using a pre- dictive approach,” In Proceedings of the USENIX Summer 1994 Technical Confer- ence on USENIX Summer 1994 Technical Conference - Volume 1, page 13, 1994

6