Why Panic()?: Improving Reliability with Restartable File Systems

Why panic()? Improving Reliability with Restartable File Systems Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift Computer Sciences Department, University of Wisconsin, Madison ABSTRACT Device drivers are not the only OS subsystem, nor are they The file system is one of the most critical components of the necessarily where the most important bugs reside. Many re- operating system. Almost all applications running in the op- cent studies have shown that file systems contain a large num- erating system require file systems to be available for their ber of bugs [4, 6, 11, 24]. Perhaps this is not surprising, as file proper operation. Though file-system availability is critical systems are one of the largest and most complex code bases in in many cases, very little work has been done on tolerating the kernel. Further, file systems are still under active develop- file system crashes. In this paper, we propose Membrane, a ment, and new ones are introduced quite frequently. set of changes to the operating system to support restartable file systems. Membrane allows an operating system to tol- Due to the presence of flaws in file system implementation, erate a broad class of file system failures and does so while it is critical to consider how to handle their crashes. Unfor- remaining transparent to running applications; upon failure, tunately, two problems prevent us from directly applying pre- the file system restarts, its state is restored, and pending ap- vious work from the device-driver literature to improve file- plication requests are serviced as if no failure had occurred. system fault tolerance. First, most approaches mentioned above Our initial evaluation of Membrane with ext2 shows that Mem- are heavyweight due to the high costs of data movement and brane induces little performance overhead and can tolerate a page-table manipulation across address-space boundaries. Sec- wide range of file system crashes. More critically, Membrane ond, file systems, unlike device drivers, are extremely stateful, does so with few changes to ext2, thus improving robustness to as they manage vast amounts of both in-memory and persistent crashes without mandating intrusive changes to existing file- data; making matters worse is the fact that file systems spread system code. such state across many parts of the kernel, including the page cache, dynamically-allocated memory, and locks. Thus, when a file system crashes (typically by calling panic()), a great 1. INTRODUCTION deal of care is required to recover from the crash while keeping Operating systems crash. Whether due to software bugs or the rest of the OS intact. hardware bit-flips, the reality is clear: large code bases are brittle and the smallest problem in software implementation or In this paper, we propose Membrane, an operating system hardware environment can lead the entire monolithic operating framework to support lightweight, stateful recovery from file system to fail. system crashes. During normal operation, Membrane logs file system operations and periodically performs lightweight Recent research has made great headway in operating-system checkpoints of file system state. If a file system crashes, Mem- crash tolerance, particularly in surviving device driver fail- brane parks pending requests, cleans up existing state, restarts ures [17, 18, 23, 25]. Many of these approaches achieve some the file system from the most recent checkpoint, and replays level of fault resilience by building a hard wall around OS the in-memory operation log to restore the state of the running subsystems using address-space based isolation and microre- file system. Once finished with recovery, Membrane begins to booting said drivers upon fault detection [17, 18]. Other ap- service on-going application requests again; applications are proaches are similar, using variants of microkernel-based ar- kept unaware of the crash and restart except for the small per- chitectures [2, 23] or virtual machines [5, 9] to isolate drivers formance blip during recovery. from the kernel. Membrane does not place an address-space boundary between the file system and the rest of the kernel. Hence, it is possible that some types of crashes (e.g., wild writes) will corrupt kernel data structures and thus prohibit Membrane from properly Permission to make digital or hard copies of all or part of this work for recovering from a file system crash, an inherent weakness (and personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies strength!) of Membrane’s architecture. However, we believe bear this notice and the full citation on the first page. To copy otherwise, to that this approach will have the propaedeutic side-effect of en- republish, to post on servers or to redistribute to lists, requires prior specific couraging file system developers to add a higher degree of in- permission and/or a fee. tegrity checking (e.g., via assertions, either by hand or through ACM SIGOPS Operating Systems Review (c) 2010 ACM ISSN 0163-5980 automated techniques such as SafeDrive [25]) into their code /10/01 ...$10.00. 25 in order to fail quickly rather than run the risk of further cor- comes useless, as a majority of the applications running in the rupting the system. If such faults are transient (as many im- operating system depend on the file system for their proper portant classes of bugs are), crashing and quickly restarting is operation. The currently available solution for handling file a sensible course of action. system crashes is to restart the operating system and run recovery (or a repair utility such as fsck for file systems that do Membrane, being a lightweight and generic operating system not have any in-built crash consistency mechanism). More- framework, requires little or no change to existing Linux file over, applications that were using the file systems are killed, systems. We believe that Membrane can be used to restart making their services unavailable during this period. This mo- most of the existing file systems. We have prototyped Mem- tivates the need for a restartable framework in the operating brane with the ext2 file system. From our initial evaluation, we system, to tolerate failures in file systems. find that Membrane enables ext2 to recover from a wide range of fault scenarios. We also find that Membrane is relatively 3. CHALLENGES lightweight, adding a small amount of overhead across a set of Building restartable file systems is hard. We now discuss the file system benchmarks. Membrane achieves these goals with challenges in building such a system. little or no intrusiveness: only five lines were added to trans- Fault Tolerant. A gamut of faults can occur in file systems. form ext2 into its restartable counterpart. Finally, Membrane Failures can be caused due to faulty hardware and/or buggy improves robustness with complete application transparency; software. These failures can be permanent or transient, and even though the underlying file system has crashed, applica- can corrupt data arbitrarily or be fail-stop. The ideal restartable tions continue to run. file system should recover from all possible faults. Transparency. File-system failures must be transparent to ap- The rest of this paper is organized as follows. Sections 2, plications. Applications should not require any modifications 3, and 4 describe the motivation, the challenges in building to work with restartable file systems. Moreover, multiple ap- restartable file systems, and the design of Membrane respec- plications (or threads) could be modifying the file system state tively. Section 5 and Section 6 discuss the consequence of at the same time and one of them could trigger a crash (through having Membrane in the operating system and evaluates Mem- a bug). brane’s robustness and performance. Section 7 places Mem- brane in the context of other relevant work; finally Section 8 Performance. Since file systems are fine-tuned for perfor- concludes the paper. mance, the infrastructure to restart file systems should have little or no overhead during regular operations. Also, the re- 2. MOTIVATION covery time should be as small as possible to hide file-system We first motivate the need for restartability in file systems.The crashes from applications. file system is one of the critical components of the operating Consistency. Care must be taken to ensure that the on-disk system that is responsible for storing and retrieving user, appli- state of the file system is consistent after recovery. Hence, cation, and OS data from the disk. File systems are also one of infrastructure for creating light-weight recovery points should the largest and complex codes in the operating system; for ex- be built as most file systems do not have any inbuilt crash con- ample, modern file systems such as Sun’s Zettabyte File Sys- sistency mechanism (e.g., only 8 out of 30 on-disk file system tem (ZFS) [1], SGI’s XFS [16], and older code bases such as have such feature in Linux 2.6.27). Also, the kernel data struc- Sun’s UFS [10] contain nearly 100,000 lines of code [14]. Fur- tures (locks, memory, list, etc.) should be kept consistent after ther, file systems are still under active development, and new recovery. ones are introduced quite frequently. For example, Linux has Generic. A large number of commodity file systems exist and many established file systems, including ext2 [19], ext3 [20], each has its own strengths and weaknesses. The infrastructure reiserfs [13], and still there is great interest in next-generation should not be designed for a specific file system. Ideally, the file systems such as Linux ext4 [21] and Btrfs [22]. Thus, file infrastructure should enable any file system to be transformed systems are large, complex, and under development: the per- into a restartable file system with little or no changes.

Load more