Evaluating File System Reliability on Solid State Drives

Evaluating File System Reliability on Solid State Drives Shehbaz Jaffer, Stathis Maneas, Andy Hwang, and Bianca Schroeder, University of Toronto https://www.usenix.org/conference/atc19/presentation/jaffer This paper is included in the Proceedings of the 2019 USENIX Annual Technical Conference. July 10–12, 2019 • Renton, WA, USA ISBN 978-1-939133-03-8 Open access to the Proceedings of the 2019 USENIX Annual Technical Conference is sponsored by USENIX. Evaluating File System Reliability on Solid State Drives Shehbaz Jaffer∗ Stathis Maneas∗ Andy Hwang Bianca Schroeder University of Toronto University of Toronto University of Toronto University of Toronto Abstract (due to suspected hardware problems) are often by an order As solid state drives (SSDs) are increasingly replacing hard of magnitude lower than those of HDDs, the occurrence of disk drives, the reliability of storage systems depends on the partial drive failures that lead to errors when reading or writ- failure modes of SSDs and the ability of the file system lay- ing a block or corrupted data can be an order of magnitude ered on top to handle these failure modes. While the classical higher. Other work argues that the Flash Translation Layer paper on IRON File Systems provides a thorough study of (FTL) of SSDs might be more prone to bugs compared to the failure policies of three file systems common at the time, HDD firmware, due to their high complexity and less matu- we argue that 13 years later it is time to revisit file system rity, and demonstrate this to be the case when drives are faced reliability with SSDs and their reliability characteristics in with power faults [53]. This makes it even more important mind, based on modern file systems that incorporate jour- than before that file systems can detect and deal with device naling, copy-on-write and log-structured approaches, and are faults effectively. optimized for flash. This paper presents a detailed study, span- Second, file systems have evolved significantly since [45] ning ext4, Btrfs and F2FS, and covering a number of different was published 13 years ago; the ext family of file systems has SSD error modes. We develop our own fault injection frame- undergone major changes from the ext3 version considered work and explore over a thousand error cases. Our results in [45] to the current ext4 [38]. New players with advanced indicate that 16% of these cases result in a file system that file-system features have arrived. Most notably Btrfs [46], a cannot be mounted or even repaired by its system checker. We copy-on-write file system which is more suitable for SSDs also identify the key file system metadata structures that can with no in-place writes, has garnered wide adoption. The cause such failures and finally, we recommend some design design of Btrfs is particularly interesting as it has fewer total guidelines for file systems that are deployed on top of SSDs. writes than ext4’s journaling mechanism. Further, there are new file systems that have been designed specifically for flash, 1 Introduction such as F2FS [33], which follow a log-structured approach to optimize performance on flash. Solid state drives (SSDs) are increasingly replacing hard disk The goal of this paper is to characterize the resilience of drives as a form of secondary storage medium. With their modern file systems running on flash-based SSDs in the face growing adoption, storage reliability now depends on the of SSD faults, along with the effectiveness of their recovery reliability of these new devices as well as the ability of the mechanisms when taking SSD failure characteristics into ac- file system above them to handle errors these devices might count. We focus on three different file systems: Btrfs, ext4, generate (including for example device errors when reading or and F2FS. ext4 is an obvious choice, as it is the most com- writing a block, or silently corrupted data). While the classical monly used Linux file system. Btrfs and F2FS include features paper by Prabhakaran et al. [45] (published in 2005) studied particularly attractive with respect to flash, with F2FS being in great detail the robustness of three file systems that were tailored for flash. Moreover, these three file systems cover common at the time in the face of hard disk drive (HDD) three different points in the design spectrum, ranging from errors, we argue that there are multiple reasons why it is time journaling to copy-on-write to log-structured approaches. to revisit this work. The main contribution of this paper is a detailed study, span- The first reason is that failure characteristics of SSDs differ ning three very different file systems and their ability to detect significantly from those of HDDs. For example, recent field and recover from SSD faults, based on error injection target- studies [39, 43, 48] show that, while their replacement rates ing all key data structures. We observe huge differences across ∗These authors contributed equally to this work. file systems and describe the vulnerabilities of each in detail. USENIX Association 2019 USENIX Annual Technical Conference 783 Over the course of this work we experiment with more than ror mechanisms that originate at the flash level and can result one thousand fault scenarios and observe that around 16% of in bit corruption, including retention errors, read and program them result in severe failure cases (kernel panic, unmount- disturb errors, errors due to flash cell wear-out and failing able file system). We make a number of observations and file blocks. Virtually all modern SSDs incorporate error correct- several bug reports, some of which have already resulted in ing codes to detect and correct such bit corruption. However, patches. For our experiments, we developed an error injection recent field studies indicate that uncorrectable bit corruption, module on top of the Linux device mapper framework. where more bits are corrupted than the error correcting code The remainder of this paper is organized as follows: Sec- (ECC) can handle, occurs at a significant rate in the field. For tion2 provides a taxonomy of SSD faults and a description example, a study based on Google field data observes 2-6 out of the experimental setup we use to emulate these faults and of 1000 drive days with uncorrectable bit errors [48]. Uncor- test the reaction of the three file systems. Section3 presents rectable bit corruption manifests as a read I/O error returned the results from our fault emulation experiments. Section4 by the drive when an application tries to access the affected covers related work and finally, in Section5, we summarize data (“Read I/O errors” in Table1). our observations and insights. Silent Bit Corruption: This is a more insidious form of bit corruption, where the drive itself is not aware of the corruption 2 File System Error Injection and returns corrupted data to the application (“Corruption” in Table1). While there have been field studies on the prevalence Our goal is to emulate different types of SSD failures and of silent data corruption for HDD based systems [9], there is check the ability of different file systems to detect and recover to date no field data on silent bit corruption for SSD based from them, based on which part of the file system was affected. systems. However, work based on lab experiments shows that We limit our analysis to a local file system running on top 3 out of 15 drive models under test experience silent data of a single drive. Note that although multi-drive redundancy corruption in the case of power faults [53]. Note that there mechanisms like RAID exist, they are not general substitutes are other mechanisms that can lead to silent data corruption, for file system reliability mechanisms. First, RAID is not including mechanisms that originate at higher levels in the applicable to all scenarios, such as single drives on personal storage stack, above the SSD device level. computers. Second, errors or data corruption can originate FTL Metadata Corruption: A special case arises when from higher levels in the storage stack, which RAID can silent bit corruption affects FTL metadata. Among other neither detect nor recover. things, the FTL maintains a mapping of logical to physical partial Furthermore, our work only considers drive fail- (L2P) blocks as part of its metadata [8]; metadata corruption ures, where only part of a drive’s operation is affected, rather could lead to “Read I/O errors” or “Write I/O errors”, when fail-stop than failures, where the drive as a whole becomes the application attempts to read or write a page that does not permanently inaccessible. The reason lies in the numerous have an entry in the L2P mapping due to corruption. Corrup- studies published over the last few years, using either lab ex- tion of the L2P mapping could also result in wrong or erased periments or field data, which have identified many different data being returned on a read, manifesting as “Corruption” to SSD internal error mechanisms that can result in partial fail- the file system. Note that this is also a silent corruption - i.e. ures, including mechanisms that originate both from the flash neither the device nor the FTL is aware of these corruptions. level [10, 12, 13, 16–19, 21, 23, 26, 27, 29–31, 34, 35, 40, 41, Misdirected Writes: This refers to the situation where dur- 47, 49, 50] and from bugs in the FTL code, e.g. when it is not ing an SSD-internal write operation, the correct data is being hardened to handle power faults correctly [52, 53]. written to flash, but at the wrong location.

Load more