Designing a True Direct-Access File System with Devfs

Designing a True Direct-Access File System with DevFS Sudarsun Kannan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison; Yuangang Wang, Jun Xu, and Gopinath Palani, Huawei Technologies https://www.usenix.org/conference/fast18/presentation/kannan This paper is included in the Proceedings of the 16th USENIX Conference on File and Storage Technologies. February 12–15, 2018 • Oakland, CA, USA ISBN 978-1-931971-42-3 Open access to the Proceedings of the 16th USENIX Conference on File and Storage Technologies is sponsored by USENIX. Designing a True Direct-Access File System with DevFS Sudarsun Kannan, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin–Madison Yuangang Wang, Jun Xu, Gopinath Palani Huawei Technologies Abstract To reduce OS-level overheads and provide direct stor- We present DevFS, a direct-access file system embed- age access for applications, prior work such as Ar- ded completely within a storage device. DevFS provides rakis [34], Moneta-D [8], Strata [26], and others [20, 49, direct, concurrent access without compromising file sys- 50] split the file system into user-level and kernel-level tem integrity, crash consistency, and security. A novel components. The user-level component handles all data- reverse-caching mechanism enables the usage of host plane operations (thus bypassing the OS), and the trusted memory for inactive objects, thus reducing memory load kernel is used only for control-plane operations such as upon the device. Evaluation of an emulated DevFS pro- permission checking. However, prior approaches fail to totype shows more than 2x higher I/O throughput with deliver several important file-system properties. First, direct access and up to a 5x reduction in device RAM using untrusted user-level libraries to maintain file sys- utilization. tem metadata shared across multiple applications can se- riously compromise file-system integrity and crash con- 1 Introduction sistency. Second, unlike user-level networking [51], in file systems, data-plane operations (e.g., read or write to The world of storage, after decades of focus on hard- a file) are closely intertwined with control-plane opera- drive technologies, is finally opening up towards a new tions (e.g., block allocation); bypassing the OS during era of fast solid-state storage devices. Flash-based SSDs data-plane operations can compromise the security guar- have become standard technology, forming a new perfor- antees of a file system. Third, most of these approaches mance tier in the modern datacenter [7, 32]. New, faster require OS support when sharing data across applications flash memory technologies such as NVMe [20] and stor- even for data-plane operations. age class memory (SCM) such as Intel’s 3D X-point [1] To address these limitations, and realize a true user- promise to revolutionize how we access and store persis- level direct-access file system, we propose DevFS, a tent data [10, 13, 50, 53]. State-of-the-art flash memory device-level file system inside the storage hardware. technologies have reduced storage-access latency to tens The DevFS design uses the compute capability and of microseconds compared to milliseconds in the hard- device-level RAM to provide applications with a high- drive era [34, 52, 58]. performance direct-access file system that does not com- To fully realize the potential of these storage devices, promise integrity, concurrency, crash consistency, or se- a careful reconsideration of the software storage stack curity. With DevFS, applications use a traditional POSIX is required. The traditional storage stack requires ap- interface without trapping into the OS for control-plane plications to trap into theOS and interact with multiple and data-plane operations. In addition to providing di- software layers such as the in-memory buffer cache, file rect storage access, a file system inside the storage hard- system, and device drivers. While spending millions of ware provides direct visibility to hardware features such cycles is not a significant problem for slow storage de- as device-level capacitance and support for processing vices such as hard drives [3,13,58], for modern ultra-fast data from multiple I/O queues. With capacitance, DevFS storage, software interactions substantially amplify ac- can safely commit data even after a system crash and also cess latencies, thus preventing applications from exploit- reduce file system overhead for supporting crash consis- ing hardware benefits [3, 9, 34, 50]. Even the simple act tency. With knowledge of multiple I/O queues, DevFS of trapping into and returning from the OS is too costly can increase file system concurrency by providing each for modern storage hardware [14, 49, 58]. file with its own I/O queue and journal. USENIX Association 16th USENIX Conference on File and Storage Technologies 241 A file system inside device hardware also introduces large pool of I/O queues to which software can concur- new challenges. First, even modern SSDs have limited rently submit requests for higher parallelism. RAM capacity due to cost ($/GB) and power constraints. With advancements in SSD hardware performance, In DevFS, we address this dilemma by introducing re- bottlenecks have shifted to software. To reduce software verse caching, an approach that aggressively moves in- overheads on the data path and exploit device-level par- active file system data structures off the device to the allelism, new standards such as NVMe [54] have been host memory. Second, a file system inside a device is adopted. NVMe allows software to bypass device driver a separate runtime and lacks visibility to OS state (such software and directly program device registers with sim- as process credentials) required for secured file access. ple commands for reading and writing the device [18]. To overcome this limitation, we extend the OS to co- Storage class memory technologies. Storage class ordinate with DevFS: the OS performs down-calls and memory (SCM), such as Intel’s 3D Xpoint [1] and HP’s shares process-level credentials without impacting direct memristors, are an emerging class of nonvolatile mem- storage access for applications. ory (NVM) that provide byte-addressable persistence To the best of our knowledge, DevFS is the first design and provide access via the memory controller. SCMs to explore the benefits and implications of a file system have properties that resemble DRAM more than a block inside the device to provide direct user-level access to ap- device. SCMs can provide 2-4x higher capacity than plications. Due to a lack of real hardware, we implement DRAMs, with variable read (100-200ns) and write la- and emulate DevFS at the device-driver level. Evalua- tency (400-800ns) latency. SCM bandwidth ranges from tion of benchmarks on the emulated DevFS prototype 8 GB/s to 20 GB/s, which is significantly faster than with direct storage access shows more than 2x higher state-of-the-art SSDs. Importantly, read (loads) and write and 1.6x higher read throughput as compared to a writes (stores) to SCMs happen via the processor cache kernel-level file system. DevFS memory-reduction tech- which plays a vital role in application performance. niques reduce file system memory usage by up to 5x. Several software solutions that include kernel-level Evaluation of a real-world application, Snappy compres- file systems [10, 13, 55, 56], user-level file systems [49], sion [11], shows 22% higher throughput. and object storage [17, 50] libraries have been pro- In Section 2, we first categorize file systems, and then posed for SCM. Kernel-level file systems retain the discuss the limitations of state-of-the-art user-level file POSIX-based block interface and focus on thinning the systems. In Section 3, we make a case for a device-level OS stack by replacing page cache and block layers with file system. In Section 4, we detail the DevFS design and simple byte-level load and store operations. Alterna- implementation, followed by experimental evaluations in tive approaches completely bypass the kernel by us- Section 5. In Section 6, we describe the related literature, ing an object-based or POSIX-compatible interface over and finally present our conclusions in Section 7. memory-mapped files [49]. 2 Motivation 2.2 File System Architectures We broadly categorize file systems into three types: (1) Advancements in storage hardware performance have kernel-level file systems, (2) hybrid user-level file sys- motivated the need to bypass the OS stack and provide tems, and (3) true user-level direct-access file systems. applications with direct access to storage. We first dis- Figure 1 shows these categories, and how control-plane cuss hardware and software trends, followed by a brief and data-plane operations are managed by each. The history of user-level file systems and their limitations. figure additionally shows a hybrid user-level file system with a trusted server and a Fuse-based file system. 2.1 H/W and S/W for User-Level Access Kernel-level traditional file systems. Kernel-level file Prior work has explored user-level access for PCI-based systems act as a central entity managing data and meta- solid state drives (SSD) and nonvolatile memory tech- data operations as well as control-plane operations [13, nologies. 28, 56]. As shown in Figure 1, kernel-level file sys- Solid-state drives. Solid-state drives (SSD) have be- tems preserve the integrity of metadata and provide crash come the de facto storage device for consumer elec- consistency. Applications using kernel-level file systems tronics as well as enterprise computing. As SSDs trap into the OS for both data-plane and control-plane have evolved, their bandwidth has significantly increased plane operations. along with a reduction in access latencies [6, 57]. To Hybrid user-level file systems.

Load more