The Aurora Operating System
Total Page:16
File Type:pdf, Size:1020Kb
The Aurora Operating System Revisiting the Single Level Store Emil Tsalapatis Ryan Hancock Tavian Barnes RCS Lab, University of Waterloo RCS Lab, University of Waterloo RCS Lab, University of Waterloo [email protected] [email protected] [email protected] Ali José Mashtizadeh RCS Lab, University of Waterloo [email protected] ABSTRACT KEYWORDS Applications on modern operating systems manage their single level stores, transparent persistence, snapshots, check- ephemeral state in memory, and persistent state on disk. En- point/restore suring consistency between them is a source of significant developer effort, yet still a source of significant bugs inma- ACM Reference Format: ture applications. We present the Aurora single level store Emil Tsalapatis, Ryan Hancock, Tavian Barnes, and Ali José Mash- (SLS), an OS that simplifies persistence by automatically per- tizadeh. 2021. The Aurora Operating System: Revisiting the Single sisting all traditionally ephemeral application state. With Level Store. In Workshop on Hot Topics in Operating Systems (HotOS recent storage hardware like NVMe SSDs and NVDIMMs, ’21), June 1-June 3, 2021, Ann Arbor, MI, USA. ACM, New York, NY, Aurora is able to continuously checkpoint entire applications USA, 8 pages. https://doi.org/10.1145/3458336.3465285 with millisecond granularity. Aurora is the first full POSIX single level store to han- dle complex applications ranging from databases to web 1 INTRODUCTION browsers. Moreover, by providing new ways to interact with Single level storage (SLS) systems provide persistence of and manipulate application state, it enables applications to applications as an operating system service. Their advantage provide features that would otherwise be prohibitively dif- lies in removing the semantic gap between the in-memory ficult to implement. We argue that this provides strong evi- representation and the serialized on-disk representation that dence that manipulation and persistence of application state uses file IO APIs. This gap often leads to increased code naturally belong in an operating system. complexity and software bugs [13, 41]. Instead, applications solely use memory and the operating system persists this CCS CONCEPTS state to disk. Developers design programs as if they never crash and thus do not write code for persistence and recovery. • Software and its engineering ! Operating systems; • After a crash, the SLS restores the application, including Computer systems organization ! Secondary storage all state (i.e., CPU registers, OS state, and memory), which organization; Reliability; Dependable and fault-tolerant continues executing oblivious to the interruption. systems and networks; • Information systems ! Stor- SLSes have been impractical to build for decades for per- age architectures. formance reasons, but this has changed with the advent of new storage technologies. Past systems suffered from a large performance gap between memory and disks in terms of bandwidth and latency. This was compounded by write- Permission to make digital or hard copies of part or all of this work for amplification due to the tracking of memory modifications personal or classroom use is granted without fee provided that copies are at page granularity, and the overhead of CPU and OS state. not made or distributed for profit or commercial advantage and that copies Modern flash, coupled with fast PCIe Gen 4–5, has largely bear this notice and the full citation on the first page. Copyrights for third- closed the performance gap with memory. party components of this work must be honored. For all other uses, contact We introduce the Aurora Operating System, a novel non- the owner/author(s). traditional single level storage system that enables persis- HotOS ’21, June 1-June 3, 2021, Ann Arbor, MI, USA © 2021 Copyright held by the owner/author(s). tence and manipulation of execution state. Aurora is based ACM ISBN 978-1-4503-8438-4/21/05. on the FreeBSD kernel and is the first SLS that can run un- https://doi.org/10.1145/3458336.3465285 modified POSIX applications. Aurora provides persistence at HotOS ’21, June 1-June 3, 2021, Ann Arbor, MI, USA Emil Tsalapatis, Ryan Hancock, Tavian Barnes, and Ali José Mashtizadeh the granularity of process trees or containers, and supports incremental checkpointing [51] to persist applications at multi-process applications with nearly all POSIX primitives. regular intervals with runtime and/or application specific This allows for the persistence of complex applications like hooks. These systems were severely limited by the speed of Firefox, a popular web browser. storage devices at the time, e.g., the EROS research OS spent Aurora differs from previous systems in several ways. a large effort on masking spinning disk latency [45]. EROS [45] requires application cooperation to achieve per- Existing persistent-by-default designs like The Machine [31] formance and its main contributions are optimizing check- and Twizzler [17] are not transparent and depend on special pointing and swap for spinning disks. IBM’s AS/400 uses hardware. The Machine was an attempt to build a supercom- runtime and compile time hooks along with application hints puter based on memristors, while Twizzler is an OS that uses to achieve good performance [46]. only NVDIMMs for storage. These systems break compati- Aurora revisits the single level store with three main con- bility with existing systems in that they depend on the byte tributions. First, we depend on improvements in hardware to addressability of persistent storage. Single level stores like achieve performance and functionality including new flash Aurora conversely use regular DRAM and disks, and hide storage and large virtual address spaces. Second, we develop the distinction between the two from the application. an architecture designed to support both unmodified and Aurora makes a key observation that device bandwidth modified POSIX applications. Third, we expand the concept and latency has improved to rival the memory bus. Mod- of a single level store with new primitives for the manipulat- ern CPUs can provide an aggregate PCIe bandwidth up to ing execution state to enable novel applications. 256 GB/s, more than that of memory [16]. New Intel 3D Aurora also accurately captures application state by treat- XPoint SSD’s reduce IO latency to 10 `B [8], within two ing all POSIX primitives (e.g., Unix domain sockets, System orders of magnitude of memory. The combination of high V shared memory, and file descriptors) as first class objects, bandwidth and low latency makes transparent persistence rather than as parts of a process. This allows Aurora to han- possible without needing byte addressability. dle applications composed of processes that share memory or Popular checkpoint/restore mechanisms have been used files in arbitrary ways, without duplicating work or leaving for scientific computing to recover from failures and migrate edge cases unhandled. Using this approach, Aurora supports workloads [29, 42, 47]. These systems do not checkpoint complex programs like the Firefox web browser, the RocksDB frequently enough to provide transparent persistence and key value store, and the Mosh remote shell. the resulting checkpoints are not self contained. Aurora provides a system level service for manipulating Checkpointing of virtual machines (VM) has enjoyed a arbitrary application state. It goes much further than tradi- lot of popularity and applications. VMs package the applica- tional SLSes by blurring the line between applications and tion and all dependencies into a portable checkpoint. Live data. Users can operate on running applications to persist, migration and incremental checkpointing have enabled dis- copy, revert, or transfer them the same way they would a tributed resource management, fault tolerance and other file. Aurora makes state manipulation an explicit operation, applications [5, 24, 36, 38]. which programs often need to do in an ad hoc manner by Containers, which have less overhead than virtualization, themselves. Aurora creates application checkpoints that en- have traditionally lacked these features. Providing the same capsulate all information required to recreate the application, functionality for containers enables the same distributed even across reboots and machines. resource management and fault tolerance applications. Sys- We argue that application persistence and manipulation tems like CRIU [6], the standard for Linux container migra- of execution state naturally belong in the operating system, tion [40], piece together application state by querying the which enables novel applications to modern systems. Aurora kernel through system calls and the proc file system. allows us to solve a wide range of complex systems problems, While CRIU’s performance is tolerable for migration, its from reducing startup times and increasing density of server- overheads are prohibitive for other applications including less computing, to improving debugging and simplifying transparent persistence. Even research systems that optimize database design. memory checkpointing in CRIU have failed to reduce over- heads enough [50]. Furthermore, CRIU is incredibly complex, requiring 7 years to properly add UNIX socket support [7]. 2 BACKGROUND Aurora is different from previous systems in two ways. Single level stores have existed for decades both in industry First, rather than checkpointing objects exposed at the sys- and academia [20, 23, 32, 45, 46]. These systems simplified tem call boundary it