
LibrettOS: A Dynamically Adaptable Multiserver-Library OS∗ Ruslan Nikolaev, Mincheol Sung, Binoy Ravindran Bradley Department of Electrical and Computer Engineering, Virginia Tech frnikola;mincheol;[email protected] Abstract network components fail or need to be upgraded. Finally, to eciently use hardware resources, applications can dynam- We present LibrettOS, an OS design that fuses two ically switch between the indirect and direct modes based paradigms to simultaneously address issues of isolation, on their I/O load at run-time. We evaluate LibrettOS with performance, compatibility, failure recoverability, and run- 10GbE and NVMe using Nginx, NFS, memcached, Redis, and time upgrades. LibrettOS acts as a microkernel OS that runs other applications. LibrettOS’s performance typically ex- servers in an isolated manner. LibrettOS can also act as a ceeds that of NetBSD, especially when using direct access. library OS when, for better performance, selected applica- Keywords: operating system, microkernel, multiserver, net- tions are granted exclusive access to virtual hardware re- work server, virtualization, Xen, IOMMU, SR-IOV, isolation sources such as storage and networking. Furthermore, ap- plications can switch between the two OS modes with no interruption at run-time. LibrettOS has a uniquely distin- guishing advantage in that, the two paradigms seamlessly 1 Introduction coexist in the same OS, enabling users to simultaneously ex- Core components and drivers of a general purpose mono- ploit their respective strengths (i.e., greater isolation, high lithic operating system (OS) such as Linux or NetBSD typ- performance). Systems code, such as device drivers, net- ically run in privileged mode. However, this design is of- work stacks, and le systems remain identical in the two ten inadequate for modern systems [13, 30, 43, 50, 62]. On modes, enabling dynamic mode switching and reducing de- the one hand, a diverse and ever growing kernel ecosys- velopment and maintenance costs. tem requires better isolation of individual drivers and other To illustrate these design principles, we implemented a system components to localize security threats due to the prototype of LibrettOS using rump kernels, allowing us to increasingly large attack surface of OS kernel code. Bet- reuse existent, hardened NetBSD device drivers and a large ter isolation also helps with tolerating component failures ecosystem of POSIX/BSD-compatible applications. We use and thereby increases reliability. Microkernels achieve this hardware (VM) virtualization to strongly isolate dierent goal, specically in multiserver OS designs [13, 30, 33].1 On rump kernel instances from each other. Because the orig- the other hand, to achieve better device throughput and inal rumprun unikernel targeted a much simpler model for resource utilization, some applications need to bypass the uniprocessor systems, we redesigned it to support multi- system call and other layers so that they can obtain exclu- core systems. Unlike kernel-bypass libraries such as DPDK, sive access to device resources such as network adapter’s applications need not be modied to benet from direct (NIC) Tx/Rx queues. This is particularly useful in recent arXiv:2002.08928v1 [cs.OS] 20 Feb 2020 hardware access. LibrettOS also supports indirect access hardware with SR-IOV support [63], which can create vir- through a network server that we have developed. Instances tual PCIe functions: NICs or NVMe storage partitions. Li- of the TCP/IP stack always run directly inside the address brary OSes and kernel-bypass libraries [77, 84] achieve this space of applications. Unlike the original rumprun or mono- goal. Multiserver-inspired designs, too, can outperform tra- lithic OSes, applications remain uninterrupted even when ditional OSes on recent hardware [54, 56]. Microkernels, though initially slow in adoption, have gai- ∗©2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. This is the author’s version of the work. It is ned more traction in recent years. Google’s Fuchsia OS [24] posted here for your personal use. Not for redistribution. The deni- uses the Zircon microkernel. Intel Management Engine [37] tive Version of Record was published in Proceedings of the 16th uses MINIX 3 [30] since 2015. A multiserver-like network- ACM SIGPLAN/SIGOPS International Conference on Virtual Execu- tion Environments (VEE ’20), March 17, 2020, Lausanne, Switzerland ing design was recently revisited by Google [56] to improve http://dx.doi.org/10.1145/3381052.3381316. performance and upgradability when using their private The U.S. Government is authorized to reproduce and distribute reprints 1As microkernels are dened broadly in the literature, we clarify that for Governmental purposes notwithstanding any copyright annotation we consider multiserver OSes as those implementing servers to isolate core thereon. OS components, e.g., MINIX 3 [30]. 1 ) ) e devices restrict the number of SR-IOV interfaces – e.g., the e python JDK g g a a Intel 82599 adapter [38] supports up to 16 virtual NICs, 4 r r o ntp o NFS t t Tx/Rx queues each; for other adapters, this number can be s s Network e e rsyslog Server ssh even smaller. Thus, it is important to manage available M M V hardware I/O resources eciently. Since I/O load is usu- V N HTTP N DB key-value ( ( ally non-constant and changes for each application based s (mySQL) (Redis) (Nginx) s u u on external circumstances (e.g., the number of clients con- b b e I e nected to an HTTP server during peak and normal hours), I C C PCIe bus (NIC) P PCIe bus (NIC) conservative resource management is often desired: use P network server(s) until I/O load increases substantially, at Figure 1. Server ecosystem example. which point, migrate to the library OS mode (for direct ac- cess) at run-time with no interruption. This is especially useful for recent bare metal cloud systems – e.g., when one (non-TCP) messaging protocol. However, general-purpose Amazon EC2 bare metal instance is shared by several users. application and device driver support for microkernels is In this paper, we present a new OS design – LibrettOS limited, especially for high-end hardware such as 10GbE+. – that not only reconciles the library and multiserver OS paradigms while retaining their individual benets (of bet- Kernel-bypass techniques, e.g., DPDK [84] and ter isolation, failure recoverability, and performance), but SPDK [77], are also increasingly gaining traction, as also overcomes their downsides (of driver and application they eliminate OS kernels from the critical data path, incompatibility). Moreover, LibrettOS enables applications thereby improving performance. Unfortunately, these to switch between these two paradigms at run-time. While techniques lack standardized high-level APIs and require high performance can be obtained with specialized APIs, massive engineering eort to use, especially to adapt to which can also be adopted in LibrettOS, they incur high en- existing applications [89]. Additionally, driver support in gineering eort. In contrast, with LibrettOS, existing appli- kernel-bypass libraries such as DPDK [84] is great only cations can already benet from more direct access to hard- for high-end NICs from certain vendors. Re-implementing ware while still using POSIX. drivers and high-level OS stacks from scratch in user space We present a realization of the LibrettOS design through involves signicant development eort. a prototype implementation. Our prototype leverages rump Oftentimes, it is overlooked that “no one size ts all.” In kernels [42] and reuses a fairly large NetBSD driver collec- other words, no single OS model is ideal for all use cases. tion. Since the user space ecosystem is also inherited from Depending upon the application, security or reliability re- NetBSD, the prototype retains excellent compatibility with quirements, it is desirable to employ multiple OS paradigms existing POSIX and BSD applications as well. Moreover, in in the same OS. In addition, applications may need to switch the two modes of operation (i.e., library OS mode and multi- between dierent OS modes based on their I/O loads. In server OS mode), we use an identical set of drivers and soft- Figure1, we illustrate an ecosystem of a web-driven server ware. In our prototype, we focus only on networking and running on the same physical or virtual host. The server storage. However, the LibrettOS design principles are more uses tools for logging (rsyslog), clock synchronization (ntp), general, as rump kernels can potentially support other sub- NFS shares, and SSH for remote access. The server also runs systems – e.g., NetBSD’s sound drivers [18] can be reused. python and Java applications. None of these applications The prototype builds on rumprun instances, which execute are performance-critical, but due to the complexity of the rump kernels atop a hypervisor. Since the original rumprun network and other stacks, it is important to recover from did not support multiple cores, we redesigned it to support temporary failures or bugs without rebooting, which is im- symmetric multiprocessing (SMP) systems. We also added possible in monolithic OSes. One way to meet this goal is 10GbE and NVMe drivers, and made other improvements to to have a network server as in the multiserver paradigm, rumprun. As we show in Section5, the prototype outper- which runs system components in separate user processes forms the original rumprun and NetBSD, especially when for better isolation and failure recoverability. This approach employing direct hardware access. In some tests, the pro- is also convenient when network components need to be totype also outperforms Linux, which is often better opti- upgraded and restarted at run-time [56] by triggering an ar- mized for performance than NetBSD. ticial fault. The paper’s research contribution is the proposed OS Core applications such as an HTTP server, database, and design and its prototype. Specically, LibrettOS is the rst key-value store are more I/O performance-critical.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages15 Page
-
File Size-