Mos: an Architecture for Extreme-Scale Operating Systems

mOS: An Architecture for Extreme-Scale Operating Systems Robert W. Wisniewskiy Todd Ingletty Pardo Keppely Ravi Murtyy Rolf Rieseny Linux R , or more specifically, the Linux API, plays a key 1. INTRODUCTION role in HPC computing. Even for extreme-scale computing, As the system software community moves forward to ex- a known and familiar API is required for production ma- ascale computing and beyond, there is the oft debated ques- chines. However, an off-the-shelf Linux distribution faces tion of how revolutionary versus how evolutionary the soft- challenges at extreme scale. To date, two approaches have ware needs to be. Over the last half decade, researchers been used to address the challenges of providing an operat- have pushed in one direction or the other. We contend that ing system (OS) at extreme scale. In the Full-Weight Kernel both directions are valid and needed simultaneously. Throw- (FWK) approach, an OS, typically Linux, forms the starting ing out all current software environments and starting over point, and work is undertaken to remove features from the would be untenable from an application perspective. Yet, environment so that it will scale up across more cores and there are significant challenges getting to exascale and be- out across a large cluster. A Light-Weight Kernel (LWK) yond, so revolutionary approaches are needed. Thus, we approach often starts with a new kernel and work is under- need to simultaneously allow the evolutionary path, i.e., in taken to add functionality to provide a familiar API, typi- the OS context, a Linux API, to coexist with revolutionary cally Linux. Either approach however, results in an execu- models supportable by a nimble LWK. The focus of mOS tion environment that is not fully Linux compatible. is extreme-scale HPC. mOS, which simultaneously runs a mOS (multi Operating System) runs both an FWK (Linux), Linux and an LWK, supports the coexistence of evolutionary and an LWK, simultaneously as kernels on the same compute and revolutionary models. In the rest of the introduction, node. mOS thereby achieves the scalability and reliability we describe the following motivations for mOS: of LWKs, while providing the full Linux functionality of an • simultaneously support the existing Linux API with LWK FWK. Further, mOS works in concert with Operating Sys- performance, scalability, and reliability; tem Nodes (OSNs) to offload system calls, e.g., I/O, that are too invasive to run on the compute nodes at extreme- • resolve the tension between FWK and LWK approaches; scale. Beyond providing full Linux capability with LWK • nimbly incorporate new technologies such as hybrid mem- performance, other advantages of mOS include the ability to ory and specialized cores, as well as new models such as effectively manage different types of compute and memory fine-grained threading and asynchronicity; and resources, interface easily with proposed asynchronous and • provide a hierarchy for system call implementation. fine-grained runtimes, and nimbly manage new technologies. The architecture of mOS is depicted in Figure 1. We This paper is an architectural description of mOS. As a describe it in more specificity in Section 4, but include it prototype is not yet finished, the contributions of this work early so the following description has context. are a description of mOS's architecture, an exploration of Simultaneously support legacy and new paradigms the tradeoffs and value of this approach for the purposes Linux is crucial to a broad user base across many comput- listed above, and a detailed architecture description of each ing areas. As such, proposed modifications to Linux need to of the six components of mOS, including the tradeoffs we meet a high bar in terms of their applicability. An LWK has considered. The uptick of OS research work indicates that greater flexibility to target the specific needs of high-end many view this as an important area for getting to extreme HPC. mOS, by incorporating Linux and an LWK, leverages scale. Thus, most importantly, the goal of the paper is to the strengths of Linux and provides a test bed to explore generate discussion in this area at the workshop. new technologies allowing Linux to incorporate them when they have demonstrated broader-based appeal. For exam- yIntel Corporation ple, large pages were used in HPC LWKs a decade ago, and Linux R is the registered trademark of Linus Torvalds in the U.S. and other countries. over time, as the usage models were validated, Linux has started including large page support. Thus, in mOS, Linux Permission to make digital or hard copies of all or part of this work for personal or and the LWK, are not in competition but are symbiotic with classroom use is granted without fee provided that copies are not made or distributed each leveraging the strengths of the other. From a pragmatic for profit or commercial advantage and that copies bear this notice and the full citation point of view, vendors need to deliver solutions that work on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or for their existing customer base, but also need to innovate republish, to post on servers or to redistribute to lists, requires prior specific permission to maintain competitiveness. mOS provides the ability to and/or a fee. Request permissions from [email protected]. simultaneously support legacy and emerging paradigms. ROSS ’14, June 10, 2014, Munich, Germany Resolve tension in FWK and LWK approaches Copyright is held by the owner/author(s). Publication rights licensed to ACM. Historically, high-end operating systems have been ap- ACM 978-1-4503-2950-7/14/06 ...$15.00. proached either from an FWK or an LWK perspective. In http://dx.doi.org/10.1145/2612262.2612263 single thread performance, ones optimized for power effi- ciency, ones designed for special purposes but being used more generally, e.g., GPUs, programmable logic (FPGAs), and fixed-function application-specific hardware. Thus, an mOS-like approach will be increasingly important compared to either FWK or traditional LWK approaches, as mOS's architecture is suited to leverage special cores. It allows placing key services as close as possible to raw hardware, while at the same time keeping jittery and other system services isolated. Further, because mOS provides non-critical Figure 1: mOS architecture c Intel services by calling Linux, the mOS LWK is smaller than a conventional LWK; in turn, this makes it simpler to port and target mOS to novel hardware. Provide a hierarchy for system calls an FWK approach, an OS, typically Linux, forms the start- As machines increase in size { both node count across the ing point, and work is undertaken to remove features from machine, and core count per node { a scalable method for the environment so that it will scale across more cores and handling the proliferation is to move to hierarchical mecha- out across more nodes in a large cluster. An LWK approach nisms. This has been done for scalable job launch, and I/O. often starts with a new kernel and work is undertaken to Cray's Linux [3] and the IBM R CNK [9] use a hierarchy for add functionality to provide a familiar API, typically some- I/O. By design, mOS leverages a hierarchy of places to im- thing close to that of a general purpose operating system plement system calls. First, performance critical calls are such as Linux. Either approach's end point however, is serviced by the LWK on the compute cores running mOS. an environment that is not fully Linux compatible. The Second, calls that benefit from a short latency, require node- LWK approaches Linux, but some Linux functionality, ei- local information, or require a Linux kernel, are serviced on ther because of time constraints or intentionally, is not im- the Linux running on the compute node. Third, calls with plemented. An FWK approach, strips enough from Linux so higher-latency or those requiring resources not available on that it too, no longer supports generic Linux applications. the compute node, are off-loaded to an OS node. While we Thus, some vendors offer two solutions: a fully compatible describe three levels of hierarchy for simplicity, we envision cluster HPC Linux that performs well but not at extreme offload can be to more than just a single node. scale, and a configured patched offering for extreme scale, but that does not provide full Linux compatibility. As an example, the Cray R Linux Environment provides extreme This paper is an architectural description of mOS. As a scale and cluster compatibility capability [3]. mOS runs prototype is not fully implemented, the contributions of this both Linux and an LWK simultaneously. Therefore, Linux work are 1) a description of mOS's architecture, 2) an ex- can be more Linux, it does not need to be stripped down. ploration of the tradeoffs we considered and the value of the And the LWK can be more lightweight, it does not need to mOS approach as a way to satisfy the above listed motiva- include support for services that should be supported on the tions. The motivations and requirements are representative compute node, but would be better left out of an LWK, e.g., of the needs for OS kernels for high-end HPC systems, and Java, Python. Further, mOS allows Linux functionality to 3) a description of mOS's six components and the tradeoffs be achieved with minimal or no patching of Linux. The goal we considered. of upstreaming modifications can be achieved due to the nar- There was significant operating system research in the row interface needed to interact with Linux, and provides for 1990s, and then in the 2000s there was less emphasis on new better sustainability due to not needing to maintain Linux operating structure and more work on Linux enhancements.

Mos: an Architecture for Extreme-Scale Operating Systems

ENCE360: Operating Systems Course Outline

Certified Systems Matrix 12C Release 3 (12.3.2.0.0) E59961-07 July 2016

Seminar HPC Trends Winter Term 2017/2018 New Operating System Concepts for High Performance Computing

The Linux Device File-System

A Framework for Visual Modular Design of Educational Operating System

Cielo Computational Environment Usage Model

Stepping Towards a Noiseless Linux Environment

The Development of Unix

Considerations for the SDP Operating System

Secure Computation Systems for Confidential Data Analysis By

CN-Linux (Compute Node Linux)

Exascale Computing Study: Technology Challenges in Achieving Exascale Systems