Omni-Kernel: an Operating System Architecture for Pervasive Monitoring and Scheduling

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015 2849 Omni-Kernel: An Operating System Architecture for Pervasive Monitoring and Scheduling Age Kvalnes, Dag Johansen, Robbert van Renesse, Fred B. Schneider, Fellow, IEEE, and Steffen Viken Valvag Abstract—The omni-kernel architecture is designed around pervasive monitoring and scheduling. Motivated by new requirements in virtualized environments, this architecture ensures that all resource consumption is measured, that resource consumption resulting from a scheduling decision is attributable to an activity, and that scheduling decisions are fine-grained. Vortex, implemented for multi-core x86-64 platforms, instantiates the omni-kernel architecture, providing a wide range of operating system functionality and abstractions. With Vortex, we experimentally demonstrated the efficacy of the omni-kernel architecture to provide accurate scheduler control over resource allocation despite competing workloads. Experiments involving Apache, MySQL, and Hadoop quantify the cost of pervasive monitoring and scheduling in Vortex to be below 6 percent of CPU consumption. Index Terms—Virtualization, multi-core, resource management, scalability, scheduling Ç 1INTRODUCTION N a cloud environment, virtual machine monitors (VMMS) resource consumption to activities, which may be processes, Icontrol what physical resources are available to virtual services, database transactions, VMS, or any other units of machines (VMS). For example, to prioritize I/O requests execution. from a particular VM,aVMM must monitor and schedule all Accurate attribution of resource consumption to activi- resource allocation. Failure to identify or prioritize vm-asso- ties allows fine-grained billing information to be generated ciated work subverts prioritization at other levels of the for tenants that share a platform. For example, bad memory software stack [1], [2], [3], [4], [5]. locality or caching performance can be exposed and penal- Modern VMMS are often implemented as extensions to an ized if page transfer costs are correctly attributed and billed. existing operating system (OS) or use a privileged OS to pro- The capability for associating schedulers with any and all vide functionality [7], [9]. Xen [6] and Hyper-V [8] for exam- resources makes an omni-kernel instance well suited for ple, rely on a privileged OS to provide drivers for physical preventing execution by one tenant from usurping resour- devices, device emulation, administrative tools, and trans- ces intended for another. This form of isolation is critical for formations along the I/O path (e.g., device aggregation, enforcing service level objectives required by VMMS. encryption, etc.). Requirements imposed on a VMM impact This paper presents an omni-kernel instance, Vortex. Vor- the supporting OS. The fine-grained control required for a vir- tex implements a wide range of commodity OS functionality tualized environment is a new OS challenge, and no OS has yet and is capable of providing execution environments for taken pervasive monitoring and scheduling as its focus. applications such as Apache, MySQL, and Hadoop. Vortex This paper presents the omni-kernel architecture, which allowed us to quantify the cost of pervasive monitoring and offers visibility and opportunity for control over resource scheduling that defines omni-kernel architectures. Experi- allocation in a computing system. All system devices (e.g., ments reported in Section 4 demonstrate that, for complex processors, memory, or I/O controllers) and higher-level applications, no more than 6 percent of application CPU con- resources (e.g., files and TCP) can have their usage monitored sumption is overhead. and controlled by schedulers. This resource management is The contributions of this work include: accomplished by factoring the OS kernel into fine-grained components that communicate using messages and inter- The omni-kernel architecture,anOS architecture that posing message schedulers on communication paths. These offers a unified approach to resource-usage account- schedulers control message processing order and attribute ing and attribution, with a system structure that allows resources to be scheduled individually or in a A. Kvalnes, D. Johansen, and S. V. Valvag are with the Department of coordinated fashion. Computer Science, University of Tromsø, the Arctic University of A demonstration of the Vortex kernel for multi-core Norway, 9037, Tromsø, Norway. E-mail: {aage, dag, steffenv}@cs.uit.no. x86-64 platforms. Vortex instantiates the omni-kernel R. van Renesse and F. B. Schneider are with the Department of Computer Science, Cornell University, Ithaca, NY 14853-7501. architecture and implements commodity abstrac- E-mail: {rvr, fbs}@cs.cornell.edu. tions, such as processes, threads, virtual memory, Manuscript received 31 Dec. 2013; revised 11 Sept. 2014; accepted 19 Sept. files, and network communication. 2014. Date of publication 8 Oct. 2014; date of current version 2 Sept. 2015. Vortex experimentally corroborates the efficacy of the Recommended for acceptance by L. Xiao. omni-kernel architecture by implementing accurate For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. scheduler control over resource consumption despite Digital Object Identifier no. 10.1109/TPDS.2014.2362540 competing workloads and doing so at low cost. 1045-9219 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 2850 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015 activity is used initially as a placeholder. For example, a CPU might not support identifying activities with separate interrupt vectors, making the identification of an activity to attribute part of interrupt handling. Similarly, a network packet that is received might have to be demultiplexed before attribution to an activity can be determined. Associating consumption with a substitute activity creates the ability to control available resources. Also, substitute activities are convenient when it is apparent in advance that some resource consumption can not readily be attributed. For Fig. 1. A scheduler controls the order in which resource request mes- example, a file system would typically store meta-informa- sages are dispatched. tion about multiple files in the same disk blocks. By attribut- ing the transfer-cost of such blocks to a substitute activity, The remainder of the paper is organized as follows. Section consumption can still be quantified and policies for cost- 2 presents the omni-kernel architecture, and Section 3 gives sharing can more easily be effectuated. an exposition of important elements in the Vortex implemen- The omni-kernel uses resource consumption records to con- tation. In Section 4, we describe performance experiments vey resource usage to schedulers. Instrumentation code that show the extent to which Vortex controls resource utiliza- measures CPU and memory consumption to handle a mes- tion and the overhead for that. Related work is discussed in sage; resource consumption is then described by a resource Section 5, and Section 6 offers some conclusions. consumption record that is reported to the dispatching scheduler. Additional consumption can be reported by 2OMNI-KERNEL ARCHITECTURE instrumentation code that is part of the resource itself. For example, a disk driver could report the time to complete a In the omni-kernel architecture an OS is structured as a set of components that communicate using messages; message request and the size of the queue of pending requests at the schedulers can be interpositioned on all of these communi- disk controller. cation paths. Message processing accompanies resource Resource consumption records can be saved until an usage, and control over resource consumption is achieved accountable activity has been determined, allowing further by allowing schedulers to decide the order in which mes- improvements to attribution accuracy. For example, if sages originating from different activities are processed. demultiplexing determines that a network packet is associated with a particular TCP connection, then retained records can be used to attribute the associated activity and reim- 2.1 Resources, Messages, and Schedulers burse the placeholder activity. An omni-kernel resides in a single address space but is sep- arated into components that exchange messages for their operation. A resource in the omni-kernel can be any software 2.3 Dependencies, Load Sharing, and Scalability component exporting an interface for access to and use of Dependencies among messages arise, for example, due to hardware or software, such as an I/O device, a network pro- consistency requirements on consecutive writes to the same tocol layer, or a layer in a file system. A resource uses the location in a file. Such dependencies are described using functionality provided by another by sending a resource dependency labels in messages. Messages having the same request message, which specifies arguments and a function to dependency label are always processed in the order made. invoke at the interface of the receiver. The sender is not So a scheduler can read, modify, and reorder a request delayed. Messages are deposited in request queues associated queue provided all dependency label constraints are main- with destination

Omni-Kernel: an Operating System Architecture for Pervasive Monitoring and Scheduling

Comparison of Contemporary Real Time Operating Systems

Porting Embedded Systems to Uclinux

Rtlinux and Embedded Programming

Embedded GNU/Linux and Real-Time an Executive Summary

Isolation, Resource Management, and Sharing in Java

Libresource - Getting System Resource Information with Standard Apis Tuesday, 13 November 2018 14:55 (15 Minutes)

Introduction to Performance Management

Linux® Resource Administration Guide

SUSE Linux Enterprise Server 11 SP4 System Analysis and Tuning Guide System Analysis and Tuning Guide SUSE Linux Enterprise Server 11 SP4

L4-Linux Based System As a Platform for EPICS Ioccore

IBM Power Systems Virtualization Operation Management for SAP Applications

Building Ethernet Drivers on Rtlinux-GPL1 1 Introduction