The Case Against User-Level Networking

Total Page:16

File Type:pdf, Size:1020Kb

The Case Against User-Level Networking The Case Against User-Level Networking Kostas Magoutis Margo Seltzer Eran Gabber IBM T. J. Watson Research Center Harvard University Google, Inc. 19 Skyline Drive 33 Oxford Street 1440 Broadway, 21st Floor Hawthorne, NY 10532 Cambridge, MA 02138 New York, NY 10018 Abstract— Extensive research on system support for enabling mote direct data placement (RDDP) [16] enables an operating I/O-intensive applications to achieve performance close to the system (OS) structure that supports user-level file caching [25] limits imposed by the hardware suggests two main approaches: and user-level network buffering [3]. User-level caching of- Low overhead I/O protocols and the flexibility to customize I/O policies to the needs of applications. One way to achieve fers full control over I/O policies, an essential requirement both is by supporting user-level access to I/O devices, enabling for resource-intensive applications, such as databases. This user-level implementations of I/O protocols. User-level networking paper describes such an OS structure that blends ideas from is an example of this approach, specific to network interface traditional and novel/radical OS architectures: In accordance controllers (NICs). In this paper, we argue that the real key to with standard OS principles, resource abstractions, such as high-performance in I/O-intensive applications is user-level file caching and user-level network buffering, both of which can be files and network sockets, and global system policies, are achieved without user-level access to NICs. implemented in the kernel. However, similarly in spirit to Avoiding the need to support user-level networking carries more radical system designs, such as the Exokernel [20], full two important benefits for overall system design: First, a NIC control over application-specific I/O policies is possible with exporting a privileged kernel interface is simpler to design and caching implemented in user space. This caching, however, implement than one exporting a user-level interface. Second, the kernel is re-instated as a global system resource controller and is performed over the file or network socket abstractions arbitrator. We develop an analytical model of network storage rather than over the raw hardware. The key advantages of this applications and use it to show that their performance is not approach over Exokernel’s are improved security and safety affected by the use of a kernel-based API to NICs. due to a larger common kernel code-base, support for global resource policies, and improved portability. I. INTRODUCTION In summary, the main arguments presented in this paper The need to reduce networking overhead in system-area are: First, NICs exporting a user-level host interface pose networks in the early 1990’s motivated a flurry of research on implementation challenges not present in NICs exporting an user-level networking protocols. Projects such as SHRIMP [2], interface to a trusted host entity such as the kernel. Second, Hamlyn [4], and U-Net [36] proposed user-level access to a with the transport protocol offloaded to the NIC, the key to network interface controller (NIC) as an approach that offered lowering networking overhead is remote direct data placement. two primary benefits: First, it enabled host-based implemen- RDDP enables a new OS structure that supports user-level file tations of new, lightweight networking protocols with lower caching and user-level network buffering. The performance of overhead compared to kernel-based TCP/IP protocol stacks. a user-level API to file and network services should be similar Second, for applications requiring use of the TCP/IP protocol to the performace of a kernel-based API, since the difference is stack, there is a potential for application-specific customization only in user-kernel boundary crossing, which is not a dominant of user-level libraries. In recent years, the need for scalable cost. Third, kernel involvement is necessary in order to enforce and more manageable services, such as HTTP servers and global policies. Systems that advocate implementing all system network storage systems, leads to increased amounts of data policies in user-space libraries [20] do not offer a solution to transferred through the network. This trend links overall I/O the problem of implementing global policies without involving performance to the efficiency of the network subsystem and a privileged server. In addition, security and safety are other makes network interface controllers (NICs) a key system I/O benefits of our proposed design: For example, common kernel- device. The large-scale deployment of high-speed (1 Gb/s and based networking code may be more robust against attacks soon 10 Gb/s) Ethernet networks stimulated interest in the than a multitude of user-level networking stacks. A benefit of design of systems that offload TCP/IP to a new generation of user-level networking is better portability due to bypassing the NICs [27] and can transfer data directly between the network OS. However, a kernel interface that is simple to implement and application buffers. Many such TCP-offload NICs are should be rapidly incorporated into most mainstream operating currently being designed to export user-level interfaces to host systems. applications. The layout of this paper is as follows: In Section III we In this paper, we show that a user-level interface to NICs compare the benefits of user-level networking to those of a offering transport offload is not necessary. A kernel host kernel interface to a NIC offering transport offload and RDDP. interface to the NIC in combination with NIC support for re- In Section IV we describe an OS structure that takes advantage of RDDP to support user-level file caching and user-level artifact, rather than to the use of hardware address space pro- network buffering. In Section V we develop an analytical tection. Research on systems such as L4 [23] and Pebble [14] model of network storage applications to study the effect of the showed that the overhead of switching protection domains user-kernel protection boundary crossing on performance. Our could be as low as 150 machine cycles, allowing for efficient motivation to use an analytical model backed by experimental modular designs using hardware address spaces. In addition, measurements of key parameters, instead of a full experimental we believe that increased cache and TLB sizes in modern system comparison, is based on the fact that the latter depends CPUs will help to alleviate cache and TLB miss overheads. significantly on the quality of the implementations and focuses Another significant factor contributing to network overhead on a particular point of the design space. An analytical model is hardware interrupts, as shown in a recent study of network enables the examination of the entire design space, provides protocol performance over high speed networks [6]. Detecting significant qualitative results, and points to the key parameters I/O completions by polling is an alternative that is offered by affecting system performance. Earlier studies of operating many user-level networking protocols [9, 36]. This mechanism system structure, such as the classical work by Lauer and can reduce network overhead in network storage applications Needham [21], have also used analytical modeling to reason as shown by Zhou and her colleagues [37]. Polling I/O about system performance. Finally, in Section VI we present completions, however, does not necessarily depend on user- experimental results. level access to the NIC and is equally applicable within the kernel. II. RELATED WORK User-level NICs enable the construction of user-level op- Support for user-level networking was first offered in su- erating system libraries1 (libOSes) such as proposed by Ex- percomputers such as the Thinking Machines CM-5 [22], okernel [15, 20]. In the same spirit, the Direct Access File Meiko CS-2 [17], and IBM SP-2 [35], and later introduced System [25] (DAFS) is an example of a network attached in commodity networks by U-Net [36], Application Device storage system that proposes a user-level file system client. Channels (ADC) [13], Hamlyn [4], SHRIMP [2], DART [31] Just like in Exokernel, application-specific extensibility was and others. User-level networking was recently standardized one of the main motivations behind user-level networking. For with the Virtual Interface [9] (VI) and InfiniBand [18] architec- example, application-specific customization of the networking tures. In addition to the ability for user-level access to the NIC, stack in a user-level networking system was shown to be which is the defining feature of user-level NICs, most of these beneficial by enabling application-controlled flow control and projects also advocated a new host programming interface to feedback [36]. As the debate on extensibility technologies the NIC. This programming interface is based on a queue-pair eventually pointed out, however, the key is often in finding abstraction and requires pre-posting of receive buffers [3, 9]. It the right API to kernel services rather than supporting user- is important to note that this new programming interface is not level implementations of kernel subsystems. In many cases, a defining feature of user-level NICs and can be implemented parametrized kernel interfaces are sufficient from a perfor- without special NIC support [5]. In contrast, user-level access mance standpoint [12]. to the NIC necessarily requires special NIC support, increasing Another motivation for user-level networking was the ob- the complexity of NIC
Recommended publications
  • A Programmable Microkernel for Real-Time Systems∗
    A Programmable Microkernel for Real-Time Systems∗ Christoph M. Kirsch Marco A.A. Sanvido Thomas A. Henzinger University of Salzburg VMWare Inc. EPFL and UC Berkeley [email protected] tah@epfl.ch ABSTRACT Categories and Subject Descriptors We present a new software system architecture for the im- D.4.7 [Operating Systems]: Organization and Design— plementation of hard real-time applications. The core of the Real-time systems and embedded systems system is a microkernel whose reactivity (interrupt handling as in synchronous reactive programs) and proactivity (task General Terms scheduling as in traditional RTOSs) are fully programma- Languages ble. The microkernel, which we implemented on a Strong- ARM processor, consists of two interacting domain-specific Keywords virtual machines, a reactive E (Embedded) machine and a proactive S (Scheduling) machine. The microkernel code (or Real Time, Operating System, Virtual Machine microcode) that runs on the microkernel is partitioned into E and S code. E code manages the interaction of the system 1. INTRODUCTION with the physical environment: the execution of E code is In [9], we advocated the E (Embedded) machine as a triggered by environment interrupts, which signal external portable target for compiling hard real-time code, and in- events such as the arrival of a message or sensor value, and it troduced, in [11], the S (Scheduling) machine as a universal releases application tasks to the S machine. S code manages target for generating schedules according to arbitrary and the interaction of the system with the processor: the exe- possibly non-trivial strategies such as nonpreemptive and cution of S code is triggered by hardware interrupts, which multiprocessor scheduling.
    [Show full text]
  • The Design of the EMPS Multiprocessor Executive for Distributed Computing
    The design of the EMPS multiprocessor executive for distributed computing Citation for published version (APA): van Dijk, G. J. W. (1993). The design of the EMPS multiprocessor executive for distributed computing. Technische Universiteit Eindhoven. https://doi.org/10.6100/IR393185 DOI: 10.6100/IR393185 Document status and date: Published: 01/01/1993 Document Version: Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.
    [Show full text]
  • Chapter 1. Origins of Mac OS X
    1 Chapter 1. Origins of Mac OS X "Most ideas come from previous ideas." Alan Curtis Kay The Mac OS X operating system represents a rather successful coming together of paradigms, ideologies, and technologies that have often resisted each other in the past. A good example is the cordial relationship that exists between the command-line and graphical interfaces in Mac OS X. The system is a result of the trials and tribulations of Apple and NeXT, as well as their user and developer communities. Mac OS X exemplifies how a capable system can result from the direct or indirect efforts of corporations, academic and research communities, the Open Source and Free Software movements, and, of course, individuals. Apple has been around since 1976, and many accounts of its history have been told. If the story of Apple as a company is fascinating, so is the technical history of Apple's operating systems. In this chapter,[1] we will trace the history of Mac OS X, discussing several technologies whose confluence eventually led to the modern-day Apple operating system. [1] This book's accompanying web site (www.osxbook.com) provides a more detailed technical history of all of Apple's operating systems. 1 2 2 1 1.1. Apple's Quest for the[2] Operating System [2] Whereas the word "the" is used here to designate prominence and desirability, it is an interesting coincidence that "THE" was the name of a multiprogramming system described by Edsger W. Dijkstra in a 1968 paper. It was March 1988. The Macintosh had been around for four years.
    [Show full text]
  • Research Purpose Operating Systems – a Wide Survey
    GESJ: Computer Science and Telecommunications 2010|No.3(26) ISSN 1512-1232 RESEARCH PURPOSE OPERATING SYSTEMS – A WIDE SURVEY Pinaki Chakraborty School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi – 110067, India. E-mail: [email protected] Abstract Operating systems constitute a class of vital software. A plethora of operating systems, of different types and developed by different manufacturers over the years, are available now. This paper concentrates on research purpose operating systems because many of them have high technological significance and they have been vividly documented in the research literature. Thirty-four academic and research purpose operating systems have been briefly reviewed in this paper. It was observed that the microkernel based architecture is being used widely to design research purpose operating systems. It was also noticed that object oriented operating systems are emerging as a promising option. Hence, the paper concludes by suggesting a study of the scope of microkernel based object oriented operating systems. Keywords: Operating system, research purpose operating system, object oriented operating system, microkernel 1. Introduction An operating system is a software that manages all the resources of a computer, both hardware and software, and provides an environment in which a user can execute programs in a convenient and efficient manner [1]. However, the principles and concepts used in the operating systems were not standardized in a day. In fact, operating systems have been evolving through the years [2]. There were no operating systems in the early computers. In those systems, every program required full hardware specification to execute correctly and perform each trivial task, and its own drivers for peripheral devices like card readers and line printers.
    [Show full text]
  • Microkernel Construction Introduction
    Microkernel Construction Introduction Nils Asmussen 04/09/2020 1 / 32 Normal Organization Thursday, 4th DS, 2 SWS Slides: www.tudos.org ! Studies ! Lectures ! MKC Subscribe to our mailing list: www.tudos.org/mailman/listinfo/mkc2020 In winter term: Microkernel-based operating systems (MOS) Various labs 2 / 32 Organization due to COVID-19 Slides and video recordings of lectures will be published Questions can be asked on the mailing list Subscribe to the mailing list! Practical exercises are planed for the end of the semester Depending on how COVID-19 continues, exercises are in person or we use some video-conferencing tool 3 / 32 Goals 1 Provide deeper understanding of OS mechanisms 2 Look at the implementation details of microkernels 3 Make you become enthusiastic microkernel hackers 4 Propaganda for OS research done at TU Dresden and Barkhausen Institut 4 / 32 Outline Organization Monolithic vs. Microkernel Kernel design comparison Examples for microkernel-based systems Vision vs. Reality Challenges Overview About L4/NOVA 5 / 32 Monolithic Kernel System Design u s Application Application Application e r k Kernel e r File Network n e Systems Stacks l m Memory Process o Drivers Management Management d e Hardware 6 / 32 Monolithic Kernel OS (Propaganda) System components run in privileged mode No protection between system components Faulty driver can crash the whole system Malicious app could exploit bug in faulty driver More than 2=3 of today's OS code are drivers No need for good system design Direct access to data structures Undocumented
    [Show full text]
  • Linux on the OSF Mach3 Microkernel
    Linux on top of the OSF Mach3 Microkernel Linux on top of the OSF Mach3 Microkernel Linux on top of OSF MK The Mach3.0 Microkernel: Design • microkernel abstractions François Barbou des Places ([email protected]) • task, thread, port, message, memory object Nick Stephen ([email protected]) • resources and services represented as ports OSF Research Institute - Grenoble • interfaces implemented via RPC to an object port Task Message • Objectives: • widen the audience for the free OSF micro-kernel Ports • provide a complete and free platform to develop new Threads servers, applications or kernel extensions • provide Apple with a freely redistributable microkernel-based operating system for the Memory Apple/PowerMac platform Object Freely Redistributable Software Conference - 5 Feb 1996 - Cambridge 1/17 Freely Redistributable Software Conference - 5 Feb 1996 - Cambridge 2/17 Linux on top of the OSF Mach3 Microkernel Linux on top of the OSF Mach3 Microkernel The Mach3.0 microkernel: Benefits OSF MK Improvements • portability (modular design) • performance • symmetric multi-processing support • kernel-loaded servers (collocation) • thread migration • scalability (from desktop PCs to high-end multi-computers) • short-circuited RPC • extensibility • real-time • OS-neutrality • preemptable kernel • support for OS personalities running as user tasks • real-time RPC, priority inheritance • pageable, preemptable • distributed system (clusters and multi-computers) • more portable: isolated from hardware • DIPC: transparent Distributed IPC • simpler:
    [Show full text]
  • Scalability of Microkernel-Based Systems
    Scalability of Microkernel-Based Systems Zur Erlangung des akademischen Grades eines DOKTORS DER INGENIERWISSENSCHAFTEN von der Fakultat¨ fur¨ Informatik der Universitat¨ Fridericiana zu Karlsruhe (TH) genehmigte DISSERTATION von Volkmar Uhlig aus Dresden Tag der mundlichen¨ Prufung:¨ 30.05.2005 Hauptreferent: Prof. Dr. rer. nat. Gerhard Goos Universitat¨ Fridericiana zu Karlsruhe (TH) Korreferent: Prof. Dr. sc. tech. (ETH) Gernot Heiser University of New South Wales, Sydney, Australia Karlsruhe: 15.06.2005 i Abstract Microkernel-based systems divide the operating system functionality into individ- ual and isolated components. The system components are subject to application- class protection and isolation. This structuring method has a number of benefits, such as fault isolation between system components, safe extensibility, co-existence of different policies, and isolation between mutually distrusting components. How- ever, such strict isolation limits the information flow between subsystems including information that is essential for performance and scalability in multiprocessor sys- tems. Semantically richer kernel abstractions scale at the cost of generality and mini- mality–two desired properties of a microkernel. I propose an architecture that al- lows for dynamic adjustment of scalability-relevant parameters in a general, flex- ible, and safe manner. I introduce isolation boundaries for microkernel resources and the system processors. The boundaries are controlled at user-level. Operating system components and applications can transform their semantic information into three basic parameters relevant for scalability: the involved processors (depending on their relation and interconnect), degree of concurrency, and groups of resources. I developed a set of mechanisms that allow a kernel to: 1. efficiently track processors on a per-resource basis with support for very large number of processors, 2.
    [Show full text]
  • The Outlook for Scalable Parallel Processing
    @) I AUTHOR'S COPY Decision Resources, Inc. Bay Colony Corporate Center 1100 Winter Street Information Systems Industry Waltham, Massachusetts 02154 Telephone 617.487.3700 The Outlook for Scalable Telefax 617.487.5750 Parallel Processing Gordon Bell Consultant to Decision Resources, Inc. It is likely that this decade will usher in the beginning Business Implications of an era in which general-purpose1 scalable parallel - A * Scalable, massively parallel processing computers computers assume most of the applications currently promise to become the most costeffective ap run on mainframes, supercomputers, and specialized proach to computing within the next decade, and scalable computers. A scalable computer is a com- the means by which to solve particular, difficult, puter designed from a small number of basic compo- large-scale commercial and technical problems. nents, without a single bottleneck component, so that The commercial and technical markets are funda- the computer can be incrementally expanded over its mentally different. Massively parallel processors designed scaling range, delivering linear incremental may be more useful for commercial applications performance for a well-defined set of scalable applica- because of the parallelism implicit in accessing a tions. General-purpose scalable computers provide a database through multiple, independent transac- wide range of processing, memory size, and 1/0 re- tions. Ease of programming will be the principal sources. Scalability is the degree to which perform- factor that determines how rapidly this class of ance increments of a scalable computer are linear. computer architecture will penetrate the general- Ideally, an application should be usable at all com- purpose computing market. puter size scales and operate with constant efficiency.
    [Show full text]
  • R00456--FM Getting up to Speed
    GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING Susan L. Graham, Marc Snir, and Cynthia A. Patterson, Editors Committee on the Future of Supercomputing Computer Science and Telecommunications Board Division on Engineering and Physical Sciences THE NATIONAL ACADEMIES PRESS Washington, D.C. www.nap.edu THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The project that is the subject of this report was approved by the Gov- erning Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engi- neering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for ap- propriate balance. Support for this project was provided by the Department of Energy under Spon- sor Award No. DE-AT01-03NA00106. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the organizations that provided support for the project. International Standard Book Number 0-309-09502-6 (Book) International Standard Book Number 0-309-54679-6 (PDF) Library of Congress Catalog Card Number 2004118086 Cover designed by Jennifer Bishop. Cover images (clockwise from top right, front to back) 1. Exploding star. Scientific Discovery through Advanced Computing (SciDAC) Center for Supernova Research, U.S. Department of Energy, Office of Science. 2. Hurricane Frances, September 5, 2004, taken by GOES-12 satellite, 1 km visible imagery. U.S. National Oceanographic and Atmospheric Administration. 3. Large-eddy simulation of a Rayleigh-Taylor instability run on the Lawrence Livermore National Laboratory MCR Linux cluster in July 2003.
    [Show full text]
  • Reversals of Fortune
    CHAPTER ONE Reversals of Fortune t was a warm, breezy day in the early summer of 1994 as I drove south I on Middlefield Road in Mountain View, California, on my way to a new startup named Mosaic Communications, Inc. I had arranged to meet some old Apple Computer friends who had bailed out of Apple to join the new company only a few days earlier. My nethead friends were overly enamored, for my tastes, with the recently commercialized ARPANet, now becoming the Internet. In fact, Apple Computer hadn't paid much attention to their Internet dreams, so they left. Apple was never going to care about networks, so why not move to a networking company? Even I thought the "Internet business" was far too techie for consumers. But when my friends were offered stock options to play with the Internet, they bolted. I thought my friends crazy for jumping the Apple Computer ship to swim with an unknown startup. After all, Apple Computer was going strong, growing its worldwide share of the personal computer market to nearly 10 percent, making it second only to all-powerful IBM. The per- sonal computer market was up for grabs as nobody had more than 11-12 percent of the blooming global market. Things could only get bet- ter for Apple. Mosaic could be gone in 6 months. A classic rock station blasted from my radio as I waited for a red stoplight to turn. The emblem of a polished horse glistened from the car next to me. These Ferrari's are a dime a dozen in Silicon Valley, I thought.
    [Show full text]
  • REAL-TIME SYSTEMS: an INTRODUCTION Keep up with the Data Stream from the Landing Radar
    R REAL-TIME SYSTEMS: AN INTRODUCTION keep up with the data stream from the landing radar. AND THE STATE-OF-THE-ART However, it was discovered that the missed deadlines were nonfatal, and the scheduler automatically adjusted, INTRODUCTION to meet soft real-time behavior for the landing tasks. Our goal in this article is to give an overview of the broad Periodic and Aperiodic Tasks area of real-time systems. This task daunting because real- Real-time applications are also classified depending on the time systems are everywhere, and yet no generally tasks that comprise the application. In some systems, tasks accepted definition differentiates real-time systems from are executed repetitively within a specified period. A task t non–real-time systems. We will make an attempt at pro- i is characterized as (p ,e), where p its periodicity and e is viding a general overview of the different classes of real- i i i i its (worst-case) execution time. Monitoring patient’s vitals time systems, scheduling of tasks (or threads) in such is an example of such a system. Hard real-time systems are systems, design tools and environments for real-time sys- usually designed using periodic tasks, and static schedul- tems, real-time operating systems, and embedded systems. ing can be used for periodic tasks. We will conclude our discussions with research challenges Aperiodic tasks are tasks that are executed on demand that still remain. (or with unknown period). A task is executed in response to an event, and a task t is characterized as (a ,r,e,d) where Definitions i i i i i ai its the arrival time.
    [Show full text]
  • Microkernel Virtualization Under One Roof - Dare the Impossible
    Microkernel virtualization under one roof - dare the impossible - Alexander Böttcher <[email protected]> Outline 1. Introduction 2. Kernel interfaces 3. VM interface harmonization 4. VMMs harmonized 5. Conclusion Microkernel virtualization under one roof - dare the impossible -2 Outline 1. Introduction 2. Kernel interfaces 3. VM interface harmonization 4. VMMs harmonized 5. Conclusion Microkernel virtualization under one roof - dare the impossible -3 Motivation Off-the-shell virtualization solution ridden with complexity. Application of virtualization call for trustworthy solutions. Complexity defeats trust. Alternative approach → Microkernels with hardware assisted virtualization extensions Microkernel virtualization under one roof - dare the impossible -4 Component based virtualization architecture Guest OS Guest OS Guest OS non-root mode root mode VMM VMM VMM Resource management Apps Drivers 9,000 SLOC kernel NOVA Microhypervisor Microkernel virtualization under one roof - dare the impossible -5 Genode OS framework Microkernel virtualization under one roof - dare the impossible -6 General supported kernels on Genode Microkernel virtualization under one roof - dare the impossible -7 Kernels with hardware assisted virtualization Microkernel virtualization under one roof - dare the impossible -8 VMM inventory of Genode Hardware assisted virtualization/separation support Microkernel Host VMM Guest vCPU hw ARM, 32bit custom 1, 32bit hw/trustzone ARM, 32bit custom 1, 32bit hw with Muen Intel, 64bit VBox 4 1, 32bit Seoul N,
    [Show full text]