Extreme High Performance Computing Or Why Microkernels Suck

Total Page:16

File Type:pdf, Size:1020Kb

Extreme High Performance Computing Or Why Microkernels Suck Extreme High Performance Computing or Why Microkernels Suck Christoph Lameter sgi [email protected] Abstract difficulties seemed to shift to other areas having more to do with the limitation of the hardware and firmware. One often wonders how well Linux scales. We fre- We were then able to further double the processor count quently get suggestions that Linux cannot scale because to two thousand and finally four thousand processors it is a monolithic operating system kernel. However, mi- and we were still encountering only minor problems that crokernels have never scaled well and Linux has been were easily addressed. We expect to be able to handle scaled up to support thousands of processors, terabytes 16k processors in the near future. of memory and hundreds of petabytes of disk storage which is the hardware limit these days. Some of the As the number of processors grew so did the amount of techniques used to make Linux scale were per cpu ar- memory. In early 2007, machines are deployed with 8 eas, per node structures, lock splitting, cache line op- terabytes of main memory. Such a system with a huge timizations, memory allocation control, scheduler opti- amount of memory and a large set of processors creates mizations and various other approaches. These required the high performance capabilities in a traditional Unix significant detail work on the code but no change in the environment that allows for the running of traditional general architecture of Linux. applications, avoiding major efforts to redesign the ba- sic logic of the software. Competing technologies, such The presentation will give an overview of why Linux as compute clusters, cannot offer such an environment. scales and shows the hurdles microkernels would have Clusters consist of many nodes that run their own oper- to overcome in order to do the same. The presentation ating systems whereas scaling up a monolithic operating will assume a basic understanding of how operating sys- system has a single address space and a single operating tems work and familiarity with what functions a kernel system. The challenge in clustered environments is to performs. redesign the applications so that processing can be done concurrently on nodes that only communicate via a net- work. A large monolithic operating system with lots of 1 Introduction processors and memory is easier to handle since pro- cesses can share memory which makes synchronization Monolithic kernels are in wide use today. One wonders via Unix shared memory possible and data exchange though how far a monolithic kernel architecture can be simple. scaled, given the complexity that would have to be man- aged to keep the operating system working reliably. We Large scale operating systems are typically based on ourselves were initially skeptical that we could go any Non-Uniform Memory Architecuture (NUMA) technol- further when we were first able to run our Linux kernel ogy. Some memory is nearer to a processor than other with 512 processors because we encountered a series of memory that may be more distant and more costly to ac- scalability problems that were due to the way the op- cess. Memory locality in such a system determines the erating system handled the unusually high amounts of overall performance of the applications. The operating processes and processors. However, it was possible to system has a role in that context of providing heuristics address the scalability bottlenecks with some work us- in order to place the memory in such a way that memory ing a variety of synchronization methods provided by latencies are reduced as much as possible. These have to the operating system and we were then surprised to only be heuristics because the operating system cannot know encounter minimal problems when we later doubled the how an application will access allocated memory in the number of processors to 1024. At that point the primary future. The access patterns of the application should ide- • 251 • 252 • Extreme High Performance Computing or Why Microkernels Suck ally determine the placement of data but the operating nication between different components must be con- system has no way of predicting application behavior. trolled through an Inter Process Communication mech- A new memory control API was therefore added to the anism that incurs similar overhead to a system call in operating system so that applications can set up memory monolithic kernel. Typically microkernels use message allocation policies to guide the operating system in allo- queues to communicate between different components. cating memory for the application. The notion of mem- In order to communicate between two components of a ory allocation control is not standardized and so most microkernel the following steps have to be performed: contemporary programming languages have no ability to manage data locality on their own.1 Libraries need to be provided that allow the application to provide infor- 1. The originating thread in the context of the orig- mation to the operating system about desirable memory inating component must format and place the re- allocation strategies. quest (or requests) in a message queue. One idea that we keep encountering in discussions of 2. The originating thread must somehow notify the these large scale systems is that Micro-kernels should destination component that a message has arrived. allow us to handle the scalability issues in a better way Either interrupts (or some other form of signal- and that they may actually allow a better designed sys- ing) are used or the destination component must be tem that is easier to scale. It was suggested that a micro- polling its message queue. kernel design is essential to manage the complexity of the operating systems and ensure its reliable operation. 3. It may be necessary for the originating thread to We will evaluate that claim in the following sections. perform a context switch if there are not enough processors around to continually run all threads (which is common). 2 Micro vs. Monolithic Kernel 4. The destination component must now access the Microkernels allow the use of the context control primi- message queue and interpret the message and then tives of the processor to isolate the various components perform the requested action. Then we potentially of the operating system. This allows a fine grained de- have to redo the 4 steps in order to return the result sign of the operating system with natural APIs at the of the request to the originating component. boundaries of the subsystems. However, separate ad- dress spaces require context switches at the boundaries which may create a significant overhead for the proces- A monolithic operating system typically uses function sors. Thus many micro kernels are compromises be- calls to transfer control between subsystems that run in tween speed and the initially envisioned fine grained the same operating system context: structure (a hybrid approach). To some extent that prob- lem can be overcome by developing a very small low 1. Place arguments in processor registers (done by the level kernel that fits into the processor cache (for exam- compiler). ple L4) [11], but then we no longer have an easily pro- grammable and maintainable operating system kernel. 2. Call the subroutine. A monolithic kernel usually has a single address space and all kernel components are able to access memory 3. Subroutine accesses registers to interpret the re- without restriction. quest (done by compiler). 4. Subroutine returns the result in another register. 2.1 IPC vs. function call Context switches have to be performed in order to iso- From the description above it is already evident that the late the components of a microkernel. Thus commu- monolithic operating system can rely on much lower level processor components than the microkernel and 1There are some encouraging developments in this area with Unified Parallel C supporting locality information in the language is well supported by existing languages used to code itself. See Tarek, [2]. for operating system kernels. The microkernel has to 2007 Linux Symposium, Volume One • 253 manipulate message queues which are higher level con- However, the isolation will introduce additional com- structs and—unlike registers—cannot be directly modi- plexities. Operating systems usually service applica- fied and handled by the processor.2 tions running on top of them. The operating system must track the state of the application. A failure of one In a large NUMA system an even more troubling issue key component typically includes also the loss of rele- arises: while a function call uses barely any memory vant state information about the application. Some mi- on its own (apart from the stack), a microkernel must crokernel components that track memory use and open place the message into queues. That queue must have a files may be so essential to the application that the ap- memory address. The queue location needs to be care- plication must terminate if either of these components fully chosen in order to make the data accessible in a fast fails. If one wanted to make a system fail safe in a mi- way to the other operating system component involved crokernel environment then additional measures, such in the message transfer. The complexity of making the as check pointing, may have to be taken in order to determination where to allocate the message queue will guarantee that the application can continue. However, typically be higher than the message handling overhead the isolation of the operating system state into different itself since such a determination will involve consult- modules will make it difficult to track the overall system ing system tables to figure out memory latencies. If state that needs to be preserved in order for check point- the memory latencies are handled by another compo- ing to work.
Recommended publications
  • Examining the Viability of MINIX 3 As a Consumer Operating
    Examining the Viability of MINIX 3 as a Consumer Operating System Joshua C. Loew March 17, 2016 Abstract The developers of the MINIX 3 operating system (OS) believe that a computer should work like a television set. You should be able to purchase one, turn it on, and have it work flawlessly for the next ten years [6]. MINIX 3 is a free and open-source microkernel-based operating system. MINIX 3 is still in development, but it is capable of running on x86 and ARM processor architectures. Such processors can be found in computers such as embedded systems, mobile phones, and laptop computers. As a light and simple operating system, MINIX 3 could take the place of the software that many people use every day. As of now, MINIX 3 is not particularly useful to a non-computer scientist. Most interactions with MINIX 3 are done through a command-line interface or an obsolete window manager. Moreover, its tools require some low-level experience with UNIX-like systems to use. This project will examine the viability of MINIX 3 from a performance standpoint to determine whether or not it is relevant to a non-computer scientist. Furthermore, this project attempts to measure how a microkernel-based operating system performs against a traditional monolithic kernel-based OS. 1 Contents 1 Introduction 5 2 Background and Related Work 6 3 Part I: The Frame Buffer Driver 7 3.1 Outline of Approach . 8 3.2 Hardware and Drivers . 8 3.3 Challenges and Strategy . 9 3.4 Evaluation . 10 4 Progress 10 4.1 Compilation and Installation .
    [Show full text]
  • Microkernels in a Bit More Depth • Early Operating Systems Had Very Little Structure • a Strictly Layered Approach Was Promoted by Dijkstra
    Motivation Microkernels In a Bit More Depth Early operating systems had very little structure A strictly layered approach was promoted by Dijkstra THE Operating System [Dij68] COMP9242 2007/S2 Week 4 Later OS (more or less) followed that approach (e.g., Unix). UNSW Such systems are known as monolithic kernels COMP9242 07S2 W04 1 Microkernels COMP9242 07S2 W04 2 Microkernels Issues of Monolithic Kernels Evolution of the Linux Kernel E Advantages: Kernel has access to everything: all optimisations possible all techniques/mechanisms/concepts implementable Kernel can be extended by adding more code, e.g. for: new services support for new harwdare Problems: Widening range of services and applications OS bigger, more complex, slower, more error prone. Need to support same OS on different hardware. Like to support various OS environments. Distribution impossible to provide all services from same (local) kernel. COMP9242 07S2 W04 3 Microkernels COMP9242 07S2 W04 4 Microkernels Approaches to Tackling Complexity Evolution of the Linux Kernel Part 2 A Classical software-engineering approach: modularity Software-engineering study of Linux kernel [SJW+02]: (relatively) small, mostly self-contained components well-defined interfaces between them Looked at size and interdependencies of kernel "modules" enforcement of interfaces "common coupling": interdependency via global variables containment of faults to few modules Analysed development over time (linearised version number) Doesn't work with monolithic kernels: Result 1:
    [Show full text]
  • Operating System Structure
    Operating System Structure Joey Echeverria [email protected] modified by: Matthew Brewer [email protected] Nov 15, 2006 Carnegie Mellon University: 15-410 Fall 2006 Overview • Motivations • Kernel Structures – Monolithic Kernels ∗ Kernel Extensions – Open Systems – Microkernels – Exokernels – More Microkernels • Final Thoughts Carnegie Mellon University: 15-410 Fall 2006 1 Motivations • Operating systems have a hard job. • Operating systems are: – Hardware Multiplexers – Abstraction layers – Protection boundaries – Complicated Carnegie Mellon University: 15-410 Fall 2006 2 Motivations • Hardware Multiplexer – Each process sees a “computer” as if it were alone – Requires allocation and multiplexing of: ∗ Memory ∗ Disk ∗ CPU ∗ IO in general (network, graphics, keyboard etc.) • If OS is multiplexing it must also allocate – Priorities, Classes? - HARD problems!!! Carnegie Mellon University: 15-410 Fall 2006 3 Motivations • Abstraction Layer – Presents “simple”, “uniform” interface to hardware – Applications see a well defined interface (system calls) ∗ Block Device (hard drive, flash card, network mount, USB drive) ∗ CD drive (SCSI, IDE) ∗ tty (teletype, serial terminal, virtual terminal) ∗ filesystem (ext2-4, reiserfs, UFS, FFS, NFS, AFS, JFFS2, CRAMFS) ∗ network stack (TCP/IP abstraction) Carnegie Mellon University: 15-410 Fall 2006 4 Motivations • Protection Boundaries – Protect processes from each other – Protect crucial services (like the kernel) from process – Note: Everyone trusts the kernel • Complicated – See Project 3 :) – Full
    [Show full text]
  • Based On: 2004 Deitel & Associates, Inc
    Based on: 2004 Deitel & Associates, Inc. Operating Systems Computer Science Department Prepared By Dr. Suleyman Al-Showarah 1.9 2000 and Beyond Middleware is computer software that provides services to software applications beyond those available from the operating system. Middleware Links two separate applications Often over a network and between incompatible machines – Particularly important for Web services Simplifies communication across multiple architectures Middleware : Software that acts as a bridge between an operating system or database and applications, especially on a network. 1.9 2000 and Beyond A Web service is a method of communication between two electronic devices over a network. Web services Encompass set of related standards Ready-to-use pieces of software on the Internet Enable any two applications to communicate and exchange data 1.10 Application Bases Application base Combination of hardware and operating system used to develop Applications Developers and users unwilling to abandon established application base Increased financial cost and time spent relearning What does Application Base mean? The application base is the directory, which contains all the files related to a .NET application, including the executable file (.exe) that loads into the initial or default application domain. 1.11 Operating System Environments Operating systems intended for high-end environments Special design requirements and hardware support needs Large main memory Special-purpose hardware Large numbers of processes Continue ... Embedded systems Characterized by small set of specialized resources Provide functionality to devices such as cell phones and PDAs (see next slide) Efficient resource management key to building successful operating system PDAs A personal digital assistant (PDA), also known as a handheld PC, or personal data assistant, is a mobile device that functions as a personal information manager.
    [Show full text]
  • Distribution and Operating Systems
    Distributed Systems | Distribution and Operating Systems Allan Clark School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/ds Autumn Term 2012 Distribution and Operating Systems Overview I This part of the course will be chiefly concerned with the components of a modern operating system which allow for distributed systems I We will examine the design of an operating system within the context that we expect it to be used as part of a network of communicating peers, even if only as a client I In particular we will look at providing concurrency of individual processes all running on the same machine I Concurrency is important because messages take time to send and the machine can do useful work in between messages which may arrive at any time I An important point is that in general we hope to provide transparency of concurrency, that is each process believes that it has sole use of the machine I Recent client machines such as smartphones, have, to some extent, shunned this idea Distribution and Operating Systems Operating Systems I An Operating System is a single process which has direct access to the hardware of the machine upon which it is run I The operating system must therefore provide and manage access to: I The processor I System memory I Storage media I Networks I Other devices, printers, scanners, coffee machines etc http://fotis.home.cern.ch/fotis/Coffee.html Distribution and Operating Systems Operating Systems I As a provider of access to physical resources we are interested in the operating system providing: I Encapsulation: Not only should the operating system provide access to physical resources but also hide their low-level details behind a useful abstraction that applications can use to get work done I Concurrent Processing: Applications may access these physcial resources (including the processor) concurrently, and the process manager is responsible for achieving concurrency transparency I Protection: Physical resources should only be accessed by processes with the correct permissions and then only in safe ways.
    [Show full text]
  • Linux? POSIX? GNU/Linux? What Are They? a Short History of POSIX (Unix-Like) Operating Systems
    Unix? GNU? Linux? POSIX? GNU/Linux? What are they? A short history of POSIX (Unix-like) operating systems image from gnu.org Mohammad Akhlaghi Instituto de Astrof´ısicade Canarias (IAC), Tenerife, Spain (founder of GNU Astronomy Utilities) Most recent slides available in link below (this PDF is built from Git commit d658621): http://akhlaghi.org/pdf/posix-family.pdf Understanding the relation between the POSIX/Unix family can be confusing Image from shutterstock.com The big bang! In the beginning there was ... In the beginning there was ... The big bang! Fast forward to 20th century... Early computer hardware came with its custom OS (shown here: PDP-7, announced in 1964) Fast forward to the 20th century... (∼ 1970s) I AT&T had a Monopoly on USA telecommunications. I So, it had a lot of money for exciting research! I Laser I CCD I The Transistor I Radio astronomy (Janskey@Bell Labs) I Cosmic Microwave Background (Penzias@Bell Labs) I etc... I One of them was the Unix operating system: I Designed to run on different hardware. I C programming language was designed for writing Unix. I To keep the monopoly, AT&T wasn't allowed to profit from its other research products... ... so it gave out Unix for free (including source). Unix was designed to be modular, image from an AT&T promotional video in 1982 https://www.youtube.com/watch?v=tc4ROCJYbm0 User interface was only on the command-line (image from late 80s). Image from stevenrosenberg.net. AT&T lost its monopoly in 1982. Bell labs started to ask for license from Unix users.
    [Show full text]
  • Openafs Client for Macos
    OpenAFS client for macOS Marcio Barbosa 2021 OpenAFS Workshop AGENDA • A high-level view of XNU • Kernel Extensions • Securing Modular Architecture • System Extensions • Apple Silicon • Conclusion • References / Contact A HIGH-LEVEL VIEW OF XNU A HIGH-LEVEL VIEW OF XNU • The Mac OS X kernel is called XNU. • Stands for X is Not UNIX. • Microkernel architecture? No, XNU is a hybrid kernel. FreeBSD Mach MONOLITHIC KERNELS • "Classic" kernel architecture. • Predominant in the UNIX and Linux realms. • All kernel functionality in one address space. • If any service fails, the whole system crashes. • Hard to extend. MICROKERNELS • Consists of only the core kernel functionality. • The rest of the functionality exported to external servers. • There exists complete isolation between the individual servers. • Communication between them is carried out by message passing. • Failure is contained. • Monolithic kernel failures usually trigger a complete kernel panic. • Performance can be an issue. HYBRID KERNELS • Hybrid kernels attempt to synthesize the best of both worlds. • The innermost core of the kernel is self-contained. • All other services are outside this core, but in the same memory space. • XNU is a hybrid. • The kernel is modular and allows for pluggable Kernel Extensions. • Absence of isolation exposes the system to bugs introduced by KEXTs. MONOLITHIC, MICROKERNELS, AND HYBRID Golftheman, Public domain, via Wikimedia Commons https://commons.wikimedia.org/wiki/File:OS-structure2.svg KERNEL EXTENSIONS KERNEL EXTENSIONS • No kernel can completely accommodate all the hardware, peripheral devices, and services available. • KEXTs are kernel modules, which may be dynamically inserted or removed on demand. • Augments kernel functionality with entirely self-contained subsystems.
    [Show full text]
  • Operating System Structure
    Operating System Structure Joey Echeverria [email protected] modified by: Matthew Brewer [email protected] rampaged through by: Dave Eckhardt [email protected] December 5, 2007 Carnegie Mellon University: 15-410 Fall 2007 Synchronization • P4 - due tonight • Homework 2 - out today, due Friday night • Book report - due Friday night (late days are possible) • Friday lecture - exam review • Exam - room change in progress; discard any cached values Carnegie Mellon University: 15-410 Fall 2007 1 Outline • OS responsibility checklist • Kernel structures – Monolithic kernels ∗ Kernel extensions – Open systems – Microkernels – Provable kernel extensions – Exokernels – More microkernels • Final thoughts Carnegie Mellon University: 15-410 Fall 2007 2 OS Responsibility Checklist • It’s not so easy to be an OS: 1. Protection boundaries 2. Abstraction layers 3. Hardware multiplexers Carnegie Mellon University: 15-410 Fall 2007 3 Protection Boundaries • Protection is “Job 1” – Protect processes from each other – Protect crucial services (like the kernel) from processes • Notes – Implied assumption: everyone trusts the kernel – Kernels are complicated ∗ See Project 3 :) ∗ Something to think about · Full OS is millions of lines of code · Very roughly: correctness ∝ 1/code size Carnegie Mellon University: 15-410 Fall 2007 4 Abstraction Layer • Present “simple”, “uniform” interface to hardware • Applications see a well defined interface (system calls) – Block Device (hard disk, flash card, network mount, USB drive) – CD drive (SCSI, IDE) – tty (teletype,
    [Show full text]
  • A Secure Computing Platform for Building Automation Using Microkernel-Based Operating Systems Xiaolong Wang University of South Florida, [email protected]
    University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2018 A Secure Computing Platform for Building Automation Using Microkernel-based Operating Systems Xiaolong Wang University of South Florida, [email protected] Follow this and additional works at: https://scholarcommons.usf.edu/etd Part of the Computer Sciences Commons Scholar Commons Citation Wang, Xiaolong, "A Secure Computing Platform for Building Automation Using Microkernel-based Operating Systems" (2018). Graduate Theses and Dissertations. https://scholarcommons.usf.edu/etd/7589 This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. A Secure Computing Platform for Building Automation Using Microkernel-based Operating Systems by Xiaolong Wang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science and Engineering Department of Computer Science and Engineering College of Engineering University of South Florida Major Professor: Xinming Ou, Ph.D. Jarred Ligatti, Ph.D. Srinivas Katkoori, Ph.D. Nasir Ghani, Ph.D. Siva Raj Rajagopalan, Ph.D. Date of Approval: October 26, 2018 Keywords: System Security, Embedded System, Internet of Things, Cyber-Physical Systems Copyright © 2018, Xiaolong Wang DEDICATION In loving memory of my father, to my beloved, Blanka, and my family. Thank you for your love and support. ACKNOWLEDGMENTS First and foremost, I would like to thank my advisor, Dr. Xinming Ou for his guidance, encouragement, and unreserved support throughout my PhD journey.
    [Show full text]
  • Kernel Operating System
    International Journal of Advanced Technology in Engineering and Science www.ijates.com Volume No.02, Special Issue No. 01, September 2014 ISSN (online): 2348 – 7550 KERNEL OPERATING SYSTEM Manjeet Saini1, Abhishek Jain2, Ashish Chauhan3 Department Of Computer Science And Engineering, Dronacharya College Of Engineering Khentawas, Farrukh Nagar, Gurgaon, Haryana, (India) ABSTRACT The central module of an operating system (OS) is the Kernel. It is the part of the operating system that loads first, and it remains in main memory. It is necessary for the kernel to be very small while still providing all the essential services needed by other parts of the OS because it stays in the memory. To prevent kernel code from being overwritten by programs or other parts of the operating system it is loaded into a protected area of memory. The presence of an operating system kernel is not a necessity to run a computer. Directly loading and executing the programs on the "bare metal" machine is possible, provided that the program authors are willing to do without any OS support or hardware abstraction. Many video game consoles and embedded systems still constitute the “bare metal” approach. But in general, newer systems use kernels and operating systems. Keywords: Scalability, Multicore Processors, Message Passing I. INTRODUCTION In computing, the kernel is a computer program that manages input/output requests from software, and translates them into data processing instructions for the central processing unit and other electronic components of a computer. When a computer program (in this context called a process) makes requests of the kernel, the request is called a system call.
    [Show full text]
  • Operating Systems Structures
    L6: Operating Systems Structures Sam Madden [email protected] 6.033 Spring 2014 Overview • Theme: strong isolation for operating systems • OS organizations: – Monolithic kernels – Microkernel – Virtual machines OS abstractions • Virtual memory • Threads • File system • IPC (e.g., pipes) • … Monolithic kernel (e.g., Linux) U sh ls K Kernel • Kernel is one large C program • Internal structure – E.g., object-oriented programming style • But, no enforced modularity Kernel program is growing • 1975 Unix kernel: 10,500 lines of code • 2012: Linux 3.2 300,000 lines: header files (data structures, APIs) 490,000 lines: networking 530,000 lines: sound 700,000 lines: support for 60+ file systems 1,880,000 lines: support for 25+ CPU architectures 5,620,000 lines: drivers 9,930,000 Total lines of code Linux kernel has bugs 5,000 bug reports fixed in ~7 years è 2+ day How bad is a bug? • Demo: – Insert kernel module – Every 10 seconds overwrites N locations in physical memory – N = 1, 2, 4, 8, 16, 32, 64, …. • What N makes Linux crash? Observations • Linux lasts surprisingly long • Maybe files were corrupted • Every bug is an opportunity for attacker • Can we enforce modularity within kernel? Microkernel organization: Apply Client/Server to kernel sh ls net pager FS driver … IPC, threads, page tables • User programs interact w. OS using RPC • Examples: QNX, L4, Minix, etc. Challenges • Communication cost is high – Much higher than procedure call • Isolating big components doesn’t help – If entire FS crashes, system unusable • Sharing between subsystems is difficult – Share buffer cache between pager and FS • Requires careful redesign Why is Linux not a pure microkernel? • Many dependencies between components • Redesign is challenging – Trade-off: new design or new features? • Some services are run as user programs: – X server, some USB drivers, SQL database, DNS server, SSH, etc.
    [Show full text]
  • Harmonizing Performance and Isolation in Microkernels with Efficient Intra-Kernel Isolation and Communication
    Harmonizing Performance and Isolation in Microkernels with Efficient Intra-kernel Isolation and Communication Jinyu Gu, Xinyue Wu, Wentai Li, Nian Liu, Zeyu Mi, Yubin Xia, Haibo Chen Monolithic Kernel and Microkernel 2 Monolithic Kernel and Microkernel Microkernel’s philosophy: Moving most OS components into isolated user processes 3 Benefits and Usages of Microkernel • Achieves good extensibility, security, and fault isolation • Succeeds in safety-critical scenarios (Airplane, Car) • For more general-purpose applications (Google Zircon) 4 Expensive Communication Cost • Tradeoff: Performance and Isolation – Inter-process communication (IPC) overhead File Disk App System Driver Microkernel IPC 5 IPC Overhead is Considerable IPC Cost Real Work in Servers 100% SQLite xv6FS Ramdisk 80% 60% 40% Microkernel 20% Zircon seL4 seL4 Direct cost: privilege switch, process switch, … w/ kpti w/o kpti Indirect cost: CPU internal structures pollution Evaluated on Dell PowerEdge R640 server with Intel Xeon Gold 6138 CPU 6 Goal: Both Ends • Harmonize the tension between Performance and Isolation in microkernels – Reducing the IPC overhead – Maintaining the isolation guarantee 7 New Hardware Brings Opportunities • PKU: Protection Key for Userspace (aka. MPK) – Assign each page one PKEY (i.e., memory domain ID) [0:15] – A new register PKRU stores read/write permission 8 Efficient Intra-Process Isolation App Part • ERIM [Security’19] & Hodor [ATC’19] Library-1 – Based on Intel PKU Library-2 – Build isolate domains in the same process efficiently – Domain switch only takes 28 cycles (modify PKRU) 9 Intra-Process Isolation + Microkernel System Servers App App FS MM Net Drv … Intel PKU Microkernel Process IPC Sched Hardware 10 Design Choice #1 Isolate different system servers in a single process.
    [Show full text]