Design and Implementation of a Multi-Purpose Cluster System

Total Page:16

File Type:pdf, Size:1020Kb

Design and Implementation of a Multi-Purpose Cluster System Design and Implementation of a Multipurp ose Cluster System Network Interface Unit by Bo on Seong Ang Submitted to the Department of Electrical Engineering and Computer Science t of the requirements for the degree of in partial fulllmen Do ctor of Philosoph y at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February c Massachusetts Institute of Technology All rights reserved Author : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Department of Electrical Engineering and Computer Science February Certied by : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Arvind Johnson Professor of Computer Science Thesis Sup ervisor Certied by : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Larry Rudolph Principal Research Scientist Thesis Sup ervisor Accepted by : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A C Smith Chairman Departmental Committee on Graduate Students Design and Implementation of a Multipurp ose Cluster System Network Interface Unit by Bo on Seong Ang Submitted to the Department of Electrical Engineering and Computer Science on February in partial fulllment of the requirements for the degree of Do ctor of Philosophy Abstract To day the interface b etween a high sp eed network and a high p erformance com putation no de is the least mature hardware technology in scalable general purp ose Currently the oneinterfacetsall philosophy prevails This ap cluster computing proach p erforms p o orly in some cases b ecause of the complexity of mo dern memory hierarchy and the wide range of communication sizes and patterns To days mes are also unable to utilize the b est data transfer and co ordination sage passing NIUs mechanisms due to p o or integration into the computation no des memory hierarchy These shortcomings unnecessarily constrain the p erformance of cluster systems thesis is that a cluster system NIU should supp ort multiple communica Our tion interfaces layered on a virtual message queue substrate in order to streamline data movement b oth within each no de as well as b etween no des The NIU should b e tightly integrated into the computation no des memory hierarchy via the cache coherent sno opy system bus so as to gain access to a rich set of data movement op erations We further prop ose to achieve the goal of a large set of high p erformance communication functions with a hybrid NIU microarchitecture that combines custom hardware building blo cks with an otheshelf emb edded pro cessor These ideas are tested through the design and implementation of the StarT oyager NES an NIU used to connect a cluster of commercial PowerPC based SMPs V Our prototyp e demonstrates that it is feasible to implement a multiinterface NIU at reasonable hardware cost This is achieved by reusing a set of basic hardware building blo cks and adopting a layered architecture that separates protected network sharing from software visible communication interfaces Through dierent mechanisms our MHz NIU MHz pro cessor core can deliver very low latency for very short s very high bandwidth for multikilobyte blo ck transfers messages under MBytess bidirectional bandwidth and very low pro cessor overhead for multicast communication each additional destination after the rst incurs pro cessor clo cks We intro duce the novel idea of supp orting a large numb er of virtual message queues through a combination of hardware Residen t message queues and rmware emulated Nonresident message queues By using the Resident queues as rmware controlled caches our implementation delivers hardware sp eed on the average while providing graceful degradation in a low cost implementation Finally we also demonstrate that an otheshelf emb edded pro cessor comple y and the ments custom hardware in the NIU with the former providing exibilit latter p erformance We identify the interface b etween the emb edded pro cessor and custom hardware as a critical design comp onent and prop ose a command and com pletion queue interface to improve the p erformance and reduce the complexity of emb edded rmware Arvind Thesis Sup ervisor Title Johnson Professor of Computer Science Thesis Sup ervisor Larry Rudolph Title Principal Research Scientist Design and Implementation of a Multipurp ose Cluster System Network Interface Unit by Bo on Seong Ang Submitted to the Department of Electrical Engineering and Computer Science on February in partial fulllment of the requirements for the degree of Do ctor of Philosophy Abstract To day the interface b etween a high sp eed network and a high p erformance com putation no de is the least mature hardware technology in scalable general purp ose cluster computing Currently the oneinterfacetsall philosophy prevails This ap proach p erforms p o orly in some cases b ecause of the complexity of mo dern memory hierarchy and the wide range of communication sizes and patterns To days mes sage passing NIUs are also unable to utilize the b est data transfer and co ordination mechanisms due to p o or integration into the computation no des memory hierarchy These shortcomings unnecessarily constrain the p erformance of cluster systems Our thesis is that a cluster system NIU should supp ort multiple communica tion interfaces layered on a virtual message queue substrate in order to streamline data movement b oth within each no de as well as b etween no des The NIU should b e tightly integrated into the computation no des memory hierarchy via the cache coherent sno opy system bus so as to gain access to a rich set of data movement op erations We further prop ose to achieve the goal of a large set of high p erformance communication functions with a hybrid NIU microarchitecture that combines custom hardware building blo cks with an otheshelf emb edded pro cessor These ideas are tested through the design and implementation of the StarT Voyager NES an NIU used to connect a cluster of commercial PowerPC based SMPs Our prototyp e demonstrates that it is feasible to implement a multiinterface NIU at reasonable hardware cost This is achieved by reusing a set of basic hardware building blo cks and adopting a layered architecture that separates protected network sharing from software visible communication interfaces Through dierent mechanisms our MHz NIU MHz pro cessor core can deliver very low latency for very short messages under s very high bandwidth for multikilobyte blo ck transfers MBytess bidirectional bandwidth and very low pro cessor overhead for multicast communication each additional destination after the rst incurs pro cessor clo cks We intro duce the novel idea of supp orting a large numb er of virtual message queues through a combination of hardware Resident message queues and rmware emulated Nonresident message queues By using the Resident queues as rmware controlled caches our implementation delivers hardware sp eed on the average while providing graceful degradation in a low cost implementation Finally we also demonstrate that an otheshelf emb edded pro cessor comple ments custom hardware in the NIU with the former providing exibility and the latter p erformance We identify the interface b etween the emb edded pro cessor and custom hardware as a critical design comp onent and prop ose a command and com pletion queue interface to improve the p erformance and reduce the complexity of emb edded rmware Thesis Sup ervisor Arvind Title Johnson Professor of Computer Science Thesis Sup ervisor Larry Rudolph Title Principal Research Scientist Acknowledgments This dissertation would not have b een p ossible without the encouragement supp ort patience and co op eration of many p eople Although no words can adequately express my gratitude an acknowledgement is the least I can do First and foremost I want to thank my wife Wee Lee and our families for standing by me all these years They gave me the latitude to seek my calling were patient as the years passed but I was no closer to enlightenment and provided me a sanctuary to retreat to whenever my marathonlike graduate scho ol career wore me thin To you all my eternity gratitude I am greatly indebted to my advisors Arvind and Larry for their faith in my abilities and for standing by me throughout my long graduate student career They gave me the opp ortunity to colead a large systems pro ject an exp erience which greatly enriched my systems building skills To Larry I want to express my gratitude for all the fatherlybrotherly advice and the cheering sessions in the last leg of my graduate scho ol apprenticeship I would also like to thank the other memb ers of my thesis committee Frans and Anant for helping to rene this work I want to thank Derek Chiou for our partnership through graduate scho ol working together on Monso on StarT StarTNG and StarTVoyager I greatly enjoy bringing vague ideas to you and jointly developing them into well thought out solutions This work on StarTVoyager NES is as much yours as it is mine Thank you to o for the encouragement and counselling you gave me all these years The graduate students and sta in Computation Structures Group gave me a home away from home Derek Chiou Alex Caro Andy Boughton James Ho e RPaul Johnson Andy Shaw Shail Aditya Gupta Xiao wei Shen Mike Ehrlich Dan Rosen band and Jan Maessen thank you for the company in this long pilgrimage through graduate scho ol It was a pleasure working with all of you bright hardworking
Recommended publications
  • D-Bus, the Message Bus System Training Material
    Maemo Diablo D-Bus, The Message Bus System Training Material February 9, 2009 Contents 1 D-Bus, The Message Bus System 2 1.1 Introduction to D-Bus ......................... 2 1.2 D-Bus architecture and terminology ................ 3 1.3 Addressing and names in D-Bus .................. 4 1.4 Role of D-Bus in maemo ....................... 6 1.5 Programming directly with libdbus ................. 9 1 Chapter 1 D-Bus, The Message Bus System 1.1 Introduction to D-Bus D-Bus (the D originally stood for "Desktop") is a relatively new inter process communication (IPC) mechanism designed to be used as a unified middleware layer in free desktop environments. Some example projects where D-Bus is used are GNOME and Hildon. Compared to other middleware layers for IPC, D-Bus lacks many of the more refined (and complicated) features and for that reason, is faster and simpler. D-Bus does not directly compete with low level IPC mechanisms like sock- ets, shared memory or message queues. Each of these mechanisms have their uses, which normally do not overlap the ones in D-Bus. Instead, D-Bus aims to provide higher level functionality, like: Structured name spaces • Architecture independent data formatting • Support for the most common data elements in messages • A generic remote call interface with support for exceptions (errors) • A generic signalling interface to support "broadcast" type communication • Clear separation of per-user and system-wide scopes, which is important • when dealing with multi-user systems Not bound to any specific programming language (while providing a • design that readily maps to most higher level languages, via language specific bindings) The design of D-Bus benefits from the long experience of using other mid- dleware IPC solutions in the desktop arena and this has allowed the design to be optimised.
    [Show full text]
  • Beej's Guide to Unix IPC
    Beej's Guide to Unix IPC Brian “Beej Jorgensen” Hall [email protected] Version 1.1.3 December 1, 2015 Copyright © 2015 Brian “Beej Jorgensen” Hall This guide is written in XML using the vim editor on a Slackware Linux box loaded with GNU tools. The cover “art” and diagrams are produced with Inkscape. The XML is converted into HTML and XSL-FO by custom Python scripts. The XSL-FO output is then munged by Apache FOP to produce PDF documents, using Liberation fonts. The toolchain is composed of 100% Free and Open Source Software. Unless otherwise mutually agreed by the parties in writing, the author offers the work as-is and makes no representations or warranties of any kind concerning the work, express, implied, statutory or otherwise, including, without limitation, warranties of title, merchantibility, fitness for a particular purpose, noninfringement, or the absence of latent or other defects, accuracy, or the presence of absence of errors, whether or not discoverable. Except to the extent required by applicable law, in no event will the author be liable to you on any legal theory for any special, incidental, consequential, punitive or exemplary damages arising out of the use of the work, even if the author has been advised of the possibility of such damages. This document is freely distributable under the terms of the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. See the Copyright and Distribution section for details. Copyright © 2015 Brian “Beej Jorgensen” Hall Contents 1. Intro................................................................................................................................................................1 1.1. Audience 1 1.2. Platform and Compiler 1 1.3.
    [Show full text]
  • An Introduction to Linux IPC
    An introduction to Linux IPC Michael Kerrisk © 2013 linux.conf.au 2013 http://man7.org/ Canberra, Australia [email protected] 2013-01-30 http://lwn.net/ [email protected] man7 .org 1 Goal ● Limited time! ● Get a flavor of main IPC methods man7 .org 2 Me ● Programming on UNIX & Linux since 1987 ● Linux man-pages maintainer ● http://www.kernel.org/doc/man-pages/ ● Kernel + glibc API ● Author of: Further info: http://man7.org/tlpi/ man7 .org 3 You ● Can read a bit of C ● Have a passing familiarity with common syscalls ● fork(), open(), read(), write() man7 .org 4 There’s a lot of IPC ● Pipes ● Shared memory mappings ● FIFOs ● File vs Anonymous ● Cross-memory attach ● Pseudoterminals ● proc_vm_readv() / proc_vm_writev() ● Sockets ● Signals ● Stream vs Datagram (vs Seq. packet) ● Standard, Realtime ● UNIX vs Internet domain ● Eventfd ● POSIX message queues ● Futexes ● POSIX shared memory ● Record locks ● ● POSIX semaphores File locks ● ● Named, Unnamed Mutexes ● System V message queues ● Condition variables ● System V shared memory ● Barriers ● ● System V semaphores Read-write locks man7 .org 5 It helps to classify ● Pipes ● Shared memory mappings ● FIFOs ● File vs Anonymous ● Cross-memory attach ● Pseudoterminals ● proc_vm_readv() / proc_vm_writev() ● Sockets ● Signals ● Stream vs Datagram (vs Seq. packet) ● Standard, Realtime ● UNIX vs Internet domain ● Eventfd ● POSIX message queues ● Futexes ● POSIX shared memory ● Record locks ● ● POSIX semaphores File locks ● ● Named, Unnamed Mutexes ● System V message queues ● Condition variables ● System V shared memory ● Barriers ● ● System V semaphores Read-write locks man7 .org 6 It helps to classify ● Pipes ● Shared memory mappings ● FIFOs ● File vs Anonymous ● Cross-memoryn attach ● Pseudoterminals tio a ● proc_vm_readv() / proc_vm_writev() ● Sockets ic n ● Signals ● Stream vs Datagram (vs uSeq.
    [Show full text]
  • Open Message Queue Technical Overview Release 5.0
    Open Message Queue Technical Overview Release 5.0 May 2013 This book provides an introduction to the technology, concepts, architecture, capabilities, and features of the Message Queue messaging service. Open Message Queue Technical Overview, Release 5.0 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007).
    [Show full text]
  • Multiprocess Communication and Control Software for Humanoid Robots Neil T
    IEEE Robotics and Automation Magazine Multiprocess Communication and Control Software for Humanoid Robots Neil T. Dantam∗ Daniel M. Lofaroy Ayonga Hereidx Paul Y. Ohz Aaron D. Amesx Mike Stilman∗ I. Introduction orrect real-time software is vital for robots in safety-critical roles such as service and disaster response. These systems depend on software for Clocomotion, navigation, manipulation, and even seemingly innocuous tasks such as safely regulating battery voltage. A multi-process software design increases robustness by isolating errors to a single process, allowing the rest of the system to continue operating. This approach also assists with modularity and concurrency. For real-time tasks such as dynamic balance and force control of manipulators, it is critical to communicate the latest data sample with minimum latency. There are many communication approaches intended for both general purpose and real-time needs [19], [17], [13], [9], [15]. Typical methods focus on reliable communication or network-transparency and accept a trade-off of increased mes- sage latency or the potential to discard newer data. By focusing instead on the specific case of real-time communication on a single host, we reduce communication latency and guarantee access to the latest sample. We present a new Interprocess Communication (IPC) library, Ach,1 which addresses this need, and discuss its application for real-time, multiprocess control on three humanoid robots (Fig. 1). There are several design decisions that influenced this robot software and motivated development of the Ach library. First, to utilize decades of prior development and engineering, we implement our real-time system on top of a POSIX-like Operating System (OS)2.
    [Show full text]
  • Sarath Singapati Inter Process Communication in Android Master of Science Thesis
    SARATH SINGAPATI INTER PROCESS COMMUNICATION IN ANDROID MASTER OF SCIENCE THESIS Examiner: Professor Tommi Mikkonen Examiner and thesis subject approved by The Faculty of Computing and Electrical Engineering on 7th March 2012 II ABSTRACT TAMPERE UNIVERSITY OF TECHNOLOGY Master’s Degree Programme in Information Technology SARATH SINGAPATI INTER PROCESS COMMUNICATION IN ANDROID Master of Science Thesis, 45 pages, 4 Appendix pages June 2012 Major: Software Systems Examiner: Professor Tommi Mikkonen Keywords: Android, Google, mobile applications, Process, IPC Google's Android mobile phone software platform is currently the big opportunity for application software developers. Android has the potential for removing the barriers to success in the development and sale of a new generation of mobile phone application software. Just as the standardized PC and Macintosh platforms created markets for desktop and server software, Android, by providing a standard mobile phone application environment, creates a market for mobile applications and the opportunity for applica- tions developers to profit from those applications. One of the main intentions of Android platform is to eliminate the duplication of functionality in different applications to allow functionality to be discovered and in- voked on the fly, and to let users replace applications with others that offer similar func- tionality. The main problem here is how to develop applications that must have as few dependencies as possible, and must be able to provide services to other applications. This thesis studies the Android mobile operating system, its capabilities in develop- ing applications that communicate with each other and provide services to other applica- tions. As part of the study, a sample application called “Event Planner”, has been devel- oped to experiment how Inter Process Communication works in Android platform, ex- plains how to implement, and use Inter Process Communication (IPC).
    [Show full text]
  • [MS-MQOD]: Message Queuing Protocols Overview
    [MS-MQOD]: Message Queuing Protocols Overview Intellectual Property Rights Notice for Open Specifications Documentation . Technical Documentation. Microsoft publishes Open Specifications documentation (“this documentation”) for protocols, file formats, data portability, computer languages, and standards support. Additionally, overview documents cover inter-protocol relationships and interactions. Copyrights. This documentation is covered by Microsoft copyrights. Regardless of any other terms that are contained in the terms of use for the Microsoft website that hosts this documentation, you can make copies of it in order to develop implementations of the technologies that are described in this documentation and can distribute portions of it in your implementations that use these technologies or in your documentation as necessary to properly document the implementation. You can also distribute in your implementation, with or without modification, any schemas, IDLs, or code samples that are included in the documentation. This permission also applies to any documents that are referenced in the Open Specifications documentation. No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. Patents. Microsoft has patents that might cover your implementations of the technologies described in the Open Specifications documentation. Neither this notice nor Microsoft's delivery of this documentation grants any licenses under those patents or any other Microsoft patents. However, a given Open Specifications document might be covered by the Microsoft Open Specifications Promise or the Microsoft Community Promise. If you would prefer a written license, or if the technologies described in this documentation are not covered by the Open Specifications Promise or Community Promise, as applicable, patent licenses are available by contacting [email protected].
    [Show full text]
  • On IPC Mechanisms
    More on IPC Mechanisms Karthik Dantu and Steve Ko Administrivia • Project meetings today at 3pm, not Friday. • Assignment 2 due in two days. Today: More on IPC Mechanisms • Why? • Fundamental building blocks of the Android framework • Last time was from the user’s (i.e., programmer’s) perspective to understand other framework components. • Today’s about the internals. • Goal: Understanding the framework structure and IPC mechanisms Today: More on IPC Mechanisms • We will look at two IPC mechanisms. • Looper-handler • Binder • Resources: • AOSP • “Embedded Android” • “Deep Dive into Android IPC/Binder Framework” http://events.linuxfoundation.org/images/ stories/slides/abs2013_gargentas.pdf Looper-Handler • Looper is a per-thread message loop. • Handler processes messages. Looper Architecture Thread Looper Message Queue Message Message Message Handler Handler Handler Looper Protocol: ActivityThread.java Looper • Looper.prepare() Looper • Looper.loop() Handler Handler.sendMessage() Handler.dispatchMessage() Binder • From OpenBinder • Enables two things for IPC • Directory of available IPC calls • Communication mechanism Architecture Context Manager (servicemanager.c) Process A Process B Client IBinder AIDL Binder Proxy AIDL IBinder Stub Binder Token Binder Service Binder Driver (/dev/binder) Context Manager • A special Binder object (handle #0) • The directory of all Binder-ready processes • On Android, ServiceManager becomes this context manager for Binder. • servicemanager.c • binder_become_context_manager(): Registers itself with the kernel Binder module as the context manager. • binder_loop() starts a message loop that listens to the commands from kernel Binder module. ServiceManager Registering with Service Manager System Server Service Manager App • Runs many services, e.g., • Maintains directories for • Can use Context’s Activity Manager, Power registered remote objects getSystemService() or Manager, etc.
    [Show full text]
  • MPI: a Message-Passing Interface Standard
    The MPI Message Queue Dumping Interface Version 1.0 MPI Forum Working Group on Tools Accepted by the Message Passing Interface Forum December 12, 2013 Acknowledgments Author Anh Vo Contributing Authors John DelSignore, Kathryn Mohror, Jeff Squyres Reviewers Dong Ahn, William Gropp, Martin Schulz Contents 1 Background1 2 Overview 2 3 Definitions3 3.1 Shared Library and DLL.............................3 3.1.1 Shared Library..............................3 3.1.2 Shared Object File............................3 3.1.3 Dynamic-link Library..........................3 3.1.4 Dynamically Loaded Library......................3 3.1.5 DLL....................................3 3.2 Process and Image................................3 3.2.1 Image...................................3 3.2.2 MPI Process...............................4 3.2.3 Address Space..............................4 3.2.4 \mqs_image"...............................4 3.3 \Starter" Process.................................4 3.3.1 The MPI Process as the Starter Process................4 3.3.2 A Separate mpiexec as the Starter Process..............4 3.4 MQD Host and Target Node..........................5 4 Debugger/MPI Interaction Model6 4.1 The MQD DLL..................................6 4.2 Debugger/Debug DLL Interaction Use Case..................6 5 Interface Specifications9 5.1 MPIR_dll_name.................................9 5.2 Types for Target Independence.........................9 5.2.1 mqs_tword_t...............................9 5.2.2 mqs_taddr_t............................... 10 5.2.3 mqs_target_type_sizes.........................
    [Show full text]
  • Chapter 14: Interprocess Communication
    Chapter 14: Interprocess Communication CMPS 105: Systems Programming Prof. Scott Brandt T Th 2-3:45 Soc Sci 2, Rm. 167 Plans This week: Chapter 14 Next week: Networked IPC Other? Last week Something Review Introduction Interprocess Communication (IPC) enables processes to communicate with each other to share information Pipes (half duplex) FIFOs (named pipes) Stream pipes (full duplex) Named stream pipes Message queues Semaphores Shared Memory Sockets Streams Pipes Oldest (and perhaps simplest) form of UNIX IPC Half duplex Data flows in only one direction Only usable between processes with a common ancestor Usually parent-child Also child-child Pipes (cont.) #include <unistd.h> int pipe(int fildes[2]); fildes[0] is open for reading and fildes[1] is open for writing The output of fildes[1] is the input for fildes[0] Understanding Pipes Within a process Writes to fildes[1] can be read on fildes[0] Not very useful Between processes After a fork() Writes to fildes[1] by one process can be read on fildes[0] by the other Understanding Pipes (cont.) Even more useful: two pipes, fildes_a and fildes_b After a fork() Writes to fildes_a[1] by one process can be read on fildes_a[0] by the other, and Writes to fildes_b[1] by that process can be read on fildes_b[0] by the first process Using Pipes Usually, the unused end of the pipe is closed by the process If process A is writing and process B is reading, then process A would close fildes[0] and process B would close fildes[1] Reading from a pipe whose write
    [Show full text]
  • Shared Memory Introduction
    12 Shared Memory Introduction 12.1 Introduction Shared memory is the fastest form of IPC available. Once the memory is mapped into the address space of the processes that are sharing the memory region, no kernel involvement occurs in passing data between the processes. What is normally required, however, is some form of synchronization between the processes that are storing and fetching information to and from the shared memory region. In Part 3, we discussed various forms of synchronization: mutexes, condition variables, read–write locks, record locks, and semaphores. What we mean by ‘‘no kernel involvement’’ is that the processes do not execute any sys- tem calls into the kernel to pass the data. Obviously, the kernel must establish the mem- ory mappings that allow the processes to share the memory, and then manage this memory over time (handle page faults, and the like). Consider the normal steps involved in the client–server file copying program that we used as an example for the various types of message passing (Figure 4.1). • The server reads from the input file. The file data is read by the kernel into its memory and then copied from the kernel to the process. • The server writes this data in a message, using a pipe, FIFO, or message queue. These forms of IPC normally require the data to be copied from the process to the kernel. We use the qualifier normally because Posix message queues can be implemented using memory-mapped I/O (the mmap function that we describe in this chapter), as we showed in Section 5.8 and as we show in the solution to Exercise 12.2.
    [Show full text]
  • Interaction Between the User and Kernel Space in Linux
    1 Interaction Between the User and Kernel Space in Linux Kai Lüke, Technische Universität Berlin F Abstract—System calls based on context switches from user to kernel supported in POSIX as well as System V style. A very space are the established concept for interaction in operating systems. common principle for IPC are sockets, and pipes can be seen On top of them the Linux kernel offers various paradigms for commu- as their most simple case. Besides the popular IP family with nication and management of resources and tasks. The principles and TCP/UDP, the local Unix domain sockets play a big role basic workings of system calls, interrupts, virtual system calls, special for many applications, while Netlink sockets are specific purpose virtual filesystems, process signals, shared memory, pipes, Unix or IP sockets and other IPC methods like the POSIX or System V to the Linux kernel and are not often found in user space message queue and Netlink are are explained and related to each other applications due to portability to other operating systems. in their differences. Because Linux is not a puristic project but home for There have been attempts to bring the D-BUS IPC into many different concepts, only a mere overview is presented here with the kernel with a Netlink implementation, kdbus and then focus on system calls. Bus1, but this will not be covered here. Also comparative studies with other operating systems are out of scope. 1 INTRODUCTION 2 KERNEL AND USER SPACE ERNELS in the Unix family are normally not initiating K any actions with outside effects, but rely on requests The processes have virtualized access to the memory as well from user space to perform these actions.
    [Show full text]