Fork/Wait and Multicore Frequency Scaling

Total Page:16

File Type:pdf, Size:1020Kb

Fork/Wait and Multicore Frequency Scaling Fork/Wait and Multicore Frequency Scaling: a Generational Clash Damien Carver, Redha Gouicem, Jean-Pierre Lozi, Julien Sopena, Baptiste Lepers, Willy Zwaenepoel, Nicolas Palix, Julia Lawall, Gilles Muller To cite this version: Damien Carver, Redha Gouicem, Jean-Pierre Lozi, Julien Sopena, Baptiste Lepers, et al.. Fork/Wait and Multicore Frequency Scaling: a Generational Clash. 10th Workshop on Programming Languages and Operating Systems, Oct 2019, Huntsville, Canada. pp.53-59, 10.1145/3365137.3365400. hal- 02349987v2 HAL Id: hal-02349987 https://hal.inria.fr/hal-02349987v2 Submitted on 31 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Fork/Wait and Multicore Frequency Scaling: a Generational Clash Damien Carver Jean-Pierre Lozi Julien Sopena Redha Gouicem [email protected] [email protected] [email protected] Oracle Labs Sorbonne Université, LIP6, Inria Sorbonne Université, LIP6, Inria Baptiste Lepers Nicolas Palix Julia Lawall Willy Zwaenepoel [email protected] Gilles Muller [email protected] Université Grenoble Alpes [email protected] University of Sydney Sorbonne Université, LIP6, Inria Abstract due to NUMA and a complex network of interconnect links, The complexity of computer architectures has risen since and they run at different—yet non-independent—frequencies. the early years of the Linux kernel: Simultaneous Multi- In order to tackle increased hardware complexity, the Threading (SMT), multicore processing, and frequency scal- Linux scheduler has had to evolve. It was rewritten twice: ing with complex algorithms such as Intel® Turbo Boost in 2003, the O(1) scheduler was introduced, and in 2007, the have all become omnipresent. In order to keep up with hard- Completely Fair Scheduler (CFS) replaced it. Since then, a ware innovations, the Linux scheduler has been rewritten myriad of heuristics has been added. A large body of re- several times, and many hardware-related heuristics have search has focused on improving scheduling on modern mul- been added. Despite this, we show in this paper that a fun- ticore architectures, often focusing on locality and NUMA damental problem was never identified: the POSIX process issues [13, 19]. creation model, i.e., fork/wait, can behave inefficiently on Despite these efforts, recent works have hinted that there current multicore architectures due to frequency scaling. may still be significant room for improvement. Major per- We investigate this issue through a simple case study: the formance bugs have gone unnoticed for years in CFS [20], compilation of the Linux kernel source tree. To do this, we despite the widespread use of Linux. Furthermore, while develop SchedLog, a low-overhead scheduler tracing tool, ULE, the scheduler of FreeBSD, does not outperform CFS on and SchedDisplay, a scriptable tool to graphically analyze average, it does outperform it significantly on many work- SchedLog’s traces efficiently. loads [7], for reasons that are not always well understood. We implement two solutions to the problem at the sched- With currently available tools, studying scheduler behavior uler level which improve the speed of compiling part of the at a fine grain is cumbersome, and there is a risk that the Linux kernel by up to 26%, and the whole kernel by up to overhead of monitoring will interfere with the behavior that 10%. is intended to be observed. Indeed, it is rarely done, either by Linux kernel developers or by the research community. However, understanding scheduler behavior is necessary in 1 Introduction order to fully exploit the performance of multicore architec- Over the past decade, computer architectures have grown tures, as most classes of workloads trust the scheduler for increasingly complex. It is now common, even for affordable task placement. machines, to sport multiple CPUs with dozens of hardware In this paper, we show that CFS suffers from a fundamen- threads. These hardware threads are sometimes linked to- tal performance issue that directly stems from the POSIX gether through hyperthreading and different levels of shared model of creating processes, i.e., fork/wait, on multicore caches, their memory latency and bandwidth is non-uniform architectures with frequency scaling enabled. Consider a common case where a parent process forks a child and waits Publication rights licensed to ACM. ACM acknowledges that this contribu- for the result, in a low concurrency scenario, for example tion was authored or co-authored by an employee, contractor or affiliate of in the case of a typical shell script. If there are idle cores, a national government. As such, the Government retains a nonexclusive, CFS will choose one of them for the child process. As the royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. core has been idle, the child process will likely start to run PLOS ’19, October 27, 2019, Huntsville, ON, Canada at a low frequency. On the other hand, the core of the parent © 2019 Copyright held by the owner/author(s). Publication rights licensed has seen recent activity. The hardware’s frequency scaling to ACM. algorithm will likely have increased its frequency, even if, ACM ISBN 978-1-4503-7017-2/19/10...$15.00 due to a wait, the core is now idle. https://doi.org/10.1145/3365137.3365400 PLOS ’19, October 27, 2019, Huntsville, ON, Canada Carver et al. Figure 1. Execution trace: building the Linux kernel scheduler using 320 jobs with CFS. We expose the performance issue through a case study of RAM, running Linux 4.19 in Debian Buster. The CPUs of Kbuild, i.e., building all or part of the Linux kernel. We have a frequency range of 1.2-2.1 GHz, and can reach up to conduct the study using our own dedicated tracing tool for 3.0 GHz with Intel® Turbo Boost. the Linux scheduler, SchedLog, that focuses on recording Figure 1 shows a trace from our graphical tool with the scheduling events with very low overhead. We then visualize frequency of the active cores at each scheduler tick (every these events with SchedDisplay, a highly configurable visu- 4 ms) when compiling the whole Linux kernel. An initial ker- alization tool we developed, to understand the scheduler’s nel compilation (i.e., subsequent to make clean) performs a behavior. With our newly learnt knowledge, we then imple- number of preparatory activities, then compiles the kernel ment two solutions for the performance issues that improve source files, and finally performs some cleanup. Around2 performance by up to 26% on Kbuild. Finally, we propose seconds there is a short burst of activity, performing an initial alternative solutions that aim to be more efficient, and to build across the source tree, where all of the cores are used reduce the likelihood of worsening performance on other and the cores reach a high frequency (2.1 – 2.6 GHz; they do workloads. not reach the highest frequencies provided by Turbo Boost The contributions of this paper are the following: due to safeguards in Turbo Boost to prevent overheating). • The identification of a performance issue in CFS that Likewise, from 5 to 34 seconds, all of the cores are again directly stems from the fork/wait process creation used and reach a high frequency, to compile the major part model of POSIX. of the kernel source code. There appears to be little room • A case study of the performance of the Linux scheduler for improvement in these regions. On the other hand, the using a workload that is representative of running regions at the beginning and end of the graph show a mod- compilation tasks in a large C project: Kbuild. erate number of cores used, almost all running at the lowest • SchedLog, a low-overhead tracing tool that focuses on frequency, around 1.2 GHz. Studying the trace at a higher scheduling events. degree of magnification shows that the code executed here • SchedDisplay, a highly configurable graphical tracing is nearly sequential, typically alternating between one and tool for SchedLog traces to ease the detection of poor two active cores at a time. This raises the question of why scheduler behavior. so many cores are used for so little work, and why are they not able to reach a higher frequency when they are occupied. The rest of this paper is organized as follows. Section 2 We are in the presence of a performance bug. shows, through a case study of Kbuild, that the fork/wait process creation model of POSIX can behave inefficiently 3 A Tracing Tool for CFS on current multicore architectures. Section 3 describes our In this section, we describe a very low overhead tool that graphical tracing tool for the Linux scheduler. Section 4 dis- we have developed for tracing scheduler behavior. In the cusses solutions to the performance problem and presents next section, we describe how we have used this tool to some that we implemented. Section 5 presents related work. understand the Kbuild performance bug and to evaluate our Finally, Section 6 concludes. proposed solutions. 2 Kbuild and the Performance of 3.1 Collecting scheduling events Fork/Wait In order to study the Linux scheduler, we need to understand We analyze the performance of Kbuild on a 4-socket Xeon E7- its behavior. To this end, we must collect the list of scheduling 8870 machine (80 cores/160 hardware threads) with 512 GiB events that happen at runtime, such as thread creations, clock Fork/Wait and Multicore Frequency Scaling PLOS ’19, October 27, 2019, Huntsville, ON, Canada ticks and thread migrations.
Recommended publications
  • Freebsd-And-Git.Pdf
    FreeBSD and Git Ed Maste - FreeBSD Vendor Summit 2018 Purpose ● History and Context - ensure we’re starting from the same reference ● Identify next steps for more effective use / integration with Git / GitHub ● Understand what needs to be resolved for any future decision on Git as the primary repository Version control history ● CVS ○ 1993-2012 ● Subversion ○ src/ May 31 2008, r179447 ○ doc/www May 19, 2012 r38821 ○ ports July 14, 2012 r300894 ● Perforce ○ 2001-2018 ● Hg Mirror ● Git Mirror ○ 2011- Subversion Repositories svnsync repo svn Subversion & Git Repositories today svn2git git push svnsync git svn repo svn git github Repositories Today fork repo / Freebsd Downstream svn github github Repositories Today fork repo / Freebsd Downstream svn github github “Git is not a Version Control System” phk@ missive, reproduced at https://blog.feld.me/posts/2018/01/git-is-not-revision-control/ Subversion vs. Git: Myths and Facts https://svnvsgit.com/ “Git has a number of advantages in the popularity race, none of which are really to do with the technology” https://chapmanworld.com/2018/08/25/why-im-now-using-both-git-and-subversion-for-one-project/ 10 things I hate about Git https://stevebennett.me/2012/02/24/10-things-i-hate-about-git Git popularity Nobody uses Subversion anymore False. A myth. Despite all the marketing buzz related to Git, such notable open source projects as FreeBSD and LLVM continue to use Subversion as the main version control system. About 47% of other open source projects use Subversion too (while only 38% are on Git). (2016) https://svnvsgit.com/ Git popularity (2018) Git UI/UX Yes, it’s a mess.
    [Show full text]
  • Getting Started With... Berkeley Software for UNIX† on the VAX‡ (The Third Berkeley Software Distribution)
    Getting started with... Berkeley Software for UNIX† on the VAX‡ (The third Berkeley Software Distribution) A package of software for UNIX developed at the Computer Science Division of the University of California at Berkeley is installed on our system. This package includes a new version of the operating sys- tem kernel which supports a virtual memory, demand-paged environment. While the kernel change should be transparent to most programs, there are some things you should know if you plan to run large programs to optimize their performance in a virtual memory environment. There are also a number of other new pro- grams which are now installed on our machine; the more important of these are described below. Documentation The new software is described in two new volumes of documentation. The first is a new version of volume 1 of the UNIX programmers manual which has integrated manual pages for the distributed software and incorporates changes to the system made while introducing the virtual memory facility. The second volume of documentation is numbered volume 2c, of which this paper is a section. This volume contains papers about programs which are in the distribution package. Where are the programs? Most new programs from Berkeley reside in the directory /usr/ucb. A major exception is the C shell, csh, which lives in /bin. We will later describe how you can arrange for the programs in the distribution to be part of your normal working environment. Making use of the Virtual Memory With a virtual memory system, it is no longer necessary for running programs to be fully resident in memory.
    [Show full text]
  • CS 350 Operating Systems Spring 2021
    CS 350 Operating Systems Spring 2021 4. Process II Discussion 1: Question? • Why do UNIX OSes use the combination of “fork() + exec()” to create a new process (from a program)? • Can we directly call exec() to create a new process? Or, do you have better ideas/designs for creating a new process? • Hints: To answer this question: think about the possible advantages of this solution. Fork Parent process Child process Exec New program image in execution 2 Discussion 1: Background • Motived by the UNIX shell • $ echo “Hello world!” • $ Hello world! • shell figures out where in the file system the executable “echo” resides, calls fork() to create a new child process, calls exec() to run the command, and then waits for the command to complete by calling wait(). • You can see that you can use fork(), exec() and wait() to implement a shell program – our project! • But what is the benefit of separating fork() and exec() in creating a new process? 3 Discussion 1: Case study • The following program implements the command: • $ wc p3.c > p4.output # wc counts the word number in p3.c • # the result is redirected to p4.output Close default standard output (to terminal) open file p4.output 4 Discussion 1: Case study • $ wc p3.c # wc outputs the result to the default standard output, specified by STDOUT_FILENO • By default, STDOUT_FILENO points to the terminal output. • However, we close this default standard output → close(STDOUT_FILENO ); • Then, when we open the file “p4.output” • A file descriptor will be assigned to this file • UNIX systems start looking for free file descriptors from the beginning of the file descriptor table (i.e., at zero).
    [Show full text]
  • The Dragonflybsd Operating System
    1 The DragonFlyBSD Operating System Jeffrey M. Hsu, Member, FreeBSD and DragonFlyBSD directories with slightly over 8 million lines of code, 2 million Abstract— The DragonFlyBSD operating system is a fork of of which are in the kernel. the highly successful FreeBSD operating system. Its goals are to The project has a number of resources available to the maintain the high quality and performance of the FreeBSD 4 public, including an on-line CVS repository with mirror sites, branch, while exploiting new concepts to further improve accessible through the web as well as the cvsup service, performance and stability. In this paper, we discuss the motivation for a new BSD operating system, new concepts being mailing list forums, and a bug submission system. explored in the BSD context, the software infrastructure put in place to explore these concepts, and their application to the III. MOTIVATION network subsystem in particular. A. Technical Goals Index Terms— Message passing, Multiprocessing, Network The DragonFlyBSD operating system has several long- operating systems, Protocols, System software. range technical goals that it hopes to accomplish within the next few years. The first goal is to add lightweight threads to the BSD kernel. These threads are lightweight in the sense I. INTRODUCTION that, while user processes have an associated thread and a HE DragonFlyBSD operating system is a fork of the process context, kernel processes are pure threads with no T highly successful FreeBSD operating system. Its goals are process context. The threading model makes several to maintain the high quality and performance of the FreeBSD guarantees with respect to scheduling to ensure high 4 branch, while exploring new concepts to further improve performance and simplify reasoning about concurrency.
    [Show full text]
  • Linux-Kernel
    linux-kernel #linux- kernel Table of Contents About 1 Chapter 1: Getting started with linux-kernel 2 Remarks 2 Versions 2 Examples 2 Installation or Setup 2 Download extract and enter to the kernel directory 2 Build the dependencies, compile the kernel and modules. 3 Chapter 2: Creation and usage of Kernel Threads 4 Introduction 4 Examples 4 Creation of kernel threads 4 Chapter 3: Event Tracing 6 Examples 6 Tracing I2C Events 6 Chapter 4: Fork System call 7 Examples 7 fork() system call 7 Chapter 5: How to find the right person for help. 9 Introduction 9 Examples 9 Find the "likely" maintainers for the FTDI USB serial converter 9 Chapter 6: Linux Hello World Device driver 10 Examples 10 An empty kernel module 10 Building and running the module 10 Chapter 7: Linux: Named Pipes(FIFO) 12 Examples 12 What is Named Pipe (FIFO) 12 Credits 13 About You can share this PDF with anyone you feel could benefit from it, downloaded the latest version from: linux-kernel It is an unofficial and free linux-kernel ebook created for educational purposes. All the content is extracted from Stack Overflow Documentation, which is written by many hardworking individuals at Stack Overflow. It is neither affiliated with Stack Overflow nor official linux-kernel. The content is released under Creative Commons BY-SA, and the list of contributors to each chapter are provided in the credits section at the end of this book. Images may be copyright of their respective owners unless otherwise specified. All trademarks and registered trademarks are the property of their respective company owners.
    [Show full text]
  • Lecture 4: September 13 4.1 Process State
    CMPSCI 377 Operating Systems Fall 2012 Lecture 4: September 13 Lecturer: Prashant Shenoy TA: Sean Barker & Demetre Lavigne 4.1 Process State 4.1.1 Process A process is a dynamic instance of a computer program that is being sequentially executed by a computer system that has the ability to run several computer programs concurrently. A computer program itself is just a passive collection of instructions, while a process is the actual execution of those instructions. Several processes may be associated with the same program; for example, opening up several windows of the same program typically means more than one process is being executed. The state of a process consists of - code for the running program (text segment), its static data, its heap and the heap pointer (HP) where dynamic data is kept, program counter (PC), stack and the stack pointer (SP), value of CPU registers, set of OS resources in use (list of open files etc.), and the current process execution state (new, ready, running etc.). Some state may be stored in registers, such as the program counter. 4.1.2 Process Execution States Processes go through various process states which determine how the process is handled by the operating system kernel. The specific implementations of these states vary in different operating systems, and the names of these states are not standardised, but the general high-level functionality is the same. When a process is first started/created, it is in new state. It needs to wait for the process scheduler (of the operating system) to set its status to "new" and load it into main memory from secondary storage device (such as a hard disk or a CD-ROM).
    [Show full text]
  • List of BSD Operating Systems
    FreeBSD-based SNo Name Description A lightweight operating system that aims to bring the flexibility and philosophy of Arch 1 ArchBSD Linux to BSD-based operating systems. 2 AskoziaPBX Discontinued 3 BSDBox 4 BSDeviant 5 BSDLive 6 Bzerk CD 7 DragonFly BSD Originally forked from FreeBSD 4.8, now developed in a different direction 8 ClosedBSD DesktopBSD is a discontinued desktop-oriented FreeBSD variant using K Desktop 9 DesktopBSD Environment 3.5. 10 EclipseBSD Formerly DamnSmallBSD; a small live FreeBSD environment geared toward developers and 11 Evoke system administrators. 12 FenestrOS BSD 13 FreeBSDLive FreeBSD 14 LiveCD 15 FreeNAS 16 FreeSBIE A "portable system administrator toolkit". It generally contains software for hardware tests, 17 Frenzy Live CD file system check, security check and network setup and analysis. Debian 18 GNU/kFreeBSD 19 Ging Gentoo/*BSD subproject to port Gentoo features such as Portage to the FreeBSD operating 20 Gentoo/FreeBSD system GhostBSD is a Unix-derivative, desktop-oriented operating system based on FreeBSD. It aims to be easy to install, ready-to-use and easy to use. Its goal is to combine the stability 21 GhostBSD and security of FreeBSD with pre-installed Gnome, Mate, Xfce, LXDE or Openbox graphical user interface. 22 GuLIC-BSD 23 HamFreeSBIE 24 HeX IronPort 25 security appliances AsyncOS 26 JunOS For Juniper routers A LiveCD or USB stick-based modular toolkit, including an anonymous surfing capability using Tor. The author also made NetBSD LiveUSB - MaheshaNetBSD, and DragonFlyBSD 27 MaheshaBSD LiveUSB - MaheshaDragonFlyBSD. A LiveCD can be made from all these USB distributions by running the /makeiso script in the root directory.
    [Show full text]
  • System Calls & Signals
    CS345 OPERATING SYSTEMS System calls & Signals Panagiotis Papadopoulos [email protected] 1 SYSTEM CALL When a program invokes a system call, it is interrupted and the system switches to Kernel space. The Kernel then saves the process execution context (so that it can resume the program later) and determines what is being requested. The Kernel carefully checks that the request is valid and that the process invoking the system call has enough privilege. For instance some system calls can only be called by a user with superuser privilege (often referred to as root). If everything is good, the Kernel processes the request in Kernel Mode and can access the device drivers in charge of controlling the hardware (e.g. reading a character inputted from the keyboard). The Kernel can read and modify the data of the calling process as it has access to memory in User Space (e.g. it can copy the keyboard character into a buffer that the calling process has access to) When the Kernel is done processing the request, it restores the process execution context that was saved when the system call was invoked, and control returns to the calling program which continues executing. 2 SYSTEM CALLS FORK() 3 THE FORK() SYSTEM CALL (1/2) • A process calling fork()spawns a child process. • The child is almost an identical clone of the parent: • Program Text (segment .text) • Stack (ss) • PCB (eg. registers) • Data (segment .data) #include <sys/types.h> #include <unistd.h> pid_t fork(void); 4 THE FORK() SYSTEM CALL (2/2) • The fork()is one of the those system calls, which is called once, but returns twice! Consider a piece of program • After fork()both the parent and the child are ..
    [Show full text]
  • OS-9 Technical Manual 4 Contents
    OS-9® Te c h n i c a l Manual Version 4.2 RadiSys. 118th Street Des Moines, Iowa 50325 515-223-8000 www.radisys.com Revision A • April 2003 Copyright and publication information Reproduction notice This manual reflects version 4.2 Microware OS-9. The software described in this document is intended to Reproduction of this document, in part or whole, by be used on a single computer system. RadiSys Corpo- any means, electrical, mechanical, magnetic, optical, ration expressly prohibits any reproduction of the soft- chemical, manual, or otherwise is prohibited, without written permission from RadiSys Corporation. ware on tape, disk, or any other medium except for backup purposes. Distribution of this software, in part Disclaimer or whole, to any other party or on any other system may constitute copyright infringements and misappropria- The information contained herein is believed to be accurate as of the date of publication. However, tion of trade secrets and confidential processes which RadiSys Corporation will not be liable for any damages are the property of RadiSys Corporation and/or other including indirect or consequential, from use of the parties. Unauthorized distribution of software may OS-9 operating system, Microware-provided software, cause damages far in excess of the value of the copies or reliance on the accuracy of this documentation. involved. The information contained herein is subject to change without notice. April 2003 Copyright ©2003 by RadiSys Corporation. All rights reserved. EPC and RadiSys are registered trademarks of RadiSys Corporation. ASM, Brahma, DAI, DAQ, MultiPro, SAIB, Spirit, and ValuePro are trademarks of RadiSys Corporation.
    [Show full text]
  • Chapter 4 Introduction to UNIX Systems Programming
    Chapter 4 Introduction to UNIX Systems Programming 4.1 Introduction Last chapter covered how to use UNIX from from a shell program using UNIX commands. These commands are programs that are written in C that interact with the UNIX environment using functions called Systems Calls. This chapter covers this Systems Calls and how to use them inside a program. 4.2 What is an Operating System An Operating System is a program that sits between the hardware and the application programs. Like any other program it has a main() function and it is built like any other program with a compiler and a linker. However it is built with some special parameters so the starting address is the boot address where the CPU will jump to start the operating system when the system boots. Draft An operating system typically offers the following functionality: ● Multitasking The Operating System will allow multiple programs to run simultaneously in the same computer. The Operating System will schedule the programs in the multiple processors of the computer even when the number of running programs exceeds the number of processors or cores. ● Multiuser The Operating System will allow multiple users to use simultaneously in the same computer. ● File system © 2014 Gustavo Rodriguez-Rivera and Justin Ennen,Introduction to Systems Programming: a Hands-on Approach (V2014-10-27) (systemsprogrammingbook.com) It allows to store files in disk or other media. ● Networking It gives access to the local network and internet ● Window System It provides a Graphical User Interface ● Standard Programs It also includes programs such as file utilities, task manager, editors, compilers, web browser, etc.
    [Show full text]
  • Lecture 22 Systems Programming Process Control
    Lecture 22 Systems Programming Process Control A process is defined as an instance of a program that is currently running. A uni processor system can still execute multiple processes giving the appearance of a multi- processor machine. A call to a program spawns a process. If a mail program is called by n users then n processes or instances are created and executed by the unix system. Many operating systems including windows and unix executes many processes at the same time. When a program is called, a process is created and a process ID is issued. The process ID is given by the function getpid() defined in <unistd.h>. The prototype for pid( ) is given by #include < unistd.h > pid_t getpid(void); In a uni-processor machine, each process takes turns running and for a short duration, a process takes time interval called a timeslice. The unix command ps can be used to list all current process status. ps PID TTY TIME CMD 10150 pts/16 00:00:00 csh 31462 pts/16 00:00:00 ps The command ps lists the process ID (PID), the terminal name(TTY), the amount of time the process has used so far(TIME) and the command it is executing(CMD). Ps command only displays the current user processes. But we can get all the processes with the flag (- a) and in long format with flag (-l) ps –a ps -l ps -al Information provided by each process may include the following. PID The process ID in integer form PPID The parent process ID in integer form STAT The state of the process TIME CPU time used by the process (in seconds) TT Control terminal of the process COMMAND The user command that started the process Copyright @ 2008 Ananda Gunawardena Each process has a process ID that can be obtained using the getpid() a system call declared in unistd.h.
    [Show full text]
  • Notes on the History of Fork-And-Join Linus Nyman and Mikael Laakso Hanken School of Economics
    Notes on the history of fork-and-join Linus Nyman and Mikael Laakso Hanken School of Economics As part of a PhD on code forking in open source software, Linus Nyman looked into the origins of how the practice came to be called forking.1 This search led back to the early history of the fork system call. Having not previously seen such a history published, this anecdote looks back at the birth of the fork system call to share what was learned, as remembered by its pioneers. The fork call allows a process (or running program) to create new processes. The original is deemed the parent process and the newly created one its child. On multiprocessor systems, these processes can run concurrently in parallel.2 Since its birth 50 years ago, the fork has remained a central element of modern computing, both with regards to software development principles and, by extension, to hardware design, which increasingly accommodates parallelism in process execution. The fork system call is imagined The year was 1962. Melvin Conway, later to become known for “Conway’s Law,”3 was troubled by what seemed an unnecessary inefficiency in computing. As Conway recalls:4 I was in the US Air Force at the time – involved with computer procurement – and I noticed that no vendor was distinguishing between “processor” and “process.” That is, when a proposal involved a multiprocessor design, each processor was committed to a single process. By late 1962, Conway had begun contemplating the idea of using a record to carry the status of a process.
    [Show full text]