Proceedings of the FREENIX Track: 2003 USENIX Annual Technical Conference

USENIX Association Proceedings of the FREENIX Track: 2003 USENIX Annual Technical Conference San Antonio, Texas, USA June 9-14, 2003 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2003 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. An Implementation of User-level Restartable Atomic Sequences on the NetBSD Operating System Gregory McGarry [email protected] Abstract system[1]. While the kernel component of the model controls the switching of execution contexts, the user- This paper outlines an implementation of restartable level component is responsible for scheduling threads atomic sequences on the NetBSD operating system onto the available execution contexts. The user-level as a mechanism for implementing atomic operations component contains a complete scheduler implementa- in a mutual-exclusion facility on uniprocessor sys- tion with shared scheduling data which must be pro- tems. Kernel-level and user-level interfaces are dis- tected by the mutual-exclusion facility. Consequently, cussed along with implementation details. Issues as- the mutual-exclusion facility is a critical component of sociated with protecting restartable atomic sequences the scheduler activations threading model. from violation are considered. The performance of The basic building block for any mutual-exclusion fa- restartable atomic sequences is demonstrated to out- cility, whether it be a blocking or busy-wait construct, is perform syscall-based and emulation-based atomic op- the fetch-modify-write operation[5]. The fetch-modify- erations. Performance comparisons with memory- write operation reads a boolean flag that indicates own- interlocked instructions demonstrate that restartable ership of shared data. The operation modifies the flag atomic sequences continue to provide performance ad- from false to true, thereby acquiring ownership. The vantages on modern hardware. Restartable atomic se- fetch-modify-write operation must execute atomically quences are now the preferred mechanism for atomic (without interruption) to ensure the flag state is main- operations in the NetBSD threads library on all unipro- tained consistent across all contending threads. cessor systems. Modern systems generally provide sophisticated pro- 1 Introduction cessor primitives within the hardware in the form of memory-interlocked instructions and bus support to en- The NetBSD Project is currently adopting a new threads sure that a given memory location can be read, modi- system based on scheduler activations[6]. Part of this fied and written atomically. The specific primitive varies project is the implementation of a POSIX-compliant between processors, however common instructions in- threads library that utilises the scheduler activations include test-and-set, fetch-and-store (swap), fetch-and- terface. The motivation for the threads project is to add, load-locked/store-conditional and compare-and- support the multi-threaded programming model which swap[5, 2]. Unfortunately, there are two problems asso- is becoming increasingly popular for application de- ciated with the use of memory-interlocked instructions velopment. Multi-threaded applications use multiple within a user-level threads library: threads to aid portability to multiprocessor systems, and as a way to manage server concurrency even when no • Memory-interlocked instructions incur overhead true system parallelism is available. To support multi- since the cycle time for an interlocked memory ac- threaded applications, the POSIX standard specifies a cess is several times greater than that for a non- mutual-exclusion facility to serialise access and guar- interlocked access. The overhead associated with antee consistency of shared data. Even on uniproces- memory-interlocked instructions is due to memory sor systems, mutual-exclusion facilities are necessary to and interconnection contention. protect shared data against an interleaved thread sched- • Not all processors support memory-interlocked in- ule. Interleaving can occur when a thread blocks on a structions and therefore cannot provide atomic op- resource or when a thread is preempted, causing another erations for a mutual-exclusion facility. Example thread to assume control of the processor. processors include the MIPS R3000, VR4100 and The scheduler activations threading model also places ARM2 processors. Interestingly, processor manu- additional demand on the mutual exclusion facility. The facturers are choosing to introduce new processors primary advantage of scheduler activations is that it without memory-interlocked instructions. These combines the simplicity of a kernel-based threading processors generally provide a subset of the com- system and the performance of a user-level threading plete processor specification and primarily target USENIX Association FREENIX Track: 2003 USENIX Annual Technical Conference 311 embedded or low-power applications. well on both uniprocessor and multiprocessor systems. Both of these problems are important to the NetBSD However, even for the case when no contention and Project due to the large number of hardware systems that no collisions occur, reservation-based algorithms require are supported. several memory accesses per atomic operation. Addi- On multiprocessor systems, the hardware is expected tionally, these algorithms have storage requirements that to provide the necessary atomic operations. Multi- increase linearly with the number of potential contend- processor atomic operations for synchronisation have ing threads. For a user-level threads library, it is not al- been extensively investigated by Mellor-Crummey and ways possible to know the maximum number of threads Scott[5]. However, the majority of systems supported likely to be used in an application. For these reasons, by the NetBSD operating system are uniprocessor sys- the software reservation mechanism was not considered tems. Therefore, it is desirable to find an alternate mech- further. anism for atomic operations which works efficiently on The mechanism of restartable atomic sequences has all uniprocessor systems. been proposed to address the problems outlined above The simplest mechanism for implementing an atomic for implementing atomic operations on uniprocessor operation is to request the kernel to perform the opera- systems[2]. The basic concept of restartable atomic se- tion of behalf of the user-level thread. A call to a spe- quences is that a user-level thread which is preempted cific system call can be used to enter the kernel. While within a restartable atomic sequence is resumed by the inside the kernel, the scheduler can be paused so that kernel at the beginning of the atomic sequence rather the thread is guaranteed not to be preempted while a than at the point of preemption. This guarantees that non-atomic fetch-modify-write operation is performed. the thread eventually executes the instruction sequence Alternatively, an invalid instruction can be executed by atomically. the thread to cause an exception which can be inter- This paper outlines the implementation of restartable cepted by the kernel to emulate an atomic fetch-modify- atomic sequences to provide the atomic operations re- write operation. An advantage of instruction emulation quired by the POSIX-compliant threads library on the is that the invalid instruction can be forward compati- NetBSD operating system. Section 2 discusses the con- ble with newer versions of the processor which do sup- cept of restartable atomic sequences. Section 3 presents port explicit atomic operations. However, both syscall- details of the implementation on the NetBSD operat- based and emulation-based solutions incur significant ing system. Section 4 compares the performance of overhead by entering the kernel. restartable atomic sequences with other mechanisms Atomic operations can also be implemented using for implementing atomic operations including memory- a software reservation mechanism. With the software interlocked instructions, instruction emulation and a reservation mechanism, a thread must register its intent syscall-based mechanism. Section 4 also examines the to perform an atomic operation and then wait until no cost associated with restartable atomic sequences. Sec- other thread has registered a similar intent before pro- tion 5 discusses the application of restartable atomic ceeding. The most widely recognised algorithm based sequences in the POSIX-compliant threads library and on software reservation is Lamport’s mutual-exclusion presents some benchmarks of the mutual-exclusion fa- algorithm[3]. cility in the threads library. Finally, Section 6 presents In Lamport’s algorithm, the atomic operation is pro- conclusions. tected by two variables; one to indicate ownership of the atomic operation and another to place reservations. Each 2 Restartable Atomic Sequences thread registers in a global array its intent to perform an Restartable atomic sequences is a mechanism to imple- atomic operation. The thread then places a reservation ment atomic operations on uniprocessor systems. Re- into the reservation variable before testing the ownership sponsibility for executing the instruction

Proceedings of the FREENIX Track: 2003 USENIX Annual Technical Conference

The Design and Verification of the Alphastation 600 5-Series Workstation by John H

Alpha and VAX Comparison Based on Industry-Standard Benchmark

Personal Decstation 5000 User's Guide

Computer Architectures an Overview

Decstation/Decsystem 5000 Model 200 Series Maintenance Guide

Software Product Description

Digital Technical Journal, Volume 3, Number 3: Availability in Vaxcluster

Decstation 5000 Model 100 Series Maintenance Guide

Flexible, Powerful, and Upgradeable. Decstation 5000/100 Series Workstations

Location SCSI Addresses and Partitions

Digital's Decstation Family Performance Summary

Alpha AXP PC Hardware