TM-Dietlibc: a TM-Aware Real-World System Library

TM-dietlibc: A TM-aware Real-world System Library Vesna Smiljković∗, Martin Nowack†, Nebojˇsa Miletić∗◦, Tim Harris‡⋆, Osman Unsal¨ ∗, Adrián Cristal∗, Mateo Valero∗ ∗Barcelona Supercomputing Center, Spain {vesna.smiljkovic, osman.unsal, adrian.cristal, mateo.valero}.bsc.es [email protected] †Technische Universitat¨ Dresden, Germany [email protected] ‡Oracle Labs, Cambridge, UK [email protected] Abstract—The simplicity of concurrent programming with that are easy to use by average programmers and that can Transactional Memory (TM) and its recent implementation assure proper multithreaded synchronization. Transactional in mainstream processors greatly motivates researchers and Memory (TM) [1], [2], either as a software (STM), a industry to investigate this field and propose new implementations and optimizations. However, there is still no standard C hardware (HTM) or a hybrid (HyTM) implementation, has system library which a wide range of TM developers can adopt. been proposed as a solution to this issue. After dozens of TM TM application developers have been forced to avoid library implementations, several TM applications [3], [4], [5], [6] calls inside of transactions or to execute them irrevocably (i.e. and research on its fundamentals, a system library (a layer in serial order). In this paper, we present the first TM-aware between an operating system and an application) compatible system library, a complex software implementation integrated with TM principles and suited for software (STM), hardware with transactional memory is missing. (HTM) and hybrid TM (HyTM). Current TM implementations handle library code either The library we propose is derived from a modified lock- by marking it ineligible for execution in a transaction (and based implementation and can be used with the existing reporting an error if the library is invoked), or by using standard C API. In our work, we describe design challenges irrevocable execution (so that only one transaction may and code optimizations that would be specific to any TM- based system library or application. We argue about system call into the library at any time). The first approach limits call execution within transactions, highlighting the possibility the ability for complex programs to be adapted to use of unexpected results from threads. For this reason we propose: transactions. The second approach limits the scalability of (1) a mechanism for detecting conflicts over kernel data in user transactional programs. In this paper, we examine how to space, and (2) a new barrier to allow hybrid TM to be used adapt a complex existing library so that it can be used within effectively with system libraries. Our evaluation includes different TM implementations and transactions without making all operations irrevocable. the focus is on memory management and file operations since System libraries abstract and simplify the access to ser- they are widely used in applications and require additional vices of the operating system and encapsulate a shared state: mechanisms for concurrent execution. We show the benefit we e.g. memory allocation lists and file structures. If a library gain with our libc modifications providing parallel execution as is used in a large program, then its internal state might be much as possible. The library we propose shows high scalability when linked with STM and HTM. For file operations it shows accessed within transactions in some threads, and concur- on average a 1.1, 2.6 and 3.7x performance speedup for 8 cores rently accessed un-intendedly outside transactions in other using HyTM, STM and HTM, respectively (over a lock-based threads using other protection schemas (e.g. locks). This single-threaded execution). For a red-black tree it shows on is especially the case for TM implementations with weak average 3.14x performance speedup for 8 cores using STM isolation guarantees like most of STM implementations. If (over a multi-read single-threaded execution). an access to shared data structures is intended to be protected Keywords-transactional memory; system library; system by locks, operations have to be implemented in a way that calls; I/O; memory allocation the interaction of locks and transactions cannot cause any unwanted or undefined behavior [7]. In contrast, Hardware I. INTRODUCTION TM implementations like AMD’s Advanced Synchronization The growing number of on-chip cores and hardware Facility (ASF) [8] or Intel’s Restricted TM (RTM) in the threads increases the need for parallel programming models upcoming processor Haswell [9] provide strong isolation guarantees assuring consistent views even for unprotected ◦ This work was conducted when the author was with Barcelona Super- access. computing Center. ⋆This work was conducted when the author was with Microsoft Research In this paper, we present a TM-aware implementation Cambridge. of a standard C system library, based on diet libc [10]. Diet libc is an open-source standard C library designed II. VARIOUS SYSTEM LIBRARY DESIGNS to have the smallest possible code footprint. We modify the original lock-based implementation to make transac- Developing a TM-based system library or changing an tions as synchronization primitives without introducing new existing lock-based one involves choosing applicable pro- functions, system calls, or instructions. Instead, we perform gramming models and designs. In this section, we describe various modifications inside the library which are invisible different choices related to (1) how to mix locks and trans- to regular users. Other proposals require some degree of actions, (2) how to adapt a library to a TM system, and (3) change, in the form of a particular API [11], [12] or how to execute library code. specialized transactional calls [13]. Our design choices for TM-dietlibc are based on two external dependencies (rather than on the original lock-based This paper makes the following contributions: libc): the TM implementation and the TM compiler. • We present the first TM-aware system library with A. Mixing locks and transactions the standard libc API. It is based on an originally Applications can invoke system library functions from lock-based library and the API remains unchanged in both transactional and non-transactional parts, and having order to allow software developers to maintain their lock-based code is still inevitable in some cases. For ex- programming habits and to ease the transactification of ample, the functions that are part of TM initialization at existing software. application startup have to remain thread-safe, but should • We explain general design choices related to (1) in- not contain transactions. Therefore, we must handle any teraction between lock-based and transactional library interaction of locks and transactional boundaries as well as code, (2) system library adaptation to a TM system, any interaction of lock-protected and TM-protected accesses and (3) system library execution. We explain which to the same shared data. design choices are suitable for system libraries, and we Completely shared data. This programming model re- detail our experience with diet libc and TM integration, quires strong atomicity provided by a TM implementation. transactional code optimizations, insufficient support of With strong atomicity, transactional accesses of a shared TM tools, etc. variable are totally synchronized with unprotected accesses • We introduce a TM conflict detection mechanism exten- by other threads. However, most of STM implementations sion, a new technique to solve the problem of detecting do not guarantee strong isolation. conflicts that cannot be detected automatically by TM because they modify kernel space. Furthermore, we Partially shared data. Dynamic separation [15], [16] is explain how to support transactions with system calls a programming model where a programmer indicates shared in HyTM and avoid running them sequentially. variables that could be accessed within or outside of a transaction, and TM provides the necessary synchronization between transactional and non-transactional accesses. We evaluate our TM-aware diet libc implementation using Completely separated shared data. Some TM imple- a set of micro-benchmarks for file operations and the red- mentations require transactional and non-transactional data black tree benchmark for memory management functions. to be separated. The programming model is called static sep- We compare our TM-aware implementation using differ- aration [17] and the system library has to contain duplicated ent TM implementations with the traditional approach of data structures and duplicated code wherever a memory running library calls irrevocably or with locks. We show access can be transactional and non-transactional. that TM-aware implementation is scalable with a significant TM-dietlibc design choice: In the entire system library speadup for running on 8 cores. In addition, we demonstrate we completely separate shared data protected by locks and the first comparison with emulated Intel’s RTM implementa- protected by TM. The reasons for not choosing mixing trans- tion. TM-dietlibc is an open-source system library, available actional and lock-protected data and code are the following: at [14]. (1) breaking the TM isolation rule: a transaction should not In the following sections, we present: different design be able to see the intermediate state of another transaction, choices for integrating a system library to TM usage (Sec- e.g. when it directly accesses memory of a lock held by the tion II);

TM-Dietlibc: a TM-Aware Real-World System Library

Glibc and System Calls Documentation Release 1.0

Preview Objective-C Tutorial (PDF Version)

Also Includes Slides and Contents From

FPGA BASED RECONFIGURABLE BODY AREA NETWORK USING Nios II and Uclinux

Embedded Linux Systems with the Yocto Project™

Embedded System Setting up Development Environment

Smashing the Stack Protector for Fun and Profit

Operating Systems and Applications for Embedded Systems >>> Toolchains

Rmox: a Raw-Metal Occam Experiment

A Review on Elliptic Curve Cryptography for Embedded Systems

The Glib/GTK+ Development Platform

Porting Musl to the M3 Microkernel TU Dresden