Hardware Transactional Memory Support for Lightweight Dynamic Language Evolution

Hardware Transactional Memory Support for Lightweight Dynamic Language Evolution Nicholas Riley Craig Zilles [email protected] [email protected] Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801–2302 ABSTRACT parent and portable. By placing few restrictions on what Lightweight dynamic language runtimes have become popu- an extension module or embedding application can do, they lar in part because they simply integrate with a wide range of function as “glue” in integrating disparate code bases. native code libraries and embedding applications. However, further development of these runtimes in the areas of concur- Unfortunately, widespread use of open-ended native code in- rency, efficiency and safety is impeded by the desire to main- terfaces restricts a runtime’s ability to evolve concurrency, tain their native code interfaces, even at a source level. Na- efficiency and safety, which in turn can impair the language’s tive extension modules’ lack of thread safety is a significant applicability. A multithreaded host application embedding barrier to dynamic languages’ effective deployment on cur- a thread-unsafe dynamic language runtime may not scale rent and future multicore and multiprocessor systems. We well on current and future multicore and multiprocessor sys- propose the use of hardware transactional memory (HTM) tems. Alternative implementations of these languages, built to aid runtimes in evolving more capable and robust exe- upon heavyweight runtimes such as Java or .NET, already cution models while maintaining native code compatibility. scale to multiple processors, can outperform the mainstream To explore these ideas, we constructed a full-system simu- implementations, and support safer execution, but are in- lation infrastructure consisting of an HTM implementation, frequently used. Disadvantages of these implementations modified Linux kernel and Python interpreter. include increased memory overhead and startup time, re- stricted portability, embeddability and extensibility. Python includes thread constructs, but its primary implementation is not architected to support their parallel exe- We propose applying hardware transactional memory (HTM) cution. With small changes, a runtime can be made HTM- mechanisms to address several issues impeding development aware to enable parallel execution of Python code and ex- of lightweight dynamic language runtimes. A HTM extends tension modules. We exploit the semantics of Python execu- a machine’s processor and memory architecture to support tion to evaluate individual bytecodes atomically by default, user-controlled speculative execution, conflict detection, and using nested transactions to emulate programmer-specified related facilities. We use features of a proposed HTM to in- locking constructs where possible in existing threaded code. crementally incorporate concurrent execution and improved We eliminate common transactional conflicts and defer I/O safety in a Python runtime, without significantly compli- within transactions to make parallel Python execution both cating the runtime’s implementation or requiring extension possible and efficient. Transactions also provide safety for modules and embedding applications be rewritten. foreign function invocations. We characterize several small Python applications executing on our infrastructure. The “official” and most popular Python runtime is CPython, a bytecode interpreter written in C; runtimes for Java and 1. INTRODUCTION .NET also exist. The PyPy project [20] aims to automati- Mainstream runtimes for lightweight dynamic languages in- cally generate a range of next-generation Python runtimes, cluding Perl, Python, Ruby and Tcl have been successful in including interpreters and just-in-time compilers, from de- part because they easily interface with native code, through scriptions specified in a subset of the Python language [24]. extension modules and by embedding themselves into host We selected the most mature PyPy target: pypy-c, an inter- applications. These runtimes’ native code interfaces, like the preter compiled from generated C code. Its design is similar runtimes themselves, are simple, easy to understand, trans- to CPython’s, and the techniques we present would apply with a little more work to CPython. The PyPy and CPython runtimes implement primarily non- concurrent threading using OS-level threads. Both use a Global Interpreter Lock (GIL) to prevent two threads from concurrently interpreting Python bytecode. A thread yields control by releasing the GIL between bytecodes, or before a blocking I/O operation [7]. We take a first step to transactional concurrency for Py- memory. In Section 3, we discuss both the benefits of using thon by constructing a full-system prototype for hardware transactions for Python execution and the runtime changes transactional execution, enabling PyPy to run existing lock- required. Specifically, Section 3.1 introduces the manner synchronized, GIL-threaded Python code in parallel, and in which transactions enable concurrency while maintain- falling back to sequential execution where required. Specif- ing the semantics of Python’s global interpreter lock-based ically, this paper makes the following contributions: threading model. Section 3.2 discusses changes to the PyPy runtime which avoid false conflicts. Section 3.3 describes First, we propose a method for safe lock-transaction coex- an execution model for running existing lock-based paral- istence, in which threads using locks and transactions for lel Python applications in any combination of transactional concurrency control can enforce the same set of atomicity and nontransactional threads, and Section 3.4 discusses the constraints. Common embedding environments, such as the interactions of memory allocation and garbage collection Apache HTTP Server’s mod python and graphical or other with transactions. Sections 3.5 and 3.6 describe methods for event-based applications, allow Python execution in event processing and deferring non-undoable actions such as I/O handlers. Embedding applications’ threading models usu- within transactions. Finally, Section 3.7 discusses a simple ally differ from the Python runtime’s; as a result, applica- form of protection which can guard against erroneous native tions must carefully manage the context in which Python code execution, and Section 4 presents a characterization of code is executed and avoid deadlock when Python code ac- Python executing transactionally. cesses data structures in the embedding application. Our model permits nontransactional code, such as that in an 2. BACKGROUND unmodified embedding application, to execute in the same In this section, we introduce the range of lightweight dy- address space, and in parallel with transactional code. namic language concurrency models, provide a brief introduction to the capabilities of hardware transactional mem- Second, to support the transactional execution of extension ory, and describe the layers of our transactional memory module functions that perform I/O, we propose a mecha- infrastructure below the PyPy runtime. nism of automatic transactions, which stop and start transactions around I/O operations within a single bytecode ex- 2.1 Dynamic Languages and Concurrency ecution, matching the most common GIL usage pattern in Lightweight dynamic languages such as Perl, Python and Python extension modules. Extension modules vary widely Ruby are defined by their original, and still most commonly in their thread safety, potentially introducing bugs and in- used, implementations. None were initially designed to sup- compatibilities in languages such as Perl whose runtimes port concurrent execution. While the languages’ users have now permit concurrent execution. With hardware-supported come to accept relatively low performance, they do expect transactions, extension modules written without thread safety the runtime’s speed to scale with the rest of their appli- in mind will continue to work. cations. As these languages’ rising popularity accompanies the emergence of mainstream systems whose primary speed Python, like other dynamic languages, has evolved generic gains derive from increasing the number of processor cores, foreign function interfaces (FFIs) which offer the dynamic their continued viability depends on implementing practical language programmer support for invoking arbitrary na- models of concurrency. tive functions and managing native data structures, without manually wrapping them in a native language—a te- Concurrent multithreaded execution of arbitrary dynamic dious and repetitive process. While generic FFIs can lower language code requires explicit support from extensions and the barrier to native code integration and give dynamic lan- embedding applications to avoid deadlocks, data corruption guage programmers opportunities to compromise the run- and other forms of incorrect execution. In addition, lan- time’s stability, they also increase the ability of the run- guage runtimes’ concurrency models can interact with the time to introspect native code execution. Extension mod- concurrency models of embedding applications or frame- ules written using generic FFIs benefit from increased trans- works exposed through extension modules in unexpected parency, exposing to the runtime marshalling and exception ways. handling that would otherwise be hidden in native code. For example, Ruby, Perl and Python have by now adopted By enclosing individual native function calls in transactions, one or more threading

Load more