Transactional Runtime Extensions for Dynamic Language Performance

Transactional Runtime Extensions for Dynamic Language Performance Nicholas Riley Craig Zilles [email protected] [email protected] Department of Computer Science University of Illinois at Urbana-Champaign ABSTRACT tates improved dynamic language performance with mini- We propose exposing best-effort atomic execution, as pro- mal cost in implementation complexity when compared with vided by a simple hardware transactional memory (HTM), software-only approaches. in a managed runtime's bytecode interface. Dynamic language implementations built on such a runtime can gener- Our framework consists of three components. First, an ef- ate more efficient, code, using speculation to eliminate the ficient hardware checkpoint/rollback mechanism provides a overhead and obstructions to optimization incurred by code set of transactional memory (TM) primitives to a Java vir- needed to preserve rarely used language semantics. In this tual machine (JVM). Second, the JVM exposes the TM's initial work, we demonstrate the applicability of this frame- capabilities as an interface for explicit speculation. Finally, work with two optimizations in Jython, a Python imple- dynamic language runtimes written in Java employ these mentation for the Java virtual machine, that would yield extensions at the implementation language level in order speedups between 13% and 38% in the presence of HTM. to speculatively execute code with common case semantics. Three additional optimizations, which we applied by hand to With hardware and JVM support for speculation, the dy- Jython-generated code, provide an additional 60% speedup namic language implementation doesn't need to know a pri- for one benchmark. ori when the common case applies: it can simply try it at runtime. 1. INTRODUCTION Dynamic languages such as Python and Ruby are increas- We demonstrate this approach by modifying a JVM dynamic ingly popular choices for general-purpose application devel- language runtime|Jython [2], a Python implementation| opment. Programmers value the languages' unique features, to generate two copies of code (e.g., Figure 1). First, a spec- flexibility, pragmatic design and continued evolution; they ulatively specialized version implements a commonly used are eager to apply them to a widening range of tasks. How- subset of Python semantics in Java, which by simplifying or ever, performance can be limiting: most programs writ- eliminating unused data structures and control flow, exposes ten in these languages are executed by interpreters. At- additional optimization opportunities to the JVM. Second, tempts to build more faster, more sophisticated implemen- a nonspeculative version consists of the original implementations (e.g., [5, 11, 15]) have been frustrated by the need to tation, which provides correct execution in all cases. Exe- preserve compatibility with the languages' flexible seman- cution begins with the faster, speculative version. If spec- tics. ulative code encounters an unsupported condition, an explicit transactional abort transfers control to the nonspec- We evaluate the effectiveness of speculatively executing Py- ulative version. As a result, the JVM does not need dy- thon programs, where possible, with a common case sub- namic language-specific adaptations, and high-performance set of these semantics which correctly handles most Python dynamic language implementations can be written entirely code. It it difficult to efficiently perform this speculation in in languages such as Java and C#, maintaining the proper- software alone, but with hardware support for transactional ties of safety, security and ease of debugging which distin- execution, easier-to-optimize speculative code can execute at guish them from traditional interpreters written in C. Even full speed. When behavior unsupported by the common case without execution feedback to the dynamic language im- semantics is needed, our implementation initiates hardware- plementation, transactional speculation leads to significant assisted rollback, recovery and nontransactional reexecution performance gains in single-threaded code. In three Python with full semantic fidelity. Overall, our approach facili- benchmarks run on Jython, a few simple speculation tech- niques improved performance between 13 and 38%. The remainder of this paper is organized as follows. Sec- tion 2 introduces the challenges of dynamic language runtime performance. Section 3.3 describes the components of our framework and the optimizations we implemented in Jython, section 4 presents our experimental method and section 5 discusses the performance results. Python source Nonspeculative implementation Speculative implementation y = g(x) begin transaction y = g(x) g = frame.getglobal("g") ➝ Python function object y = g(x) ﹡ Compile Specialize z = 2 y x = frame.getglobal("x") ➝ Python integer object y = g_int$1(global$x) [specialized function body] y = g.__call__(x) ➝ Python integer object z = 2﹡y g.func_code.call(x, globals, closure) z = 2﹡y f = new PyFrame(g.func_code, globals) add x to Python frame object commit transaction g.func_code.call(f, closure) Function modified? Py.getThreadState() ➝ Python thread state object Global resolution modified? abort transaction finish setting up frame and closure Exception? Debugging? } g.funcs.call_function(g.func_id, f) g$1(f) [function body] Recover frame.setlocal(1, y) y z = 2﹡y … Figure 1: Potential transactional memory-aided speculative specialization of Python code. 2. BACKGROUND AND RELATED WORK JVM. On these platforms, dynamic language code is trans- Python, Ruby, Perl and similar dynamic languages differ lated into CLR/JVM bytecode, which references implemen- substantially from historically interpreted counterparts such tation components written in a platform-native language as Java and Smalltalk in that large portions of their basic (C#/Java). Dynamic language implementations must gen- data structures and libraries are written in the runtime's im- erate code and maintain data structures to express the lan- plementation language, typically C, rather than in the dy- guages' rich semantics in terms of more limited CLR/JVM namic languages themselves. This choice substantially im- functionality, even for such basic operations as function in- proves the performance and practicality of these languages vocation. This semantic adaptation layer acts as a barrier even as they remain interpreted. to optimization by the CLR/JVM, hindering performance. However, to get the best performance out of interpreted dy- One example of Python's flexible semantics is method dis- namic language implementations, programmers must ensure patch, which in Jython involves call frame object instanti- that as much work as possible takes place in native code. As ation and one or more hash table lookups (new PyFrame(...) a result, interpreter implementation details can inspire less- and frame.getglobal calls, respectively, as shown in Figure 1). than-optimal programming practices motivated by perfor- Dispatch must also simulate Python's multiple inheritance mance. For example, in the CPython interpreter, function of classes on Java's single inheritance model [18]. Each of calls are relatively expensive, so performance-critical inner these operations is significantly more costly than dynamic loops are typically written to minimize them. A more so- dispatch in Java. phisticated runtime could perform inlining and specialization. Or, to sort a list by an arbitrary key, Perl and Python The CLR/JVM platforms could be extended with dynamic have adopted an idiom known as \decorate-sort-undecorate" language-specific primitives to reduce the cost of adapta- in which each list item is transformed to and from a sublist tion. But selecting the right set of primitives is challenging, of the form (sort key, item), so the list can be sorted entirely as each dynamic language has its own unique semantics. from compiled code instead of repeatedly invoking an inter- We have observed that the majority of a typical dynamic preted comparator. language program doesn't exploit its language's full semantic flexibility. In fact, existing CLR/JVM platform primi- To reduce the need for such workarounds and improve exe- tives can directly encode common uses of most dynamic lan- cution performance in general, an obvious next step in dy- guage features, significantly improving performance. How- namic language evolution is uniting the language and its ever, without a way to identify which operations can be so implementation on a single platform which can aggressively encoded without observing them during execution|which optimize both. New dynamic language platforms do this would require hooks into the CLR/JVM internals|dynamic in one of two ways. Some write the language implementa- language implementations cannot unconditionally generate tion and/or its libraries in a common language, and build a fast code for the common case. dynamic language-specific runtime to execute it: for example, Rubinius [12] executes Ruby on a Smalltalk-style VM; While adapting the CLR/JVM to accommodate individual PyPy [15] implements a translation infrastructure that can dynamic languages' semantics would be inadvisable, some transform a Python implementation written in a subset of general-purpose VM additions can simplify dynamic lan- Python into an interpreter or na¨ıve JIT, which is then com- guage implementation. For example, some CLR-based dy- piled for one of several platforms. While these systems are namic languages are hosted on a layer called the Dynamic written targeting dynamic

Transactional Runtime Extensions for Dynamic Language Performance

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support