Pydgin: Generating Fast Instruction Set Simulators from Simple Architecture Descriptions with Meta-Tracing JIT Compilers
Total Page:16
File Type:pdf, Size:1020Kb
Appears in the Proceedings of the Int’l Symp. on Performance Analysis of Systems and Software (ISPASS-2015), March 2015 Pydgin: Generating Fast Instruction Set Simulators from Simple Architecture Descriptions with Meta-Tracing JIT Compilers Derek Lockhart, Berkin Ilbeyi, and Christopher Batten School of Electrical and Computer Engineering, Cornell University, Ithaca, NY {dml257,bi45,cbatten}@cornell.edu Abstract—Instruction set simulators (ISSs) remain an essential to enable fast and efficient translation. However, promising re- tool for the rapid exploration and evaluation of instruction set ex- cent work has demonstrated sophisticated frameworks that can tensions in both academia and industry. Due to their importance automatically generate DBT-ISSs from ADLs [35, 42, 56]. in both hardware and software design, modern ISSs must balance a tension between developer productivity and high-performance Meanwhile, designers working on interpreters for general- simulation. Productivity requirements have led to “ADL-driven” purpose dynamic programming languages (e.g., Javascript, toolflows that automatically generate ISSs from high-level ar- Python, Ruby, Lua, Scheme) face similar challenges balancing chitectural description languages (ADLs). Meanwhile, perfor- productivity of interpreter developers with performance of the mance requirements have prompted ISSs to incorporate increas- interpreter itself. The highest performance interpreters use just- ingly complicated dynamic binary translation (DBT) techniques. Construction of frameworks capable of providing both the produc- in-time (JIT) trace- or method-based compilation techniques. tivity benefits of ADL-generated simulators and the performance As the sophistication of these techniques have grown so has benefits of DBT remains a significant challenge. the complexity of interpreter codebases. For example, the We- We introduce Pydgin, a new approach to ISS construction that bKit Javascript engine currently consists of four distinct tiers addresses the multiple challenges of designing, implementing, and of JIT compilers, each designed to provide greater amounts maintaining ADL-generated DBT-ISSs. Pydgin uses a Python- based, embedded-ADL to succinctly describe instruction behavior of optimization for frequently visited code regions [37]. In as directly executable “pseudocode”. These Pydgin ADL descrip- light of these challenges, one promising approach introduced tions are used to automatically generate high-performance DBT- by the PyPy project uses meta-tracing to greatly simplify the ISSs by creatively adapting an existing meta-tracing JIT compila- design of high-performance interpreters for dynamic languages. tion framework designed for general-purpose dynamic program- PyPy’s meta-tracing toolchain takes traditional interpreters im- ming languages. We demonstrate the capabilities of Pydgin by im- plementing ISSs for two instruction sets and show that Pydgin pro- plemented in RPython, a restricted subset of Python, and auto- vides concise, flexible ISA descriptions while also generating simu- matically translates them into optimized, tracing-JIT compil- lators with performance comparable to hand-coded DBT-ISSs. ers [2, 9, 10, 36, 39]. The RPython translation toolchain has been previously used to rapidly develop high-performance JIT- I. INTRODUCTION enabled interpreters for a variety of different languages [11– Recent challenges in CMOS technology scaling have mo- 13, 24, 51, 52, 54]. We make the key observation that similar- tivated an increasingly fluid boundary between hardware and ities between ISSs and interpreters for dynamic programming software. Examples include new instructions for managing fine- languages suggest that the RPython translation toolchain might grain parallelism, new programmable data-parallel engines, enable similar productivity and performance benefits when ap- programmable accelerators based on reconfigurable coarse- plied to instruction set simulator design. 1 grain arrays, domain-specific co-processors, and application- This paper introduces Pydgin , a new approach to ISS design specific instructions. This trend towards heterogeneous hard- that combines an embedded-ADL with automatically-generated ware/software abstractions combined with complex design tar- meta-tracing JIT interpreters to close the productivity- gets is placing increasing importance on highly productive and performance gap for future ISA design. The Pydgin library high-performance instruction set simulators (ISSs). provides an embedded-ADL within RPython for succinctly de- Unfortunately, meeting the multitude of design requirements scribing instruction semantics, and also provides a modular in- for a modern ISS (observability, retargetability, extensibility, struction set interpreter that leverages these user-defined in- support for self-modifying code, etc.) while also providing struction definitions. In addition to mapping closely to the productivity and high performance has led to considerable ISS pseudocode-like syntax of ISA manuals, Pydgin instruction de- design complexity. Highly productive ISSs have adopted ar- scriptions are fully executable within the Python interpreter for chitecture description languages (ADLs) as a means to en- rapid code-test-debug during ISA development. We adapt the able abstract specification of instruction semantics and sim- RPython translation toolchain to take Pydgin ADL descriptions plify the addition of new instruction set features. The ADLs and automatically convert them into high-performance DBT- in these frameworks are domain specific languages constructed ISSs. Building the Pydgin framework required approximately to be sufficiently expressive for describing traditional archi- three person-months worth of work, but implementing two dif- tectures, yet restrictive enough for efficient simulation (e.g., ferent instruction sets (a simple MIPS-based instruction set and ArchC [3,50], LISA [33,55], LIS [34], MADL [43,44], SimIt- a more sophisticated ARMv5 instruction set) took just a few ARM ADL [21, 40]). In addition, high-performance ISSs use weeks and resulted in ISSs capable of executing many of the dynamic binary translation (DBT) to discover frequently exe- 1 cuted blocks of target instructions and convert these blocks into Pydgin loosely stands for [Py]thon [D]SL for [G]enerating [In]struction set simulators and is pronounced the same as “pigeon”. The name is inspired by optimized sequences of host instructions. DBT-ISSs often re- the word “pidgin” which is a grammatically simplified form of language and quire a deep understanding of the target instruction set in order captures the intent of the Pydgin embedded-ADL. 1 jd = JitDriver( greens = [’bytecode’, ’pc’], Python Source RPython Source bytecode 2 reds = [’regs’, ’acc’] ) 3 Interpreter Elaboration 4 def interpreter( bytecode ): until can_enter_jit 5 regs = [0]*256 # vm state: 256 registers RPython Source jitcode 6 acc = 0 # vm state: accumulator 7 pc = 0 # vm state: program counter Type Inference Meta-Tracing 8 until can_enter_jit 9 while True: Annotated IR JIT IR 10 jd.jit_merge_point( bytecode, pc, regs, acc ) 11 opcode = ord(bytecode[pc]) Back-End Opt JIT Generator JIT Optimizer 12 pc += 1 13 Optimized IR jitcodes Opt JIT IR 14 if opcode == JUMP_IF_ACC: Failiure Guard 15 target = ord(bytecode[pc]) Code Gen JIT Assembler 16 pc += 1 Runtime 17 if acc: C Source Assembly 18 if target < pc: 19 jd.can_enter_jit( bytecode, pc, regs, acc ) Compilation Native Execution 20 pc = target 21 22 elif opcode == MOV_ACC_TO_REG: Compiled Interpreter 23 rs = ord(bytecode[pc]) (b) Meta-Tracing 24 regs[rs] = acc (a) Static Translation Toolchain JIT Compiler 25 pc += 1 26 Figure 2. RPython Translation Toolchain – (a) the static translation toolchain 27 #...handleremainingopcodes... converts an RPython interpreter into C code along with a generated JIT com- piler; (b) the meta-tracing JIT compiler traces the interpreter (not the applica- Figure 1. Simple Bytecode Interpreter Written in RPython – bytecode is string tion) to eventually generate optimized assembly for native execution. of bytes encoding instructions that operate on 256 registers and an accumulator. RPython enables succinct interpreter descriptions that can still be automatically translated into C code. Basic annotations (shown in blue) enable automatically generating a meta-tracing JIT compiler. Adapted from [10]. as the foundation for the Pydgin framework. More detailed in- formation about RPython and the PyPy project in general can SPEC CINT2006 benchmarks at hundreds of millions of in- be found in [2, 8–10, 36, 39]. structions per second. Python is a dynamically typed language with typed objects This paper makes the following three contributions: (1) we but untyped variable names. RPython is a carefully chosen describe the Pydgin embedded-ADL for productively specify- subset of Python that enables static type inference such that ing instruction set architectures; (2) we describe and quan- the type of both objects and variable names can be determined tify the performance impact of specific optimization techniques at translation time. Even though RPython sacrifices some of used to generate high-performance DBT-ISSs from the RPython Python’s dynamic features (e.g., duck typing, monkey patching) translation toolchain; and (3) we evaluate the performance of it still maintains many of the features that make Python produc- Pydgin DBT-ISSs when running SPEC CINT2006 applications tive (e.g., simple syntax, automatic memory management, large on two distinct ISAs. standard