The Cog Smalltalk Virtual Machine. Writing a JIT in a High-Level Dynamic

The Cog Smalltalk Virtual Machine. Writing a JIT in a High-Level Dynamic

The Cog Smalltalk Virtual Machine writing a JIT in a high-level dynamic language Eliot Miranda Independent Consultant San Francisco [email protected] Introduction 1. Overview Cog is an extension of the Squeak VMMaker [1] vir- Cog is implemented within the Squeak VMMaker tual machine framework for implementing Smalltalk package and translated to C using the package’s virtual machines. The framework implements virtual Slang translator. Being a Smalltalk program, Cog is machines in a subset of Smalltalk and allows their evaluable within a Smalltalk environment and hence incremental development and execution within the allows incremental development and debugging of Smalltalk IDE, and then translates the code to C to most of the system. But to evaluate generated x86 produce a production VM. machine code the framework is extended with an Cog adds a just-in-time compiler for x86 to the exist- interface to the core processor implementation in the ing interpreter. At this stage in its evolution Cog Bochs [3] IBM PC emulator, which is married to the produces a VM that runs language-intensive (rather byte array object holding the heap and generated than library intensive) benchmarks from the com- code, and to variables in the Smalltalk interpreter puter language shootout [2] at around 5 times faster and JIT objects. than the interpreter. While Cog’s code generation Cog was implemented in a commercial context, the and message send optimization techniques are not Qwaq/Teleplace1 [4] application of the Croquet [5] novel, its plumbing of an x86 simulator into its virtual worlds system to business communication. In framework, which allows full simulation of gener- this context, incremental delivery of increased per- ated machine code, is probably of some interest to formance over the base interpreter and minimisation the VMIL community. The VM is also well- of risk was a major requirement. Hence Cog modularised and extensible, as exemplified by re- evolved in several stages, and forethought was given cently being applied to Newspeak [6]. to making major components pluggable to support Cog is fully open-source, released under the MIT that evolution. Squeak license. The author maintains a blog with This paper follows the above sketch. We first cover detailed articles on various components of the sys- the Slang Smalltalk-to-C translator and its exten- tem [7]. sions, going on to describe the processor simulation General Terms Virtual Machine Implementa- system with Bochs at its core. The next two sections tion, High-level language, dynamic language, just- then look at different aspects of modularity and ex- in-time compiler. tensibility in Cog. The last section presents a couple 1 The company was founded as Qwaq, Inc (see the left of a qwerty keyboard) and changed its name to Teleplace to avoid offense; try selling Qwaq Forums to a medical establishment. of cool hacks made possible by using Smalltalk and essor implementation in the Bochs [3] IBM PC emu- then we conclude. lator. This accesses data in the memory ByteArray, and executes instructions in the generated code por- 2. Slang tion of the memory ByteArray. Using Bochs is both As described in [1] the VMMaker framework uses a less work and doubtless provides faster execution subset of Smalltalk that is easily translated to C. The than implementing an x86 simulator in Smalltalk. translator, called Slang, is a one-pass traversal over The Bochs x86 simulator is derived by taking the the parse trees of methods in the framework, core processor definition from Bochs, changing the straightforwardly writing out C code. This implies memory interface so that it fetches/stores from/to the no rich data structures, no Dictionaries (hash maps), memory ByteArray and wrapping this up in a Squeak Sets or OrderedCollections (dequeues, stacks), only VM plugin2 with a primitive interface whose re- Arrays, simple to:by:do: loops, and no polymor- ceiver is an Alien3 for the Bochs C++ processor ob- phism; each message in the framework must have ject and whose arguments are commonly the memory only one implementation. ByteArray and some read/write/execute limits. The primitive interface comprises 6 methods for execu- The base Interpreter is monolithic, inheriting from tion, single stepping, disassembly, initialization and the object representation/heap/garbage collector Ob- error message retrieval. The remaining methods ac- jectMemory. All messages are therefore to self. Elements of this architecture were far too restrictive cess the register state of the x86 represented by the for a JIT and the architecture was extended, by split- C++ object. ting the Interpreter from the ObjectMemory, and Machine code is generated into the “method zone”, keeping the JIT separate from these two. In addition located low in the memory ByteArray. Correspond- to this support for multiple classes, there is support ingly there are structures in the memory object, such for record and pointer-to-record types which are used as machine-code methods, that must be accessed extensively to implement data structures such as in from the Smalltalk side of things, and hence proxy the Interpreter, a stack page of Smalltalk method ac- objects are used to wrap addresses in the memory tivations, or in the JIT, an abstract instruction, or an object (see Figure 1 on page 6). These wrapper entry on the simulation stack used to map stack byte- classes are auto-generated from the classes that de- code to register-based code. fine the relevant ADTs, e.g. CogMethod defines the ADT for a machine-code method header, generates The base Interpreter uses a ByteArray called memory to model the object heap, using integer addresses in the C typedef in the Slang translation and generates CogMethodSurrogate32/64 to access CogMethod the memory ByteArray as object references. In Cog instances in the simulation memory (see appendix). this memory ByteArray is extended with space for stack pages holding method activations in the form Accessing methods and variables in the interpreter, of stack frames, and with space for generated code. such as the bulk of primitives that are not imple- mented in machine code (window, file and operating 3. Bochs system interfaces, FFI, save/restore and complex To evaluate generated x86 machine code the frame- execution primitives such as perform: (Smalltalk’s work is extended with an interface to the core proc- apply)), requires a way of mapping machine code 2 A Squeak VM plugin is a class implementing a set of primitives that either implement or provide an interface to some useful functionality. It can be either statically or dynamically linked into the VM and is hence a modular extension to the VM providing encapsulation and platform independence over an FFI based implementation. See [12]. 3 An Alien is a wrapper for some external address along with a set of primitives such as longAt:put: for accessing external memory relative to that address. accesses to Smalltalk message sends. This is done cacheing techniques, which are focussed on call in- using illegal addresses. Each method and/or variable structions. But the mapping must also allow modifi- that needs to be accessed from machine code is given cation of the activation records, causing modification a unique illegal address and entered in a read and a or destruction of the associated stack frame as ap- write dictionary mapping illegal address to evaluable propriate. Such code is complex and Qwaq/ (either a Symbol or a Block) that can be used to ei- Teleplace management rightly wanted to speed de- ther fetch/store a variable or call a method. Bochs is livery of improved performance by requiring that the invoked to execute machine code until it hits an ille- context-to-stack mapping machinery was realised gal address, at which point the invocation primitive first in the context of a modified interpreter, without fails and maps the Bochs exception into a Smalltalk having to wait for the development of the JIT. This ProcessorSimulationTrap Exception subclass that context-to-stack mapping StackInterpreter is typi- records the offending instruction, next instruction cally about twice as fast as the original context-based address, the illegal address, and register source or Interpreter for Smalltalk-intensive code. Further, destination. The system simulator then handles Qwaq/Teleplace negotiated a naive initial JIT code ProcessorSimulationTrap exceptions by looking up generator for early delivery. This code generator has the illegal address in the relevant Dictionary, evaluat- a one-to-one correspondence between bytecodes and ing it and continuing execution. generated code, hence every bytecode that pushes an Execution of the simulator (see Figure 1 on page 6) operand results in machine code that also pushes that starts in Smalltalk, the interpreter being pure Small- operand on the real stack. talk. But once the CoInterpreter has asked the JIT to Given the StackInterpreter it was natural to use it via compile a method execution may continue in ma- the CoInterpreter subclass4 which married the chine code, surfacing back up to Smalltalk on en- context-to-stack mapping machinery, bytecode inter- countering an illegal address, or the VM being inter- preter and primitive set to Cog’s JIT (called the Co- rupted. Use of single-stepping allows capturing ma- git), providing a mixed execution model where code chine instructions so that one may conveniently ex- may either be interpreted or compiled to machine amine the previous N instructions on hitting a bug. code and freely inter-called. Hence Cog evolved in five stages, an extended bytecode set and new Small- 4. Modularity, Evolution and Mixed- talk bytecode compiler that implements full closures Mode Execution and hence enables context-to-stack mapping, a faster Cog was implemented in a commercial context, the context-to-stack mapping interpreter, a naive code Qwaq/Teleplace [4] application of the Croquet [5] generator, a threading model supporting a non- virtual worlds system to business communication.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us