SISC: a Complete Scheme Interpreter in Java
Total Page:16
File Type:pdf, Size:1020Kb
SISC: A Complete Scheme Interpreter in Java Scott G. Miller <[email protected]> 20th February 2003 Abstract We will examine the design of SISC, a fully R5RS compliant heap-based inter- preter of the functional language Scheme, with proper tail-recursion and first-class continuations. All noteworthy components of the interpreter will be described, and as well as discussion of some implementation details related to interpretation optimizations and efficient use of the Object-Oriented host language Java. Contents 1 Introduction 2 2 Rationale 3 2.1 Proper tail recursion . 3 2.2 Continuations . 3 3 The Interpreter 4 3.1 Expression compilation . 4 3.2 Environments . 5 3.2.1 Associative Environments . 5 3.2.2 Lexical Environments . 5 3.3 Evaluation . 6 3.3.1 Execution . 7 3.3.2 Expressions . 7 3.4 Procedures . 10 3.4.1 Built-in procedures . 10 3.4.2 Closures . 10 3.4.3 Continuations . 11 3.5 Proper tail recursion . 11 3.6 Continuations . 11 3.7 Error handling . 12 3.7.1 The User’s view . 12 3.7.2 The Implementation . 13 1 4 Datatypes 13 4.1 Symbols . 14 4.2 Numbers . 14 4.3 Multiple values . 14 5 Java: Pros and Cons 15 5.1 Object-Oriented benefits . 15 6 Optimizations 16 6.1 Call-frame recycling . 16 6.2 Expression recycling . 17 6.3 Immediate fast-tracking . 17 7 Extensibility 18 8 Performance comparison 18 9 Conclusion 18 1 Introduction SISC1 is a Java based interpreter of the Scheme. It differs from most Java Scheme interpreters in that it attempts to sacrifice some simplicity in the evaluation model in order to provide a complete interpretation of Scheme. In particular, the complete R5RS Scheme standard is supported by SISC without sacrificing speed2. SISC uses a heap-based approach to control-context management, rather than a stack-based approach (using the procedure-call mechanism of the host language, and thus its native stack, to maintain continuations), and thus has full control over lan- guage continuations. For this reason, SISC is able to provide first-class continuations as defined in the standard, not just a subset of continuations such as the escaping con- tinuations often provided by stack-based interpreters in Java. For a more thorough discussion of heap and stack-based interpreters, refer to [1]. SISC’s engine is able to provide support for the core language features. With the addition of the portable syntax-case macro expander[3], SISC is able to provide hy- gienic macro support as per the standard. Lexical environments are used in SISC to provide efficient searchless linear time lookups and sets (linear with respect to lexical depth) for lexically referenced vari- ables. Top-level or global variable lookups and sets are constant-time through the use of inlined storage location pointers to store free variable references and syntactic defi- nitions. 1SISC’s predecessor was a stack-based interpreter named LISC (Lightweight Interpreter of Scheme Code), so, Second Interpreter of Scheme Code. 2There are a few exceptions to the standard in SISC that are not violations of the standard. In particular, SISC allows some syntax that the standard does not. For example, SISC does not require vector constants to be quoted, an extension common to many implementations. 2 SISC’s speed is also quite competitive with respect to other Java Scheme inter- preters. Since SISC compiles Scheme s-expressions to their fully-expanded forms, and stores them in an object-oriented internal representation, programs typically perform significantly faster than other interpreters that evaluate s-expressions in place. Later we will compare SISC to two other popular Java-based Scheme interpreters using publicly available benchmarks. SISC also implements a full number stack, with support for fixed and arbitrary pre- cision integers, arbitrary precision floating point numbers, rational numbers with ar- bitrary precision components, and complex numbers with arbitrary precision floating- point components. All the number forms are printed correctly, and all R5RS standard mathematical operations function correctly on these forms. 2 Rationale When creating an interpreter for Scheme, the question of design goals is often raised. Many implementations of Scheme strive to prove truly useful for real-world use (Chez, MIT, Guile). Others are created for teaching purposes (DrScheme). Others seem de- signed as ’toy’ implementations with interesting but irrelevant design goals (SILK3, Scheme-in-one-Defun). Such endevours show the broad range of possibilities and the flexibility of Scheme as a small and elegant language. Often, these implementations sacrifice some important features to meet their goals. But are the right sacrifices made? The two most frequently ignored requirements of the Scheme standard seem to be proper tail recursion and full first-class continuations. 2.1 Proper tail recursion Proper tail recursion is not just an academic excercise. One of the central pillars of functional programming is the idea of recursive problem solving. It is the ability of functional languages to encourage this type of problem solving that is its greatest strength. Consequently, programmers of functional languages use recursion to solve prob- lems in ways that would ordinarily be tasked to iteration in other languages. A pro- grammer assumes in such circumstances that their programs will run regardless of the size of their data, or the number of loops through a recursive function. Broken or non- existant tail call optimization cripples a fundamental benefit of functional languages. SISC’s realization of proper tail recursion is described in section 3.5. 2.2 Continuations Continuations are less clear cut. Useful programs can be written without ever being aware of their existence. Nevertheless, their inclusion in a language as a controlable entity removes a significant barrier to producing programs that need more advanced control flow. 3SILK has since grown up, and renamed itself to JScheme. Its execution model remains the same, how- ever. 3 First-class continuations allow for progamming constructs not possible in most lan- gages today such as threading, co-routines, engines, and exceptions, without requiring explicit language support for those features. Along with Scheme’s hygenic syntax ex- tension, continuations make Scheme one of the most flexible languages in existence. SISC’s realization of first-class continuations is described in section 3.6. 3 The Interpreter 3.1 Expression compilation Before a Scheme expression can be evaluated, it must be compiled into SISC’s inter- nal expression representation. Do not put to much stock into the term ’compiling’ as it is used here. S-expressions are merely parsed for semantics and turned into an in- ternal representation that is used by the SISC evaluation engine. This process entails syntactic-expansion followed by the examination of the expression and the creation of the internal representation. Syntactic expansion is accomplished using a portable macro-expander package by R. Kent Dybvig, et al [3]. This is a fully hygienic macro-expander that takes Scheme lists representing expressions and returns Scheme lists representing their fully ex- panded form. The expander is implements the algorithm described in [2]. After syntac- tic expansion, only the core Scheme expression types (BEGIN, LAMBDA, IF, QUOTE, DEFINE, variable references/sets, and applications), remain. Then the expression is run through the compiler, which performs the task of exam- ining the Scheme expression to create a number of SISC Expressions. The recognition of expression types such as LAMBDA, IF, DEFINE, etc. and the generation of internal representations to handle these forms is performed once at compile time. Consequently, we are able to realize a substantial performance increase at runtime, since we no longer have to examine the expressions to determine how to evaluate them. This combined with the object-oriented nature of SISC’s Expressions means that apart from the dy- namic method dispatch, no examination overhead (list-traversal, string comparison, etc) is necessary at runtime. The downside is that a lengthy expansion and compilation must occur when s- expressions are entered or loaded. Due to the complexity of the macro expander, this does incur a significant overhead. For example, to load the macro-expander, the stan- dard macro-definitions, the derived expressions and library functions, SISC must churn away for 10-15 seconds at startup4. This is an unreasonable amount of time to wait for a Scheme system to start, but is easily solved by loading a pre-expanded and compiled heap at startup. The actual conversion from Scheme expressions to SISC Expressions is not worth describing in detail except for a few small cases. Quoted expressions merely compile to the value themselves (since Values evaluate to themselves). IF expressions with con- stant tests are eliminated, selecting the correct branch at compile time. Finally, some optimization is done to eliminate unnecessary begin expressions, and unnecessary ex- pressions present in effect context. 4Depending on the platform, JVM, and hardware, of course. 4 3.2 Environments SISC has two types of environments. The first, associative environments, create the foundation for top-level, free variable references. The second, lexical environments, support lexically scoped variable references within Scheme programs. 3.2.1 Associative Environments Associative environments store a mapping between a symbol and a storage location. The storage locations are represented as an array of Values. At compile time, each free variable reference is translated into a SISC Expression which is aware of which AssociativeEnvironment the variable is referencing. In addi- tion, the expression contains an offset index into that AssociativeEnvironment’s Value array. Initially, that offset is unset. At runtime, when a free variable reference is encountered, the value is retrieved by retrieving the value present at the storage location associated with the reference. No- tice that no lookup of any symbol is required, so the operation completes in constant time, with a very small overhead (one memory indirect).