From Lisp S-Expressions to Java Source Code

UDC 004.439:00.55 From Lisp S-Expressions to Java Source Code António Menezes Leitão Instituto Superior Técnico / INESC-ID Lisboa, Portugal [email protected] Abstract. The syntax of Lisp languages is based on S-expressions, an extremely simple form of structured data representation that is nevertheless fundamental to the development of Lisp syntactic extensions. By adopting a more conventional syntax, the Java language placed itself in a difficult position in regard to user-defined syntax extensions. In spite of the many efforts to provide mechanisms for such extensions, they continue to be more difficult to use than S-expression- based ones. In this paper, we will describe the use of the S-expression syntax in a Java code generation environment. By providing an S- expression based program representation for Java source code, we are able to reuse and extend Lisp macro-expansion techniques to significantly simplify the construction of Java programs. Keywords: S-expressions, Macros, Common Lisp, Java. 1. Introduction S-expressions (Symbolic Expressions) were invented by John McCarthy in the late 50s as a notation for both programs and data in the language Lisp [16]. S-expressions are usually expressed as fully-parenthesized prefix notation (also known as Cambridge Polish notation). For example, the mathematical expression 1+2×3 is represented in S-expression notation as (+ 1 (* 2 3)). Data can also be easily represented and there are old and recent proposals for its standardized use [17] [20]. McCarthy’s original idea also suggested the use of a different, more ALGOLesque, format named M-expressions (Meta Expressions), that would be translated into S-expressions. In M-expression notation, the previous expression looks like +[1, *[2, 3]]. However, programmers started to use and appreciate the S-expression format and M-expression never caught up. Homoiconicity is one of the fundamental ideas behind the S-expression notation. A language is said to be homoiconic when the primary representation of a program source code is implemented using a primitive type of the language itself. Thus, in a homoiconic language, a program can be constructed, analyzed, and evaluated using the programming language itself. António Menezes Leitão Lisp is the best known example of a homoiconic programming languages but there are other examples such as Prolog, SNOBOL and Tcl. The ability to write programs that write programs is the hallmark of meta- programming [12]. A meta-program is written in a meta-language and its execution generates programs written in an object-language. In the case of homoiconic languages, such as Lisp, where programs and data are represented using the same S-expression notation, the meta-language and the object-language can be the same. From its inception Lisp was extensively used for meta-programming, including writing self-modifying programs. However, these programs were perceived as generally difficult to debug and with performance problems. As time went by, simpler and more standardized forms of meta-programming were developed and one, in particular, became well established: macros [19]. The word “macro” has a somewhat dubious reputation due to the problems of their use in the C programming language but that reputation is totally undeserved when we talk about Lisp. Macros in C are processed using a special program (a pre-processor) that operates as a text-replacement tool that does not understand the syntax of the language and that might create problems that are hard to debug. In Lisp, a macro describes a program that accepts program fragments (as S-expressions) and computes a new program fragment (an S-expression) that is evaluated in place of the macro call. This means that Lisp macros do not deal with the program text but with the program syntax tree instead. For performance reasons, the evaluation of the macro call and the evaluation of the new program fragment produced by that call occur at different times (called, respectively, macro-expansion time and run time) and, in most implementations, the new program fragment replaces the original macro-call so that there is only one macro-expansion for each call. Macros have been used in Lisp languages for quite a long time (at least, since 1963 [11]) and are fundamental for syntactically extending the language. In dialects such as Common Lisp [2], a significant part of the language is implemented using macros. As an example of the use of a macro, consider the following program fragment in Common Lisp that reads the first line of text contained in the given file: (with-open-file (s "/tmp/file.txt") (read-line s)) 20 ComSIS Vol. 5, No. 2, December 2008 From Lisp S-Expressions to Java Source Code The macro with-open-file is responsible for opening the file, executing the read-line operation requested and, in the end, closing the file and returning whatever was read. This behavior, however, is not implemented by the macro itself but by its macro-expansion, i.e., by the S-expression that is computed by the macro call. In fact, after macro-expansion time, what is really evaluated is the form:1 (let ((s (open "/tmp/file.txt"))) (unwind-protect (read-line s) (when (streamp s) (close s)))) This sort of code generation can be very easy to do when we combine the Lisp macro system with a template-based quasiquotation approach [5]. Quasiquotation allows us to provide the following definition for the with-open- file macro: (defmacro with-open-file ((f filename) form) `(let ((,f (open ,filename))) (unwind-protect ,form (when (streamp ,f) (close ,f))))) The idea behind quasiquotation is that it operates as a parameterized version of quotation and is generally used to describe templates for code generation where some “unquoted” parts (those proceeded by commas) will be filled in by evaluating the corresponding expression. Given the fact that macros receive their arguments unevaluated (as S-expressions) and that they must compute an S-expression as result, it is extremely tempting to use quasiquotation in macros and, in fact, they are heavily used for that purpose. The macro/quasiquotation combination is one of the best features of Lisp and has been appropriately called “the ultimate CASE tool” [3]. In many cases, macros do nothing more than construct an S-expression from a template described by quasiquotation that incorporate the arguments to the macro call. The with-open-file example presented above shows this behavior, as can be seen in the (read-line s) argument that is carried over 1We present only a simplification of the expansion that is done in most implementations of this macro. ComSIS Vol. 5, No. 2, December 2008 21 António Menezes Leitão the expansion without any changes. In these cases, the arguments are treated as completely opaque objects. In some other cases, however, it is necessary to look inside those arguments. For example, the extended loop macro provided in Common Lisp is used to express iterations and it expands into different forms depending on the presence of certain symbols in the macro arguments, such as from, in, and while, so that a (loop for i from ... to ... do ...) and a (loop while ... do ...) can produce different expansions. As another example, consider the macro that increments a place: (incf x) is the same as (setf x (+ x 1)). However, if the macro argument is more complex, e.g., in (incf (first (foo))), the naive application of the macro will generate the form (setf (first (foo)) (+ (first (foo)) 1)) that will incorrectly evaluate (foo) twice. To solve this problem it is necessary to look inside the argument in order to generate a program that avoids duplicate evaluation. In other cases, the situation is even more complex. Sophisticated syntactic extensions, such as the Series ([23]) and Iterate ([1]) packages, must extensively analyze and rearrange the source code that is passed as argument. For an even more extreme case, consider Screamer ([21]), a non- deterministic variant of Common Lisp that depends on a few macros to convert a program into continuation-passing-style and that also (incrementally) operates a whole-program analysis to distinguish deterministic and non-deterministic functions. The common theme across the previous examples is that macros, sometimes, must do a lot more than the simple template instantiation provided by quasiquotation. In spite of its long history, there are still some problems associated with Lisp macros (unintended variable capture, out-of-order evaluation, etc) and some of these problems have been solved, e.g., by the hygienic macros [9] and syntactic closures [6] that have been proposed for the Scheme dialect of Lisp. Common Lisp, however, still uses the traditional model because it is simpler and it does not entail the same problems that it does in Scheme. In the rest of this paper, for reasons that will be obvious, we will only consider the Common Lisp model. 2. The Linj Language System Software maintenance is an highly difficult task, in particular, when the software is written in one programming language but the maintenance team prefers to develop in a different language. To deal with this problem, management tends to restrict the set of “acceptable” programming languages to the most widely used ones such as Java. This imposes a difficult constraint on developers that prefer to work in less mainstream languages such as Common Lisp. Linj is a Common Lisp-like language intended to be translated into human- readable Java source code. The fundamental idea behind Linj is that it should 22 ComSIS Vol. 5, No. 2, December 2008 From Lisp S-Expressions to Java Source Code be possible to develop a program in a Common Lisp dialect but deliver it in Java just like if the program was originally written in Java.

From Lisp S-Expressions to Java Source Code

The Machine That Builds Itself: How the Strengths of Lisp Family

From JSON to JSEN Through Virtual Languages of the Creative Commons Attribution License (

The Semantics of Syntax Applying Denotational Semantics to Hygienic Macro Systems

Star TEX: the Next Generation Several Ideas on How to Modernize TEX Already Exist

Functional Programming Patterns in Scala and Clojure Write Lean Programs for the JVM

Lisp, Jazz, Aikido Three Expressions of a Single Essence

Lisp, Jazz, Aikido – Three Expressions of a Single Essence Didier Verna

SMC2011 Template

Functional Programming in Lisp

Scheme in Scheme 2

Clojure (For the Masses

LISP a Programmable Programming Language