An Abstract, Reusable, and Extensible Programming Language Design Architecture⋆
Total Page:16
File Type:pdf, Size:1020Kb
An Abstract, Reusable, and Extensible Programming Language Design Architecture⋆ Hassan A¨ıt-Kaci Universit´eClaude Bernard Lyon 1 Villeurbanne, France [email protected] Abstract. There are a few basic computational concepts that are at the core of all programming languages. The exact elements making out such a set of concepts determine (1) the specific nature of the computational services such a language is designed for, (2) for what users it is intended, and (3) on what devices and in what environment it is to be used. It is therefore possible to propose a set of basic build- ing blocks and operations thereon as combination procedures to enable program- ming software by specifying desired tasks using a tool-box of generic constructs and meta-operations. Syntax specified through LALR(k) grammar technology can be enhanced with greater recognizing power thanks to a simple augmentation of yacc technology. Upon this basis, a set of implementable formal operational semantics constructs may be simply designed and generated (syntax and seman- tics) `ala carte, by simple combination of its desired features. The work presented here, and the tools derived from it, may be viewed as a tool box for generating lan- guage implementations with a desired set of features. It eases the automatic prac- tical generation of programming language pioneered by Peter Landin’s SECD Machine. What is overviewed constitutes a practical computational algebra ex- tending the polymorphically typed λ-Calculus with object/classes and monoid comprehensions. This paper describes a few of the most salient parts of such a system, stressing most specifically any innovative features—formal syntax and semantics. It may be viewed as a high-level tour of a few reusable programming language design techniques prototyped in the form of a set of composable abstract machine constructs and operations.1 Keywords: Programming Language Design; Object-Oriented Programming; De- notational Semantics; Operational Semantics; λ-Calculus; Polymorphic Types; Static/Dynamic Type Checking/Inference; Declarative Collections; Monoid Com- prehensions; Intermediate Language; Abstract Machines. This article is dedicated to Peter Buneman, a teacher and a friend—for sharing the fun! With fond memories of our Penn days and those Friday afternoon seminars in his office ... ⋆ Thanks to Val Tannen for his patience, Nabil Laya¨ıda for his comments, and the anonymous referee for catching many glitches and giving good advice in general. 1 Some of this material was presented as part of the author’s keynote address at LDTA 2003 [5]. The source code is available on https://github.com/ha-k/hak-lang-syntax. The languages people use to communicate with computers differ in their intended aptitudes towards either a particular application area, or in a par- ticular phase of computer use (high level programming, program assem- bly, job scheduling, etc., . ). They also differ in physical appearance, and more important, in logical structure. The question arises, do the id- iosyncrasies reflect basic logical properties of the situations that are be- ing catered for? Or are they accidents of history and personal background that may be obscuring fruitful developments? This question is clearly im- portant if we are trying to predict or influence language evolution. To answer it we must think in terms, not of languages, but of families of languages. That is to say we must systematize their design so that a new language is a point chosen from a well-mapped space, rather than a laboriously devised construction. PETER J. LANDIN—“The Next 700 Programming Languages” [31] 1 Introduction 1.1 Motivation—programming language design? Today, programming languages are designed more formally than they used to be fifty years ago. This is thanks to linguistic research that has led to syntactic science (beget- ting parser technology)and research in the formal semantics of programmingconstructs (begetting compiler technology—semantics-preserving translation from human-usable surface syntax to low-level instruction-based machine language).As in thecase ofa nat- ural language, a grammar is used to control the formation of sentences (programs) that will be understood (interpreted/executed) according to the language’s intended (deno- tational/operational) semantics. Design based on formal syntax and semantics can thus be made operational. Designing a programming language is difficult because it requires being aware of all the overwhelmingly numerous consequences of the slightest design decision that may occur anytime during the lexical or syntactical analyses, and the static or dynamic semantics phases. To this, we must add the potentially high design costs investing in defining and implementing a new language. These costs affect not only time and effort of design and development, but also the quality of the end product—viz., performance and reliability of the language being designed, not to mention how to justify, let alone guarantee, the correctness of the design’s implementation [37]. Fortunately, there have been design tools to help in the process. So-called meta- compilers have been used to great benefit to systematize the design and guarantee a higher quality of language implementation. The “meta” part actually applies to the lexical and syntactic phases of the language design. Even then, the metasyntactic tools are often restricted to specific classes of grammarsand/or parsing algorithms. Still fewer propose tools for abstract syntax. Most that do confine the abstract syntax language to some form of idiosyncratic representation of an attributed tree language with some ad hoc attribute co-dependence interpretation. Even rarer are language design systems that propose abstract and reusable components in the form of expressions of a formal typed kernel calculus.It is such a system that this work proposes; it gives an essential overview of its design principle and the sort of services it has been designed to render. This document describes the design of an abstract, reusable, and extensible, pro- gramming language architecture and its implementation in Java. What is described represents a generic basis insofar as these abstract and reusable constructs, and any well-typed compositions thereof, may be instantiated in various modular language con- figurations. It also offers a practical discipline for extending the framework with ad- ditional building blocks for new language features as per need. The first facet was the elaboration of Jacc,2 an advanced system for syntax-directed compiler generation ex- tending yacc technology [28].3 A second facet was the design of a well-typed set of abstract-machine constructs complete enough to represent higher-order functional pro- gramming in the form of an object-oriented λ-Calculus, extended with monoid com- prehensions [14,15,13,22]. A third facet could be the integration of logic-relational (from Logic Programming) and object-relational (from Database Programming) en- abling LIFE-technology [4,7] and/or any other CP/LP technology to cohabit. What is described here is therefore a metadesign: it is the design of a design tool. The novelty of what is described here is both in the lexical/syntactical phase and in the typing/execution semantic phase. The lexical and syntactic phases are innovative in many respects. In particular, they are conservative extensions considerably enhancing the conventional lex/yacc tech- nology (or, similarly, flex/bison) meta-lexico-syntactical tools [28,19] with more efficient implementation algorithms [34] and recognizingpower (viz., overloaded gram- mar symbols, dynamic operator properties `ala Prolog. This essentially gives Jacc the recognizing power of LALR(k) grammars, for any k ≥ 1.) Sections 2.1 and 2.2 give more details on that part of the system. The interpretation is essentially the same approach as the one advocated by Landin for his Store-Environment-Code-Dump (SECD) machine [30] and optimized by Luca Cardelli in his Functional Abstract Machine (FAM) [16].4 The abstract machine we present here is but a systematic taking advantage of Java’s object-oriented tool-set to put together a modular and extensible set of building blocks for language design. It is sufficiently powerful for expressing higher-order polymorphic object-oriented func- tional and/or imperative programming languages. This includes declarative collection- processing based on the concept of Monoid Comprehensions as used in object-oriented databases [14,15,13,22,25].5 2 http://www.hassan-ait-kaci.net/hlt/doc/hlt/jaccdoc/000 START HERE.html 3 See Section 2.1. 4 Other formally derived abstract machines like the Categorical Abstract Machine (CAM) also led to variants of formal compilation of functional languages (e.g., Caml). This approach was also adopted for the chemical metaphor formalizing concurrent computation as chemical re- action originally proposed by Banˆatre and Le M´etayer [8] and later adapted by Berry and Boudol to define their Chemical Abstract Machine (ChAM)[9]. The same also happened for Logic Programming [3]. 5 That is, at the moment, our prototype Algebraic Query Language (AQL v0.00) is a func- tional language augmented with a calculus of compehensions `ala Fegaras-Maier [22], or `a la Grust [25]. In other words, it is a complete query language, powerful enough to express most of ODMG’s OQL, and thus many of its derivatives such as, e.g., XQuery [12] and This machine was implemented and used by the author to generate several exper- imental 100%-java implementation of various language prototypes. Thus, what was actually implemented in this toolkit was done following a “by need” priority order. It is not so complete as to already encompass all the necessary building blocks needed for all known styles of programming and type semantics. It is meant as an open set of tools to be extended as the needs arise. For example, there is no support yet for LP [3], nor—more generally—CLP [27]. However, as limited as it may be, it already encompasses most of the basic familiar constructs from imperative and functional programming, including declarative aggre- gation (so-called “comprehensions”).