Structuring Languages As Object-Oriented Libraries
Total Page:16
File Type:pdf, Size:1020Kb
Structuring Languages as Object-Oriented Libraries Structuring Languages as Object-Oriented Libraries ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof. dr. ir. K. I. J. Maex ten overstaan van een door het College voor Promoties ingestelde commissie, in het openbaar te verdedigen in de Agnietenkapel op donderdag november , te . uur door Pablo Antonio Inostroza Valdera geboren te Concepción, Chili Promotiecommissie Promotores: Prof. Dr. P. Klint Universiteit van Amsterdam Prof. Dr. T. van der Storm Rijksuniversiteit Groningen Overige leden: Prof. Dr. J.A. Bergstra Universiteit van Amsterdam Prof. Dr. B. Combemale University of Toulouse Dr. C. Grelck Universiteit van Amsterdam Prof. Dr. P.D. Mosses Swansea University Dr. B.C.d.S. Oliveira The University of Hong Kong Prof. Dr. M. de Rijke Universiteit van Amsterdam Faculteit der Natuurwetenschappen, Wiskunde en Informatica The work in this thesis has been carried out at Centrum Wiskunde & Informatica (CWI) under the auspices of the research school IPA (Institute for Programming research and Algorith- mics) and has been supported by NWO, in the context of Jacquard Project .. “Next Generation Auditing: Data-Assurance as a Service”. Contents Introduction .Language Libraries . .Object Algebras . .Language Libraries with Object Algebras . .Origins of the Chapters . .Software Artifacts . .Dissertation Structure . Recaf: Java Dialects as Libraries .Introduction . .Overview.................................... .Statement Virtualization . .Full Virtualization . .Implementation of Recaf . .Case Studies . .Discussion . .Related Work . .Conclusion . Tracing Program Transformations with String Origins .Introduction . .String Origins . .Applications of String Origins . .Implementation . .Related Work . .Conclusion . Modular Interpreters with Implicit Context Propagation .Introduction . .Background . v Contents .Implicit Context Propagation . .Working with Lifted Interpretations . .Automating Lifting . .Case study : Extending a DSL for State Machines . .Case Study : Modularizing Featherweight Java . .Performance Overhead of Lifting . .Related Work and Discussion . . Conclusion . JEff: Objects for Effect .Introduction . .JEff: Programming with Objects and Effects . .Dynamic Semantics . .Type System . .Discussion & Related Work . .Conclusion . Conclusion .Recapitulation of Research Questions . .Discussion . .Conclusion . Bibliography Summary Samenvatting Acknowledgments vi 1 Introduction Implementing programming languages is a non-trivial endeavor. A typical language processing pipeline consists of several possibly interdependent components. Thus, if an existing programming language is extended with a new construct, its tooling has to be extended accordingly. A case in point could be adding an async/await mechanism to a Java-like language. This would imply that all the processors of this language will need to be extended to consider the new constructs. It is therefore desirable that one could reuse the implementation of existing processors, extending them in a modular fashion. However, this kind of language extension poses a number of challenges. For instance, a newly introduced feature can demand different context information than the one assumed by the existing processors, or it can interact with the existing features in intricate ways. These scenarios show how complex is to achieve true modular language extensibility. In this thesis we develop techniques to reduce this complexity, contributing to make the implementation of modular languages a reality. The results of this dissertation contribute to the idea of language-oriented pro- gramming [Dmi; FFF+; War], a vision that puts programming languages at the center of the software development practice. Instead of using one general-purpose programming language (GPL), the idea is to represent the diversity of domains that crosscut traditional software development using specialized Domain-Specific Languages (DSL) [Fow; vDKV]. This allows programmers to concentrate on the elements of the domain and their interactions instead of the details about their representation. In language-oriented programming it is essential to have tools and practices that make the development of new languages less costly. In particular, in this thesis we exploit linguistic reuse [Kri] by means of better modularity. We consider linguistic . Introduction reuse as the application of the principles of software reuse to the domain of language development. According to Krueger [Kru], “Software Reuse is the process of creating software systems from existing software rather than building software systems from scratch”. A canonical example of reuse is library-based development. In fact, in McIllroy’s seminal paper on software reuse he presents the vision of a library of reusable components [McI]; software can be thus developed by assembling such components. A software library is a collection of interrelated functions, classes or modules that cover a particular domain or application concern (e.g., accessing the database) and is intended to be reused as is. The main motivation of this thesis is to provide new mechanisms to implement libraries of language fragments. A language fragment is the specification of a language that might or might not be usable on its own, but can be composed with others to form increasingly complex languages. In other words, a language fragment is a module that represents the implementation of part of a language. The essential question that follows is how to define modular and extensible language fragments. One of the keys of the success of object-oriented programming is its support for extensibility via inheritance [Ald]. Statically typed object-oriented languages such as Java or C# support static safe incremental extensibility which enables modular development. In this work we posit the question of how to use object-oriented techniques to implement reusable language fragments. The object algebra pattern [OC], encoded with standard object-oriented features, is a technique in that direction, as it enables incremental extensibility along two axes that fit the language development scenario: to develop a language we can both extend its syntax and add new semantic interpretations. In this thesis we exploit the object algebra pattern to develop modular language components that form libraries of language fragments. We focus our attention on two kinds of libraries. On the one hand, we want to develop extensions of an existing GPL using object algebra-based language fragments, in the spirit of “Growing a Language” [Ste]. On the other hand, we want to assemble several object algebra- based language fragments that can help us to build different languages from scratch, as if these fragments were language LEGO blocks. The next section provides an overview of the idea of language libraries and component-based language specifications, alongside the background on related techniques for language extensibility and language modularity. Section .introduces the central technique that we make use of: object algebras, while Section .describes how we have applied object algebras to develop language libraries, introducing our research questions and how they correspond to the chapters in this thesis. Furthermore, related work that is specific to each research question is also discussed. In Section . we list our contributions in terms of peer-reviewed publications. Section .presents .. Language Libraries the software artifacts produced in the context of this thesis. Finally Section . describes the structure of this dissertation. Language Libraries The idea of component-based language development has originated work in different communities. In [TSC+], for example, the authors coin the concept of “Language as Libraries” and illustrate it with the Racket language. Racket programmers can write language modules that reuse the base-level meta-infrastructure by having access to a well-defined interface to the static semantics. These modules themselves are written in Racket supporting the argument that writing a language is equivalent to writing a library in the base language, instead of more complex approaches such as extensible compilers [NCM]. Another embodiment of the idea of language libraries is SugarJ [ERK+], a Java-like language in which libraries are just like regular Java libraries together with syntactic sugar definitions to cater for extensible syntax. On the more formal side, Bergstra et al. propose an axiomatic algebraic calculus of modules to dissect the essence of composition [BHK]. More pragmatically, Heering and Klint consider a library of reusable semantic components as an essential element of Language Design Assistants [HK]. In [CMS+; CMT] Churchill et al. propose fundamental constructs (fun- cons), a fixed set of reusable components of semantic specifications defined using I-MSOS [MN], an improved, modular variant of Structural Operational Semantics. Using funcons as basic building blocks, one can specify arbitrary compositions that provide the specification of complex general-purpose languages. In this dissertation, a language library is a library of language fragments written in an object-oriented host language. The basic mechanism for implementing the idea of language libraries is language composition. It is therefore necessary to establish some basic terminology in this regard. In algebraic terms, if we have two modular definitions for language