Reflective Embedding of Domain-Specific Languages

Reflective Embedding of Domain-Specific Languages Vom Fachbereich Informatik der Technischen Universität Darmstadt genehmigte Dissertation zur Erlangung des akademischen Grades eines Doktor-Ingenieurs (Dr.-Ing.) vorgelegt von Dipl.-Inform. Tom Dinkelaker geboren in Frankfurt am Main, Deutschland Referent: Prof. Dr.-Ing. Mira Mezini Korreferent: Prof. Dr.-Ing. Robert Hirschfeld Datum der Einreichung: 26.11.2010 Datum der mündlichen Prüfung: 10.12.2010 Erscheinungsjahr 2011 Darmstadt D17 Abstract In the recent years, there is an increasing interest for new programming languages that are special for certain problem domains – so-called domain-specific languages [MHS05]. A domain- specific language (DSL) provides a syntax and semantics for a particular class of problems. When writing code in a DSL, its concrete syntax enables higher end-user productivity [MHS05, KLP+08] because the syntax is closer to the problem domain than the syntax of a general- purpose language (GPL). However, implementing a new programming language is expensive, because language developers need to design and develop a new syntax and semantics. Because of these high initial costs, stakeholders desist from implementing a new DSL from scratch. To lower the initial costs of developing a DSL, there is the idea to embed a language into an existing language [Lan66, Hud96, Hud98, Kam98]. The existing language—often a general- purpose language—acts as a host language for the guest language that is embedded into the host as a library. Such an embedded language reuses the general-purpose features of the host language. Consequently, this reduces the development costs of the embedded language by reusing these features and by reusing the existing host language infrastructure [Hud98, Kam98, KLP+08], i.e., its development tools, parser, compiler, and/or virtual machine. Furthermore, new language features can be incrementally added to the embedded language by “simply” ex- tending its library [Hud96]. Indeed, there is a comparative study [KLP+08] that shows that embedding requires significant lower investments to implement a new language compared to traditional non-embedded approaches. However, existing language embedding approaches have several important limitations w.r.t. the features that are usually supported by non-embedded approaches. First and foremost, most embedding approaches and host languages lack support for designing a concrete syntax for an embedded language. Second, current approaches do not allow embedding special language semantics that the embedded language needs but that conflict the host semantics. Third, there is no support for embedding multiple languages that have interactions in their syntax and semantics. The lack of support for these features in current embedding approaches mostly stem from their restrictions to adapt constructs in embedded languages and host languages. To address these limitations, this thesis proposes the reflective embedding architecture that embeds languages into reflective host languages. A reflective language is a special programming language with support for reflection that enables programs to reason about their structure and behavior. Specifically, there is support for introspection that allows analyzing programs, and support for intercession that allows transforming programs. Using reflection helps overcoming current limitations of language embedding, since it enables evolution of embedded languages by analyzing and transforming their libraries. iii The reflective embedding architecture itself defines a process how to embed a language’s syntax and semantics as a set of artifacts. Every language is an encapsulated component that has a set of well-defined interfaces. The classes of the embedding implement these interfaces to define the details of the execution semantics for that language component. In addition, the architecture enables embedding of language compositions, scoping strategies, domain-specific analyses and transformations. What is special in the architecture is that every language component implicitly has a meta level. When necessary, the language developers can use this meta level to evolve embeddings, which enables a number of benefits. First, in contrast to other embedded approaches, the reflective embedding architecture enables support for concrete syntax for embedded languages without high initial syntax costs. For an embedded language, developers can annotate its library with meta data for rewriting DSL programs in concrete syntax into a corresponding executable form. By using a special technique for agile grammar specification, called island grammars [Moo01], developers only need to specify those parts of the embedded language’s grammar that deviate from the grammar of the host language. Island grammars help significantly reducing the initial syntax costs, while supporting the full class of context-free grammars. Second, the meta level enables support for composition of languages with interactions in their syntax and semantics. On the one hand, the architecture provides special language combiners that analyze and compose the syntax of all embedded languages in a composition. On the other hand, there are language combiners that transform embedded languages so that they can interact at well-defined points in their evaluation logic. By allowing interacting compositions, the architecture enables embedding special language abstractions that facilitate end user programmers to implement more modular programs. Third, the meta level enables embedding languages that interact with their host languages. The meta level uses reflective features to interact with the evaluation of host language constructs in order to change the host’s semantics. Only when the host semantics can be changed, developers can embed sophisticated DSLs. To review the reflective embedding architecture, this thesis conducts a qualitative and quan- titative evaluation of the proposed concepts. First, to qualitatively evaluate the concepts of this thesis, the evaluation reviews the support of the reflective embedding architecture for supporting a set of desirable properties and compares its support to related work. Second, to quantitatively evaluate the concepts, the evaluation compares to embedded and non-embedded approaches using benchmarks. The results show that the architecture’s flexibility only comes with moderately increased runtime costs. iv Zusammenfassung In den letzten Jahren gibt es ein zunehmendes Interesse an neuen Programmiersprachen, die auf bestimmte Problemdomänen spezialisiert sind, so genannte domänenspezifische Sprachen [MHS05] (engl. domain-specific languages, DSLs). DSLs stellen eine spezielle Syn- tax und Semantik zur Lösung einer bestimmten Klasse von Problemen zur Verfügung. Wenn Programme mit einer DSL entwickelt werden, erlaubt ihre konkrete Syntax eine höhere Pro- duktivität des Endanwenders [MHS05, KLP+08], da ihre Syntax näher an der Problemdomäne ist als die Syntax einer allgemeinen Sprache (engl. general-purpose language, GPL). Jedoch ist die Implementierung einer neuen Programmiersprache kostenintensiv, da Sprachentwickler eine Syntax und Semantik entwerfen und umsetzen müssen. Wegen dieser hohen initialen Kosten sehen viele davon ab eine neue DSL zu implementieren und einzusetzen. Um die initialen Kosten für DSLs zu minimieren, gibt es den Ansatz eine DSL in eine vorhandene Sprache einzubetten [Lan66, Hud96, Hud98, Kam98]. Dabei wird die Syntax und Semantik der eingebetteten Sprache in Form einer Bibliothek in dieser Sprache beschrieben. Die vorhandene Sprache fungiert dabei als eine Wirtssprache für die DSL, die als Gast- sprache in die Wirtssprache eingebettet wird. Durch die Einbettung können Gastsprachen die allgemeinen Sprachkonstrukte der Wirtssprache wieder verwenden, was Entwicklungskosten einspart [Hud98, Kam98, KLP+08]. Zusätzlich erlaubt die Einbettung die Wiederverwendung der bestehenden Sprachinfrastruktur, wie z.B. den Entwicklungswerkzeugen, dem Parser, dem Compiler und der virtuellen Laufzeitumgebung. Neue Sprachkonstrukte können bei Bedarf durch Erweiterungen der entsprechenden Bibliotheken [Hud96] hinzugefügt werden. Eine ver- gleichende Studie [KLP+08] zeigt, dass durch Einbettung die Entwicklungskosten für die Im- plementierung neuer Sprachen, im Vergleich zu traditionellen nicht eingebetten Sprachentwick- lungsansätzen, signifikant reduziert werden können. Trotz dieser Vorteile gibt es für bestehende Ansätze zur Spracheneinbettung wichtige Ein- schränkungen bezüglich der Unterstützung für Spracheigenschaften, die normalerweise von nicht eingebetten Ansätzen unterstützt werden. In erster Linie gibt es bei bestehenden Ansätzen zur Einbettung in den meisten verwendeten Wirtssprachen keine Unterstützung für die freie Gestaltung der konkreten Syntax für die eingebettete Gastsprache [KLP+08]. Beim Einbetten der Gastsprache in die Wirtssprache müssen die Ausdrücke der Gastsprache mit der Syntax der Wirtssprache übereinstimmen. Zweitens ist die Sprachsemantik, die eine Gastsprache haben kann, durch die vorgegebene Sprachsemantik der Wirtssprache eingeschränkt. Drittens gibt es keine Unterstützung für spezielle Kompositionen von mehreren DSLs, die Interaktionen in ihrer Syntax und Semantik haben. Diese Einschränkungen sind vorallem auf die mangelnde Anpass- barkeit von Konstrukten in eingebetteten Sprachen und Hostsprachen zurückzuführen. v Um diese Probleme anzugehen, schlägt die vorliegende Arbeit die Reflective Embedding Ar- chitecture vor, die es ermöglicht Gastsprachen in reflexive Wirtssprachen einzubetten.

Load more