Manydsl One Host for All Language Needs

ManyDSL One Host for All Language Needs Piotr Danilewski March 2017 A dissertation submitted towards the degree (Dr.-Ing.) of the Faculty of Mathematics and Computer Science of Saarland University. Saarbrücken Dean Prof. Dr. Frank-Olaf Schreyer Date of Colloquium June 6, 2017 Examination Board: Chairman Prof. Dr. Sebastian Hack Reviewers Prof. Dr.-Ing. Philipp Slusallek Prof. Dr. Wilhelm Reinhard Scientific Asistant Dr. Tim Dahmen Piotr Danilewski, [email protected] Saarbrücken, June 6, 2017 Statement I hereby declare that this dissertation is my own original work except where otherwise indicated. All data or concepts drawn directly or indirectly from other sources have been correctly acknowledged. This dissertation has not been submitted in its present or similar form to any other academic institution either in Germany or abroad for the award of any degree. Saarbrücken, June 6, 2017 (Piotr Danilewski) Declaration of Consent Herewith I agree that my thesis will be made available through the library of the Computer Science Department. Saarbrücken, June 6, 2017 (Piotr Danilewski) Zusammenfassung Die Sprachen prägen die Denkweise. Das ist die Tatsache für die gesprochenen Sprachen aber auch für die Programmiersprachen. Da die Computer immer wichtiger in jedem Aspekt des menschlichen Lebens sind, steigt der Bedarf um entsprechend neue Konzepte in den Programmiersprachen auszudrücken. Jedoch, damit unsere Denkweise sich weiterentwicklen könnte, müssen sich auch die Programmiersprachen weiterentwickeln. Aber welche Hilfsmittel gibt es um die Programmiersprachen zu schaffen und aufzurüsten? Wie kann man Entwickler ermutigen damit sie eigene Sprachen definieren, die dem Bereich in dem sie arbeiten am besten passen? Heutzutage gibt es zwei Methoden. Die erste Methode: es gibt spezifische Werkzeuge und Parser-Generatoren, die zum Schaffen der unabhängigen Pro- grammiersprache von Anfang an dienen. Die zweite Methode: man kann die ausreichend flexiblen exisitierenden Hostsprachen ausnutzen, um in sie kleine DSL einzubetten. Die beiden Methoden haben eigene Beschränkungen. Einerseits braucht man viel Aufwand um die unabhängige Programiersprache zu schaffen. Diese Sprache ist es schwer mit den anderen Sprachen zu verbinden. Andererseits sind die eingebetteten DSLs durch Syntax der Hostsprache eingeschränkt. Außerdem wenn die eingebetteten DSLs einmal definiert sein werden, sind sie ständig gegenwärtig. Es gibt keine Abgrenzung zwischen den eingebetteten DSLs und der Host- sprache. Wenn man viele eingebettete DSLs verwendet, führt es zur Sprachen- mischung, die Syntax durcheinander hat. Diese Sprachenmischung hat auch unerwartete Interaktionen zwischen den Sprachen. In der vorliegenden Arbeit wird die alternative Lösung dargestellt: ManyDSL. Das ist ein einzigartiger Interpreter und Compiler, die aus diesen Lösungen Kraft schöpft und meidet die Schwächen dieser Lösungen. ManyDSL hat den eigegen LL1 Parser-Generator, der die Beschränkungen meidet, die von der Hostsprache aufgedrängt sind. Beschreibung der Grammatik ist definiert in derselben Programmiersprache wie die anderen Teile des Pro- gramms. Die Fragmente der Grammatiken können parametrisiert werden und aus diesen Fragmenten können Funktionen geschaffen werden. Diese Funktionen können zum Schaffen der nächsten Sprachen benutzt werden. Die Sprachen werden während des Interpretationsprozesses geschaffen und sie können benutzt werden um nächste Fragmente des Quellecodes zu parsen. Ähnlich den eingebetteten DSLs übersetzt ManyDSL alle Sprachen in die Hostsprache. Die Hostpsprache verwendet Continuation-Passing Style (CPS) mit der neuartigen, dynamischen Methode für Staging. Staging erlaubt Partial Evaluation und Ausführung von Quellecode in vielen Phasen. Das kann zum Definieren der Optimierung und der ’zusätzlichen Berechnung’ benutzt werden — alles das in der Funktionalen Methode, ohne Abstrakten Syntaxbaum (ASTs) zu benutzen. Mit der Hilfe von ManyDSL kann der Benutzer neue Sprachen mit der erkennbaren Syntax bauen. Außerdem kann er viele Sprachen innerhalb eines Projektes verwenden. Diese Sprachen haben genaue Grenzen und der Benutzer kann zwischen diesen Sprachen umschalten. Dank diesen Grenzen treten diese Sprachen miteinander in die Interaktion auf kontrollierte Art und Weise. ManyDSL ist der erste Schritt zum Sprachwechsel in den Programmier- sprachen. Mit der Hilfe von ManyDSL möchte ich die Entwickler zum Schaffen der Sprachen, die denen am besten passen, ermutigen. Ich hoffe, dass jeder Entwickler mit der Zeit mit der Hilfe von grammatischen Bibliotheken neue Sprachen schaffen kann. Abstract Languages shape thoughts. This is true for human spoken languages as much as for programming languages. As computers continue to expand their dominance in almost every aspect of our lives, the need to more adequately express new concepts and domains in computer languages arise. However, to evolve our thoughts we need to evolve the languages we speek in. But what tools are there to create and upgrade the computer languages? How can we encourage developers to define their own languages quickly to best match the domains they work in? Nowadays two main approaches exists. Dedicated language tools and parser generators allows to define new standalone languages from scratch. Alternatively, one can “abuse” sufficiently flexible host languages to embed small domain- specific languages within them. Both approaches have their own respective limitations. Creating standalone languages is a major endeavor. Such languages cannot be combined easily with other languages. Embedding, on the other hand, is limited by the syntax of the host language. Embedded languages, once defined, are always present without clear distinction between them and the host language. When used extensively, it leads to one humungous conglomerate of languages, with confusing syntax and unexpected interactions. In this work we present an alternative: ManyDSL. It is a unique interpreter and compiler taking strength from both approaches, while avoiding the above weaknesses. ManyDSL features its own LL1 parser generator, breaking the limits of the syntax of the host language. The grammar description is given in the same host language as the rest of the program. Portions of the grammar can be parametrized and abstracted into functions, in order to be used in other language definitions. Languages are created on the fly during the interpretation process and may be used to parse selected fragments of the subsequent source files. Similarly to embedded languages, ManyDSL translates all custom languages to the same host language before execution. The host language uses a continuation- passing style approach with a novel, dynamic approach to staging. The staging allows for arbitrary partial evaluation, and executing code at different phases of the compilation process. This can be used to define domain-specific optimiza- tions and auxiliary computation (e.g. for verification) — all within an entirely functional approach, without any explicit use of abstract syntax trees and code transformations. With the help of ManyDSL a user is able to create new languages with distinct, easily recognizable syntax. Moreover, he is able to define and use many of such languages within a single project. Languages can be switched with a well-defined boundary, enabling their interaction in a clear and controlled way. ManyDSL is meant to be the first step towards a broader language pluralism. With it we want to encourage developers to design and use languages that best suit their needs. We believe that over time, with the help of grammar libraries, creating new languages will become accessible to every programmer. Contents 1 Introduction1 1.1 Motivation..............................1 1.2 Solutions Today............................4 1.3 Our Work...............................6 1.4 The Structure of this Work..................... 10 2 Background 13 2.1 Language Clasification........................ 13 2.1.1 Language Generations.................... 13 2.1.2 Language Paradigms..................... 15 2.2 Compilation and Interpretation................... 17 2.2.1 Compilation Phases..................... 17 2.2.2 Type Checking........................ 18 2.3 Staging................................ 22 2.3.1 Fixed Staging......................... 23 2.3.2 Textual Staging........................ 25 2.3.3 Structured Staging...................... 26 2.3.4 Dynamic Code Generation.................. 28 2.3.5 Code as a First Class Citizen................ 29 2.3.6 Automated Staging...................... 32 2.3.7 Staging in ManyDSL..................... 33 2.4 Language Construction........................ 34 2.4.1 Parser Generators...................... 34 2.4.2 Parser Combinators..................... 35 2.4.3 Code Generation....................... 36 2.5 Language Embedding........................ 37 2.5.1 Plain Shallow Embedding.................. 38 2.5.2 Deep Embedding....................... 39 2.5.3 Shallow Embedding with Staging.............. 42 2.6 Metamorphic Languages....................... 43 2.6.1 The Importance of Syntax.................. 43 2.6.2 Macro Languages....................... 48 2.6.3 Racket............................. 50 2.6.4 Grammar Extension..................... 51 2.6.5 Grammar replacement.................... 53 2.6.6 Metamorphism in ManyDSL................ 54 xi 3 ManyDSL Overview 57 3.1 The Main Goal............................ 57 3.2 Properties............................... 59 3.3 Design Decisions........................... 64 3.4 Separation of Concerns.......................

Manydsl One Host for All Language Needs

Working on ENIAC: the Lost Labors of the Information Age

Herman Heine Goldstine

“Scrap Your Boilerplate” Reloaded

On Specifying and Visualising Long-Running Empirical Studies

Dec. 21St Ladies’ [May 00]

Lecture Notes in Computer Science 6120 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan Van Leeuwen

Domain-Specific Languages for Modeling and Simulation

Unbounded Spigot Algorithms for the Digits of Pi

Towards a Repository of Bx Examples

The Women of ENIAC

A Separation Logic for Concurrent Randomized Programs

Profunctor Optics Modular Data Accessors