Programming Languages Р La Carte

Programming Languages à la Carte Edoardo Vacchi Id. Number:R 09518 Scuola di Dottorato in Informatica PhD in Computer Science Advisor: Prof. Walter Cazzola COMPUTER SCIENCE DEPARTMENT • UNIVERSITÀ DEGLI STUDI DI MILANO ADAPT-LAB Computer Science Department ADAPT-Lab Ciclo XXVII Academic Year 2013–2014 Contents 1. Introduction 1 2. Background 7 3. Feature-Oriented Language Composition 11 3.1. Conceptual Model . 12 3.1.1. Language Components . 13 3.1.2. Dependencies Between Components . 15 3.1.3. Globally-Scoped Components . 17 3.1.4. Composition Model . 18 4. Neverlang 21 4.1. The Neverlang Framework . 22 4.1.1. Defining Syntax and Semantics: Modules . 23 4.1.2. Mapping Semantics onto Syntax: Slices . 30 4.1.3. Combining Slices Together: Generating a Language . 35 4.2. Runtime Deployment of Semantic Actions and Tree Rewriting DSL . 38 4.3. Tools and Utilities . 39 4.4. Implementation . 43 4.4.1. Architecture . 44 4.4.2. Runtime and Execution . 49 4.4.3. DEXTER . 53 5. Case Study: Evolution of a DSL through Composition 55 5.1. A Simple State Machine DSL . 56 5.2. A Simple Imperative Language . 59 5.3. Guards and Actions: Composing the DSLs . 61 6. Evaluation 67 6.1. Feature-Oriented Language Implementation Across Tools . 67 6.1.1. LISA ...................................... 68 6.1.2. Silver . 70 6.1.3. Spoofax . 72 6.1.4. Xtext . 74 6.1.5. Summary . 76 i Contents 6.2. Extending a Real-World Language: neverlang.js ................ 80 6.2.1. Runtime Evolution for Dynamic Optimization . 83 6.3. The DESK Language . 84 6.4. Tracking Dependencies Through Variability Management . 89 7. Related Work 93 7.1. Extensible Parser Generators . 96 7.2. Variability Modeling of Language Families . 97 8. Conclusions 99 A. Formal Composition Model 101 A.1. Decomposition of Syntax Definitions . 101 A.2. Decomposition of Language Semantics . 103 B. On The Relation Between LR Goto-Graphs 107 B.1. Goto-Graphs and Growing Grammars . 109 B.1.1. Construction of j and DV ......................... 112 B.1.2. Construction of y and DE ......................... 115 B.1.3. Construction of GG from GG ....................... 118 B.2. Goto-Graphs and Shrinking Grammars . 121 C. Variability Model Inference 123 C.1. Tag Generation . 123 C.2. Hierarchical Clustering . 124 C.3. Refinement Procedure . 125 C.4. Heuristics for Mining Constraints . 127 ii 1 Introduction In Nineteen Eighty-Four, the IngSoc party imposes on the population an artificial language called Newspeak «not only to provide a medium of expression for the world-view and mental habits proper to the devotees of IngSoc, but to make all other modes of thought impossible»1. Fiction aside, a highly-debated, fascinating hypothesis in linguistics is that the language we speak shapes the way we think. Since the 1930s, the notion that different languages may influence the cognition skills of the speakers has become associated to Edward Sapir and Benjamin Whorf, American linguists who studied how languages vary and conjectured the ways different tongues would affect their speakers. These ideas were initially met with enthusiasm and excitement, yet they were finally stroked by a substantial lack of evidence. But recently, a solid body of new empirical evidence has emerged [10], showing that language has influence even in the most fundamental dimensions of human experience, such as space, time, causality and relationships to others, affecting even memory and learning abilities. Programming languages are quite different from natural languages. Natural languages can be ambiguous, informal and vague; they leave room for creativity and imagination. On the other hand, programming languages are a form of communication between humans and machines: they are necessarily unambiguous, formal and precise. Because of this inherent rigidity, not only do these artificial languages enforce strict rules on the way a program should be written, but also impose a mental model on the programmers. And, as programmers, we often feel like choosing a programming language over another frees our mind from cognitive burden. Edsger W. Dijkstra despised FORTRAN «for as a vehicle of thought it is no longer adequate: it wastes our brainpower, is too risky and therefore too expensive to use» [24]; Paul Graham, Eric S. Raymond and 1Orwell G., from the appendix “The Principles of Newspeak” 1 1. Introduction val fixedIncomeTrade = 200 discount_bonds IBM for_client NOMURA on NYSE at 72.ccy(USD) val equityTrade = 200 equities GOOGLE for_client NOMURA on TKY at 10000.ccy(JPY) Listing 1.1: A trading DSL in Scala (from [38]). many others have defined learning Lisp «an enlightment experience» [79, 41]. But even if languages influence our thought, the influence goes the other way, too. In fact, we are constantly looking for ways to extend and enrich our programming languages with new constructs and powerful abstractions, trying to close the gap between the way we are accustomed to think as human beings and the way machines have to be instructed. Modern general purpose languages are striving to provide more abstraction to programmers; mainstream design is progressively converging towards a hybrid between object-orientation and functional programming. Languages from both the communities cross-pollinate each other with features. Languages that were born as purely object-orientated, nowadays tend to include functional constructs. However, even though the tendency to contamination is strong, programming language implementations hardly share any code. Even close relatives, with similar syntax and semantics are usually developed from scratch, using techniques that we would barely call modern; after all, modular programming dates back to the 1970s [75]. Yet, programming language development is often still a top-down, monolithic activity, where extensibility of the compiler, although desirable, is only an afterthought. Nevertheless, the rising trend of developing programming languages to target a particular application domain, to solve a particular problem of that domain, has been cause for a shift in the way computer languages are designed and implemented. The model of development where the software system is built from the ground up using a little language [7] has been dubbed language-oriented programming [107]. In language-oriented programming, domain-specific languages (DSLs) are developed to write programs of the application domain in a concise, problem-oriented way. Contrasting this trend to the traditional language development process, in this model language implementation is quite a bottom-up activity, where the specification rises from the problem that the developers need to solve. As a matter of fact, domain-specific languages are a natural part of our everyday workflow, because they are designed to simplify interaction with specific software systems; at the same time, a well-crafted DSL brings interaction with software even within the reach of domain experts, who may not be necessarily professional programmers. Mathematicians may write MATLAB programs; statisticians may use the SAS programming language; a hardware engineer could write in Verilog. Different DSLs targeting different domains may be used to implement different concerns of the same software system. Therefore, language composition, extension and reuse are highly-researched topics. For instance, in electronic automation and communication protocols, the transitions between states are usually represented through state machine languages; it is not uncommon to combine these languages with a restricted 2 List<Person> persons = query.from(person) .where( person.firstName.eq("John"), person.lastName.eq("Doe")) .list(person); Listing 1.2: A query using Spring Data’s Query DSL [78] in Java. imperative programming language to model actions that are performed when a transition fires. General purpose programming languages often embed other domain-specific languages to perform a number of tasks, such as querying databases (using SQL), sending marshaled data over the wire (using JSON) or pattern matching over a string (using regular expressions). Language composition and extension can be achieved through embedding of a guest language in a host programming language or by implementing the language to be stand-alone through specific tooling. In the first case, the simplest form of embedding is through quoting: the foreign programming language is typically represented as a string; this has several downsides, the most obvious of which is the lack of a support for static verification from the host language tooling. A more advanced and modern form of embedding is the fluent interface [35]. A fluent interface is a particular API design style that makes syntactically valid lines of code of the host programming language read like a foreign language. Fluent APIs are often used to embed query languages within the body of a general purpose programming language (cf., Spring Data’s Query DSL [78], Listing 1.2) or to describe graphical user interfaces (cf. JavaFX’s APIs [23]). DSL embedding through fluent interfaces has become part of the idiom in many modern programming languages such as Scala, Ruby and Groovy (e.g., Listing 1.1). Because of this, embedding is generally the most convenient and widely adopted technique to implement a DSL. Recent research has also shown the benefits of employing the same technique to produce high-performance, code-generating DSLs [82]. However, this technique has its limits. First of all, the syntax of the embedded DSL is inevitably dictated by the host programming language. Second, since it may not be possible to sandbox the guest language environment, final users may inadvertently end up outside the boundaries of the DSL and in direct contact with the host language. External languages, on the other hand, are usually implemented using dedicated toolsets, but they give a much finer-grained level of control to the language developer. There is no more hard limit on the way the syntax is defined, and the language implementation works stand-alone, without a host. The traditional route to this kind of language development is to first implement the front-end either by hand or through parser generators such as yacc, ANTLR [76] or, more recently, parser combinators [93, 66].

Load more