An Agile Approach to Language Modelling and Development
Total Page:16
File Type:pdf, Size:1020Kb
ISSE manuscript No. (will be inserted by the editor) An Agile Approach to Language Modelling and Development Adrian Johnstone · Peter D. Mosses · Elizabeth Scott Received: ??? / Accepted: ??? Abstract We have developed novel techniques for 1 Introduction component-based specification of programming languages. In our approach, the semantics of each fundamental We are developing a component-based approach to language programming construct is specified independently, using an modelling. Our models are formal specifications of syntax inherently modular framework such that no reformulation and semantics. is needed when constructs are combined. A language Language designers already use formal specifications specification consists of an unrestricted context-free for syntax; we claim that formal semantics can be useful grammar for the syntax of programs, together with an and worthwhile for them too, provided that the following analysis of each language construct in terms of fundamental features are combined in a single, coherent framework: constructs. An open-ended collection of fundamental – a substantial collection of specification components that constructs is currently being developed. can be used without change; When supported by appropriate tools, our techniques – accessible formalisms and notation; allow a more agile approach to the design, modelling, – tool support for developing and checking specifications; and implementation of programming and domain-specific – tools for generating (at least prototype) implementations languages. In particular, our approach encourages lan- from tentative language specifications; guage designers to proceed incrementally, using prototype – successful case studies involving major languages; and implementations generated from specifications to test – an online repository providing access to all specifica- tentative designs. The components of our specifications tions. are independent and highly reusable, so initial language specifications can be rapidly produced, and can easily evolve Our aim is to provide such a framework. in response to changing design decisions. Our component-based approach supports agile mod- elling, so language designers may start from a rough model In this paper, we outline our approach, and relate it to of their initial design – possibly making extensive reuse the practices and principles of agile modelling. of components from previously-developed models – ex- periment with a generated prototype implementation, then Keywords Programming language models · Syntax · adjust and extend the model easily as their design evolves. Semantics · Agile methods In contrast, conventional frameworks for semantic spec- ification are inherently non-agile, because adding a new feature to the specified language may require reformulation A. Johnstone · E. Scott of the entire specification. For example, suppose we have Dept. of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX, United Kingdom specified a conventional structural operational semantics [35] for a purely functional language; extending the lan- P. D. Mosses Dept. of Computer Science, Swansea University, guage with assignable variables or concurrent processes Singleton Park, Swansea, SA2 8PP, United Kingdom requires adding further arguments to the specified transition Tel.: +44 1792 602249, Fax: +44 1792 295708 relation, and those arguments have to be added in the E-mail: [email protected] original specification. Conventional denotational semantics 2 suffers from the same problem. Agile language modelling 2 Syntax is not possible when the specification of each construct depends on which other constructs it is to be combined with. Language syntax is often specified in several parts, sepa- With few exceptions, language designers have not them- rating the lexical components or words from the phrases or selves produced formal specifications of their languages’ sentences. intended semantics. For instance, the Haskell designers Context-free grammars have been the preferred means to were familiar with the conventional modelling frameworks, specify the phrase-level syntax since the time of the Algol 60 and “resorted to denotational semantics to discuss design report. Reference manuals and standards documents use options”, but they also reported [15]: grammars which are comfortable for the explication of the semantics, but typically these are unsuitable for direct “One of our explicit goals was to produce a language implementation using near-deterministic parser generators that had a formally defined type system and seman- such as Bison and JavaCC. Most language standards ignore tics. [. ] We were inspired by our brothers and this problem, although the original Java Language Specifi- sisters in the ML community, who had shown that it cation [11] provides two grammars – one for explication and was possible to give a complete formal definition of one intended for use with parser generators – and its authors a language, and the Definition of Standard ML [24] went to some trouble to establish their equivalence. had a place of honour on our shelves. Neither the separation between lexical and phrase-level syntax, nor having two separate grammars for phrase-level Nevertheless, we never achieved this goal. [. ] we syntax, is consistent with an agile approach to language always found it a little hard to admit that a language design. They make it difficult to move rapidly to proto- as principled as Haskell aspires to be has no formal typing and to adopt incremental design because several definition. But that is the fact of the matter, and inter-related data structures have to be maintained, and it is not without its advantages. In particular, the changing one requires corresponding changes to the others. absence of a formal language definition does allow The quality of the language design may be compromised the language to evolve more easily, because the because the design of the lexical components cannot exploit costs of producing fully formal specifications of the phrase-level context. Moreover, one must ensure that any proposed change are heavy, and by themselves the grammar used to define the syntax and the grammar discourage changes.” used to specify the semantics generate the same language. Maintaining this language equivalence is not trivial: by the We claim that using our component-based approach, the cost time of the third edition of the Java Language Specification of specifying – and experimenting with – proposed changes [12] significant errors had crept into the implementation would be much lighter than with conventional frameworks. grammar. For implementation it would be far better to use In the rest of this paper, we outline our approach and the explication grammar, which is the base grammar used analyse its agility. Section 2 is concerned with the speci- by the language’s designers, than a derived grammar of fication of language syntax. It points out some drawbacks doubtful accuracy. of the conventional approach, and proposes the use of The ASF+SDF Meta-Environment [3] and Stratego/XT general context-free grammars, for which efficient parsing [42] are environments that provide some of the features algorithms have recently been developed. Section 3 intro- discussed in this section, and indeed the ASF+SDF Meta- duces the concept of language-independent fundamental Environment was used to implement the Action Environ- constructs, and illustrates how language syntax can be ment [5]. The SDF parser generator which underpins both translated to such constructs. Section 4 recalls two frame- tools uses Farshi-style GLR parsing with SLR(1) tables, and works which allow the semantics of individual constructs the user is provided with tools that allow some parse-time to be specified independently; using these frameworks, the disambiguation [4]. semantic specifications of fundamental constructs become In practice, SDF tables are monolithic and must be highly reusable components of complete language specifica- regenerated when specifications change. This can be very tions. Section 5 briefly outlines a proposed online repository time consuming. In addition, the parse-time features are not for component-based specifications of programming and formally well-founded, and parsers that employ them can domain-specific languages, and Section 6 describes the fail to correctly match the context free language suggested kinds of tool support that are needed for practical use of by the context free rules in their specifications: the language our approach. Section 7 considers the extent to which our recognised by the parser can change as disambiguation approach adheres to the practices and principles of agile specifications are added, and in fact the recognised language modelling. Section 8 reviews related approaches. Section 9 may become context sensitive. This makes it very hard to concludes with a summary of our claims. prove completeness and correctness of the resulting parsers. 3 Our approach is based on the use of GLL parsers [38] aiming to maximise their mnemonic value without bias which are fully general, efficient at parse time and, most towards any particular language family. importantly from a software developer’s perspective, require Funcos are classified according to the kind of value that no more processing time to generate than conventional they normally compute when executed: Cmd classifies com- recusive descent parsers. As a result, regenerating the source mands, which compute nothing (or some fixed null value); code for a parser typically takes less time than compil-