A ++ LANGUAGE WORKBENCH

GAVIN WOOD

Abstract. Language-orientated programming promises to elevate programmer productivity through increased abstrac- tion capabilities. Structural programming environments provide apparatus to reduce the difficulties with syntax. The language workbench, a conceptual combination of these two approaches, is a comparatively novel approach to software development and has so far been attempted only in dynamic-dispatch, run-time-compiled languages (e.g. Java). However, it must be remembered that several fields of engineering exist, each having theor own priorities. For example within the video games industry, where large, complex and diverse projects are routinely developed but where efficiency is paramount C++, as a development platform, is widely used. I explore the possibility of a language workbench capable of a gradual transition in both skills and code from the traditional C++ development environment. This article is the design for a language workbench. It uses includes novel techniques including a context-sensitive event-driven input system and a hybrid single/multiple-inherited class model and through a prototype implementation demonstrates that is both concise and practical for C++. I refute the hitherto implicit hypothesis that the language workbench paradigm is not applicable to the C++ language, showing that C++ can be used for creating an effective development framework usable in otherwise pure-C++ programming environments.

1. Introduction correct throughout editing; and only a simple hierarchi- cal program structure is explicitly provided for. Poten- Despite decades of research programming comput- tially superfluous features such as program transforma- ers, in general, remains difficult. Incremental improve- tions, meta-language programming and cross-references ments to the development stack—e.g. our code edi- between program entities are not explicitly provided for. tors, languages, prototyping and versioning tools—have In order to implement this efficiently in C++, several made larger and more complex systems possible to build. issues are addressed in the eventual design: a hybrid sin- New programming paradigms—e.g. language-orientated gle/multiple inheritance schema is used to best express the programming—have allowed our code to become more complex interrelationship of language concepts; an exten- conceptually powerful, reducing language-induced idioms, sive ‘undo’ infrastructure is used to supplant traditional improving the “meaning-to-noise” ratio and allowing the serial-entry-specific mechanisms; a highly extensible dis- programmer to focus on creativity over rules. play mechanism based around web-technologies is used Some issues of development, however, e.g. syntax er- to provide rich visualisation, interaction and computer- rors, file/directory management and refactoring will not aided development; and a multi-modal hierarchy structure be fixed by better text-editors, cleverer codes or more is used to facilitate techniques. auxiliary tools; they are implications of this traditional I outline design and implementation issues and present development stack. To solve these problems we must ul- clear solutions to deliver a solution efficiently and con- timately move past languages and directories of text files: cisely in cross-platform C++, comparing where applicable language workbenches are one such destination. to past and present approaches. I show the prototype’s op- Some industries however, require the combina- eration demonstrating how the received features are use- tion of complex, high-performance software and lan- ful. Finally, I detail future directions that the present guages/development stacks that are well-tested, portable, work, and language workbenches in general, might take. industrially-supported, compatible with legacy code and for which a large developer base already exists. In this case there is little option; only C++ can fulfil these re- quirements. 2. Background 2.1. Structural Editing. Structural editing can be seen as a move away from the text-editor/text-file/ triplet that still dominates modern programming environ- 1.1. Overview. With the lack of a C++-based language ments. The actual idea of editing the conceptual program workbench, it might be tempting to conclude that an im- model rather than a string of plain text is at least 30 plementation would be impractical (perhaps due to the years old. When describing an early program-model ed- lack of language features of C++) or the eventual bene- itor, Cornell Program Synthesizer, Teitelbaum and Reps fits minimal (perhaps too cumbersome or unwieldy). The [1981] points out that “programs are not text; they are present work dispells this notion. Here I outline and dis- hierarchical compositions of computational structures and cuss a highly flexible and extensible language workbench should be edited, executed, and debugged in an environ- based around the C++ language. ment that consistently acknowledges and reinforces this Core functionality is reduced to a minimum, provid- viewpoint”. The idea did not become popular; indeed one ing instead flexible mechanisms allowing lingual idioms might ponder how many programmers even recognise the to be developed modularly. In particular, display and difference between the code of a program and its model. input are fundamentally separated; text-based input is More modern structural editors do exist and recent eschewed; the program is guaranteed to be structurally work into usability aspects has continued; Goldman [2004] 1 A C++ LANGUAGE WORKBENCH 2 describes a Java structural editor, JPie, used for the in- Workbench (discussed by Simonyi et al. [2006]). All are struction of novice programmers. However there is com- implemented in imperative or run-time, dynamic-dispatch mon criticism of structual editors (e.g. dicussed by Birn- languages (incompatible with C++) and have either cum- baum and Goldman [2005]) which relates to the diffi- bersome input designs or fall back onto a textual language culty of editing a program model with serialised input: input system. a program-model that is semantically correct, if entered The FermaT program manipulation system and the naturally, requires transition states of the program model framework it is built in, as discussed , work on the assump- that are nonsensical. For example, assigning an integer tion that an LOP project should comprise a single stack of variable i from the output of a function foo() one might formally-defined languages, each more specific to the ulti- like to enter i = foo before putting the final parentheses mate problem’s domain. Thus it becomes trivial to write on foo to dictate the desire to invoke it rather than assign the “final” program in the highest-level language. One it as a functor. Ko [2004] provides a more circumspect downfall of the single-stack design is to make difficult the discussion about structural-editing issues found as part combination of distinct languages, thus hampering reuse. of the Natural Programming project (a Java-based reper- The Meta-Programming System (MPS), proposed and toire of structural programming tools) and finds that the developed by JetBrains is a now open-source comprehen- user-interface of a structural code editor must be care- sive language workbench implemented in Java. It is based fully designed to be not only supportive (i.e. strict) but around a Java-implemented editor and several conceptual flexible enough to facilitate the user’s manipulation of the languages; here language is used to define the structure, program. editor (i.e. I/O) and semantics of a set of concepts. In particular, a language needs to have no canonical textual 2.2. Language-Orientated Programming. Language- representation. The languages MPS comes with include a Orientated Programming (LOP) coined by Ward [1994] structure—or language—language (i.e. one that is defined and discussed later by Dmitriev [2005], brings together in terms of itself, but initially would be bootstrapped from such paradigms as intentional programming, model-driven an absolute reference implementation); several base lan- architecture (MDA) and generative programming. Put guages, corresponding to key portions of the Java plat- simply, LOP champions a middle-out development or- form; several language-transformation languages (allow- der with an abstraction mechanism that contains syntax, ing operations on abstract portions of a program); and grammar and vocabulary. a editor language, allowing new editors to be created for LOP has several indisputable advantages, as described the languages. The display is extensible through the ed- by Simonyi et al. [2006], including greater scope for itor language, but a cellular display—not dissimilar from parametrisation (thereby optimising code reuse), max- an appropriately customised spreadsheet—is provided as imising productivity through facilitating the mixing of standard. Being Java-based it is ineffective for authoring multiple languages and providing the canvas of lan- C++ projects and extending the C++ language. guage grammar to better help programmers codify their The C#-based Domain Workbench exists as perhaps thoughts. To aid in the continuous development of new the most comprehensive and advanced proposal for a lan- languages that LOP requires, a new class of structural guage workbench. It is described as a programming envi- editors have come into existence termed language work- ronment centred around the Intentional Tree, an abstract benches. data type representing, as best as possible, what it is that the author means. The tree is a description of the program 2.3. Language Workbenches. Combining the currents in terms of a number of def s; these defs form the domain of structural program editing and language-orientated de- schema and are themselves defined as intent trees. The velopment, language workbenches provide a development tree can be projected to the user with a number of pre- framework that has the potential to alleviate or elim- defined projections (assuming some cooperation is present inate traditional problems such as syntax errors, file- between the projection and the def s that the tree refer- management and refactoring, while providing the pro- ences). The intentional tree may be edited through several grammer with a tool flexible enough to program a com- “intuitive” tree-based operations (splicing, inserting chil- puter with whatever dialogue (language/paradigm) she dren, appending on a list, &c.). Domain schemas may sees fit. be composed to form hybrid schemas. A special generator A structural (imperative) programming editor is an en- module, directed by the ‘domain code’ (represented by the vironment for developing in terms of other co- intentional tree) is able to generate the C# sources to be developed subroutines. In this manner, a language work- compiled with the help of the domain schema. Creating bench can be considered an environment for developing content in the intentional tree is done by entering text, languages in terms of other co-developed languages. Put which is parsed and acted upon according to the current another way, a language workbench is a structural editor, domain schema, as such intermediate languages must be where the abstract grammar (or program form) can itself designed very carefully to keep the parse-tree unambigu- be specified from within the same environment in the very ous. Being C#-based, it is ineffective for working with same manner as one would otherwise write program code. C++ projects or building intermediate languages on the Several language workbenches have been proposed. C++ language. Chief among them are the graph-based GReAT frame- From this brief review it is clear that the design and work (proposed by Agrawal et al. [2003]), the Lisp- implementation of a language workbench based around based FermaT program manipulation system (proposed C++ has not yet been addressed. Furthermore, several by Ward and Zedan [2005]), JetBrains’s open-source deficiencies (such as a reliance on a notional language’s Java-based Meta-Programming System (MPS; introduced by Dmitriev [2005]), and IntentSoft’s C#-based Domain A C++ LANGUAGE WORKBENCH 3

(a) Traditional (b) Structural

Figure 1. The organisation of a number of components of the development system. text representation for input) are, I argue, both unneces- • be compatible with or adaptable to existing tools such sary and fixable. I will detail and discuss the design of a as version control systems. novel language workbench. To maximise the extensibility, a design priority was to make the core components (i.e. those which may not be 2.4. Terminology. Terminology in the different lan- swapped out at a later stage) as minimal as possible. In guage workbenches is fragmented; a quick lookup-table is particular great effort was spent making the core as flex- provided in below. Henceforth, I elect to use the MPS ter- ible as possible so that functionality for C++ language minology, refining the term language to abstract language compatibility could be provided, concisely, by extensions. to avoid any mistaken inference that a form of syntax may Furthermore, the abstract language must be well exten- be involved. sible without falling back on the basic LOP abstraction mechanism of simply implementing a new language in Presently DW MPS terms of the old. Consider a hypothetical progression from abstract language domain schema language the C-subset of C++ to the complete C++ abstract lan- concept def/identity concept guage: ideally, the most concise C-subset implementation program(-model) domain code program could be extended to become the most concise full C++ node node node implementation; similarly, it should not be necessary to make changes to this C++ implementation in order to Table 1. Language workbench terms in use. extend it into a concise C++0x implementation. The system I propose can be split functionally into four distinct portions which have a set of run-time link depen- As such we may say that a program is a structure of dencies: nodes, each of which has an ‘is-a’ relation to a single con- • Editor: An executable GUI ‘harness’ for utilising the cept composing the overall abstract language. Further- computer’s I/O and hosting everything else, including more, any concept can be (recursively) defined as a pro- an API for extending its functionality. gram, given the appropriate abstract language. • Concept API: The base class (called Concept) repre- senting the required interface for defining a language 3. The System concept and the class’s support mechanisms. The open-source Martta project (http://martta.net) • A number of GUI extensions and their associated com- is the research vehicle I am using to help determine the ponents. compatibility between traditional C++-development and • A number of concept classes, forming a primitive ab- the language workbench paradigm. stract language. This situation gives rise to a number of more concrete The first two portions (the editor and base concept goals. The language workbench must: class) are essentially fixed and constant parts of the sys- • be portable to major platforms—Win32 (see Hart tem. They are separate to allow the base concept class to [2004]), Mac OS X (see Hillegass [2008]), Linux (see be linked to specialised concept classes without the bulk Peterson [1998]); of the editor. The GUI extensions allow extension of the • have an initial abstract language based around the editor (e.g. different input mechanisms, views &c.) and C++ language; help specify the user’s interface. They have a link-time • support native cooperation with C++ legacy code and dependency on the editor. Finally, the concept classes are the /linkers; a set of specialised base concept classes which together • be natural to C++ developers in terms of code display define the language in which the user must express their and manipulation; program. This set of classes is independent of the editor • support development through avoidance of structural though are linked to the editor at run-time. errors and recognition of the existence and nature of static semantic errors; 3.1. Editor. The editor forms a core “harness” or, as Bir- • have an extensible display with customisable input san [2005] refers to it, a plug-in engine, for each of the mechanisms; other components. Aside from loading editor extensions, • be extensible regarding abstract and implementation the key functionality of the editor is threefold: language grammar (in particular, be able to utilise Firstly, to manage programs, holding them in mem- C++0x should require a minimal amount of change); ory, providing access to, saving and loading them to disk A C++ LANGUAGE WORKBENCH 4

(trivially accomplished through the concept’s model nav- this GUI functionality. One such example is that of the igation, RTTI-style inspection and the property specifica- display: tion API). The GUI editor must have at least one display; i.e. a Secondly, to route basic user-interface events such as component capable of allowing the user to navigate around (multiple) key presses, mouse/touch gestures, menu items a program and maintain a number of properties such &c. to specific transforms on the program of actions, as the currently focussed node. As an example design, parametrising them according to the current display. It I have implemented such a component, based around a maintains a display of all actions that can be performed HTML/CSS/JavaScript document, displayed by a WWW immediately in a manner similar to Cepage, described by page display engine. It includes a ‘stylist’ delegate class Meyer [1988]. (to allow some customisation of the appearance of the pro- Thirdly, to provide execution facilities. Fundamentally, gram) and a notional concept (WebViewable) for a node program semantics are defined through the concept classes that can present itself for such a view. of the program. The program’s root concept’s class de- The view displays the program simply by asking the fines, through an interface known to the editor, how the program’s root node for an HTML representation. To program can be executed. Having all semantics informa- form this HTML representation, the node may defer to tion in only a single place presents some difficulties for its children recursively. If a node does not implement making the language easily extensible; this is solved by the interface for specifying an HTML representation (a decentralising meaning to individual fragments of the pro- concept-class known as WebViewable) then a basically- gram/abstract language as described in section 4.5. styled HTML dump of the node is returned as a fallback. JavaScript is used in the definition for basic interactivity 3.2. Concept. The fundamental base of the Martta sys- and CSS for abstracting component of style. Configuring tem is a class named Concept. All classes derived from or reimplementing the stylist delegate (which handles the Concept are known henceforth as concepts and together fallback mechanism) allows the display to be customised. form the class hierarchy rooted at Concept. An abstract In this way, the actions one performs are fundamentally language may be described as a set of such classes. To separate from visual appearance those actions invoke. allow certain simplifications in the design of the present 3.4. The Abstract Language. Initially, the abstract system, a number of restrictions are made on, and facil- language is the set of concepts loaded into the editor prior ities provided to, concepts. As they are fundamentally to any program. Being C++-based, this system begins classes, concepts have function and data members (aka with an abstract language conceptually similar to C++. properties). e.g. the StringLiteral concept has a datum If, within her project, a programmer does author a new text, which stores the quoted text of the literal. concept, then it augments the abstract language for all in- A program is defined as a strictly hierarchical set of dependent parts of the project (the editor implicitly com- nodes, each node being an instantiation of a particular piles the class, links and dynamically loads it). In much concept. The concept of a node defines, among other the same way as a programmer may specify a library de- things, the allowed concepts of its child nodes. In this way, pendency in C++, she may specify a abstract language an abstract language may simply and effectively define dependency in Martta (the editor would dynamically load structural-correctness in much the same way a grammar and link the requested concept classes). is defined, prohibiting most invalid programs immediately Two core Martta abstract languages are proposed; one and simplifying both the present project’s implementation approximating the grammar of the C++ language, an- and the eventual job of the user. Unfortunately, with a other (based around the Concept class and its ancillaries) strictly single-inheritance model for the concept class hi- providing the necessary functionality for authoring new erarchy, this leads to difficulties in code-sharing and pro- concepts. The latter includes concepts for program trans- gram cross-referencing. This was overcome with a hybrid formation, program fragment literals and parameters for inheritance model, detailed later. such literals. There are some concepts, equivalents of which would not be found in a traditional language’s grammar. Per- 4. Difficulties and Design Solutions haps because they deal with aspects of a language that is traditionally implicit (e.g. its display), perhaps be- The system outlined so far, has a number of serious cause they provide an abstraction that is unimportant in flaws that must be addressed for it to fulfill the require- a purely grammatical sense. These concepts often make ments given. sense to be considered discretely (perhaps providing ap- 4.1. Node Organisation. Extensibility is maximised by propriate code and hooks), though being grammatically pushing as much functionality as possible outside the core inert and never directly realised as nodes. These are so- of Martta and into the specialised concept classes. An ex- called notional concepts and are problematic since they ample of this is the structure of the program; it is a simple imply a structure more complex than a simple hierarchy. hierarchy of nodes. In particular there is no explicit sup- The ramifications of these are discussed later. port for node cross-references (unlike e.g. MPS). In the present solution, such functionality is contained within the 3.3. Extensions. Editor extensions (or simply exten- C++ abstract language (so that, for example, an invoca- sions) allow concepts of a language to interact with the tion can refer to the function declaration). GUI editor in novel ways. They take the form of a dy- namically loadable library that interfaces with the GUI 4.1.1. Child Indexing. However, a hierarchy where each to extend its functionality, and that interfaces with the node may have an (unordered) set of child nodes is some- concept heirarchy to provide language components to use what restrictive. With most languages being serialised, A C++ LANGUAGE WORKBENCH 5 the order of nodes is typically important: in C++, for ex- In addition to addressing the problem of conditional- ample, the argument order of a function call is generally compilation language constructs, this addresses a more se- paramount. As such, an ordered set of children for each rious problem of including parameters within literal pro- node seems reasonable. This is better, but still not ideal; gram fragments1. Consider a literal program fragment many language concepts require their nodes have not only of the form T t; foo(&t); cout<

1program fragments are often used in functions (e.g. defining a new concept in terms of an existing one or writing refactoring algorithms) and save the programmer having to painstakingly construct a required program fragment manually 2Birnbaum and Goldman [2005] provides a more extensive critique A C++ LANGUAGE WORKBENCH 6

Figure 2. A sample of the C++-like abstract language tree. Using multiple inheritance, the concept Enumeration which is accepted as a valid type of child for nodes of, e.g. the Namespace concept, may be inherited into MemberEnumeration whose superclass is Member, providing extensibility and code shar- ing. Namespace would not allow MemberEnumeration as an acceptable concept-type for its children as its strict (‘is-a’) ancestry (starting with Member) is dismissed by Namespace, but a hypothetical Class concept whose children must be Member concepts would accept it.

Proper concepts are well-defined and concrete (e.g. the ‘inherits’ relationship). Weak fulfilment is equivalent Invocation represents a function invocation); their nodes to normal (multiple-)inheritance in C++, and is used for typically have associated data or children and specify I/O normal reference (or pointer) interface semantics. Strong behaviour. Placeholders, however, exist only to guaran- fulfilment puts a single-inheritance hierarchy over a subset tee structural correctness part way through program entry of the multiple-inheritance tree, and is used for determin- (e.g. Statement represents an undefined program state- ing node-ownership semantics. ment, as you might expect when you press the Enter key Clearly, each node must have exactly one set of child in a function ) and as such their presence normally nodes. With the base Concept class being defined as the guarantees static-semantic error. class common to all types of nodes, it manages these re- lationships. Thus Concepts, if they use multiple inher- 4.3. Concept Inheritance. There is a grave issue re- itance must inherit virtually, safeguarding the fact that garding the design of a concept as expressed so far: As each node has stored only one set of children. previously mentioned, it is useful for concepts to able to Therefore, concept-derived classes: define structural requirements by limiting the concepts al- • are stated as being either proper, a placeholder or no- lowed for their nodes’ children to a specific set. Only con- tional; cepts derived from a member of, or actually within, this • must inherit at least one concept class or Concept it- set would be allowed as children, others would not. This self; mechanism is provided effectively by the classic object- • must always inherit from concept classes virtually; orientated ‘is-a’ function, realised in C++ through, e.g. • must nominate a single ‘superclass’ from the set of in- dynamic cast. herited concept classes; There are, however, clear examples where inheriting • may be queried at run-time and compile-time on the from multiple classes is effectively a requirement. Con- identity of ‘superclass’, membership of the power-set sider the C++ concept MemberVariable. It should clearly of ‘superclass’ and concept type. be inherited from the Member concept (it makes sense to 4.4. Editing. It is not immediately clear how the edi- dictate that the children of a node of type Class should be tor actually manages the editing interface to the user, Members). However, there is a strong argument for inher- while remaining extensible and modular. To address this, itance also from Variable; Variable would have a con- the present solution involves two complementary systems. siderable amount of code that might be useful, may itself The first, actions, are general program transformations. inherit from concepts that make sense for MemberVariable The second, editable concepts are concepts which may en- (e.g. Typed), and may be referenced by other nodes in the ter a special “editing” mode whereby it temporarily re- program, e.g. VariableReference, a concept that repre- ceives all input events and may represent itself differently. sents a variable in use. There are a number of solutions to this; one is to ignore 4.4.1. Actions. Actions are program transformations, re- the merits of using C++ class inheritance and use a run- quired to be reversible, involving nodes of a particular time mechanism for determining such attributes instead. concept: user-directed structural program manipulation This would however result in no compile-time checks and is done through a sequence of actions. Concepts register potentially unsafe object casts. Another is to follow the their repertoire of actions at link-time. Through the GUI traditional route and inherit from Member but use compo- editor, an action may have a user-configurable mapping sition for VariableReference. This is ineffective here as to a concrete UI event, such as key sequences, GUI menu it loses referential transparency; the VariableReference items, buttons, mouse gestures &c., though each action node would not share the attributes of its owner (such as is typically provided with a sensible default (often a key- children/parent). Any references to it expecting a node stroke or sequence). within the program would be broken. Actions are typically used for creation and placement of The solution I provide here is to inherit from both, a node in the program, but could also be for property ma- but retain two different inheritance strengths. For nipulation or to replace an existing proper node with a one MemberVariable, Member is named as the superclass, but of another type. All actions are reversible, which serves it also derives from Variable. Thus there is strong ful- two purposes: it allows a comprehensive “undo” mecha- filment (or the ‘is-a’ relationship) and weak fulfilment (or nism which, I assert, is an important adjunct since there A C++ LANGUAGE WORKBENCH 7 is no immediate equivalent of the text-editor’s backspace it is ultimately the root Program node that determines the key. It also allows partially colliding multi-key sequences semantics of the given program. to be allowed without sacrificing timeliness of response. In the case of the C++ abstract language, the steps e.g. The two input sequences ‘i+6’ and ‘i++’ both begin to execution involve composing the pure C++ code for with the same input sequence; determinism and timeli- the program, compiling each module of the program, ness of response require the program model to be in an linking the resultant object code and finally running ultimately incorrect intermediate state for at least one of the executable. In the case of an interpreted language, the expressions. In this system’s case, the user would see e.g. Scheme, the compilation/linking step would not be a program of the form i + ? following the first two key needed. In principle, an abstract code-building back-end strokes—acceptable for the former, but not the latter, pro- could be used to make a given abstract language not gram. A third keystroke of ‘+’, in forming a recognised only compiler-independent but implementation-language- stroke-sequence (‘++’) would reverse the action of the sec- independent through implementing the Program concept ond stroke (forming a binary addition operation) and con- effectively as a ‘back-end’ for each language to be sup- duct the appropriate action (forming a unary increment ported. operation). This system thus places no dependence rela- 4.5.1. Program Transformations. Though each node may tionship on the two operation concepts, a requirement for define itself in a textual C++ representation, this, in gen- the abstract input system proposed. eral, is not enough for true language-orientated program- 4.4.2. Editables. Actions, as previously described, specify ming: new concepts must be able to define their semantics the primary mechanism for entering and editing the struc- as a program transformation to arbitrary, more primative, ture of a program. However, a small minority of concepts, concepts. Unlike other approaches, even this mechanism is generally those with properties, require considerably more not provided in the core of the system, but rather through involved editing systems that can comfortably be provided the notional concept Composite and the C++ abstract with such a simple mechanism. For this reason, the notion language’s specialised Program concept: of Editable was created. An editable node is one that As a step prior to the formation of the pure C++ is able to enter an “editing” state (typically mapped to code for the program, the program is first transformed. an otherwise unrecognised, non-control, input key press) A Composite notional concept handles the process and it whereby normal input and display mechanisms are dele- is here that concepts of otherwise undefined semantics re- gated to an object particularly suited for such a datatype. garding the basic C++ Program may define their meaning. Examples of such concepts include ReferencedValue and The transformation process is, for this initial design, triv- IntegerLiteral. Several sub-classes of Editable exist ial but general; a node may inherit from Composite and within this extension including IntegerEditable and the will be called upon to transform itself. Arbitrary code is more generic CompletionListEditable. The editor ex- allowed when defining the transformation. The process of tension recognises nodes of these types and induces the transforming the program continues through the tree and correct action upon an attempt to edit such a node. This repeats as long as any Composite nodes remain. Though may be to delegate the task to the currently active display the order of node transforms is arbitrary, nodes are free to extension, or simply to invoke a GUI dialogue. inspect the program tree and delay their transformation Editing is not limited to nodes with properties or even or otherwise coordinate transforms with other nodes. This proper nodes: the placeholder node Statement may, for is a somewhat brute-force approach to program transfor- example, be editable. In this case, the result of the edit mations, and may be built upon at a later date with more is not to change the properties of the node, but rather refined techniques such as those described by Bravenboer to replace the node with some other, less general, node et al. [2008]. (or nodes). A simple system of registration is employed allowing arbitrary concepts may register potential names Editor Instantiated into trees or keywords with such placeholders so that they may be recognised and acted upon correctly if entered. Examples Other Concepts Source of such keywords include type identifiers (whereby entry Concept Other Concepts introduces a local variable declaration), variable/function Concept Library Source Library identifiers (introducing a reference to that value) and C++ C++Language keywords (e.g. BoolLiteral registers the keyword true Language C++Language Source Creates and ForStatement registers for, both of which introduce Library Library GCC the corresponding concepts). Language Source Libraries 4.5. Program Semantics and Execution. The exact meaning of a program is defined by its actions on execu- Figure 3. Martta’s self-referential tion. In this case, the execution is determined solely by nature—the arrow shows the direction the node at the root of the program tree. The type of of the initial bootstrap stage. this node must be derived from a notional concept called Program. This notion defines a routine and/or a number of 5. Implementation commands that must be executed on the system in order for the program to be executed. Notably, though other In terms of implementation, the project is C++- concepts in the abstract language may elect to describe based, cross-platform and is open-source with an open- their semantics in some way (such as a substitution into development model. Cross-platform, open-source tech- elementary C++ code or a structure of more basic nodes), nologies were also co-opted to speed development: Apple’s A C++ LANGUAGE WORKBENCH 8

WebKit was used at the primary engine for the display (utilised from the WebView GUI extension). Nokia’s Qt was used as the toolkit for the GUI. Finally, KitWare’s GCC-XML was used to analyse existing C/C++ code in order to translate them into a Martta program-fragment. Though by no means complete, Martta fast approaches the functionality of a full language workbench. The code- base of my initial implementation of this design is just over 20,000 lines of C++. This is an order of magnitude smaller than e.g. MPS (over 170,000 lines of Java), and though it would be foolish to suggest the two were di- rectly comparable, I would nonetheless take it as evidence to suggest that the present approach is effective.

Figure 5. 14 stages of entry of the ex- pression i+=:abs(2*(i+3)).

5.2. Input Example. Figure 5 demonstrates the visual appearance of the editor through the entry of a typical expression; i+=:abs(2*(i+3)). The evolving state of the program model should be made clearer through the auto- matic and inevitable display of placeholder concepts. (b) and (c) demonstrate a multi-character input se- quence; in this case the operator wanted is +=, which re- quires first the entry of the + operator. The editor’s key entry system maintains a buffer of the input key strokes and gives priority to actions bound to longer keystroke se- quences; if one is found, it reverses the effects of actions that were effected for earlier subsections of the keystroke sequence before executing the better action. In this case, with the = stroke in (c), the previous step’s program aug- Figure 4. A screenshot of Martta mentation is undone and the action to insert a new += demonstrating two views onto the same operation into the program is executed. program. (b - e), (g), (h) and (j) demonstrate incomplete ex- pressions; properly defined as a structurally correct, but semantically incorrect, model. The offending concept has 5.1. Display Example. Figure 4 demonstrates some of a red dashed-line and grey crossings through it. (d) is the the power of language workbenches. Here, the same sim- most interesting of these, with the editor determining a ple program has two differing views. The two views are type mismatch between the function value abs and the configured independently, and in this case, differently. Im- expected int value. This is corrected in the following step portantly, the program model, and thus the serialised data where the abs value is usurped and adopted by an Invo- that can be saved to a file or versioned in a revision control cation concept. A full explanation of the problem along system remains unchanged. with automatic solutions may be given off-screen. On the left of the window is the default view for Martta Normally, expression entry, through the Operation programs. This uses camel-cased identifiers and back- concept, will follow normal C++ rules of precedence and ground tones, rather than parentheses, to denote the ex- associativity, changing the expression tree as is required pression hierarchy. On the right side is a more tradi- to follow these rules. This mechanism can be altered by tional view using same-case underscore-based identifiers tagging certain concept positions: concepts at such po- and parenthesised operations to make clear any expres- sitions appear green and parenthesised, and cannot have sion trees whose C++ form would require parentheses. their program model altered implicitly. To tag a concept Modern, partially semantic-aware text-editor features position in such a way, the parenthesis character is used such as code-folding, or the toggleable hiding of a func- in much the same way as in C++. (h - k) demonstrate tion’s body, are trivially implemented using the Web- the use of the tagging mechanism. View’s JavaScript engine. In this case the bodies of the It is important to note that parentheses are not con- two methods of the SpecialInt class are hidden from right cepts and have no presence in the program model. In this view with on the cast operator’s body hidden in the left. view, the minimum number of parentheses are displayed Double-clicking on the title area or pressing the {/} keys such that the displayed code, when read as C++, is un- shows/hides the body. ambiguous and accurately reflects the program. It is key to note that although to the user, the parentheses around i + 3 in (k) and in (l) appear identical (intentionally), A C++ LANGUAGE WORKBENCH 9 they actually arise from completely different parts of the The traditional TODO:, BUG: and FIXME: notes often system. In (k) they are a temporary visual aid to denote a found in program code could easily be integrated well into tagged concept position and last only as long as the expres- the infrastructure. A specific concept, in effect a semantic sion entry. In (l) they are a display mechanism to more comment, could represent such a note. A unit-test subsys- accurately reflect the entered program model. That, in tem with access to the same infrastructure would allow a the normal case, one disappears when the other initially slightly higher-level of issue reporting, albeit less timely. A appears is per design, however either can be separately bug-tracking subsystem, perhaps with some back-end con- disabled or otherwise customised at will. nection to an online bug-tracking database, would allow a further level of issue reporting, less timely and precise 5.3. Proposed Directions. still. This issue listing/annotation system could similarly be 5.3.1. Revision Control. Revision control has become integrated with the revision control system above to auto- common feature in integrated development environments, matically note issues fixed (or broken) on given commits. allowing the user to add/remove files view differences be- tween revisions commit changes to a repository (perhaps 5.3.3. Metasyntactic Variables. When prototyping code, with a commit string). In general, structural program- the names of values is often of little importance. Pro- ming environments and language workbenches have no grammers typically resort to using common placeholder intrinsic serialised language. Thus no file format is im- variable names among the most common are foo and bar. posed nor is there any need to store programs primarily Martta already provides a naming system for anonymous in files for use with other components of the development identifiers; identifiers with an empty text label are given stack. As such I would propose to remove from the edi- a name of the form anon1234567. Allowing the program- tor the act of traditional file-saving, instead providing a mer to define and customise these placeholder names such repository-commit function. that they come from a list of his choosing would reduce Most programmers save their work locally much more 3 the length of such names and eliminate unnecessary key frequently than committing to a repository . Each save strokes. point will likely have some rationale behind the changes made; it might represent a bug potentially fixed, some 5.3.4. Invocation Expansion. A relatively common occur- debug information added or a (partial) feature implemen- rence when working with a code base is that the program- tation &c., and as such should deserves to be revision com- mer comes across a function invocation and wishes to see mit and not a blind-save. Co-ordination with a lightweight its implementation in order to better understand its be- tasks infrastructure (discussed next) and judicious use of haviour and limitations. Traditionally, this would mean tags/branches and hierarchical levels of repositories would locating the implementation file containing minimise interruption to the workflow while providing a the function, displaying that file at the correct point, and fine-grained record of changes made to the code-base. eventually navigating back to the file containing the invo- A guarantee of versioning functionality within a struc- cation. Modern editors, e.g. Trolltech’s Qt Creator (de- tural editor gives an interactive semantic history of a scribed by Corporation [2009]), are able to ease this pro- codebase, making practical certain features. In terms of cess considerably by providing search and navigation func- project management, it would allow a project manager or tionality for language entities. However, ultimately, there programming lead to get an informative overview about exists a required mental discontinuity of thought in navi- where in the codebase developers concentrate their efforts, gating from invocation to implementation that conceptu- which developers work closely on which parts of the code ally should not be. base and recurring collisions over shared code. This knowl- The temporary expansion of (statically-dispatched) in- edge could be used to improve division of labour, sched- vocations to the semantically-equivalent inline code miti- uling and general team management. gates this discontinuity. It permits the programmer to se- For a structural editor, and especially a language work- lectively ignore the imposed language-abstraction of func- bench, however, revision control presents an interesting tions and its invocation/implementation duality, instead problem—that of representing a set of changes to a pro- to concentrate on the meaning of the invocation in terms gram without loss of generality or effectiveness of repre- of the execution path. The fact that the program is stored sentation. Further work is needed in this area. internally and rendered to display makes a crude version of this functionality trivial. More advanced program analysis 5.3.2. Problem, Unit-tests, Bugs and To-dos. An abstract (such as determining referential transparency and solving infrastructure for the reporting and managing of tasks re- static arithmetic) would be required to optimise the ex- garding a program would be an interesting adjunct to a pansion’s appearance. language workbench such as Martta. Complete seman- A simple example might be the program represented tic awareness at the UI level allows some blurring of the by the Fibonacci sequence generator: distinction between a program’s structural semantic cor- rectness and a program’s conceptual semantic correctness. uint fib(uint n) { For example, as shown above it is trivial to determine, if (n <= 1) return n; list, mark-up and, in some cases, solve, in real-time, what return fib(n-1)+fib(n-2); would be semantic compilation issues such as expression } type-correctness. Some current editors offer similar func- Invoking the invocation expansion functionality of the tionality, albeit with some reduction in accuracy, precision GUI over the first fib invocation would result in the dis- and/or timeliness. play of a program represented by:

3though this is slowly changing as a new generation of distributed RCSs become commonplace. See e.g. Git, Bazaar, Arch, Mercurial A C++ LANGUAGE WORKBENCH 10 uint fib(uint n) { 6. Conclusion if (n <= 1) return n; In this introductory paper I have outlined a design uint _arg = n-1; of a C++-based language workbench, and discussed my if (_arg <= 1) cross-platform, open-source implementation of this design, return _arg + fib(n-2); Martta. In so doing I have demonstrated it not only pos- return fib(_arg-1)+fib(_arg-2) + fib(n-2); sible, but effective, for C++ to be used as the platform- } language for a language workbench, hitherto implemented With basic program analysis techniques, arg could be only with dynamic dispatch, run-time compiled program- substituted for the argument, giving: ming environments such as Java and C#. My basic im- uint fib(uint n) { plementation shows signs that the design, together with if (n <= 1) return n; the use of modern middleware where possible, delivers a if ((n-1) <= 1) comparably small codebase given the functionality it de- return (n-1) + fib(n-2); livers. return (fib((n-1)-1)+fib((n-1)-2)) + fib(n-2); The design approach I have taken differs in several re- } spects from those already proposed. Some differences, such as the judicious usage of multiple inheritance, make 5.3.5. Auto Prototyping. Unless exceptionally foresighted, implementation in other languages impractical or impos- most programmers will, while implementing a routine, re- sible. Other differences, such as the tiered customisabil- alise that another (usually lower-level) needs to ity of the display and the absolutist uncoupled approach be written. Assuming she does not immediately drop the to program input and display are not platform-language- task at hand and start coding the lower-level subroutine specific. Two user-interface examples demonstrate the ini- two options are available: write the interface for the new tial promise in this design that I have found through in subroutine before finishing the routine, or, finish the rou- my own informal user-interface testing. tine (using the hypothetical new subroutine) then write Finally I have discussed several further directions this its interface. The former since it keeps the program in design, and language workbenches in general, could be a semantically valid state is perhaps favourable, but does taken. require the programmer to break with his task at hand for a short period. References In a language workbench it becomes largely trivial to have the editor automatically determine that a particu- A. Agrawal, G. Karsai, and A. Ledeczi. An end-to-end lar symbol (perhaps a variable or method) does not exist domain-driven software development framework. In and query the user if it should be created. Furthermore, OOPSLA ’03: Companion of the 18th annual ACM contextual information (such as the expected type and its SIGPLAN conference on Object-oriented programming, invocation arguments) can be used to estimate a full inter- systems, languages, and applications, pages 8–15, New face to the method. Combined with invocation expansion, York, NY, USA, 2003. ACM. ISBN 1-58113-751-6. doi: above, the subroutine could be coded conceptually inline http://doi.acm.org/10.1145/949344.949347. allowing all to be done without the programmer resorting B. E. Birnbaum and K. J. Goldman. Achieving flexibility to code navigation or otherwise breaking their concentra- in direct-manipulation programming environments by tion. relaxing the edit-time grammar. In VLHCC ’05: Pro- ceedings of the 2005 IEEE Symposium on Visual Lan- 5.3.6. Inline Program Analysis. Program analysis, partic- guages and Human-Centric Computing, pages 259–266, ularly in C++, is traditionally swamped with the pre- Washington, DC, USA, 2005. IEEE Computer Society. requisite that the program’s source code must first be ISBN 0-7695-2443-5. doi: http://dx.doi.org/10.1109/ turned into an accurate representation of the program’s VLHCC.2005.15. meaning (an abstract syntax tree, or some other abstract D. Birsan. On plug-ins and extensible architectures. intermediate-level structure). This is a particularly diffi- Queue, 3(2):40–46, 2005. ISSN 1542-7730. doi: http: cult task—there are several commercial projects who spe- //doi.acm.org/10.1145/1053331.1053345. cialise in creating proprietary software for this. Sim et al. M. Bravenboer, K. T. Kalleberg, R. Vermaas, and [2002] lists and evaluates several of the contenders in terms E. Visser. Stratego/xt 0.17. a language and toolset for of accuracy and robustness, concluding none delivered per- program transformation. Science of - fect results. ming, 72(1-2):52–70, 2008. If, however, the program editor stores the program in Nokia Corporation. Qt creator. Technical report, Nokia terms of its meaning, the act of program analysis bypasses Corporation, 2009. URL http://qt.nokia.com/files/ this step completely. Interactivity delivers the possibility pdf/qt-creator-1.3-whitepaper. for real-time indication and solving of troublesome pro- S. Dmitriev. Language oriented programming: The next gram fragments. Interactive expression solving and sim- . OnBoard, Online Magazine, plification within the editor becomes practical, if not triv- Issue 2, feb 2005. URL www.onboard.jetbrains.com/ ial. The annotation of attributes of blocks of code, e.g. is1/articles/04/10/lop/. referential transparency, const-ness of a class method, a K. J. Goldman. An interactive environment for beginning structural dependency overview on block of code (program java programmers. Science of , entities used, potential changes made &c.) are trivially 53(1):3–24, 2004. made. Key to all of this is that the functionality is up- J. M. Hart. Windows system programming. Addison- front in the editor and timely. Wesley Professional, 2004. A C++ LANGUAGE WORKBENCH 11

A. Hillegass. Cocoa− programming for mac− os x. 2008. 2002. A. J. Ko. Designing a flexible and supportive C. Simonyi, M. Christerson, and S. Clifford. Intentional direct-manipulation programming environment. Vi- software. SIGPLAN Not., 41(10):451–464, 2006. ISSN sual Languages - Human Centric Computing, 0:277– 0362-1340. doi: http://doi.acm.org/10.1145/1167515. 278, 2004. doi: http://doi.ieeecomputersociety.org/10. 1167511. 1109/VLHCC.2004.18. T. Teitelbaum and T. Reps. The cornell program syn- B. Meyer. Cepage: Toward computer-aided design of soft- thesizer: a syntax-directed programming environment. ware. Journal of Systems and Software, 8(5):419–429, Commun. ACM, 24(9):563–573, 1981. ISSN 0001-0782. 1988. doi: http://doi.acm.org/10.1145/358746.358755. R. Peterson. Linux programmer’s reference. 1998. M. Ward. Language oriented programming. Software— B. Shneiderman. Direct manipulation: A step beyond Concepts and Tools, (15):147–161, 1994. programming languages (abstract only). In Proceedings M. Ward and H. Zedan. Metawsl and meta- of the joint conference on Easier and more productive transformations in the fermat transformation system. use of computer systems. (Part - II), page 143, New In COMPSAC ’05: Proceedings of the 29th Annual In- York, NY, USA, 1981. ACM. ISBN 0-89791-064-8. doi: ternational Computer Software and Applications Con- http://doi.acm.org/10.1145/800276.810991. ference, pages 233–238, Washington, DC, USA, 2005. S. E. Sim, R. C. Holt, and S. Easterbrook. On using a IEEE Computer Society. ISBN 0-7695-2413-3. doi: benchmark to evaluate c++ extractors. pages 114–123, http://dx.doi.org/10.1109/COMPSAC.2005.107. A C++ LANGUAGE WORKBENCH 12 MemberOperation LongMemberOperation ion NewOperation DeleteOperation SubscriptOperation ReferenceOperation AssignmentOperation DereferenceOperation SimpleUnaryOperation SimpleBinaryOperation GenericMemberOperat ncedValue Construction UnaryOperation BinaryOperation FloatingMemberRefere lue ue ue Operation Invocation BoolLiteral FloatLiteral VirtualPure StringLiteral IntegerLiteral VirtualOverride LocalReferencedValue GlobalReferencedValue MemberReferencedVal InScopeReferencedVal ArgumentReferencedVa r rator ctor Literal UntilLoop Evaluation ThisPointer VirtualMethod ReferencedValue ArtificialCopyConstructo ArtificialDefaultConstru ArtificialAssignmentOpe Array Typed Pointer Method Artificial ForLoop Destructor WhileLoop Constructor MainFunction UndefinedArray HardCompound MethodOperator UnlessStatement AssignedVariable ConversionOperator able Appendix A: Base language graph. Loop Field Class Const Struct Union Typedef Variable ListType Function CModule Argument HashType Reference Memberify BareTyped StringType Compound BuiltinType Conditional IfStatement Enumeration AddressType FunctionType BuiltinMethod BuiltinOperator BreakStatement MemberLambda ReferencedType ReturnStatement ConstructedVariable MemberEnumeration DefaultConstructedVari Primary Untyped TextLabel CProgram EnumValue Namespace PhysicalType Subscriptable TopLevelType MemberValue OperatorLabel ModifyingType LambdaNamer VariableNamer BuiltinDeclaration EnumerationNamer MemberTemplateType Root IdLabel Member TopLevel Statement ConstLabel VirtualBase AccessLabel ValueDefiner TypeConcept TypeDefinition Base Label Virtual Labelled Corporal Declaration TypeNamer TypedOwner NameEntryPoint Some of the key language components of the base language in Martta. Red/blue/back indicate notional/placeholder/proper nodes and Module Program Composite Identifiable WebInformer WebViewable CDependency CModuleDependency Figure 6. solid/dashed edges represent strong/weak inheritance respectively. Concept