The following paper was originally published in the Proceedings of the Conference on Domain-Specific Languages Santa Barbara, California, October 1997

KHEPERA: A System for Rapid Implementation of Domain Specific Languages

Rickard E. Faith, Lars S. Nyland, Jan F. Prins University of North Carolina, Chapel Hill

For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: [email protected] 4. WWW URL:http://www.usenix.org

Khepera: A System for Rapid Implementation of Domain Sp eci

Languages

Rickard E. Faith Lars S. Nyland Jan F. Prins

Department of Computer Science

University of North Carolina

CB 3175, Sitterson Hal l

Chapel Hil l NC 27599-3175

ffaith,nyland,[email protected]

Abstract ample a C is attractive b ecause it pro-

vides p ortabilityover a large class of architectures,

while achieving p erformance through the near uni-

The Khepera system is a to olkit for the rapid im-

versal availability of architecture-sp eci c optimizing

plementation and long-term maintenance of domain

C .

sp eci c languages DSLs. Our viewp oint is that

Yet there are some drawbacks to this approach.

DSLs are most easily implemented via source-to-

While DSLs are often simpler than general purp ose

source translation from the DSL into another lan-

programming languages, the domain-sp eci c infor-

guage and that this translation should b e based on

mation available may result in a generated program

simple , sophisticated tree-based analysis and

that can b e much larger and substantially di erent

manipulation, and source generation using pretty-

in structure than the original co de written in the

printing techniques. Khepera emphasizes the use

DSL. This can debugging very dicult: an

of familiar, pre-existing to ols and provides supp ort

exception raised on some line of an incomprehensi-

for transformation replay and debugging for the DSL

ble C program generated by the DSL pro cessor is a

pro cessor and end-user programs. In this pap er, we

long way removed in abstraction from the DSL input

presentanoverview of our approach, including im-

program.

plementation details and a short example.

Since the DSL pro cessor is comp osed with a native

high-level compiler, and do es not have to p erform

machine-co de generation or optimization, we b elieve

1 Intro duction

that there are some basic di erences between the

construction of a compiler for a general purp ose pro-

gramming language and the construction of a trans-

Domain sp eci c languages DSL can often be im-

lator for a DSL. Our view is that DSL translation is

plemented as a source-to-source translator comp osed

most simply expressed as

with a pro cessor for another language. For example,

PIC [8], a classic \little language" for typ esetting

gures, is translated into , a general-purp ose

1. simple parsing of input into an abstract syntax

typ esetting language. Language comp osition can b e

tree ast,

extended in either direction: the CHEM language

2. translation via sophisticated tree-based analysis

[1], a DSL used for drawing chemical structures, is

and manipulation, and

translated into PIC, while troff is commonly trans-

lated into PostScript.

3. output source generation using versatile pretty-

Other DSLs translate into general-purp ose high-level

printing techniques.

programming languages. For example, ControlH, a

DSL for the domain of real-time Guidance, Naviga-

We add the additional caveat that the translation

tion, and Control GN&C software, translates into

pro cess retain enough information to supp ort the in-

Ada [5]; and Risla, a DSL for nancial engineering,

verse mapping problem, i.e., given a lo cus in the out-

translates into COBOL [18].

put source, determine the tree manipulations and in-

The comp osition of a DSL pro cessor with for ex- put source elements that are resp onsible for it. This

@ @ @

@ @ @

  

2 1

`

C C B C C B C C B   

 C C  C C  C C   

H



 B  B  B

H



H

 B  B  B

H

 B  B  B H

' $ ' $

T T T

0 1

`

Final Original

Source Source

Co de Co de

&  & 

0

P P

Figure 1: Transformation Pro cess

facilitywould b e useful b oth for the DSL develop er 1.1 Goals for a DSL Implementation

to trace erroneous translation and for the DSL user

To olkit

to trace run-time errors back to the input source.

For the translation step we advo cate the use of arbi-

The implementation of a DSL translator can require

trary ast traversals and transformations. We be-

considerable overhead, b oth for the initial implemen-

lieve that this approach is simpler for source-to-

tation and as the DSL evolves. A to olkit should

source translation than the use of attribute gram-

leverage existing, familiar to ols as much as p ossi-

mars, since it decouples the ast analysis and pro-

ble. Use of such to ols takes advantage of previous

gram synthesis from the grammar of the input and

implementor knowledge and the availability of com-

output languages. Further, this approach minimizes

prehensive resources explaining these to ols which

the need for parsing \heroics", since simple gram-

may not b e widely available for a DSL to olkit.

mars, close or identical to the natural sp eci cation

of the DSL syntax, can b e used to generate an ast

A transformational mo del for DSL design ts in well

that is sp ecialized in subsequent analysis. By decou-

with these high-level goals. Consider the problem of

pling the input grammar, translation pro cess, and

translating a program, P , written in the domain sp e-

output grammar, this approach is b etter able to ac-

ci c language, L. In Figure 1, T is an ast which

0

commo date changes during the evolution of the DSL

represents P after the parsing phase, . T is the

`

syntax and semantics.

0

nal transformed ast, and P isavalid program in

0

the output language, L , constructed from T during

Throughout this pap er, we will use \ast" to refer

`

the pretty-printing phase,  . The transformation

to abstract syntax tree derived from parsing the in-

pro cess is viewed as a sequential application of var-

put le, and to any intermediate tree-based repre-

ious transformations functions,  T =T ,to

sentations derived from this original ast, even if

k +1 k k+1

the ast. The determination of which transforma-

those representations do not strictly represent an

tion function to apply next may require extensive

\abstract syntax".

analysis of the ast. Once the transformation func-

In our own work we use the DSL paradigm in the

tions are determined, however, they can b e rapidly

compilation of parallel programs. We are particu-

applied for replay or debugging.

larly interested in the translation into HPF of ir-

Within a transformational mo del, a DSL-building

regular computations expressed in the Proteus [12]

to olkit can simplify the implementation pro cess by

language, a DSL providing sp ecialized notation. Our

providing sp ecialized to ols where pre-existing to ols

observation was that wewere sp ending a disprop or-

are not already available, and to transparently inte-

tionate amount of e ort working on a custom trans-

grate supp ort for debugging within this framework.

lator implementation to incorp orate changes in Pro-

teus syntax and improvements in the translation

The Khepera system facilitates b oth the problem

scheme|thus wewere motivated to investigate gen-

of rapid DSL prototyping and the problem of long-

eral to ol supp ort for DSL translation to simplify this

term DSL maintenance through the following sp e-

pro cess.

ci c design goals:

Familiar, mo dularized parsing comp onents.

Khepera supp orts the use of familiar scanning and

parsing to ols e.g., the traditional lex and yacc,or

the newer PCCTS [11] for implementation of a DSL

Khepera library as discussed in Section 4.6 and pro cessor. Because Khepera concentrates on pro-

shown in Figure 8 and Figure 9; or the transforma- viding the \missing pieces" that help with rapid im-

tions are written using explicit calls to the Khepera plementation of DSLs, previous knowledge can be

library tree manipulation functions. In either case, utilized, thereby decreasing the slop e of the learning

low-level ho oks in the Khepera library track debug- curve necessary for the rapid implementation of a

ging information when no des or subtrees are created, DSL.

destroyed, copied, or replaced. This low-level in-

formation can b e analyzed to provide the abilityto

Familiar, exible, and ecient semantic anal-

navigate through intermediate versions of the trans-

ysis. Khepera uses the source-to-source transfor-

formed program, and the abilityto answer sp eci c

mational mo del outlined in Figure 1. This mo del

queries that supp ort the debugging of the nal trans-

uses tree-pattern matching for ast manipulation,

formed output:

analysis, and attribute calculation. For tedious

but common tasks, such as tree-pattern match-

 setting breakp oints

ing, sub-tree creation, and sub-tree replacement,

Khepera provides a little language for describ-

 determining current execution lo cation e.g., in

ing tree matches and for building trees. For un-

resp onse to a breakp oint or program exception

predictable or language-sp eci c tasks, such as at-

tribute manipulation or analysis, the Khepera little

 rep orting a pro cedure traceback

language provides an escap e to a familiar general-

purp ose C. Standard tree

 displaying values of variables

traversal algorithms are supp orted e.g., b ottom up,

top down, as well as arbitrarily complicated syntax-

These tracking and debugging capabilities are the

directed sequencing. Rapid pattern matching is pro-

sub ject of Faith's forthcoming dissertation and will

vided via data-structure maintenance, which can

b e not b e discussed in detail in this pap er. An exam-

p erform rapid pattern matches in a standard tree

ple of setting a breakp oint will b e shown in Section 4.

traversal order for many commonly-used patterns.

Familiar output mechanism. A pretty-printing

facility is provided that can output the ast in an

2 Related Work

easily readable format at any time. One strong ad-

vantage of this pretty-printer when compared with

Khepera is similar to some compiler construction

other systems is that it will always b e able to print

kits. However, these systems usually restrict the

the ast, regardless of howmuch of the transforma-

scanning and parsing to ols used [6]; sp ecify ast

tion has b een p erformed. If the ast is in the original

transformations using a low-level language, such

input format or the original output format, then the

as C [17] instead of a high-level transformation-

pretty-printed program will probably b e executable

0

oriented language; or require that the ast always

in the input language, L, or the output language, L .

conforms to a single grammar sp eci cation, making

However, if the ast b eing printed is one of the T in-

n

translation from one language to another dicult

termediate trees, then the output will use some com-

0

[4, 3, 14]. Further, some systems rely on an attribute

bination of the syntax of L and L , with a fallbackto

grammars for all ast transformations, without pro-

simple S-expressions for ast constructs which do not

viding for a more general-purp ose scheme for tree-

have well-de ned concrete syntax. While the pro-

pattern matching and replacement.

gram printed may not be executable, it do es use a

familiar syntax whichmay b e helpful for the human

Sorcerer, from the PCCTS to olkit [11], is the most

when replaying transformations while debugging.

similar, since it do es not require the use of sp eci c

scanning and parsing to ols, and since it provides a

Debugging supp ort for DSL translation.

\little language" in the style of lex and yacc with

Khepera tracks transformation application and

emb edded pro cedures written in another general-

ast mo di cations, can replay the transformation

purp ose programming language e.g., C. Sorcerer

sequence, and has supp ort for answering questions

and Khepera share abilities to describ e tree struc-

ab out which transformations were applied at which

tures, p erform syntax-directed translations, and

p oints on the ast. This is helpful when writing and

supp ort the writing of ast-based interpreters. In

debugging the DSL pro cessor, as well as when im-

contrast, Khepera also supp orts rule-based trans-

plementing a debugger for the DSL itself.

lations that do not require a complete grammar sp ec-

i cation; Khepera rules are well suited for the con-

Transformations are either written in the high-

struction of \use-def " chains, data- ow dep endency

level Khepera language and are transformed by

graphs, and other compiler-required analyses; and

Khepera into executable C with calls to the

writing pretty-printer rules in Khepera do es not

they are written in an ad ho c manner using the un- require a complete tree-grammar sp eci cation. This

derlying Khepera ast manipulation library, then allows pretty-printing to easily take place during

the debugging tracking and transformation replay grammar evolution.

supp ort will b e automatically provided.

None of the previous systems, including Sorcerer,

An overview of how the Khepera system ts into a contain built-in supp ort for \replay" of transforma-

complete DSL implementation solution is shown in tions, or for automatic and transparent tracking of

Figure 3. In the example shown in the next section, debugging information. When translating programs

we explain how the scanner and parser sp eci cations from one language to another, the \discovery" of the

are simpli ed by using calls to the Khepera library b est order for transformation application is often dif-

and will provide an example showing how other im- cult, involving considerable ast analysis. The co de

p ortant input les are sp eci ed. to p erform this analysis is often dicult to verify or

is undergoing constantchange during the implemen-

In Figure 4, the \DSL Pro cessor" from from Fig-

tation phase of a DSL. However, after the transfor-

ure 3 is expanded, showing the basic blo cks that are

mations are discovered and recorded in a ,

created from the source co de and showing how the

a much simpler program i.e., one that is easier to

DSL pro cessor is used during the compilation of a

verify could b e written that applies all of the discov-

program written in the DSL.

ered transformations in the sp eci ed order, thereby

proving, by construction, that the translation pre-

serves semantics. In this case, only the seman-

tics preserving characteristics of the transformations

4 Example

themselves must b e proven|not the co de which p er-

forms analysis and discovery. While we have not

yet implemented such a prover, wehave utilized the

A simple language translation problem based on [12]

transformation discovery and replay capabilities of

will b e used to illustrate the Khepera system. The

Khepera to implement a browser that presents in-

DSL is a subset of 90 with the addition

termediate views of the transformation pro cess, and

of a sequence comprehension construct that can b e

which can answer typical queries p osed by a debug-

used to construct nested sequences. The transla-

ger see Section 4.6.

tion problem is to remove all sequence comprehen-

sion constructs and replace them with simple data-

parallel op erations, yielding a program suitable for

compilation with a standard Fortran 90 compiler.

3 Overview of Khepera

4.1 Example DSL Syntax

The Khepera library provides low-level supp ort for:

 building an ast

The lexical elements of the DSL are:

 applying transformation rules to the ast tree

Id Num / /   + , : = in

traversal, matching, and replacement

0

source co de from the  \pretty-printing" the P

A program is describ ed by the following context-free

T ast pretty-printing is actually the  \trans-

`

grammar CFG:

formation"

program ::= statement-list

An overview of the Khepera system is shown in

statement ::= Id = expression

Figure 2. Khepera encapsulates low-level details

statement-list ::= statement

of the DSL implementation: ast manipulation,

symbol and typ e table management, and manage-

j statement-list statement

ment of line-numb er and lexical information. On a

higher level, library routines are available to supp ort

expr ::= Id

pretty-printing currently, with a small language to

j Num

describ e how to print each no de typ e in the ast,

typ e inference, and the tracking functions for debug-

j expr + expr

ging information. Further, a \little language" has

b een implemented to supp ort a high-level descrip-

j length expr 

tion of the transformation rules. If transformation

j range expr 

rules are written in the Khepera language, or if



Transformation

Debugger

Programs

Language

Interface



Tree

High-level

Pretty Printer Typ e Inference

Transformation

Routines



ast

Source Co de Memory

Low-level Low-level Data

Manipulation

Management Management

Routines Structures

Figure 2: The Khepera Transformation System

DSL Processor Intermediate Intermediate Source Processors Source

Scanner Scanner Spec. Flex/Lex Source

Parser Parser Spec. Bison/Yacc Source

Native AST Node Compiler Definitions

Transformation Pretty Source Printer Spec. DSL Processor

Khepera Pretty Transformation Printer Rules Compiler Source Khepera Lib. Type Inference Setup Other Source

Transformation

Sequencing

Figure 3: Using the Khepera Transformation System DSL Processor Executable

DSL AST Program Builder

Transformation AST Engine Special DSL Libraries

Pretty C/Fortran/Etc. C/Fortran/Etc. Printer Source Compiler

Executable from DSL

Khepera Library Program

Figure 4: Using the DSL Pro cessor

j dist expr , expr  A = range3;

B = / i in A: i + i /;

j / expr-list /

C = / i in A:

/ j in rangei: i / /

j / Id in expr : expr /

yields:

For this example, we use the array constructor nota-

tion from Fortran 90 to sp ecify literal sequences and

a similar notation to sp ecify the sequence compre-

A = / 1, 2, 3 /

hension construct. However, the sequence compre-

B = / 2, 4, 6 /

hension construct creates arbitrarily nested, irregu-

C = / / 1 /,

lar sequences. In contrast, the array constructor

/ 2, 2 /,

from Fortran 90 can only generate vectors or rect- / 3, 3, 3 / /

angular arrays.

We omit here a collection of typ e inference rules

for the language that de ne a well-typ ed program.

4.2 Example DSL Semantics

4.3 Example Translation

DSL values havetyp es drawn from D = IntjSeqD .

We de ne, 8n 2 Int ;c 2 D:

We view a program in terms of the natural ast cor-

rangen = / 1; 2;:::;n /

resp onding to the CFG of Section 4.1. In the ast,

distc; n = / c;c;:::;c /

an application of one of the four basic op erations is

written as a function application no de with the op-

eration to be applied in the name attribute and a

with lengthdistc; n = lengthrangen = n.

depth attribute that is 0. The children of the no de

For an expression, e, the sequence comprehension

are expressions for each of the arguments.

The following 3 rules can be used to eliminate all

/ i in A : ei /

sequence comprehension constructs from the ast:

yields the sequence of successivevalues of e obtained

Rule 1

when i is b ound to successivevalues in A.

/ x in e : x / ! e

For example, the sample program: 1 1 1 1

Rule 2 Provided e is an Id or Num, and e 6= x , the Khepera library ast construction routines. At

2 2 1

the level of the scanner, Khepera provides sup-

/ x in e : e /

1 1 2

p ort for source co de line number and token o set

! dist e ; length e 

2 1

tracking. This supp ort is optional, but is very help-

ful for debugging. If the implementor desires line

Rule 3

numb er and token o set tracking, the scanner must

/ x in e :

1 0

interact with Khepera in three ways: rst, each

fn app name = f;

line of source co de must be registered. In versions

depth = d;

of lex that supp ort states, providing this informa-

args = n;

tion is trivial although inecient, as show in Fig-

e ;:::;e /

1 n

ure 5. For other scanner generators, or if scanning

eciency is of great concern, other techniques can b e

! fn app name = f;

used. The routine src_line stores a copy of the line

depth = d +1;

using low-level string-handling supp ort. While the

args = n;

routines used in these examples are tailored for lex

/ x in e : e /;

1 0 1

semantics, the routines are generally wrapp er rou-

:::;

tines for lower-level Khepera functions and would,

/ x in e : e / 

1 0 n

therefore, b e easy to implement for other front-end

to ols.

The resultant ast can be written out in as For-

tran 90 with the depth attribute supplied as an ex-

Khepera also handles interpretation of line number

tra argument to the basic functions add, length,

information generated by the C prepro cessor. This

range, dist. Given an appropriate implementa-

requires a simple lex action:

tion of these basic four functions, the resultant pro-

gram sp eci es fully parallel execution of each se-

^n .* src_cpp_lineyytext, yyleng;

quence comprehension construct, regardless of the

degree of nesting and sequence sizes.

Finally,every scanner action must advance a p ointer

For example, using these rules, the program from

to the current p osition on the current line. This is

Section 4.2 would b e transformed as follows using

accomplished byhaving every action make a call to

f ::: as a shorthand for fn appname = f;::::

src_getyyleng,a minor inconvenience that can

b e encapsulated in a .

A = rangedepth=0, 3

B = adddepth=1, A, A

The pro ductions in the parser need only call

C = distdepth=1,

Khepera tree-building routines|all other work can

A,

b e reserved for later tree walking. This tends to sim-

lengthdepth=1,

plify the parser description le, and allows the imple-

rangedepth=1, A

mentor to concentrate on parsing issues during this

phase of development. A few example yacc pro duc-

tions are shown in Figure 6. The second argumentto Note that functions with depth = 0 op erate on

tre_mk is a p ointer to the optional source p osition scalar arguments, whereas functions with depth =1

information obtained during scanning. The abstract op erate on vector arguments.

representation of the constructed ast is that of an

The rules shown for this example are terminating

n-ary tree, and routines are available to walk the

and con uent. When the source language is more

1

tree using this viewp oint.

expressive and optimization b ecomes an issue, the

rules used are not necessarily terminating, hence ad-

Immediately after the parsing phase, the ast is

ditional sequencing rules must be added to control

available for printing. Without any pretty-printer

rule application [10].

description, the ast is printed as a nested S-

expression, as shown in Figure 7.

In the following sections, we shall show how

Khepera can be used to implement translations,

such as the one sp eci ed ab ove, in an ecient man-

4.5 Pretty-printing

ner.

For pretty-printing, Khepera uses a mo di cation

4.4 Parsing and ast Construction

of the algorithm presented by [9]. This algorithm

1

Physically, the tree is stored as a rotated binary tree,

The ast is constructed using a scanner and parser

although other underlying representations would also be

generator of the implementor's choice with calls to p ossible.

NL nn

...



f

.*fNLg src_lineyytext,yyleng; yyless0; BEGINOTHER;

.* src_lineyytext,yyleng; yyless0; BEGINOTHER;

g

...

fNLg BEGININITIAL;

Figure 5: Storing Lines While Scanning

opment, even b efore the complete pretty-printer de-

Statement: Identifier '=' Expression

scription is nished and debugged.

{ $$ = tre_mkN_Assign, $2.src,

For printing which requires lo cal analysis,

$1, $3, 0; }

implementor-de ned functions can b e used to return

;

pre-formatted information or to force a line break.

These functions are passed a p ointer to the current

StatementList: Statement

no de, so they have access to the complete ast from

{

the lo cus b eing printed. While the pretty-printer

$$ = tre_mkN_StatementList,

is source-language indep endent and is unaware of

tre_src$1,

the sp eci c application-de ned attributes present

$1, 0;

on the ast, the implementor-de ned functions have

}

access to all of this information. We typically use

| StatementList Statement

these functions to format typ e information or to

{

add comments to the generated source co des.

$$ = tre_append$1, $2;

}

Additional pretty-printer description syntax allows

;

line breaks to be declared as \inconsistent" or

2

\consistent" ; allows for forced line breaks; and p er-

mits indentation adjustment after breaks.

Figure 6: Building the ast While Parsing

is linear in space and time, and do es not backtrack

when printing. The implementation was straightfor-

ward, with mo di cations added to supp ort source

2

See [9] for details. Each group mayhave several places

line tracking and formatted pretty-printing. Other

where a break is p ossible. An inconsistent break will select

algorithms for pretty printing, some of which sup-

one of those p ossible places to break the line, whereas a con-

p ort a ner-grain control over the formatting, are

sistent break will select al l of these places if a break is needed

presented in [7,2,15,16].

anywhere in the group. This allows the following formatting

to b e realized assuming breaks are p ossible b efore +:

For eachnodetyp e in the ast, a short description,

using printf-like syntax, tells how to print that

no de and its children. If the no de can have several

di erent numb ers of children, several descriptions

Inconsistent

maybe present, one for eachvariation. List no des

 x = a+b+c

mayhave an unknown number of children. Multiple

+d+e+f

descriptions may b e present for multiple languages,

Consistent

with \fallback" from one language to another sp ec-

i ed at printing time so, Fortran may be printed

 x = a

for all of those no des that haveFortran-sp eci c de-

+b

scriptions, with initial fallback to unlab eled no des

+c

+d

p erhaps for C or for the original DSL, and with -

+e

nal fallback to generic S-expressions. This fallback

+f

scheme provides usable pretty-printing during devel-

4.6 The Khepera Transformation

Language

Khepera transformations are sp eci ed in a sp e-

cial \little language" that is compiled into C co de

for tree-pattern matching and replacement. A sim-

Original Program:

ple transformation rule conditionally matches a tree,

builds a new tree, and p erforms a replacement. The

A = rangedepth=0, 3

rule that implements the rst sequence comprehen-

B = / i in A : i + i /

sion elimination transformation Rule 1 from Sec-

C = / i in A :

tion 4.3 is shown in Figure 8.

/ j in rangedepth=0, i :

i / /

rule eliminate_iterator1

Initial ast with attribute values shown after the

f

slash:

match N_SequenceBuilder

N_Iterator id1:N_Identifier D:.

N_StatementList

id2:N_Identifier

N_Assign

when tre_symbolid1

N_Identifier/"A"

== tre_symbolid2

N_Call

build new with D

N_Identifier/"range"

replace with new

N_ExpressionList

g

N_Integer/3

N_Assign

N_Identifier/"B"

Figure 8: Simple Transformation Rule

N_SequenceBuilder

N_Iterator

In Figure 8, a tree pattern follows the match key-

N_Identifier/"i"

word. Tree patterns are written as S-expressions for

N_Identifier/"A"

convenience. The tree pattern in this example is

N_Add

compiled to the pattern matching co de shown in the

N_Identifier/"i"

rst part of Figure 9 co de for sections of the rule

N_Identifier/"i"

follow the comment containing that section.

N_Assign

N_Identifier/"C"

The when expression, which contains arbitrary C

N_SequenceBuilder

co de, guards the match, preventing the rest of the

N_Iterator

rule from b eing executed unless the expression evalu-

N_Identifier/"i"

ates to true. The build statement creates a new sub-

N_Identifier/"A"

tree, taking care to copy subtrees from the matched

N_SequenceBuilder

tree, since those subtrees are likely to b e deleted by

N_Iterator

a replace command.

N_Identifier/"j"

The tracking necessary for debugging and transfor-

N_Call

mation replay is p erformed at a low-level in the

N_Identifier/"range"

Khepera library. However, the Khepera lan-

N_ExpressionList

guage translator automatically adds functions with

N_Identifier/"i"

names starting with trk_ to the generated rules.

N_Identifier/"i"

These functions add high-level descriptive informa-

tion which allows ne-grain navigation during trans-

formation reply, but which is not necessary for an-

swering debugger queries.

Figure 7: Example Input and Initial ast

A more complicated Khepera rule is shown in Fig-

ure 10. This rule implements the third sequence

comprehension elimination transformation Rule 3

from Section 4.3.

The example in Figure 10 uses the children

statement to iterate over the children of the

int rule_eliminate_iterator1 int *_kh_flag, tre_Node _kh_node 

{

const char *_kh_rule = "rule_eliminate_iterator1";

Node _kh_pt;

Node this = NULL; /* sym */

Node id1 = NULL; /* sym */

Node D = NULL; /* sym */

Node id2 = NULL; /* sym */

Node new = NULL;

/* match this:N_SequenceBuilder

N_Iterator id1:N_Identifier D:. id2:N_Identifier */

_kh_pt = _kh_node;

if _kh_pt && tre_id this = _kh_pt  == N_SequenceBuilder {

_kh_pt = tre_child _kh_pt ; /* N_Node */

if _kh_pt && tre_id _kh_pt  == N_Iterator {

_kh_pt = tre_child _kh_pt ; /* N_Node */

if _kh_pt && tre_id id1 = _kh_pt  == N_Identifier {

_kh_pt = tre_right _kh_pt ;

if _kh_pt {

D = _kh_pt;

_kh_pt = tre_parent _kh_pt ;

_kh_pt = tre_right _kh_pt ;

if _kh_pt && tre_id id2 = _kh_pt  == N_Identifier {

_kh_pt = tre_parent _kh_pt ;

assert _kh_pt == _kh_node ;

/* when tre_symbolid1 == tre_symbolid2 */

if tre_symbolid1 == tre_symbolid2 {

trk_application _kh_rule, _kh_node ;

/* build new with D */

new = tre_copyD;

/* replace with new */

++*_kh_flag;

trk_work _kh_rule, _kh_node ;

tre_replace _kh_node, new ;

}

}

}

}

}

}

return 0;

}

Figure 9: Generated Tree-Pattern Matching Co de

rememb er to add tracking capabilities to his trans-

rule dp_func_call

formations, the overhead of implementing debugging

f

supp ort in a DSL pro cessor is greatly reduced.

match this:N_SequenceBuilder

The tracking algorithms asso ciate the tree b eing

iter:N_Iterator

transformed T in Figure 1, the transformation rule

i

f:N_Call

  b eing applied, and the sp eci c changes made to

fn:N_Identifier

the ast. This information can then be analyzed

plist:N_ExpressionList

to answer queries ab out the transformation pro cess.

For example, the DSL implementor mayhave iden-

build newPlist with N_ExpressionList

ti ed twointermediate asts, T and T , and may

i i+1

children plist f

ask for a summary of the changes b etween these two

match p:.

asts.

build next with N_SequenceBuilder

iter p

On a more sophisticated level, the user may iden-

do f tre_appendnewPlist, next; g

tify a no de in the DSL program and request that a

g

breakp oint be placed in the program output. An

example of this is show in Figure 11. Here, the

build call with N_Call fn newPlist

user clicked on the scalar + no de in the left win-

delete newPlist

dow. In the right window, the generated program,

do f call->prime = f->prime + 1; g

after 13 transformations have b een applied, is dis-

replace with call

played, showing that the breakp oint should be set

g

on the call to the vector add function.

At this p oint, the user could navigate backward

Figure 10: Iterator Distributing Transformation

and forward among the transformations, viewing the

Rule

particular intermediate asts whichwere involved in

transforming the original + into the call to add. The

ability to navigate among these views is unique to

N_ExpressionList no de, and uses the do state-

the Khepera system and helps the user to under-

ment as a general-purp ose escap e to C. This es-

stand how the transformations changed the original

cap e mechanism is used to build up a new list

program. This is esp ecially useful when many trans-

with the tre_append function, and to mo dify an

formations are comp osed.

implementor-de ned attribute prime.

The tracking algorithms can also b e used to under-

Khepera language features not discussed here in-

stand relationships b etween variables in the original

clude the use of a conditional if-then-else statement

and transformed programs. For example, in Fig-

in place of a when statement, the ability to break

ure 12, the user has selected an iterator variable i

out of a children lo op, and the abilityto p erform

whichwas removed from the nal transformed out-

tree traversals of matched subtree sections this is

put. In this case, b oth o ccurrences of A are marked

useful when an expression must b e examined to de-

in the nal output, showing that these vectors cor-

termine if it is indep endent of some variable under

resp ond, in some way, to the use of the scalar i in

consideration.

the original input.

In addition to the \forward" tracking, describ ed

4.7 Debugging with Khepera

here, Khepera also supp orts reverse tracking,

which can b e used to determine the current execu-

tion p oint in source terms, or to map a compile or

The Khepera library tracks changes to the ast

run-time error back to the input source.

throughout the transformation pro cess. The track-

ing is p erformed, automatically, at the lowest levels

of ast manipulation: creation, destruction, copy-

ing, and replacement of individual no des and sub-

5 Conclusion and Future Work

trees. This tracking is transparent, assuming that

the programmer always uses the Khepera ast-

In this pap er, we have presented an overview of manipulation library, either via direct calls or via

our transformation-based approach to DSL pro ces- the Khepera , to p erform

sor implementation, with emphasis on how this ap- all ast transformations. This assumption is reason-

proach provides increased ease of implementation able b ecause use of the Khepera library is required

and more exibility during the DSL lifetime when to maintain ast integrity through the transforma-

compared with more traditional compiler implemen- tion pro cess. Since the programmer do es not haveto

Figure 11: Debugging with Khepera Example 1

Figure 12: Debugging with Khepera Example 2

tation metho ds. More details on this work will b e presented in a fu-

ture pap er.

In the previous section wehave provided an overview

of the Khepera system using a small example. Another advantage of Khepera is the supp ort for

We have shown how the Khepera library sup- debugging via transformation replay. When the

p orts ast construction and pretty-printing, and transformation are applied to the ast using the

have demonstrated some of the capabilities of the Khepera library supp ort with or without using

Khepera transformation language and debugging the Khepera transformation language, then those

system. Many additional features of the Khepera transformations are tracked and can b e replayed at

system are dicult to demonstrate in a short pap er. a later time. Khepera includes supp ort for arrang-

These features include low-level supp ort for common ing the transformations in an abstract hierarchy,

compiler-related data structures such as hash tables, thereby facilitating meaningful viewing by a DSL

skip lists, string p o ols, and symbol tables and for implementor. As part of a complete debugging sys-

high-level functionality such as typ e inference and tem, Khepera also provides mappings which allow

typ e checking. The availability of these commonly- lo ci in the output source to b e mapp ed back through

used features in the Khepera library can shorten the ast transformations to the input source writ-

the time needed to implement a DSL pro cessor. ten in the DSL. These debugging capabilities are

the sub ject of Faith's forthcoming dissertation.

Further, wehave found that keeping lists of no des,

by typ e, can dramatically improve transformation

sp eed. Instead of traversing the whole ast,we tra-

verse only those no de typ es which will yield a match

6 Availability

for the current rule. However, since some trans-

formations may assume a pre-order or p ost-order

traversal of the ast, the \fast tree walk" problem

Snapshots of the Khepera library, including work-

is more dicult that simply keeping no de lists: the

ing examples similar to those discussed in this pa-

lists must b e ordered and the data structure holding

p er, are available from ftp://ftp.cs.unc.edu/-

the lists must b e up dateable during the tree traver-

pub/projects/proteus/src/.

sal this eliminates many balanced binary trees from

consideration for the underlying data structure. We

have found that an implementation based on skip

lists [13] was viable|preliminary empirical results

References

demonstrate a signi cant transformation sp eed com-

pared with pattern matching over the whole ast.

[1] Jon L. Bentley, Lynn W. Jelinski, and Brian W.

[13] William Pugh. Skip lists: a probabilistic alter- Kernighan. CHEM|a program for phototyp e-

native to balanced trees. Communications of setting chemical structure diagrams. Comput-

the ACM, 336:668{76, June 1990. ers and Chemistry, 114:281{97, 1987.

[2] Rob ert D. Cameron. An abstract pretty printer.

[14] Reasoning Systems. Refine user's guide, 25

IEEE Softw., 56:61{7, Nov. 1988.

May 1990.

[3] James R. Cordy and Ian H. Carmichael. The

[15] Lisa F. Rubin. Syntax-directed pretty

TXL programming language syntax and infor-

printing|a rst step towards a syntax-directed

mal semantics, version 7. Software Tech-

editor. IEEE Trans. on Softw. Eng., SE-

nology Lab oratory, Department of Computing

92:119{27, Mar. 1983.

and Information Science, Queen's Universityat

[16] Martin Ruckert. Conservative pretty printing.

Kingston, June 1993.

SIGPLAN Notices, 322:39{44, Feb. 1997.

[4] James R. Cordy, Charles D. Halp ern-Hamu,

[17] S. Tjiang, M. Wolf, M. Lam, K. Piep er, and

and Eric Promislow. TXL: a rapid prototyp-

J. Hennessy. Integrating Scalar Optimization

ing system for programming language dialects.

and Parallelization. Languages and Compil-

Comp. Lang., 161:97{107, Jan. 1991.

ers for Paral lel Computing Fourth Interna-

[5] Matt Englehart and Mike Jackson. ControlH:

tional Workshop Santa Clara, California, 7{9

A Fourth Generation Language for Real-time

Aug. 1991. Published as U. Banerjee, D. Gel-

GN&C Applications. Symp. on Computer-

ernter, A. Nicolau, and D. Padua, editors,

Aided Control System Design Tucson, Arizona,

Lecture Notes in Computer Science, 589:137{

Mar. 1994, Mar. 1994.

51. Springer-Verlag, 1992. An overview

of a more recent version of SUIF is avail-

[6] J. Grosch and H. Emmelmann. A tool box

able as Rob ert P. Wilson, Rob ert S. French,

for compiler construction, Compiler Generation

Christopher S. Wilson, Saman P. Amaras-

Rep ort No. 20. GMD Forschungsstelle an der

inghe, Jennifer M. Anderson, Steve W. K.

Universitat Karlsruhe, 21 Jan. 1990.

Tjiang, Shih-Wei Liao, Chau-Wen Tseng, Mary

[7] Matti O. Jokinen. A language-indep endent

W. Hall, Monica S. Lam, and John L. Hen-

prettyprinter. Softw.|Practice and Experience,

nessy, SUIF: An Infrastructure for Research on

199:839{56, Sep. 1989.

Paral lelizing and Optimizing Compilers, avail-

able at http://suif.stanford.edu/suif/-

[8] Brian W. Kernighan. PIC|a language

suif-overview/suif.html.

for typ esetting graphics. Softw.|Practice

and Experience, 12:1{21, 1982. Published

[18] Arie van Deursen and Paul Klint. Lit-

as AT&T Bell Lab oratories Murray Hill,

tle languages: little maintenance? Pro-

New Jersey Computing Science Technical

ceedings of DSL '97 First ACM SIG-

Rep ort No. 116: PIC|a graphics lan-

PLAN Workshop on Domain-Speci c Lan-

guage for typesetting user manual, available

guages Paris, France, 18 Jan. 1997. Pub-

as http://cm.bell-labs.com/cm/cs/cstr/-

lished as University of Il linois Computer Sci-

116.ps.gz.

ence Report, http://www-sal.cs.uiuc.edu/-

~kamin/dsl/:109{27, Jan. 1997.

[9] Derek C. Opp en. Prettyprinting. ACM Trans.

on Prog. Lang. and Sys., 24:465{83, Oct.

1980.

[10] Daniel William Palmer. Ecient execution of

nested data-paral lel programs. PhD thesis, pub-

lished as Technical rep ort TR97-015. University

of North Carolina at Chap el Hill, 1996.

[11] Terence John Parr. Language Translation Using

PCCTS and C++: A Reference Guide. San

Jose: Automata Publishing Co., 1997.

[12] Jan F. Prins and Daniel W. Palmer. Trans-

forming high-level data-parallel programs into

vector op erations. Proc. 4th Annual Symp.

on Princ. and Practice of Paral lel Prog. San

Diego, CA, 19{22 May 1993. Published as SIG-

PLAN Notices, 287:119{28. ACM, July 1993.