This is page 1

Printer: Opaque this

Towards Environment Re-targetable

Parser Generators

K. Kontogiannis

J. Mylop oulos

S. Wu

ABSTRACT

One of the most fundamental issues in program understanding is the is-

sue of representing the source co de at a higher level of abstraction. Even

though many researchers haveinvestigated a variety of Program Represen-

tation schemes, one particular scheme, the Abstract Syntax (AST),

is of particular interest for its simplicity, generality, and completeness of

the information it contains. In this pap er weinvestigate ways to generate

Abstract Syntax Trees that conform with a user de ned domain mo del,

can b e easily p orted to di erent CASE to ols, and can b e generated using

publicly available parser generators. The work discussed in this pap er uses

PCCTS and yacc as parser generators, and provides an integration mech-

anism with the conceptual Telos, and the CASE to ols Rigi, PBS,

TM

and Re ne .

1 Intro duction

Extracting and representing the syntactic structure of source co de is gen-

erally acknowledged to b e one of the most imp ortant activities in program

understanding. This activity is considered a prerequisite to b oth mainte-

nance and reverse engineering of legacy source co de [Till96]. Traditionally,

the activity consists of two steps, and lexical analysis. However,

unlike traditional syntax-directed approaches used by , parsing

and lexical analysis for program understanding purp oses, is carried out

with resp ect to a language sp eci c domain mo del which describ es the syn-

tactic and semantic constructs of the underlying programming .

The use of domain mo dels by parsers and parser generators is not new.

In [Arango91], [Marko94] source co de analysis techniques for legacy co de

which pro duce Abstract Syntax Trees (AST) on the basis of domain mo dels

are discussed.

In [Kotik89] the Re ne re-engineering environment allows for the de-

velopment of parsers based on domain mo dels Similarly,in[Devambu92],

[Devambu94] GENOA a language indep endent application generator is pre-

sented. GENOAcanbeinterfaced to the front-end of a language

that pro duces an attributed parse tree. This can b e achieved by writing

1. Towards Environment Re-targetable Parser Generators 2

an interface sp eci cation using a companion system to GENOA, called

GENI I;. The advantage of GENOA/GENI I is that to ols for program under-

standing, metrics, reverse engineering and other applications can b e built,

without having to write a complete parser/linker for the source language

in which the application is written.

However, most commercial to ols do not provide a programming interface

(API) and therefore the generated Abstract Syntax Trees can only accessed

and pro cessed in a limited way for any complex re-engineering or program

understanding task. Within this framework, the ob jectives of this pap er

fo cus on three directions.

The rst is to rep ort on techniques whichmake it p ossible to generate an-

notated Abstract Syntax Trees at various levels of granularity with resp ect

to the parsed source co de.

The second is to prop ose a metho dology for simplifying the pro cess of

developing a grammar and a parser for a given .

The third is to rep ort on an architecture that facilitates the exchange

of source co de related information b etween di erent software engineering

analysis to ols through the use of domain mo dels, schemata, and p ersistent

software rep ositories.

Towards these ob jectives, we exp erimented with the use of domain mo d-

els by di erent parser generator environments. The rst one is the PCCTS

[Par96], [Sta97] parser generator environment and the second is the C pub-

lic domain parser generator yacc. On-going work rep orted also in this pa-

per involves the use of XML for annotating source co de using the JavaCC

parser generator [JavaCC].

The work describ ed in this pap er was carried out within the context of

asoftware re-engineering pro ject whose ultimate ob jective is to establish

a generic and op en software re-engineering environment. The environment

should allow for di erent CASE to ols to b e integrated in terms of suitable

intermediate representations and a p ersistent rep ository of multi-faceted

information ab out a legacy system. Earlier rep orts on the pro ject include

[Buss94] and [Finni97].

This pap er is organized as follows. In section 2 we rep ort on di erent

techniques for pro ducing source co de representations tailored for program

analysis and program understanding. In section 3 we discuss the rationale

b ehind the use of the Abstract Syntax Trees (ASTs) as a representation of

source co de structure. In section 3 we outline the approachwe adopted in

order to construct customizable and domain re-targetable ASTs. Section 4

describ es how our approachworks by applying it to a small C co de frag-

ment. In section 5, we discuss advantages and limitations of our approach.

In section 6 we present alternative parsing techniques based on XML, and

in section 7 we discuss the design of an integration environmentforcom-

municating data b etween the system presented in this pap er and various

CASE to ols. Finally, section 8 o ers a brief summary of the results rep orted

in this pap er, as well as a p ersp ective for further work.

1. Towards Environment Re-targetable Parser Generators 3

2 Parsing for Program Analysis

Traditionally, parsing has b een seen as a to ol for pro ducing internal repre-

sentations of source co de with the purp ose of generating executable binaries

for a sp eci c target op erating platform. In contrast, parsing for program

understanding aims on pro ducing representations of the source co de that

can b e used for program analysis and design recovery purp oses. In this

resp ect, program representation aims on facilitating source co de analysis

that can b e applied at various levels of abstraction and detail, namely at:

 the physical level where co de artifacts are represented as tokens, syn-

tax trees, and lexemes,

 the logical level where the software is represented as a collection of

mo dules and interfaces, in the form of a program design language,

annotated tuples, aggregate data and control ow relations,

 the conceptual level where software is represented in the form of ab-

stract entities such as, ob jects, abstract data typ es, and communi-

cating pro cesses.

These representations are achieved by parsing the source co de of the sys-

tem b eing analyzed at various levels of detail and granularity. Sp eci cally,

over the past yearsanumb er of parsing to ols have b een prop osed and used

for re-do cumentation, design recovery and, architectural analysis. Overall,

the parsing technology for program understanding can b e classi ed in three

main categories.

The rst category, deals with full parsers that pro duce complete ASTs

of a given co de fragment. The second category fo cuses on scanners that

pro duce partial ASTs or emit facts related to control and data ow prop-

erties of a given source co de fragment. Finally, the third category deals with

regular expression analyzers that extract high level syntactic information

from the source co de of a given system.

Full parsers, are used for detailed source co de analysis as they provide

low level detailed information that can b e obtained from the source co de.

This includes data typ e information, linking information, conditional com-

pilation directives, and pre-pro cessed libraries and macros.

In this context, a full parser generator called DIALECT is prop osed in

[Kotik89]. The approach uses a domain mo del as well as yacc and lex to

pro duce a parser given a BNF grammar. The AST is represented in terms

of CLOS ob jects in dynamic memory and can b e accessed by LISP-likea

programming interface (API) written for the CASE to ol Re ne.

In [Datrix00], another full parser for C++ source co de is presented. The

parser emits an AST representation that is suitable for source co de analysis

and reverse engineering of systems written in C++. A domain mo del for

C++ provides the schema that describ es the structure of the generated

1. Towards Environment Re-targetable Parser Generators 4

AST. Moreover, programming interface provides access to the AST and the

ability to exp ort detailed information in the form of RSF tuples [Muller93].

Similar to the approachabove, a C/C++ parser generator that pro-

duces a fully annotated AST representation of the co de is presented in

[Devambu92]. The source co de representation can b e accessed by GEN++

for sp ecifying C++ analysis and GENOA that provides a language inde-

pendent parse tree querying framework.

Finally, another technique that is gaining p opularity due to the emer-

gence of mark-up languages is to annotate the source co de using XML.

Do cumentTyp e De nitions (DTDs) can b e automatically built by analyz-

ing the grammar of a given programming language. Parser generators such

as JavaCC can b e used to emit XML tagged co de by altering the semantic

actions in a given grammar sp eci cation. Currently, DTDs for C, C++, and

Javahave b een prop osed [Mamas00] and are available through the Con-

sortium for Software Engineering Research (CSER). Once XML annotated

source co de is parsed, AST-like representations can b e generated. These

AST-like structures are called DOM trees.

In the second category of parsing techniques, scanners are used for ex-

tracting high level information from a software system. This includes infor-

mation related to declarations, calls, fetches and stores of program variables

as well as, high level directory and source co de organizational information.

Data obtained from scanners are particularly useful for high level architec-

tural recovery of large systems and can b e readily used byvarious visual-

ization and analysis to ols.

In [Muller93] a source co de representation scheme that is based on re-

lation entity-relation tuples (RSF) is prop osed. The approachisbasedon

parsers for which their semantic actions are mo di ed and applied only to a

selected set of grammar rules to emit facts in the form of entity-relation tu-

ples. As a result information p ertaining only to sp eci c syntactic construct

(i.e. declarations, function calls) is presented and the rest of the source

co de is treated e ectively as whitespace to b e skipp ed.

Similarly, in [Holt97] a mo di ed gnu C parser allows for attributed tuples

to b e emitted. The mo di ed parser called CFX pro duces high level source

co de representations that comply with the TA syntactic form and can b e

directly used bythe PBS visualization to olkit [Finni97].

In the third category, regular expression analyzers provide high level

syntactic information of a system. Their ma jor strength is their simplicity

and availability as they can b e implemented using scripting languages such

as Perl, TCL, and Awk.

1. Towards Environment Re-targetable Parser Generators 5

3 Rationale for Using AST Representations

As programming languages b ecome more and more complex, one-pass com-

pilers which don't pro duce anyintermediate form of the compiled co de

are b ecoming very rare. This is also true in the area of program under-

standing and software re-engineering [Till96]. This has led to a varietyof

intermediate representations of the syntactic structure of source co de. For

program understanding and re-engineering purp oses, a suitable represen-

tation must b e able to representsyntactic information which p ersists and

evolves through the entire life-cycle of a software system. This life-cycle typ-

ically b egins with source co de analysis during development, and continues

throughout successive maintenance, reverse engineering and re-engineering

phases [Chiko90].

Often, such a representation limits what can b e expressed and describ ed

within it for the source co de. Hence, generating an intermediate repre-

sentation is an extremely imp ortant asp ect for program understanding. In

[Konto93], of common program representations, suchasAbstractSyntax

Trees (AST), Directed Acyclic Graphs (DAG), Prolog rules, co de and con-

cept ob jects, and control and data ow graphs are surveyed. Among these,

the AST is the representation that is most advantageous, and is most widely

used bysoftware develop ers. The reason for the p opularityoftheASTas

a program representation scheme is its ability to capture all the necessary

information for p erforming anytyp e of co de analysis, as well as the neutral

representation of the source co de it o ers with resp ect to data ow and

control ow source co de prop erties.

To justify the AST representation used here, we need to distinguish rst

between a parse tree and an AST. When parsing, a parser explicitly or im-

plicitly uses a so-called parse tree to represent the structure of the source

program [Aho86]. In Fig.1(a) such a parse tree for an addition expression,

e.g., "1 + 2" is illustrated. An AST is a compressed representation of the

parse tree where the op erators app ear as no des, and op erands of the op-

erator are the children of the no de representing that op erator. Similarly,

Fig.1(b) illustrates an AST for the same addition expression, and Fig.1(c)

illustrates an annotated AST where the tree edges b ecome annotated at-

tributes b etween no des.

The AST we adopted for our pro ject is a variation of the AST shown

ab ove, called annotated AST. An annotated AST for the PL/X If-Statement

given b elow

IF (OPTION > 0) THEN

SHOW_MENU(OPTION)

ELSE

SHOW_ERROR("Invalid option...")

is illustrated in Fig.2.

As you may notice, links of the AST are represented as attributes asso-

1. Towards Environment Re-targetable Parser Generators 6

(a) (b) (c)

Add + +

Arg1 Arg2

1 + 2 12 12

Parse Tree Annotated Abstract Syntax Tree

FIGURE 1. Parse tree and AST representation for the expression "1 + 2".

ciated with (no de) ob jects. In the If-Statement example, three attributes,

are asso ciated with the If-Statement no de, and hold the fact that there are

three comp onents to an If-Statement (condition, then-part, else-part).

The tree in Fig.2 is also annotated with a fanout attribute that denotes the

numb er of function calls that app ear in the corresp onding source co de en-

tity. The fanout attribute can b e computed comp ositionally from the leaves

to the ro ot of the AST as illustrated in Fig.2. In the subsequent discussion,

we use the term AST to denote an annotated AST.

4 The Domain Mo del Approach

In this section, we describ e a domain-mo del based technique of generat-

ing an AST. The pro cess of developing a domain mo del based parser is

illustrated in Fig.3 and consists of three phases:

1. Identi cation and mo deling of the syntactic structures of a program-

ming language in the form of ob ject classes in a domain mo del. In-

stances of the ob ject classes along with their corresp onding attributes

form the actual AST no des, and AST edges resp ectivelly,

2. Development of a grammar for the programming language byusing

the domain mo del ob ject classes as grammarnonterminal symbols,

3. Integration of semantic actions with grammar rules so that the AST

can b e built during parsing in a customizable way.

For the overall AST generation pro cess there are twolevels of abstraction.

The higher level fo cuses on the de nition of the domain mo del using a

1. Towards Environment Re-targetable Parser Generators 7

STATEMENT IF fanout 2

condition then-clause else-clause

PREDICATE STATEMENT STATEMENT GT CALL CALL

fanout fanout

argument1 argument2 callee-id caller-list 1 callee-id caller-list 1

IDENTIFIER LITERAL IDENTIFIER IDENTIFIER IDENTIFIER LITERAL REFERENCE INTEGER REFERENCE REFERENCE REFERENCE STRING

fanout fanout

defined-name integer-value defined-name 1 defined-name defined-name 1 defined-name

OPTION 0 SHOW_ OPTION SHOW_ "Invalid MENU ERROR option..."

Legend

NODE NAME = AST node

= Link from parent attribute name to child via a named attribute.

fanout = Fanout attribute containing integer

V value V.

FIGURE 2. Annotated AST for an If-Statement.

mo deling language and the sp eci cation of grammar rules using EBNF.

The lower level of abstraction fo cuses on utilities that allow for the gen-

eration of an AST in terms of a concrete programming formalism.For our

work wehavechosen to represent ASTs in terms of C++ ob jects, and the

grammar b e sp eci ed in an o -the-shelf parser generator (i.e PCCTS). Fi-

nally, the domain mo del is designed and implemented using the conceptual

language Telos [Mylo90].

As indicated in the intro duction, the domain mo del-based approachaims

at the development of parsers that are easy to develop, and can b e inte-

grated with a variety of CASE to ols. For this pap er we examine howthe

domain mo del approach can b e applied so that:

 the grammar sp eci cations b e represented in a simple way,

 the communicationbetween the parser and the lexical analyzer b e

enhanced and nally,

 the structure of the generated AST b e customized and controlled by

the user.

4.1 Developing a Domain Model

There are two basic metho ds for building a domain mo del for a given

programming language. One metho d is b ottom-up, in the sense that it

uses the nonterminals in the language grammar to obtain the classes of

the domain mo del. In this approacheachnonterminal symb ol has a close

relationship with the ob jects in a parse tree [Aho86].

The other basic metho d for building domain mo dels is top-down. In this

approach, domain mo dels are built in an incremental way where the higher

1. Towards Environment Re-targetable Parser Generators 8

Domain Grmmar Model Rules

Translation Translation

Grammar C++ Input Input Semantic Rules with Domain Model Actions & Semantic Implementation Actions Utilities

Parser Generator

Input Output Annotated Parser Abstract Syntax Tree

Source Code

FIGURE 3. The AST domain-mo del based generation pro cess.

level schemata (domain mo del entities) are sp eci ed rst, and then they

are re ned according to the syntactic and structural elements and relations

of the programming language b eing mo deled. The domain mo del schemata

provide the sp eci cation for the ob jects ASTs are generated and comp osed

with.

In this context, a domain mo del de nes a set of ob jects and the interrela-

tionships among them. Overall, the domain mo del must b e expressed in an

environmentwhich o ers mo deling capabilities and some p ersistent storage

for the classes and ob jects of the domain mo del (which represent generated

AST). For our work, we adopt Telos [Mylo90] to represent domain mo dels,

primarily b ecause of its expressiveness and p ortability. Among other fea-

tures, Telos allows an arbitrary numb er of meta-class hierarchylevels to

b e de ned. This feature is exploited in dealing with di erentversions of

the same language. Moreover, the AST generated by the parser generator

can b e p ersistently stored in a Telos rep ository or any other commercial

Data Base and exp orted to other to ols for subsequent analysis [Buss94].

Mo deling the application domain involves the discovery of ob jects classes

that represent the AST no des and their relationships which representthe

AST branches [?].

For example, in the case of the C programming language, wemay consider

the following categories of schemata (classes) as part of the C domain mo del

[Reas96a]:

Telos-Object

- Program

- Declaration-Object

-File - Function

1. Towards Environment Re-targetable Parser Generators 9

- Statement

For each language domain mo del, there is an ob ject hierarchythatcanbe

implicitly imp osed on schema entities by the structure of the programming

language itself. Along these lines, further re nement of the C language

ob ject domain hierarchyisshown b elow:

ObjectClass

...Container

...Attribute

...C-object

...... Program

...... Declaration-Object

...... Function

...... Expression

...... Identifier-Ref

...... Function-Call

...... File

...... Statement

...... If-Statement

...... Assign-Statement

The task of de ning relationships among ob jects is emanating from the

syntax of a programming language itself. This task is directly related to

identifying the edges of the AST. For example, an If-Statement domain-

mo del class as illustrated b elowmayhave three attributes. The one is the

condition attribute whichmapstoanExpression class, and the other

two are the then-part and the else-part attributes which map to a

Statement. In the domain mo del, one can sp ecify the If-Statement class

and its corresp onding attributes in Telos as follows:

SimpleClass If-Statement

in ObjectClass

isA Statement

with

attributes

condition : Expression;

then-part : Statement;

else-part : Statement;

end

This can b e read as follows: the class If-Statement is a sub class of a

Statement domain mo del class; it has three arguments two of them map

to the class Statement and one to the class Expression. A simple trans-

formation program can translate the ab ove domain mo del class de nition

to corresp onding C++ classes, twoofwhich are illustrated in the example

below:

class If-Statement: public Statement,

1. Towards Environment Re-targetable Parser Generators 10

ObjectClass

{

private:

Condition *condition;

Then-part *then-part;

Else-part *else-part;

public:

Condition *getCondition();

Then-Part *getThen-Part();

Else-Part *getElse-Part();

void *putCondition(Condition *aCond);

void *putThen-Part(Statement *aStat);

void *putElse-Part(Statement *aStat);

};

class Condition: public Attribute,

ObjectClass

{

private:

If-Statement *from;

Expression *to;

public:

C-Object *getFrom()

Container *getTo()

void *putFrom(If-Statement *aStat)

void *putTo(Expression *aExpr)

};

The instantiations of the ab ove C++ classes can b e generated at parse

time and form the no des of the AST.

4.2 De ning the Grammar

To understand how the grammar sp eci cation asso ciates with the domain

mo del, let's consider consider how a grammar for three C language con-

structs (If-Statement, Assign-Statement, Block-Statement) can b e de-

velop ed.

A domain mo del for the ab ove constructs is sp eci ed in terms of an

hierarchical schema as:

Statement

...If-Statememt

...Assign-Statement

...Block-Statement

and in Telos the constructs are sp eci ed as follows:

1. Towards Environment Re-targetable Parser Generators 11

SimpleClass Statement

in ObjectClass

isA C-Object

end

SimpleClass If-Statement

in ObjectClass

isA Statement

with

attributes

condition : Expression;

then-part : Statement;

else-part : Statement;

end

SimpleClass Assign-Statement

in ObjectClass

isA Statement

with

attributes

assign-lhs : Identifier-Ref;

assign-rhs : Expression;

end

SimpleClass Block-Statement

in ObjectClass

isA Statement

with

attributes

stat-list : sequence-of(Statement);

end

The grammar for a given language can b e built in two phases. The rst

phase is automatic and is used for emitting grammar rules that can b e

directly inferred from the domain mo del. These include rules that involve

nonterminals that corresp ond tp class-sub class hierarchies and to class-

attribute relations in the domain mo del. An example is the rule R.4 and

rules R.5-R.10 resp ectively as illustrated b elow.

The second phase is user-assisted and involves the user writing the gram-

mar for a given programming language provided that he of she uses as non-

terminals entities sp eci ed in the language domain mo del. Examples are the

rules R.1 - R.3 illustrated b elow. For example, the Assign-Statement do-

main entity, is used as a non-terminal in Rule R.1 along with its attributes

assign-lhs and assign-rhs.Attributes for which their target domain is a

sequence of entities (such as in the case of stat-list attribute), are trans-

formed into rules that involve the closure op erations * or + as illustrated

in the case of rule R.3.

1. Towards Environment Re-targetable Parser Generators 12

Given these classes, the following grammatical rules can b e sp eci ed as:

1

R.1 If-Statement : "if" "(" condition ")" "then"

the-part {["else" else-part]};

R.2 Assign-Statement : assign-lhs "=" assign-rhs;

R.3 Block-Statement : stat-list +;

while the rules the following rules are generated automatically by the do-

main mo del,

R.4 Statement : If-Statement | Assign-Statement|

Block-Statement;

R.5 condition : Expression;

R.6 then-part : Statement;

R.7 else-part : Statement;

R.8 assign-lhs : Identifier-Ref;

R.9 assign-rhs : Expression;

R.10 stat-list : Statement;

The co ordination of grammar rules and the domain mo del in order to

generate the desired AST is achieved only with a set of semantic actions

that are invoked from within the parser generator. Wedonotputany

restriction on the parser generator and therefore the same semantic actions

can b e invoked byavariety of parser generator to ols requiring only minor

or not at all, mo di cations.

De ning a parser for a programming language can b e a tedious exercise

esp ecially if the language contains context sensitive structures.

In the literature one can nd a wealth of parser generator environments

(PCCTS, Yacc, Eli, TXL, LLGen, Pgs, Cola, Lark, Genoa, and Dialect,

to mention just a few). All of these environments provide a formalism to

represent grammatical rules, and a mechanism to trigger semantic actions

when a rule is applied successfully during the parsing pro cess.

1

The grammar rules are given for clarity at a higher level that the one they

will app ear at the original PCCTS sp eci cation. An automatic translation to

PCCTS or yacc rules from the given high level rule sp eci cation is also p ossible.

In these rules, curly brackets denote optional choice while square brackets indicate sequencing.

1. Towards Environment Re-targetable Parser Generators 13

4.3 Generating the AST

In order to ful ll the task of AST generation by using the machine de-

scrib ed ab ove, we need appropriate semantic actions to b e inserted into

the grammar.

Wehave de ned three semantic actions that are asso ciated with each

rule. These are: BuildRule, BuildTerm,and BuildAttr. These actions op-

erate on two global stacks RuleStack and NodeStack. The rst stackkeeps

track the current rule applied while the second stack stores the generated

AST no des that have still to b e linked with their parentnodeduringthe

AST generation phase. The semantic actions are automatically inserted in

the grammar rules given the domain mo del sp eci cation. Only the rules for

which their left-hand size part (i.e. the head of the grammar rule) corre-

sp onds to a domain mo del class that do es not haveany sub classes need to

b e annotated by the semantic actions describ ed b elow. The reason is that

all other rules are not used to generate AST no des and are simply used to

facilitate the parsing pro cess for a given a top level non-terminal starting

grammar symb ol. These semantic actions are discussed in detail b elow:

BuildRule(Rule) : This action takes as a parameter the current rule

name (i.e. the name of the non terminal in the head of a rule). This action

is inserted as the rst semantic action to b e carried out at the right-hand-

side of a given rule. It is actually invoked b efore the rule comp onents are

tried. In other words, the action is invoked b efore a rule recognizes anything

in the input stream. More sp eci cally, this action registers the rule as the

current rule applied. Concretely, it creates an instance of the class Rule

the p ointer of which is pushed onto a RuleStack.

Once the rule succeeds (i.e. all right-hand side non-terminals or terminal

symbols have b een resolved against the source text), the reference for this

rule is p opp ed from the RuleStack indicating that this is not anymore the

current rule b eing tried. Hence, the only purp ose of the BuildRule() action

is to keep track which is the current rule b eing tried. In this resp ect, a stack

data structure allows for keeping trackofmultiple nested invo cations of the

same rule, a very frequent case that o ccurs during the parsing pro cess.

As an example consider the rule:

Assign-Statement :

assign-lhs "=" assign-rhs;

The rule ab ovehastwo nonterminals (assign-lhs, assign-rhs) and

one terminal (the token "=").

By examining the domain mo del two more rules are automatically added.

These are:

assign-lhs : Identifier-Ref;

assign-rhs : Expression;

1. Towards Environment Re-targetable Parser Generators 14

The domain mo del is also used to recognize if a non-terminal is an at-

tribute or a class. For the example domain mo del given in Section 4.2,

Assign-Statement is de ned as a class entity, while assign-lhs and assign-rhs

are de ned as attributes of the Assign-Statement class. The Assign-Statement

rule shown ab ove enhanced with the BuildRule semantic action and the

additional rules obtained by the domain mo del b ecomes:

Assign-Statement:

<< BuildRule(Assign-Statement); >>

assign-lhs "=" assign-rhs ;

assign-lhs:

<< BuildRule(assign-lhs); >>

Identifier-Ref;

assign-rhs:

<< BuildRule(assign-rhs); >>

Expression;

BuildTerm(Term) : This action takes as a parameter the name of the

non-terminal symb ol at the head of the rule. This semantic action is placed

at the end of a rule, provided that the head of the rule is a class that has no

sub classes in the domain mo del. If the head of the rule is an attribute then

the BuildAttr semantic action is applied instead. The BuildAttr(Attr)

action is discussed b elow. Examining whether or not a Term corresp onds

to a Class or an Attribute takes only a simple lo ok-up op eration on the

domain mo del sp eci cation. The BuildTerm(Term) action aims at building

the AST no de that corresp onds to the latest term (head of the rule) suc-

ceeded. This action is implemented by incorp orating a globally available

stackcalledNodeStack.

Once the rule is successfully parsed the BuildTerm(Term) semantic

action is invoked.Aninstanceofanobjectoftyp e Term is constructed and

is pushed on the TermStack indicating that this is the most recent rule

successfully tried. If the rule fails then a cleanup of the stacks o ccurs up to

the last rule that had b een successfully tried.

Augmented with b oth actions BuildRule(Rule) and BuildTerm(term),

2

our example rule lo oks like the following: :

Assign-Statement:

<< BuildRule(Assign-Statement); >>

assign-lhs "=" assign-rhs ;

<< BuildTerm(Assign-Statement); >> ;

2

Due to limitations in space for this pap er we omit the rules related to

assign-lhs and assign-rhs.

1. Towards Environment Re-targetable Parser Generators 15

Assign−Statement

assign−lhs assign−rhs

Identifier−Ref Integer

name value

a 1

FIGURE 4. AST representation for the statement a=1.

In our example the BuildTerm(Assign-Statement) action will create

an instance of the class Assign-Statement.

BuildAttr(Attr) : This action is placed at the end of a rule, provided

that the head of the rule is recognized as an Attribute in the domain

mo del and not as a class. The BuildAttr(Attr) action is used for creating

the annotated edges of the AST. The most recent attribute created bythe

pro cess b ecomes an incoming edge to the most recent created no de. Aug-

mented with the semantic actions BuildRule(Rule), BuildTerm(Term),

and BuildAttr(Attr) our example rules b ecome:

assign-lhs :

<< BuildRule(assign-lhs); >>

Identifier-Ref

<< BuildAttr(assign-lhs); >>

assign-rhs :

<< BuildRule(assign-rhs); >>

Expression

<< BuildAttr(assign-rhs); >>

For the ab ove example the AST which will b e created for the C statement

3

a= 1is illustrated in Fig.4 .

This pro cess allows for a customizable way to imp ose a structure in the

AST, which can b e used to p ort AST structures from one to ol to another

with minimal e ort. In practice, the programmer is not even required to

3

We assume a domain mo del that contains Identifier-Ref,and Integer

classes with attributes name and value resp ectively

1. Towards Environment Re-targetable Parser Generators 16

sp ecify the BuildRule(Rule), BuildTerm(Term),and BuildAttr(Attr)

actions. These can b e generated automatically by a transformation program

that takes abstract grammar rules and generates legal co de for the parser

generator.

The following sections discuss the functionality of the semantic actions

in more detail.

4.4 Semantic Actions

The three semantic actions sketched in the previous session are used in co-

ordination with two global stacks to create the Annotated Abstract Syntax

Tree.

The following algorithms describ e how the semantic actions are used in

the AST generation pro cess for a rule of the form NT : NT NT :::N T ,

Head 1 2 k

where NT is a a non-terminal. For claritywe omit terminals in our rule

i

descriptions.

Pro cedure Initialize

Step.1 RuleStack NULL

Step.2 No deStack NULL

Step.3 Exit

Pro cedure BuildTerm(NT )

head

Step.1 RulePtr CreateRuleOb ject(NT )

Head

Step.2 RulePtr.count 0

Step.3 Push(RulePtr, RuleStack)

Step.4 Exit

Pro cedure BuildAttr(NT )

head

Step.1 Pop(RuleStack)

Step.2 Top(RuleStack).count++

Step.3 Exit

Pro cedure BuildTerm(NT )

head

Step.1 Term CREATE-TERM(NT );

Head

TheNonTerm Pop(RuleStack);

1. Towards Environment Re-targetable Parser Generators 17

Step.2 IF RuleStack 6= NULL THEN

AnAttr Top(RuleStack);

CurrentAttr CREATE-ATTR(AnAttr);

/* CREATE-ATTR builds a lab eled AST edge */

/* with information obtained from CurrentAttr */

Term.IncomingAttr CurrentAttr

END IF

Step.3 FORi=1TOTheNonTerm.countDO

aNo de Top(No deStack);

/* Link the incoming attribute to aNo de */

/* with the parent of aNo de, whichisTerm */

LinkIncomingW ith (aNo de, Term)

Pop(No deStack)

END DO

Step.4 Push(Term, No deStack);

Step.5 Return(Term)

5 Advantages and Limitations

In Section 4.4, we describ ed a technique for incorp orating the semantic

actions for AST generation into a grammar. The basic di erence b etween

traditional approaches and the domain-mo del based approach is the level

where the actions are emb edded. Traditional approaches can b e thoughtas

syntax-directed while the domain-model approach as rule-directed.

In this section, wetake a lo ok at the grammars written by using the two

di erent approaches, and we discuss the advantages and limitations of the

domain mo del approach.

Aversion of grammar rules for generating an AST for a Blo ck-Statement

and an Assignment-Statement are shown b elow:

Block-Statement >

<< List* stat-list = NULL;

Statememt * Stat; int i=0;>>

(LC:LEFT-CURLY

( Statement >[Stat] SE:SEMICOLON

<< if (i==0)

{stat-list=new List(Stat); i=i+1;}

else stat-list -> append(Stat); >>)+

RC:RIGHT-CURLY)

Assign-Statement >

[Assign-Statement * Assign];

1. Towards Environment Re-targetable Parser Generators 18

<< Expression* Expr; Identifier-Ref* Id-Ref;>>

ID:ID-REF

<getText());>>

AS:ASSIGN-SIGN

Expression > [Expression]

<< $Assign=new Assign-Statement(Id-Ref,Expr);>> ;

For the cases where the programming language has context-sensitivecon-

structs the grammar ab ove can b e even more complex to design and de-

velop. With the domain-mo del approach, by using the semantic actions

discussed ab ove the sample grammar can b e rewritten and still p erform

the same task.

Block-Statement >

<< BuildRule("Block-Statement"); >>

(LC:LEFT-CURLY

( stat-list SE:SEMICOLON )+

RC:RIGHT-CURLY)

<< BuildTerm("Block-Statement";>>

Assign-Statement >

<< RuleCount("Assign");>>

assign-lhs AS:ASSIGN-SIGN assign-rhs

<< BuildTerm("Assign-Statement") >> ;

A translation pro cess takes the simpli ed grammar sp eci cation and gen-

erates PCCTS co de (ab ove) or yacc co de. The translation pro cess will gen-

erate valid PCCTS or yacc co de by the simpli ed grammar and the domain

mo del, by adding more rules, and attaching the semantic actions discussed

ab ove at the b eginning and the end of eachrule.

Therefore, in practice, the programmer needs only to de ne the following

grammar:

Block-Statement :

(LC:LEFT-CURLY

(stat-list SE:SEMICOLON)+

RC:RIGHT-CURLY)

Assign-Statement :

assign-lhs AS:ASSIGN-SIGN assign-rhs;

The domain mo del is also used to automatically generate the header les

that contain the C++ class instances which represent AST no des.

Our exp eriments suggest the following ma jor advantages for the domain

mo del approach.

First, the AST generation pro cess can b e easily applied to accommo date

di erent programming languages. Most programming languages constructs

1. Towards Environment Re-targetable Parser Generators 19

can b e mo deled using the domain-model approachandany parser generator

can b e used iwhen augmented with the prop er semantic actions.

Second, once the AST is created, it is easy to make tree analyzers and

prettyprint utilities. This is of particular imp ortance for re-engineering

pro jects aiming at the migration of programs from one language version to

another.

Third, the approach e ectively hides implementation details for generat-

ing AST. It is notable that the domain-model based grammar sp eci cation

is much more concise than that of generated by the traditional approach.

Finally,context-sensitive languages can b e sp eci ed easier as the struc-

ture of the rules is simpler. This is of particular imp ortance, since many

programs that have to b e re-engineered, or migrated are written in highly

sp ecialized, context sensitive languages.

The prop osed approach has the limitation that when used with an LL

parser generator may provide complications (i.e. complex rules) in the im-

plementation due to the left- restrictions [Aho86]. In this case the

user has to b e careful to write grammar rules, as discussed ab ove, that

are not sub ject to left-recursion. However, these restrictions do not apply

when the prop osed domain mo del approach is used with an LALR parser.

Moreover, an algorithmic pro cess for eliminating left recursion in grammar

rules has b een presented in [Aho86].

6 AlternativeTechniques

In this section we present alternative techniques for customizable domain-

driven parsing using the extensible Mark-up Language (XML) [Mamas00].

The ob jectiveistodevelop program representations based on XML that

provides information at the same level as ASTs. This alternative represen-

tation is simple but detailed enough to represent the complete syntax of

a sp eci c programming language. This mark-up language based approach

aims on developing more generic representations suitable for domains with

similar characteristics.

6.1 Annotating SourceCode Using XML

Mapping ASTs to DTDs

Given a sp eci c programming language we need to de ne a representation

in whichevery valid source co de program can b e mapp ed to. To accomplish

this mapping from ASTs to XML DOM trees we need to de ne a metho d to

map the grammar of the programming language structure to a Do cument

Typ e De nition (DTD). The mapping at this level will guaranteethatall

p ossible syntax trees de ned by the grammar (and therefore any source

co de program), can b e mapp ed to XML trees de ned by the DTD. One

1. Towards Environment Re-targetable Parser Generators 20

of the requirements when de ning new XML representations is to make

them as easy to use as p ossible. Prop osing a generic algorithm that maps

a grammar to a DTD is something that result in hard to use representa-

tions. Instead, an alternativeistoprovide general guidelines that assist in

implementing a go o d mapping from a grammar to a DTD.

Once a grammar is mapp ed to a DTD, the ASTs for a sp eci c program

can b e mapp ed to XML les. These XML les can then b e used in place

of the ASTs or the original source les for maintenance tasks.

Java Mark-up Language (JavaML)

The generation of a program representation for Java is based on a parser

generator to ol called Java Compiler Compiler or JavaCC[JavaCC] in short.

This to ol was initially develop ed by Sun Microsystems and it is the one

of the most p opular parser generators for Java. The p opularityofJavaCC

is most probably due to the grammar for Java that is shipp ed with the

to ol. The ma jorityofJava source co de parsers are built using JavaCC and

the Java 1.1 grammar. The Java 1.1 grammar was develop ed bySriram

Sankar at Sun Microsystems and a copy of this grammar can b e found in

the distribution of JavaCC.

The complete DTD that we generated based on the Java 1.1 grammar is

presented in [Mamas00] and can b e found at http://swen.uwaterlo o.ca/evan/

javaml.html.Belowwe present a small example of a Java source le and

its corresp onding JavaML representation.

Java source co de

public class Car{

int color;

public int getColor(){

return color;}

}

JavaML representation

1. Towards Environment Re-targetable Parser Generators 21

The ab ove annotated source co de can b e parsed by an XML parser such

as IBM's XML4J parser and pro duce a DOM tree that corresp onds in this

context to an Annotated Abstract Syntax Tree.

C++ Markup Language (CppML)

The generation of a representation for the C++ programming language

is based on a di erent approach than that for Java. Instead of using a

parser generator suchasJavaCC, we to ok advantage of a compiler pro d-

uct that maintains the intermediate representation of the co de and pro-

vides access to it through an API. This pro duct is the IBM VisualAge

C++[VACpp] IDE which is develop ed at the IBM Toronto lab. VisualAge

contains a rep ository which is called Co destore in which all the information

generated during the compilation pro cess is stored. Parsing, pro cessing and

co de generation information can all b e accessed using the provided APIs.

The goal b ehind the architecture of VisualAge is to allowdevelop ers to

maintain and analyze C++ source co de. Our goal in using VisualAge is

to demonstrate that a commercial pro duct can b e integrated and used

as a to ol in a more complex environment. The complete representation

for C++ that was generated using IBM's VisualAge C++ can b e found

at http://swen.uwaterlo o.ca/do cs/cp pml .html . Sample source les and their

representations are also available. The grammar that CppML was based

on, was implicitly extracted from the Co destore APIs, whichwas obtained

from the VisualAge development team and is expressive enough to repre-

sent ANSI C++ compliant source les.

7 Exp eriments

In this section we provide exp erimental results related b oth to the p er-

formance of the generated parser using the prop osed domain mo del based

approach, and to the p erformance of the architecture used for to ol integra- tion.

1. Towards Environment Re-targetable Parser Generators 22

7.1 Parser Performance

The exp erimental results presented here fo cus on two main categories namely,

parse time p erformance and, space requirements for the generated Abstract

Syntax Tree. The results were obtained byevaluating a parser develop ed for

the C programming language, using the domain mo del approach discussed

in this pap er. On-going work fo cuses on developing parsers for PL/IX,

PL/X, and PL/I.

In Table.1.1, the time and space p erformance statistics for the C parser

are illustrated. These results indicate that the parser develop ed is scal-

able with resp ect to the time required to parse source co de and build the

corresp onding Abstract Syntax Tree.

In particular, the results indicate a linear complexitybetween the number

of statements to b e parsed and the time sp ent to actually parse these

statements. The same linear relationship holds b etween the numb er of lines

of co de to b e parsed and the actual parse time. On average, the parser

parsedonaverage 44.43 statements, or 235 lines of co de (LOC) p er second.

Similarly, the space requirements for the generated Abstract Syntax Tree

indicate that the size of the tree in terms of the numb er of no des is linear

when compared to against the numb er of source co de statements and the

lines of source co de (LOC) parsed.

On-going work fo cuses on the use of XML and DOM for representing

the Abstract Syntax Trees and DTD domain mo dels have already built for

Java and C++.

7.2 Tool Integration

In developing a p ortable program representation environment, the following

imp ortant problems have to b e considered:

 Data integration is essential to ensure data exchange b etween to ols,

 Control integration enhances inter-op erability and data integrity among

di erent to ols.

The rst can b e accomplished through a common schema, while the sec-

ond can b e accomplished with a server that handles requests and resp onses

between to ols in the environment.

A solution to the data integration problem is based on a system architec-

ture in which all to ols communicate through a central software rep ository

that stores, normalizes and makes available analysis results, design con-

cepts, links to graph structures, and other control and message passing

mechanisms required for the communication and co op erativeinvo cation of

di erenttools.Suchintegration is achieved by using a lo cal workspace for

each individual parser or to ol in which sp eci c results and artifacts are

1. Towards Environment Re-targetable Parser Generators 23

LOC # of Statements Parse Time # AST Nodes

(sec) thousands

15 1 0.05 0.005

49 1 0.14 0.038

40 1 0.2 0.068

101 1 0.32 0.113

203 1 0.92 0.409

371 1 1.71 0.713

302 1 2.41 1.085

1 1 0.04 0.008

76 2 0.85 0.406

122 4 0.23 0.068

38 16 0.16 0.055

54 25 0.38 0.165

65 50 0.46 0.202

196 89 1.28 0.603

178 110 1.46 0.701

240 131 1,92 0.907

238 153 1.59 0.846

365 248 2.86 1.401

960 476 8.52 4.214

1002 728 8.07 4.125

8050 4723 92.99 39.89

16203 7486 149.1 63.53

34271 14240 277.19 114.622

TABLE 1.1. Time and space statistics for the prototyp e parser.

stored, and a translation program for transforming to ol-sp eci c software

entities into a common and compatible form for all environments entities.

The translation program generates appropriate images in the central

rep ository of ob jects shared with lo cal workspaces. For the program rep-

resentation scheme prop osed in this pap er (i.e. annotated ASTs based on

a domain mo del) this translation is a straightforward pro cess, as the AST

no des and links corresp ond to classes that can b e mapp ed from one to ol to

the other (CLOS in Re ne, C++ for PCCTS etc.) A system architecture

for suchanintegrated reverse engineering environment that uses common

program representation interchange data is illustrated in Figure 5

Communication in this distributed environmentisachieved by scripts

understo o d byeach to ol using a shared schema data mo del for representing

data from the individual lo cal workspaces of each to ol and, a message

passing mechanism for each to ol to request and resp ond to services.

A message server allows all to ols to communicate b oth with the rep osi-

tory and with each other, using the common schema. These messages form

the basis for all communication in the system.

1. Towards Environment Re-targetable Parser Generators 24

Pattern Rigi Refine Matcher

Local Local Local Workspace Workspace Workspace

Control Integration Server

Schema

Repository Data Integration Object Base

Tool

FIGURE 5. The implemented system architecture for to ol integration. Dashed

lines distinguis h computing environments, usually running on di erentmachines.

The central rep ository is resp onsible for normalizing these representa-

tions, making them available to other to ols, and linking them with the

other relevantsoftware artifacts already stored in the rep ository (e.g., do c-

umentation, architectural descriptions).

Finally, communication scripts are in the form of s-expressions. Eachs-

expression is wrapp ed in packets called network objects and sentviaTMB

to the appropriate target machine. TMB uses Unix so ckets for the commu-

nication and utilizes the TCP/IP network infrastructure.

An example s-expression for a Function called hostnames-matching is

given b elow.

This s-expression is used to represent an annotated (yet simpli ed) AST

no de corresp onding to a C Function. Annotations in this example include

call information, metrics information and, data ow information.

(Function-176 Token

(Function)

()

(((objectId)

(("#176")))

((cyclomaticComplexity)

((5.0)))

((calls)

((Function-175)

(Function-174)))

((definesLocals)

(("word")

("yylval")))

((definesGlobals)

1. Towards Environment Re-targetable Parser Generators 25

(("shell-input-line")

("shell-input-line-index")))

((functionName)

(("hostnames-matching")))))

The shared schema facilitates data integration and enco des program ar-

tifacts (i.e. the AST), as well as analysis results (e.g. the call graph, the

metrics analysis).

The schema has b een implemented in Telos and allows for multiple in-

heritance and sp ecialization of the attribute categories for eachobject.In

this way,anASTentity (i.e. File) mayhave attributes that are classi ed

as Re ne-Attributes, Rigi-Attributes,and PBS-Attributes.Theschema is

p opulated by to ols that may add, or retrievevalues for the attributes in

the ob jects to which it has access to.

Similarly,dataintegration for analysis results was achieved by designing

schema classes for every typ e of analysis a to ol is able to p erform. Each

such class is linked via attributes to the actual AST entities and this is

visible to all the other to ols that may request it.

At run-time, the user may request and select rep ository entities according

to sp eci c ob ject typ es or attribute values.

In addition to the Data Integration asp ects discussed ab ove, Control

Integration is based on designing and developing a mechanism for :

 Uniquely registering to ol sessions and corresp onding services,

 Representing requests and resp onses,

 Transferring ob ject entities to and from the rep ository,

 Performing error recovery.

At the Application layer [Stallings91] a message server is used to facilitate

inter-networking. The message server, o ers an environment to manage the

transfer of data via TCP/IP using a higher-level language to represent

source and destination p oints, pro cesses, and data. Essentially, the server

o ers an environment to access lower level UNIX communication primitives

(i.e. so ckets), in order to manage the transfer of data via TCP/IP using a

higher-level language to represent source and destination p oints, pro cesses,

and data.

Each to ol generates a stream of network objects enco ded as a stream of

s-expressions. A parser analyzes the contents of each it network ob ject and

p erforms the appropriate actions (e.g. resp ond to a request, acknowledge

the successful reception of a network object).

Tool integration statistics are discussed and in particular the relationship

between source co de size, total numb er of rep ository generated ob jects, data

retrieval p erformance as well as upload and down-load times.

1. Towards Environment Re-targetable Parser Generators 26

LOC #ofFunctions # of Files #ofObjects

44,754 658 46 3,340

32,807 705 40 1,694

27,393 632 63 1,606

13,615 235 39 1,089

943 38 3 920

TABLE 1.2. Storage Statistics (only File and Function ob ject typ es stored)

Upload Performance 120

100

80

60 Upload Time (sec) 40

20

0 0 2000 4000 6000 8000 10000 12000 14000

Number of Objects

FIGURE 6. Upload AST Performance.

Our exp eriments for time and space statistics related to the integration

architecture involved four software systems. For each system wehave mea-

sured the numb er of ob jects generated for the reduced AST as well as, the

upload and download times from the rep ository to the individual to ols.

The total numb er of ob jects generated for the reduced AST that corre-

sp ond to les, functions and declarations is illustrated in Table.1.2. These

measurements indicate that the approach of storing only the necessary

parts of the AST results in a large p otential for scalability, as ma jor in-

creases in the size of the source co de do not a ect dramatically the total

numb er of ob jects generated.

The upload times are illustrated in Fig.6. These statistics indicate a

relatively linear relation b etween the upload time and the total number of

ob jects to b e loaded in the rep ository. The total numb er of ob jects in this

exp erimentwas obtained byallowing ob jects that represent statements to

b e generated as well.

Similarly, the down-load statistics are shown in Fig.7. These statistics

indicate that down-load time dep ends on the size of the ob jects retrieved

and not on the size of the rep ository. This is an imp ortantobservation as

1. Towards Environment Re-targetable Parser Generators 27

Download Performance 18

16

14

12

10

8 Download Time (sec) 6

4

2

0 0 200 400 600 800 1000 1200

Number of Objects

FIGURE 7. Download AST Performance.

it is directly connected with the scalability of the system.

The server has b een extended to handle more complex messages and

resp ond automatically to events using a rule base and the Event, Condition,

Action paradigm [Mylo96].

Aprototyp e system that allows for the integration of the CASE to ols

Re ne, Rigi, Telos, and data obtained from domain-mo del based Abstract

Syntax Trees has b een implemented at the UniversityofWaterlo o, and the

IBM Canada, Center for Advanced Studies [Mamas00].

This architecture has also b een used to facilitate control and data inte-

gration b etween di erent to ols and pro cesses in a Co op erative Information

System Environment [Mylo96].

8 Summary

This pap er examines the use of domain mo dels for generating AST repre-

sentations of co de fragments in order to facilitate environment re-targetable

representation of the source co de as well as, data and control integration

between di erent CASE to ols.

The system is re-targetable in the sense that:

 The ASTs generated by the technique discussed in this pap er are

easily traversed, and can easily b e p orted to another to ol or stored

in an Ob ject Oriented Data Base or relational Data Base.

 The level of granularity source co de is represented parsed can b e

easily mo di ed by selectively applying the semantic actions to sp eci c

grammar rules and by ommiting them from the others. A grammar

1. Towards Environment Re-targetable Parser Generators 28

with the semantic actions applied to every rule will create a full AST

for the whole source parsed. However, if the user decides to apply

selectively the semantic actions then a "lightweight" partial AST can

b e created. This typ e of "lightweight" AST is most useful for analysis

related to the recovery of high level architectural views of the co de.

 The development and the sp eci cation of the grammar is greatly

simpli ed and the approach can b e used with most parser genera-

tors requiring minimum mo di cations to accommo date the semantic

actions.

Domain mo dels are used to infer the semantic information required by

the parsing pro cess for transforming the parsed input into an annotated

AST, and provide a common schema for di erent to ols to share. Tomake

such a common schema usable by di erent to ols, wehave adopted language-

indep endent and to ol-indep endent representations for ASTs as well as for

the domain mo del itself. Such representations constitute a foundation for

p ortability. Such p ortability is of particular imp ortance as it allows di erent

parsers to exp ort the generated AST to CASE to ols that are very p owerful

on analyzing source co de.

On-going work for this pro ject includes the control and data integration

of di erentreverse engineering to ols by using the common data interchange

formats such as XML, RSF, TA,andASTsaswell as, enhancements for

the AST tree traversal and pattern matching utilities.

9 References

[Arango91] G. Arango and R. Prieto-Diaz. Domain Analysis Concepts and

Research Directions. In Domain Analysis and Software Systems

Modeling, eds. Rub en Prieto-Diaz and Guillermo Arango, IEEE

Computer So ciety Press, 1991.

[Aho86] A. V. Aho, R. Sethi, and J.D. Ullman. Compiler Principles, Tech-

niques, and Tools. Addison-Wesley Publishing Company, Reading,

Massachusetts, 1986.

[Buss94] E. Buss, R. De Mori, W. M. Gentleman, J. Henshaw, H. Johnson,

K. Kontogiannis, E. Merlo, H. A. Muller,  J. Mylop oulos, S. Paul,

A. Prakash, M. Stanley, S. R. Tilley,J.Troster and K. Wong. In-

vestigating Reverse Engineering Technologies for the CAS Program

Understanding Pro ject. IBM Systems Journal, vol. 33 no. 3,1994.

[Chiko90] Elliot J. Chikofsky and James H. Cross. Reverse Engineering

and Design Recovery: A Taxonomy. IEEE Software,January 1990.

[Datrix00] Bell Canada Datrix Group. Abstract Semantic Graph: Refer-

ence Manual, Bel l Canada Ltd., Version 1.3 January 2000.

1. Towards Environment Re-targetable Parser Generators 29

[Devambu94] P. Devambu, D. Rosenblum, A. Wolf. \Automated construc-

tion of testing and analysis to ols, In Proceedings of 16th, Int'l Conf.

SoftwareEng.. Los Alamitos, Calif., IEEE CS Press, May1994.

[Devambu92] P. Devambu. Genoa - a language and front-end indep endent

source co de analyzer generator. In Proceedings 14th, Int'l Conf. Soft-

wareEng. Los Alamitos, Calif., IEEE CS Press, May 1992.

[Debaud95] . DeBaud, S. Rugab er. A Software Re-Engineering Metho d

using Domain Mo dels. Technical Report. Col lege of Computing, De-

orgia Institute of Technology.Atlanta, Georgia 30332-0280, 1995.

[Finni97] P. Finnigan et. al. The Software Bo okshelf. IBM Systems Jour-

nal, vol.36, No.4. September 1997.

[Holt97] R. Holt. An Intro duction to TA: the Tuple-Attribute Language.

http://plg.uwaterloo.ca/ holt/cv/papers.html. Dept. of Computer

Science, UniversityofWaterlo oi, March1997.

[JavaCC] Sun Microsystems. JavaCC:

The Parser Generator. http://www.metamamta.com/JavaCC. Sun

Microsystems, May 2000.

[Konto93] K. Kontogiannis. Program Representation and Behavioural

Matching for Lo calizing Similar Co de Fragments. In Proceedings of

CASCON'93.Toronto, Ontario, Canada, Octob er 1993.

[Kotik89] G. Kotik, L. Markosian. Automating Software Analysis and Test-

ing Using a Program Transformation System. Reasoning Systems

Inc..Palo Alto CA, 1989.

[Mamas00] E. Mamas. Design and Development of a Co de Base Manage-

ment System. M.Sc Thesis, Dept. of Electrical & Computer Engi-

neering. UniversityofWaterlo o, Waterlo o, Canada, June 2000.

[Marko94] L. Markosian. Using an Enabling Technology to Reengineer

Legacy Systems. Communications of the ACM,Vol. 37, No.5, may

1994.

[Muller93] H. Muller.  Understanding Software Systems Using Reverse En-

gineering Technology Persp ectives from the Rigi Pro ject. In Pro-

ceedings of CASCON'93.Toronto, ON. pp. 217-226, 22-24 Octob er

1993.

[Mylo90] J. Mylop oulos, A. Borgida, M. Jarke, M. Koubarakis, Telos: Rep-

resenting Knowledge Ab out Information Systems. ACM Transac-

tions on Information Systems (TOIS).Vol.8, No. 4, Octob er 1990.

1. Towards Environment Re-targetable Parser Generators 30

[Mylo96] J. Mylop oulos, A. Gal, K. Kontogiannis, M. Stanley. A Generic

Integration Architecture for Co op erative Information Systems. In

Proceedings of Co-operative Information Systems'96. Brussels, Bel-

gium, 1996.

[Par96] T. Parr. Language Translation Using PCCTS & C++ - A Refer-

ence Guide. Automata Publishing Company 1996.

[Purti89] J. Purtilo, J. Callahan. Parse-Tree Annotations. Communications

of the ACM.Vol32, Numb er 12 Decemb er 1989.

[Sta97] S. Stanch eld. A PCCTS Tutorial" at URL.

http://www.scruz.net/ thetick/pcctstut/. August 1997.

[Stallings91] W. Stallings. Data and Computer Communications. MacMil-

lan Inc. Toronto, 1991.

[Reas96a] Reasoning Systems, Inc. Dialect User's Guide. Version 2.0.CA

U.S.A., July 1990.

[Reas96b] Reasoning Systems, Inc. C-Programmer's Guide. Version 1.2.

CA U.S.A. July 1990.

[Till96] S. Tilley,D.Smith.ComingAttractions in Program Understand-

ing. Technical Report CMU/SEI-96-TR-019 ESC-TR-96-019. De-

cemb er 1996.

[VACpp] IBM Corp oration. Visual Age C++.

http://www.ibm.com/software/ad/vacpp. September 1999.