This is page 1
Printer: Opaque this
Towards Environment Re-targetable
Parser Generators
K. Kontogiannis
J. Mylop oulos
S. Wu
ABSTRACT
One of the most fundamental issues in program understanding is the is-
sue of representing the source co de at a higher level of abstraction. Even
though many researchers haveinvestigated a variety of Program Represen-
tation schemes, one particular scheme, the Abstract Syntax Tree (AST),
is of particular interest for its simplicity, generality, and completeness of
the information it contains. In this pap er weinvestigate ways to generate
Abstract Syntax Trees that conform with a user de ned domain mo del,
can b e easily p orted to di erent CASE to ols, and can b e generated using
publicly available parser generators. The work discussed in this pap er uses
PCCTS and yacc as parser generators, and provides an integration mech-
anism with the conceptual language Telos, and the CASE to ols Rigi, PBS,
TM
and Re ne .
1 Intro duction
Extracting and representing the syntactic structure of source co de is gen-
erally acknowledged to b e one of the most imp ortant activities in program
understanding. This activity is considered a prerequisite to b oth mainte-
nance and reverse engineering of legacy source co de [Till96]. Traditionally,
the activity consists of two steps, parsing and lexical analysis. However,
unlike traditional syntax-directed approaches used by compilers, parsing
and lexical analysis for program understanding purp oses, is carried out
with resp ect to a language sp eci c domain mo del which describ es the syn-
tactic and semantic constructs of the underlying programming languages.
The use of domain mo dels by parsers and parser generators is not new.
In [Arango91], [Marko94] source co de analysis techniques for legacy co de
which pro duce Abstract Syntax Trees (AST) on the basis of domain mo dels
are discussed.
In [Kotik89] the Re ne re-engineering environment allows for the de-
velopment of parsers based on domain mo dels Similarly,in[Devambu92],
[Devambu94] GENOA a language indep endent application generator is pre-
sented. GENOAcanbeinterfaced to the front-end of a language compiler
that pro duces an attributed parse tree. This can b e achieved by writing
1. Towards Environment Re-targetable Parser Generators 2
an interface sp eci cation using a companion system to GENOA, called
GENI I;. The advantage of GENOA/GENI I is that to ols for program under-
standing, metrics, reverse engineering and other applications can b e built,
without having to write a complete parser/linker for the source language
in which the application is written.
However, most commercial to ols do not provide a programming interface
(API) and therefore the generated Abstract Syntax Trees can only accessed
and pro cessed in a limited way for any complex re-engineering or program
understanding task. Within this framework, the ob jectives of this pap er
fo cus on three directions.
The rst is to rep ort on techniques whichmake it p ossible to generate an-
notated Abstract Syntax Trees at various levels of granularity with resp ect
to the parsed source co de.
The second is to prop ose a metho dology for simplifying the pro cess of
developing a grammar and a parser for a given programming language.
The third is to rep ort on an architecture that facilitates the exchange
of source co de related information b etween di erent software engineering
analysis to ols through the use of domain mo dels, schemata, and p ersistent
software rep ositories.
Towards these ob jectives, we exp erimented with the use of domain mo d-
els by di erent parser generator environments. The rst one is the PCCTS
[Par96], [Sta97] parser generator environment and the second is the C pub-
lic domain parser generator yacc. On-going work rep orted also in this pa-
per involves the use of XML for annotating source co de using the JavaCC
parser generator [JavaCC].
The work describ ed in this pap er was carried out within the context of
asoftware re-engineering pro ject whose ultimate ob jective is to establish
a generic and op en software re-engineering environment. The environment
should allow for di erent CASE to ols to b e integrated in terms of suitable
intermediate representations and a p ersistent rep ository of multi-faceted
information ab out a legacy system. Earlier rep orts on the pro ject include
[Buss94] and [Finni97].
This pap er is organized as follows. In section 2 we rep ort on di erent
techniques for pro ducing source co de representations tailored for program
analysis and program understanding. In section 3 we discuss the rationale
b ehind the use of the Abstract Syntax Trees (ASTs) as a representation of
source co de structure. In section 3 we outline the approachwe adopted in
order to construct customizable and domain re-targetable ASTs. Section 4
describ es how our approachworks by applying it to a small C co de frag-
ment. In section 5, we discuss advantages and limitations of our approach.
In section 6 we present alternative parsing techniques based on XML, and
in section 7 we discuss the design of an integration environmentforcom-
municating data b etween the system presented in this pap er and various
CASE to ols. Finally, section 8 o ers a brief summary of the results rep orted
in this pap er, as well as a p ersp ective for further work.
1. Towards Environment Re-targetable Parser Generators 3
2 Parsing for Program Analysis
Traditionally, parsing has b een seen as a to ol for pro ducing internal repre-
sentations of source co de with the purp ose of generating executable binaries
for a sp eci c target op erating platform. In contrast, parsing for program
understanding aims on pro ducing representations of the source co de that
can b e used for program analysis and design recovery purp oses. In this
resp ect, program representation aims on facilitating source co de analysis
that can b e applied at various levels of abstraction and detail, namely at:
the physical level where co de artifacts are represented as tokens, syn-
tax trees, and lexemes,
the logical level where the software is represented as a collection of
mo dules and interfaces, in the form of a program design language,
annotated tuples, aggregate data and control ow relations,
the conceptual level where software is represented in the form of ab-
stract entities such as, ob jects, abstract data typ es, and communi-
cating pro cesses.
These representations are achieved by parsing the source co de of the sys-
tem b eing analyzed at various levels of detail and granularity. Sp eci cally,
over the past yearsanumb er of parsing to ols have b een prop osed and used
for re-do cumentation, design recovery and, architectural analysis. Overall,
the parsing technology for program understanding can b e classi ed in three
main categories.
The rst category, deals with full parsers that pro duce complete ASTs
of a given co de fragment. The second category fo cuses on scanners that
pro duce partial ASTs or emit facts related to control and data ow prop-
erties of a given source co de fragment. Finally, the third category deals with
regular expression analyzers that extract high level syntactic information
from the source co de of a given system.
Full parsers, are used for detailed source co de analysis as they provide
low level detailed information that can b e obtained from the source co de.
This includes data typ e information, linking information, conditional com-
pilation directives, and pre-pro cessed libraries and macros.
In this context, a full parser generator called DIALECT is prop osed in
[Kotik89]. The approach uses a domain mo del as well as yacc and lex to
pro duce a parser given a BNF grammar. The AST is represented in terms
of CLOS ob jects in dynamic memory and can b e accessed by LISP-likea
programming interface (API) written for the CASE to ol Re ne.
In [Datrix00], another full parser for C++ source co de is presented. The
parser emits an AST representation that is suitable for source co de analysis
and reverse engineering of systems written in C++. A domain mo del for
C++ provides the schema that describ es the structure of the generated
1. Towards Environment Re-targetable Parser Generators 4
AST. Moreover, programming interface provides access to the AST and the
ability to exp ort detailed information in the form of RSF tuples [Muller93].
Similar to the approachabove, a C/C++ parser generator that pro-
duces a fully annotated AST representation of the co de is presented in
[Devambu92]. The source co de representation can b e accessed by GEN++
for sp ecifying C++ analysis and GENOA that provides a language inde-
pendent parse tree querying framework.
Finally, another technique that is gaining p opularity due to the emer-
gence of mark-up languages is to annotate the source co de using XML.
Do cumentTyp e De nitions (DTDs) can b e automatically built by analyz-
ing the grammar of a given programming language. Parser generators such
as JavaCC can b e used to emit XML tagged co de by altering the semantic
actions in a given grammar sp eci cation. Currently, DTDs for C, C++, and
Javahave b een prop osed [Mamas00] and are available through the Con-
sortium for Software Engineering Research (CSER). Once XML annotated
source co de is parsed, AST-like representations can b e generated. These
AST-like structures are called DOM trees.
In the second category of parsing techniques, scanners are used for ex-
tracting high level information from a software system. This includes infor-
mation related to declarations, calls, fetches and stores of program variables
as well as, high level directory and source co de organizational information.
Data obtained from scanners are particularly useful for high level architec-
tural recovery of large systems and can b e readily used byvarious visual-
ization and analysis to ols.
In [Muller93] a source co de representation scheme that is based on re-
lation entity-relation tuples (RSF) is prop osed. The approachisbasedon
parsers for which their semantic actions are mo di ed and applied only to a
selected set of grammar rules to emit facts in the form of entity-relation tu-
ples. As a result information p ertaining only to sp eci c syntactic construct
(i.e. declarations, function calls) is presented and the rest of the source
co de is treated e ectively as whitespace to b e skipp ed.
Similarly, in [Holt97] a mo di ed gnu C parser allows for attributed tuples
to b e emitted. The mo di ed parser called CFX pro duces high level source
co de representations that comply with the TA syntactic form and can b e
directly used bythe PBS visualization to olkit [Finni97].
In the third category, regular expression analyzers provide high level
syntactic information of a system. Their ma jor strength is their simplicity
and availability as they can b e implemented using scripting languages such
as Perl, TCL, and Awk.
1. Towards Environment Re-targetable Parser Generators 5
3 Rationale for Using AST Representations
As programming languages b ecome more and more complex, one-pass com-
pilers which don't pro duce anyintermediate form of the compiled co de
are b ecoming very rare. This is also true in the area of program under-
standing and software re-engineering [Till96]. This has led to a varietyof
intermediate representations of the syntactic structure of source co de. For
program understanding and re-engineering purp oses, a suitable represen-
tation must b e able to representsyntactic information which p ersists and
evolves through the entire life-cycle of a software system. This life-cycle typ-
ically b egins with source co de analysis during development, and continues
throughout successive maintenance, reverse engineering and re-engineering
phases [Chiko90].
Often, such a representation limits what can b e expressed and describ ed
within it for the source co de. Hence, generating an intermediate repre-
sentation is an extremely imp ortant asp ect for program understanding. In
[Konto93], of common program representations, suchasAbstractSyntax
Trees (AST), Directed Acyclic Graphs (DAG), Prolog rules, co de and con-
cept ob jects, and control and data ow graphs are surveyed. Among these,
the AST is the representation that is most advantageous, and is most widely
used bysoftware develop ers. The reason for the p opularityoftheASTas
a program representation scheme is its ability to capture all the necessary
information for p erforming anytyp e of co de analysis, as well as the neutral
representation of the source co de it o ers with resp ect to data ow and
control ow source co de prop erties.
To justify the AST representation used here, we need to distinguish rst
between a parse tree and an AST. When parsing, a parser explicitly or im-
plicitly uses a so-called parse tree to represent the structure of the source
program [Aho86]. In Fig.1(a) such a parse tree for an addition expression,
e.g., "1 + 2" is illustrated. An AST is a compressed representation of the
parse tree where the op erators app ear as no des, and op erands of the op-
erator are the children of the no de representing that op erator. Similarly,
Fig.1(b) illustrates an AST for the same addition expression, and Fig.1(c)
illustrates an annotated AST where the tree edges b ecome annotated at-
tributes b etween no des.
The AST we adopted for our pro ject is a variation of the AST shown
ab ove, called annotated AST. An annotated AST for the PL/X If-Statement
given b elow
IF (OPTION > 0) THEN
SHOW_MENU(OPTION)
ELSE
SHOW_ERROR("Invalid option...")
is illustrated in Fig.2.
As you may notice, links of the AST are represented as attributes asso-
1. Towards Environment Re-targetable Parser Generators 6
(a) (b) (c)
Add + +
Arg1 Arg2
1 + 2 12 12
Parse Tree Abstract Syntax Tree Annotated Abstract Syntax Tree
FIGURE 1. Parse tree and AST representation for the expression "1 + 2".
ciated with (no de) ob jects. In the If-Statement example, three attributes,
are asso ciated with the If-Statement no de, and hold the fact that there are
three comp onents to an If-Statement (condition, then-part, else-part).
The tree in Fig.2 is also annotated with a fanout attribute that denotes the
numb er of function calls that app ear in the corresp onding source co de en-
tity. The fanout attribute can b e computed comp ositionally from the leaves
to the ro ot of the AST as illustrated in Fig.2. In the subsequent discussion,
we use the term AST to denote an annotated AST.
4 The Domain Mo del Approach
In this section, we describ e a domain-mo del based technique of generat-
ing an AST. The pro cess of developing a domain mo del based parser is
illustrated in Fig.3 and consists of three phases:
1. Identi cation and mo deling of the syntactic structures of a program-
ming language in the form of ob ject classes in a domain mo del. In-
stances of the ob ject classes along with their corresp onding attributes
form the actual AST no des, and AST edges resp ectivelly,
2. Development of a grammar for the programming language byusing
the domain mo del ob ject classes as grammarnonterminal symbols,
3. Integration of semantic actions with grammar rules so that the AST
can b e built during parsing in a customizable way.
For the overall AST generation pro cess there are twolevels of abstraction.
The higher level fo cuses on the de nition of the domain mo del using a
1. Towards Environment Re-targetable Parser Generators 7
STATEMENT IF fanout 2
condition then-clause else-clause
PREDICATE STATEMENT STATEMENT GT CALL CALL
fanout fanout
argument1 argument2 callee-id caller-list 1 callee-id caller-list 1
IDENTIFIER LITERAL IDENTIFIER IDENTIFIER IDENTIFIER LITERAL REFERENCE INTEGER REFERENCE REFERENCE REFERENCE STRING
fanout fanout
defined-name integer-value defined-name 1 defined-name defined-name 1 defined-name
OPTION 0 SHOW_ OPTION SHOW_ "Invalid MENU ERROR option..."
Legend
NODE NAME = AST node
= Link from parent attribute name to child via a named attribute.
fanout = Fanout attribute containing integer
V value V.
FIGURE 2. Annotated AST for an If-Statement.
mo deling language and the sp eci cation of grammar rules using EBNF.
The lower level of abstraction fo cuses on utilities that allow for the gen-
eration of an AST in terms of a concrete programming formalism.For our
work wehavechosen to represent ASTs in terms of C++ ob jects, and the
grammar b e sp eci ed in an o -the-shelf parser generator (i.e PCCTS). Fi-
nally, the domain mo del is designed and implemented using the conceptual
language Telos [Mylo90].
As indicated in the intro duction, the domain mo del-based approachaims
at the development of parsers that are easy to develop, and can b e inte-
grated with a variety of CASE to ols. For this pap er we examine howthe
domain mo del approach can b e applied so that:
the grammar sp eci cations b e represented in a simple way,
the communicationbetween the parser and the lexical analyzer b e
enhanced and nally,
the structure of the generated AST b e customized and controlled by
the user.
4.1 Developing a Domain Model
There are two basic metho ds for building a domain mo del for a given
programming language. One metho d is b ottom-up, in the sense that it
uses the nonterminals in the language grammar to obtain the classes of
the domain mo del. In this approacheachnonterminal symb ol has a close
relationship with the ob jects in a parse tree [Aho86].
The other basic metho d for building domain mo dels is top-down. In this
approach, domain mo dels are built in an incremental way where the higher
1. Towards Environment Re-targetable Parser Generators 8
Domain Grmmar Model Rules
Translation Translation
Grammar C++ Input Input Semantic Rules with Domain Model Actions & Semantic Implementation Actions Utilities
Parser Generator
Input Output Annotated Parser Abstract Syntax Tree
Source Code
FIGURE 3. The AST domain-mo del based generation pro cess.
level schemata (domain mo del entities) are sp eci ed rst, and then they
are re ned according to the syntactic and structural elements and relations
of the programming language b eing mo deled. The domain mo del schemata
provide the sp eci cation for the ob jects ASTs are generated and comp osed
with.
In this context, a domain mo del de nes a set of ob jects and the interrela-
tionships among them. Overall, the domain mo del must b e expressed in an
environmentwhich o ers mo deling capabilities and some p ersistent storage
for the classes and ob jects of the domain mo del (which represent generated
AST). For our work, we adopt Telos [Mylo90] to represent domain mo dels,
primarily b ecause of its expressiveness and p ortability. Among other fea-
tures, Telos allows an arbitrary numb er of meta-class hierarchylevels to
b e de ned. This feature is exploited in dealing with di erentversions of
the same language. Moreover, the AST generated by the parser generator
can b e p ersistently stored in a Telos rep ository or any other commercial
Data Base and exp orted to other to ols for subsequent analysis [Buss94].
Mo deling the application domain involves the discovery of ob jects classes
that represent the AST no des and their relationships which representthe
AST branches [?].
For example, in the case of the C programming language, wemay consider
the following categories of schemata (classes) as part of the C domain mo del
[Reas96a]:
Telos-Object
- Program
- Declaration-Object
-File - Function
1. Towards Environment Re-targetable Parser Generators 9
- Statement
For each language domain mo del, there is an ob ject hierarchythatcanbe
implicitly imp osed on schema entities by the structure of the programming
language itself. Along these lines, further re nement of the C language
ob ject domain hierarchyisshown b elow:
ObjectClass
...Container
...Attribute
...C-object
...... Program
...... Declaration-Object
...... Function
...... Expression
...... Identifier-Ref
...... Function-Call
...... File
...... Statement
...... If-Statement
...... Assign-Statement
The task of de ning relationships among ob jects is emanating from the
syntax of a programming language itself. This task is directly related to
identifying the edges of the AST. For example, an If-Statement domain-
mo del class as illustrated b elowmayhave three attributes. The one is the
condition attribute whichmapstoanExpression class, and the other
two are the then-part and the else-part attributes which map to a
Statement. In the domain mo del, one can sp ecify the If-Statement class
and its corresp onding attributes in Telos as follows:
SimpleClass If-Statement
in ObjectClass
isA Statement
with
attributes
condition : Expression;
then-part : Statement;
else-part : Statement;
end
This can b e read as follows: the class If-Statement is a sub class of a
Statement domain mo del class; it has three arguments two of them map
to the class Statement and one to the class Expression. A simple trans-
formation program can translate the ab ove domain mo del class de nition
to corresp onding C++ classes, twoofwhich are illustrated in the example
below:
class If-Statement: public Statement,
1. Towards Environment Re-targetable Parser Generators 10
ObjectClass
{
private:
Condition *condition;
Then-part *then-part;
Else-part *else-part;
public:
Condition *getCondition();
Then-Part *getThen-Part();
Else-Part *getElse-Part();
void *putCondition(Condition *aCond);
void *putThen-Part(Statement *aStat);
void *putElse-Part(Statement *aStat);
};
class Condition: public Attribute,
ObjectClass
{
private:
If-Statement *from;
Expression *to;
public:
C-Object *getFrom()
Container *getTo()
void *putFrom(If-Statement *aStat)
void *putTo(Expression *aExpr)
};
The instantiations of the ab ove C++ classes can b e generated at parse
time and form the no des of the AST.
4.2 De ning the Grammar
To understand how the grammar sp eci cation asso ciates with the domain
mo del, let's consider consider how a grammar for three C language con-
structs (If-Statement, Assign-Statement, Block-Statement) can b e de-
velop ed.
A domain mo del for the ab ove constructs is sp eci ed in terms of an
hierarchical schema as:
Statement
...If-Statememt
...Assign-Statement
...Block-Statement
and in Telos the constructs are sp eci ed as follows:
1. Towards Environment Re-targetable Parser Generators 11
SimpleClass Statement
in ObjectClass
isA C-Object
end
SimpleClass If-Statement
in ObjectClass
isA Statement
with
attributes
condition : Expression;
then-part : Statement;
else-part : Statement;
end
SimpleClass Assign-Statement
in ObjectClass
isA Statement
with
attributes
assign-lhs : Identifier-Ref;
assign-rhs : Expression;
end
SimpleClass Block-Statement
in ObjectClass
isA Statement
with
attributes
stat-list : sequence-of(Statement);
end
The grammar for a given language can b e built in two phases. The rst
phase is automatic and is used for emitting grammar rules that can b e
directly inferred from the domain mo del. These include rules that involve
nonterminals that corresp ond tp class-sub class hierarchies and to class-
attribute relations in the domain mo del. An example is the rule R.4 and
rules R.5-R.10 resp ectively as illustrated b elow.
The second phase is user-assisted and involves the user writing the gram-
mar for a given programming language provided that he of she uses as non-
terminals entities sp eci ed in the language domain mo del. Examples are the
rules R.1 - R.3 illustrated b elow. For example, the Assign-Statement do-
main entity, is used as a non-terminal in Rule R.1 along with its attributes
assign-lhs and assign-rhs.Attributes for which their target domain is a
sequence of entities (such as in the case of stat-list attribute), are trans-
formed into rules that involve the closure op erations * or + as illustrated
in the case of rule R.3.
1. Towards Environment Re-targetable Parser Generators 12
Given these classes, the following grammatical rules can b e sp eci ed as:
1
R.1 If-Statement : "if" "(" condition ")" "then"
the-part {["else" else-part]};
R.2 Assign-Statement : assign-lhs "=" assign-rhs;
R.3 Block-Statement : stat-list +;
while the rules the following rules are generated automatically by the do-
main mo del,
R.4 Statement : If-Statement | Assign-Statement|
Block-Statement;
R.5 condition : Expression;
R.6 then-part : Statement;
R.7 else-part : Statement;
R.8 assign-lhs : Identifier-Ref;
R.9 assign-rhs : Expression;
R.10 stat-list : Statement;
The co ordination of grammar rules and the domain mo del in order to
generate the desired AST is achieved only with a set of semantic actions
that are invoked from within the parser generator. Wedonotputany
restriction on the parser generator and therefore the same semantic actions
can b e invoked byavariety of parser generator to ols requiring only minor
or not at all, mo di cations.
De ning a parser for a programming language can b e a tedious exercise
esp ecially if the language contains context sensitive structures.
In the literature one can nd a wealth of parser generator environments
(PCCTS, Yacc, Eli, TXL, LLGen, Pgs, Cola, Lark, Genoa, and Dialect,
to mention just a few). All of these environments provide a formalism to
represent grammatical rules, and a mechanism to trigger semantic actions
when a rule is applied successfully during the parsing pro cess.
1
The grammar rules are given for clarity at a higher level that the one they
will app ear at the original PCCTS sp eci cation. An automatic translation to
PCCTS or yacc rules from the given high level rule sp eci cation is also p ossible.
In these rules, curly brackets denote optional choice while square brackets indicate sequencing.
1. Towards Environment Re-targetable Parser Generators 13
4.3 Generating the AST
In order to ful ll the task of AST generation by using the machine de-
scrib ed ab ove, we need appropriate semantic actions to b e inserted into
the grammar.
Wehave de ned three semantic actions that are asso ciated with each
rule. These are: BuildRule, BuildTerm,and BuildAttr. These actions op-
erate on two global stacks RuleStack and NodeStack. The rst stackkeeps
track the current rule applied while the second stack stores the generated
AST no des that have still to b e linked with their parentnodeduringthe
AST generation phase. The semantic actions are automatically inserted in
the grammar rules given the domain mo del sp eci cation. Only the rules for
which their left-hand size part (i.e. the head of the grammar rule) corre-
sp onds to a domain mo del class that do es not haveany sub classes need to
b e annotated by the semantic actions describ ed b elow. The reason is that
all other rules are not used to generate AST no des and are simply used to
facilitate the parsing pro cess for a given a top level non-terminal starting
grammar symb ol. These semantic actions are discussed in detail b elow:
BuildRule(Rule) : This action takes as a parameter the current rule
name (i.e. the name of the non terminal in the head of a rule). This action
is inserted as the rst semantic action to b e carried out at the right-hand-
side of a given rule. It is actually invoked b efore the rule comp onents are
tried. In other words, the action is invoked b efore a rule recognizes anything
in the input stream. More sp eci cally, this action registers the rule as the
current rule applied. Concretely, it creates an instance of the class Rule
the p ointer of which is pushed onto a RuleStack.
Once the rule succeeds (i.e. all right-hand side non-terminals or terminal
symbols have b een resolved against the source text), the reference for this
rule is p opp ed from the RuleStack indicating that this is not anymore the
current rule b eing tried. Hence, the only purp ose of the BuildRule() action
is to keep track which is the current rule b eing tried. In this resp ect, a stack
data structure allows for keeping trackofmultiple nested invo cations of the
same rule, a very frequent case that o ccurs during the parsing pro cess.
As an example consider the rule:
Assign-Statement :
assign-lhs "=" assign-rhs;
The rule ab ovehastwo nonterminals (assign-lhs, assign-rhs) and
one terminal (the token "=").
By examining the domain mo del two more rules are automatically added.
These are:
assign-lhs : Identifier-Ref;
assign-rhs : Expression;
1. Towards Environment Re-targetable Parser Generators 14
The domain mo del is also used to recognize if a non-terminal is an at-
tribute or a class. For the example domain mo del given in Section 4.2,
Assign-Statement is de ned as a class entity, while assign-lhs and assign-rhs
are de ned as attributes of the Assign-Statement class. The Assign-Statement
rule shown ab ove enhanced with the BuildRule semantic action and the
additional rules obtained by the domain mo del b ecomes:
Assign-Statement:
<< BuildRule(Assign-Statement); >>
assign-lhs "=" assign-rhs ;
assign-lhs:
<< BuildRule(assign-lhs); >>
Identifier-Ref;
assign-rhs:
<< BuildRule(assign-rhs); >>
Expression;
BuildTerm(Term) : This action takes as a parameter the name of the
non-terminal symb ol at the head of the rule. This semantic action is placed
at the end of a rule, provided that the head of the rule is a class that has no
sub classes in the domain mo del. If the head of the rule is an attribute then
the BuildAttr semantic action is applied instead. The BuildAttr(Attr)
action is discussed b elow. Examining whether or not a Term corresp onds
to a Class or an Attribute takes only a simple lo ok-up op eration on the
domain mo del sp eci cation. The BuildTerm(Term) action aims at building
the AST no de that corresp onds to the latest term (head of the rule) suc-
ceeded. This action is implemented by incorp orating a globally available
stackcalledNodeStack.
Once the rule is successfully parsed the BuildTerm(Term) semantic
action is invoked.Aninstanceofanobjectoftyp e Term is constructed and
is pushed on the TermStack indicating that this is the most recent rule
successfully tried. If the rule fails then a cleanup of the stacks o ccurs up to
the last rule that had b een successfully tried.
Augmented with b oth actions BuildRule(Rule) and BuildTerm(term),
2
our example rule lo oks like the following: :
Assign-Statement:
<< BuildRule(Assign-Statement); >>
assign-lhs "=" assign-rhs ;
<< BuildTerm(Assign-Statement); >> ;
2
Due to limitations in space for this pap er we omit the rules related to
assign-lhs and assign-rhs.
1. Towards Environment Re-targetable Parser Generators 15
Assign−Statement
assign−lhs assign−rhs
Identifier−Ref Integer
name value
a 1
FIGURE 4. AST representation for the statement a=1.
In our example the BuildTerm(Assign-Statement) action will create
an instance of the class Assign-Statement.
BuildAttr(Attr) : This action is placed at the end of a rule, provided
that the head of the rule is recognized as an Attribute in the domain
mo del and not as a class. The BuildAttr(Attr) action is used for creating
the annotated edges of the AST. The most recent attribute created bythe
pro cess b ecomes an incoming edge to the most recent created no de. Aug-
mented with the semantic actions BuildRule(Rule), BuildTerm(Term),
and BuildAttr(Attr) our example rules b ecome:
assign-lhs :
<< BuildRule(assign-lhs); >>
Identifier-Ref
<< BuildAttr(assign-lhs); >>
assign-rhs :
<< BuildRule(assign-rhs); >>
Expression
<< BuildAttr(assign-rhs); >>
For the ab ove example the AST which will b e created for the C statement
3
a= 1is illustrated in Fig.4 .
This pro cess allows for a customizable way to imp ose a structure in the
AST, which can b e used to p ort AST structures from one to ol to another
with minimal e ort. In practice, the programmer is not even required to
3
We assume a domain mo del that contains Identifier-Ref,and Integer
classes with attributes name and value resp ectively
1. Towards Environment Re-targetable Parser Generators 16
sp ecify the BuildRule(Rule), BuildTerm(Term),and BuildAttr(Attr)
actions. These can b e generated automatically by a transformation program
that takes abstract grammar rules and generates legal co de for the parser
generator.
The following sections discuss the functionality of the semantic actions
in more detail.
4.4 Semantic Actions
The three semantic actions sketched in the previous session are used in co-
ordination with two global stacks to create the Annotated Abstract Syntax
Tree.
The following algorithms describ e how the semantic actions are used in
the AST generation pro cess for a rule of the form NT : NT NT :::N T ,
Head 1 2 k
where NT is a a non-terminal. For claritywe omit terminals in our rule
i
descriptions.
Pro cedure Initialize
Step.1 RuleStack NULL
Step.2 No deStack NULL
Step.3 Exit
Pro cedure BuildTerm(NT )
head
Step.1 RulePtr CreateRuleOb ject(NT )
Head
Step.2 RulePtr.count 0
Step.3 Push(RulePtr, RuleStack)
Step.4 Exit
Pro cedure BuildAttr(NT )
head
Step.1 Pop(RuleStack)
Step.2 Top(RuleStack).count++
Step.3 Exit
Pro cedure BuildTerm(NT )
head
Step.1 Term CREATE-TERM(NT );
Head
TheNonTerm Pop(RuleStack);
1. Towards Environment Re-targetable Parser Generators 17
Step.2 IF RuleStack 6= NULL THEN
AnAttr Top(RuleStack);
CurrentAttr CREATE-ATTR(AnAttr);
/* CREATE-ATTR builds a lab eled AST edge */
/* with information obtained from CurrentAttr */
Term.IncomingAttr CurrentAttr
END IF
Step.3 FORi=1TOTheNonTerm.countDO
aNo de Top(No deStack);
/* Link the incoming attribute to aNo de */
/* with the parent of aNo de, whichisTerm */
LinkIncomingW ith (aNo de, Term)
Pop(No deStack)
END DO
Step.4 Push(Term, No deStack);
Step.5 Return(Term)
5 Advantages and Limitations
In Section 4.4, we describ ed a technique for incorp orating the semantic
actions for AST generation into a grammar. The basic di erence b etween
traditional approaches and the domain-mo del based approach is the level
where the actions are emb edded. Traditional approaches can b e thoughtas
syntax-directed while the domain-model approach as rule-directed.
In this section, wetake a lo ok at the grammars written by using the two
di erent approaches, and we discuss the advantages and limitations of the
domain mo del approach.
Aversion of grammar rules for generating an AST for a Blo ck-Statement
and an Assignment-Statement are shown b elow:
Block-Statement >
<< List* stat-list = NULL;
Statememt * Stat; int i=0;>>
(LC:LEFT-CURLY
( Statement >[Stat] SE:SEMICOLON
<< if (i==0)
{stat-list=new List(Stat); i=i+1;}
else stat-list -> append(Stat); >>)+
RC:RIGHT-CURLY)
Assign-Statement >
[Assign-Statement * Assign];
1. Towards Environment Re-targetable Parser Generators 18
<< Expression* Expr; Identifier-Ref* Id-Ref;>>
ID:ID-REF
<
AS:ASSIGN-SIGN
Expression > [Expression]
<< $Assign=new Assign-Statement(Id-Ref,Expr);>> ;
For the cases where the programming language has context-sensitivecon-
structs the grammar ab ove can b e even more complex to design and de-
velop. With the domain-mo del approach, by using the semantic actions
discussed ab ove the sample grammar can b e rewritten and still p erform
the same task.
Block-Statement >
<< BuildRule("Block-Statement"); >>
(LC:LEFT-CURLY
( stat-list SE:SEMICOLON )+
RC:RIGHT-CURLY)
<< BuildTerm("Block-Statement";>>
Assign-Statement >
<< RuleCount("Assign");>>
assign-lhs AS:ASSIGN-SIGN assign-rhs
<< BuildTerm("Assign-Statement") >> ;
A translation pro cess takes the simpli ed grammar sp eci cation and gen-
erates PCCTS co de (ab ove) or yacc co de. The translation pro cess will gen-
erate valid PCCTS or yacc co de by the simpli ed grammar and the domain
mo del, by adding more rules, and attaching the semantic actions discussed
ab ove at the b eginning and the end of eachrule.
Therefore, in practice, the programmer needs only to de ne the following
grammar:
Block-Statement :
(LC:LEFT-CURLY
(stat-list SE:SEMICOLON)+
RC:RIGHT-CURLY)
Assign-Statement :
assign-lhs AS:ASSIGN-SIGN assign-rhs;
The domain mo del is also used to automatically generate the header les
that contain the C++ class instances which represent AST no des.
Our exp eriments suggest the following ma jor advantages for the domain
mo del approach.
First, the AST generation pro cess can b e easily applied to accommo date
di erent programming languages. Most programming languages constructs
1. Towards Environment Re-targetable Parser Generators 19
can b e mo deled using the domain-model approachandany parser generator
can b e used iwhen augmented with the prop er semantic actions.
Second, once the AST is created, it is easy to make tree analyzers and
prettyprint utilities. This is of particular imp ortance for re-engineering
pro jects aiming at the migration of programs from one language version to
another.
Third, the approach e ectively hides implementation details for generat-
ing AST. It is notable that the domain-model based grammar sp eci cation
is much more concise than that of generated by the traditional approach.
Finally,context-sensitive languages can b e sp eci ed easier as the struc-
ture of the rules is simpler. This is of particular imp ortance, since many
programs that have to b e re-engineered, or migrated are written in highly
sp ecialized, context sensitive languages.
The prop osed approach has the limitation that when used with an LL
parser generator may provide complications (i.e. complex rules) in the im-
plementation due to the left-recursion restrictions [Aho86]. In this case the
user has to b e careful to write grammar rules, as discussed ab ove, that
are not sub ject to left-recursion. However, these restrictions do not apply
when the prop osed domain mo del approach is used with an LALR parser.
Moreover, an algorithmic pro cess for eliminating left recursion in grammar
rules has b een presented in [Aho86].
6 AlternativeTechniques
In this section we present alternative techniques for customizable domain-
driven parsing using the extensible Mark-up Language (XML) [Mamas00].
The ob jectiveistodevelop program representations based on XML that
provides information at the same level as ASTs. This alternative represen-
tation is simple but detailed enough to represent the complete syntax of
a sp eci c programming language. This mark-up language based approach
aims on developing more generic representations suitable for domains with
similar characteristics.
6.1 Annotating SourceCode Using XML
Mapping ASTs to DTDs
Given a sp eci c programming language we need to de ne a representation
in whichevery valid source co de program can b e mapp ed to. To accomplish
this mapping from ASTs to XML DOM trees we need to de ne a metho d to
map the grammar of the programming language structure to a Do cument
Typ e De nition (DTD). The mapping at this level will guaranteethatall
p ossible syntax trees de ned by the grammar (and therefore any source
co de program), can b e mapp ed to XML trees de ned by the DTD. One
1. Towards Environment Re-targetable Parser Generators 20
of the requirements when de ning new XML representations is to make
them as easy to use as p ossible. Prop osing a generic algorithm that maps
a grammar to a DTD is something that result in hard to use representa-
tions. Instead, an alternativeistoprovide general guidelines that assist in
implementing a go o d mapping from a grammar to a DTD.
Once a grammar is mapp ed to a DTD, the ASTs for a sp eci c program
can b e mapp ed to XML les. These XML les can then b e used in place
of the ASTs or the original source les for maintenance tasks.
Java Mark-up Language (JavaML)
The generation of a program representation for Java is based on a parser
generator to ol called Java Compiler Compiler or JavaCC[JavaCC] in short.
This to ol was initially develop ed by Sun Microsystems and it is the one
of the most p opular parser generators for Java. The p opularityofJavaCC
is most probably due to the grammar for Java that is shipp ed with the
to ol. The ma jorityofJava source co de parsers are built using JavaCC and
the Java 1.1 grammar. The Java 1.1 grammar was develop ed bySriram
Sankar at Sun Microsystems and a copy of this grammar can b e found in
the distribution of JavaCC.
The complete DTD that we generated based on the Java 1.1 grammar is
presented in [Mamas00] and can b e found at http://swen.uwaterlo o.ca/evan/
javaml.html.Belowwe present a small example of a Java source le and
its corresp onding JavaML representation.
Java source co de
public class Car{
int color;
public int getColor(){
return color;}
}
JavaML representation
1. Towards Environment Re-targetable Parser Generators 21
The ab ove annotated source co de can b e parsed by an XML parser such
as IBM's XML4J parser and pro duce a DOM tree that corresp onds in this
context to an Annotated Abstract Syntax Tree.
C++ Markup Language (CppML)
The generation of a representation for the C++ programming language
is based on a di erent approach than that for Java. Instead of using a
parser generator suchasJavaCC, we to ok advantage of a compiler pro d-
uct that maintains the intermediate representation of the co de and pro-
vides access to it through an API. This pro duct is the IBM VisualAge
C++[VACpp] IDE which is develop ed at the IBM Toronto lab. VisualAge
contains a rep ository which is called Co destore in which all the information
generated during the compilation pro cess is stored. Parsing, pro cessing and
co de generation information can all b e accessed using the provided APIs.
The goal b ehind the architecture of VisualAge is to allowdevelop ers to
maintain and analyze C++ source co de. Our goal in using VisualAge is
to demonstrate that a commercial pro duct can b e integrated and used
as a to ol in a more complex environment. The complete representation
for C++ that was generated using IBM's VisualAge C++ can b e found
at http://swen.uwaterlo o.ca/do cs/cp pml .html . Sample source les and their
representations are also available. The grammar that CppML was based
on, was implicitly extracted from the Co destore APIs, whichwas obtained
from the VisualAge development team and is expressive enough to repre-
sent ANSI C++ compliant source les.
7 Exp eriments
In this section we provide exp erimental results related b oth to the p er-
formance of the generated parser using the prop osed domain mo del based
approach, and to the p erformance of the architecture used for to ol integra- tion.
1. Towards Environment Re-targetable Parser Generators 22
7.1 Parser Performance
The exp erimental results presented here fo cus on two main categories namely,
parse time p erformance and, space requirements for the generated Abstract
Syntax Tree. The results were obtained byevaluating a parser develop ed for
the C programming language, using the domain mo del approach discussed
in this pap er. On-going work fo cuses on developing parsers for PL/IX,
PL/X, and PL/I.
In Table.1.1, the time and space p erformance statistics for the C parser
are illustrated. These results indicate that the parser develop ed is scal-
able with resp ect to the time required to parse source co de and build the
corresp onding Abstract Syntax Tree.
In particular, the results indicate a linear complexitybetween the number
of statements to b e parsed and the time sp ent to actually parse these
statements. The same linear relationship holds b etween the numb er of lines
of co de to b e parsed and the actual parse time. On average, the parser
parsedonaverage 44.43 statements, or 235 lines of co de (LOC) p er second.
Similarly, the space requirements for the generated Abstract Syntax Tree
indicate that the size of the tree in terms of the numb er of no des is linear
when compared to against the numb er of source co de statements and the
lines of source co de (LOC) parsed.
On-going work fo cuses on the use of XML and DOM for representing
the Abstract Syntax Trees and DTD domain mo dels have already built for
Java and C++.
7.2 Tool Integration
In developing a p ortable program representation environment, the following
imp ortant problems have to b e considered:
Data integration is essential to ensure data exchange b etween to ols,
Control integration enhances inter-op erability and data integrity among
di erent to ols.
The rst can b e accomplished through a common schema, while the sec-
ond can b e accomplished with a server that handles requests and resp onses
between to ols in the environment.
A solution to the data integration problem is based on a system architec-
ture in which all to ols communicate through a central software rep ository
that stores, normalizes and makes available analysis results, design con-
cepts, links to graph structures, and other control and message passing
mechanisms required for the communication and co op erativeinvo cation of
di erenttools.Suchintegration is achieved by using a lo cal workspace for
each individual parser or to ol in which sp eci c results and artifacts are
1. Towards Environment Re-targetable Parser Generators 23
LOC # of Statements Parse Time # AST Nodes
(sec) thousands
15 1 0.05 0.005
49 1 0.14 0.038
40 1 0.2 0.068
101 1 0.32 0.113
203 1 0.92 0.409
371 1 1.71 0.713
302 1 2.41 1.085
1 1 0.04 0.008
76 2 0.85 0.406
122 4 0.23 0.068
38 16 0.16 0.055
54 25 0.38 0.165
65 50 0.46 0.202
196 89 1.28 0.603
178 110 1.46 0.701
240 131 1,92 0.907
238 153 1.59 0.846
365 248 2.86 1.401
960 476 8.52 4.214
1002 728 8.07 4.125
8050 4723 92.99 39.89
16203 7486 149.1 63.53
34271 14240 277.19 114.622
TABLE 1.1. Time and space statistics for the prototyp e parser.
stored, and a translation program for transforming to ol-sp eci c software
entities into a common and compatible form for all environments entities.
The translation program generates appropriate images in the central
rep ository of ob jects shared with lo cal workspaces. For the program rep-
resentation scheme prop osed in this pap er (i.e. annotated ASTs based on
a domain mo del) this translation is a straightforward pro cess, as the AST
no des and links corresp ond to classes that can b e mapp ed from one to ol to
the other (CLOS in Re ne, C++ for PCCTS etc.) A system architecture
for suchanintegrated reverse engineering environment that uses common
program representation interchange data is illustrated in Figure 5
Communication in this distributed environmentisachieved by scripts
understo o d byeach to ol using a shared schema data mo del for representing
data from the individual lo cal workspaces of each to ol and, a message
passing mechanism for each to ol to request and resp ond to services.
A message server allows all to ols to communicate b oth with the rep osi-
tory and with each other, using the common schema. These messages form
the basis for all communication in the system.
1. Towards Environment Re-targetable Parser Generators 24
Pattern Rigi Refine Matcher
Local Local Local Workspace Workspace Workspace
Control Integration Server
Schema
Repository Data Integration Object Base
Tool
FIGURE 5. The implemented system architecture for to ol integration. Dashed
lines distinguis h computing environments, usually running on di erentmachines.
The central rep ository is resp onsible for normalizing these representa-
tions, making them available to other to ols, and linking them with the
other relevantsoftware artifacts already stored in the rep ository (e.g., do c-
umentation, architectural descriptions).
Finally, communication scripts are in the form of s-expressions. Eachs-
expression is wrapp ed in packets called network objects and sentviaTMB
to the appropriate target machine. TMB uses Unix so ckets for the commu-
nication and utilizes the TCP/IP network infrastructure.
An example s-expression for a Function called hostnames-matching is
given b elow.
This s-expression is used to represent an annotated (yet simpli ed) AST
no de corresp onding to a C Function. Annotations in this example include
call information, metrics information and, data ow information.
(Function-176 Token
(Function)
()
(((objectId)
(("#176")))
((cyclomaticComplexity)
((5.0)))
((calls)
((Function-175)
(Function-174)))
((definesLocals)
(("word")
("yylval")))
((definesGlobals)
1. Towards Environment Re-targetable Parser Generators 25
(("shell-input-line")
("shell-input-line-index")))
((functionName)
(("hostnames-matching")))))
The shared schema facilitates data integration and enco des program ar-
tifacts (i.e. the AST), as well as analysis results (e.g. the call graph, the
metrics analysis).
The schema has b een implemented in Telos and allows for multiple in-
heritance and sp ecialization of the attribute categories for eachobject.In
this way,anASTentity (i.e. File) mayhave attributes that are classi ed
as Re ne-Attributes, Rigi-Attributes,and PBS-Attributes.Theschema is
p opulated by to ols that may add, or retrievevalues for the attributes in
the ob jects to which it has access to.
Similarly,dataintegration for analysis results was achieved by designing
schema classes for every typ e of analysis a to ol is able to p erform. Each
such class is linked via attributes to the actual AST entities and this is
visible to all the other to ols that may request it.
At run-time, the user may request and select rep ository entities according
to sp eci c ob ject typ es or attribute values.
In addition to the Data Integration asp ects discussed ab ove, Control
Integration is based on designing and developing a mechanism for :
Uniquely registering to ol sessions and corresp onding services,
Representing requests and resp onses,
Transferring ob ject entities to and from the rep ository,
Performing error recovery.
At the Application layer [Stallings91] a message server is used to facilitate
inter-networking. The message server, o ers an environment to manage the
transfer of data via TCP/IP using a higher-level language to represent
source and destination p oints, pro cesses, and data. Essentially, the server
o ers an environment to access lower level UNIX communication primitives
(i.e. so ckets), in order to manage the transfer of data via TCP/IP using a
higher-level language to represent source and destination p oints, pro cesses,
and data.
Each to ol generates a stream of network objects enco ded as a stream of
s-expressions. A parser analyzes the contents of each it network ob ject and
p erforms the appropriate actions (e.g. resp ond to a request, acknowledge
the successful reception of a network object).
Tool integration statistics are discussed and in particular the relationship
between source co de size, total numb er of rep ository generated ob jects, data
retrieval p erformance as well as upload and down-load times.
1. Towards Environment Re-targetable Parser Generators 26
LOC #ofFunctions # of Files #ofObjects
44,754 658 46 3,340
32,807 705 40 1,694
27,393 632 63 1,606
13,615 235 39 1,089
943 38 3 920
TABLE 1.2. Storage Statistics (only File and Function ob ject typ es stored)
Upload Performance 120
100
80
60 Upload Time (sec) 40
20
0 0 2000 4000 6000 8000 10000 12000 14000
Number of Objects
FIGURE 6. Upload AST Performance.
Our exp eriments for time and space statistics related to the integration
architecture involved four software systems. For each system wehave mea-
sured the numb er of ob jects generated for the reduced AST as well as, the
upload and download times from the rep ository to the individual to ols.
The total numb er of ob jects generated for the reduced AST that corre-
sp ond to les, functions and declarations is illustrated in Table.1.2. These
measurements indicate that the approach of storing only the necessary
parts of the AST results in a large p otential for scalability, as ma jor in-
creases in the size of the source co de do not a ect dramatically the total
numb er of ob jects generated.
The upload times are illustrated in Fig.6. These statistics indicate a
relatively linear relation b etween the upload time and the total number of
ob jects to b e loaded in the rep ository. The total numb er of ob jects in this
exp erimentwas obtained byallowing ob jects that represent statements to
b e generated as well.
Similarly, the down-load statistics are shown in Fig.7. These statistics
indicate that down-load time dep ends on the size of the ob jects retrieved
and not on the size of the rep ository. This is an imp ortantobservation as
1. Towards Environment Re-targetable Parser Generators 27
Download Performance 18
16
14
12
10
8 Download Time (sec) 6
4
2
0 0 200 400 600 800 1000 1200
Number of Objects
FIGURE 7. Download AST Performance.
it is directly connected with the scalability of the system.
The server has b een extended to handle more complex messages and
resp ond automatically to events using a rule base and the Event, Condition,
Action paradigm [Mylo96].
Aprototyp e system that allows for the integration of the CASE to ols
Re ne, Rigi, Telos, and data obtained from domain-mo del based Abstract
Syntax Trees has b een implemented at the UniversityofWaterlo o, and the
IBM Canada, Center for Advanced Studies [Mamas00].
This architecture has also b een used to facilitate control and data inte-
gration b etween di erent to ols and pro cesses in a Co op erative Information
System Environment [Mylo96].
8 Summary
This pap er examines the use of domain mo dels for generating AST repre-
sentations of co de fragments in order to facilitate environment re-targetable
representation of the source co de as well as, data and control integration
between di erent CASE to ols.
The system is re-targetable in the sense that:
The ASTs generated by the technique discussed in this pap er are
easily traversed, and can easily b e p orted to another to ol or stored
in an Ob ject Oriented Data Base or relational Data Base.
The level of granularity source co de is represented parsed can b e
easily mo di ed by selectively applying the semantic actions to sp eci c
grammar rules and by ommiting them from the others. A grammar
1. Towards Environment Re-targetable Parser Generators 28
with the semantic actions applied to every rule will create a full AST
for the whole source parsed. However, if the user decides to apply
selectively the semantic actions then a "lightweight" partial AST can
b e created. This typ e of "lightweight" AST is most useful for analysis
related to the recovery of high level architectural views of the co de.
The development and the sp eci cation of the grammar is greatly
simpli ed and the approach can b e used with most parser genera-
tors requiring minimum mo di cations to accommo date the semantic
actions.
Domain mo dels are used to infer the semantic information required by
the parsing pro cess for transforming the parsed input into an annotated
AST, and provide a common schema for di erent to ols to share. Tomake
such a common schema usable by di erent to ols, wehave adopted language-
indep endent and to ol-indep endent representations for ASTs as well as for
the domain mo del itself. Such representations constitute a foundation for
p ortability. Such p ortability is of particular imp ortance as it allows di erent
parsers to exp ort the generated AST to CASE to ols that are very p owerful
on analyzing source co de.
On-going work for this pro ject includes the control and data integration
of di erentreverse engineering to ols by using the common data interchange
formats such as XML, RSF, TA,andASTsaswell as, enhancements for
the AST tree traversal and pattern matching utilities.
9 References
[Arango91] G. Arango and R. Prieto-Diaz. Domain Analysis Concepts and
Research Directions. In Domain Analysis and Software Systems
Modeling, eds. Rub en Prieto-Diaz and Guillermo Arango, IEEE
Computer So ciety Press, 1991.
[Aho86] A. V. Aho, R. Sethi, and J.D. Ullman. Compiler Principles, Tech-
niques, and Tools. Addison-Wesley Publishing Company, Reading,
Massachusetts, 1986.
[Buss94] E. Buss, R. De Mori, W. M. Gentleman, J. Henshaw, H. Johnson,
K. Kontogiannis, E. Merlo, H. A. Muller, J. Mylop oulos, S. Paul,
A. Prakash, M. Stanley, S. R. Tilley,J.Troster and K. Wong. In-
vestigating Reverse Engineering Technologies for the CAS Program
Understanding Pro ject. IBM Systems Journal, vol. 33 no. 3,1994.
[Chiko90] Elliot J. Chikofsky and James H. Cross. Reverse Engineering
and Design Recovery: A Taxonomy. IEEE Software,January 1990.
[Datrix00] Bell Canada Datrix Group. Abstract Semantic Graph: Refer-
ence Manual, Bel l Canada Ltd., Version 1.3 January 2000.
1. Towards Environment Re-targetable Parser Generators 29
[Devambu94] P. Devambu, D. Rosenblum, A. Wolf. \Automated construc-
tion of testing and analysis to ols, In Proceedings of 16th, Int'l Conf.
SoftwareEng.. Los Alamitos, Calif., IEEE CS Press, May1994.
[Devambu92] P. Devambu. Genoa - a language and front-end indep endent
source co de analyzer generator. In Proceedings 14th, Int'l Conf. Soft-
wareEng. Los Alamitos, Calif., IEEE CS Press, May 1992.
[Debaud95] . DeBaud, S. Rugab er. A Software Re-Engineering Metho d
using Domain Mo dels. Technical Report. Col lege of Computing, De-
orgia Institute of Technology.Atlanta, Georgia 30332-0280, 1995.
[Finni97] P. Finnigan et. al. The Software Bo okshelf. IBM Systems Jour-
nal, vol.36, No.4. September 1997.
[Holt97] R. Holt. An Intro duction to TA: the Tuple-Attribute Language.
http://plg.uwaterloo.ca/ holt/cv/papers.html. Dept. of Computer
Science, UniversityofWaterlo oi, March1997.
[JavaCC] Sun Microsystems. JavaCC:
The Parser Generator. http://www.metamamta.com/JavaCC. Sun
Microsystems, May 2000.
[Konto93] K. Kontogiannis. Program Representation and Behavioural
Matching for Lo calizing Similar Co de Fragments. In Proceedings of
CASCON'93.Toronto, Ontario, Canada, Octob er 1993.
[Kotik89] G. Kotik, L. Markosian. Automating Software Analysis and Test-
ing Using a Program Transformation System. Reasoning Systems
Inc..Palo Alto CA, 1989.
[Mamas00] E. Mamas. Design and Development of a Co de Base Manage-
ment System. M.Sc Thesis, Dept. of Electrical & Computer Engi-
neering. UniversityofWaterlo o, Waterlo o, Canada, June 2000.
[Marko94] L. Markosian. Using an Enabling Technology to Reengineer
Legacy Systems. Communications of the ACM,Vol. 37, No.5, may
1994.
[Muller93] H. Muller. Understanding Software Systems Using Reverse En-
gineering Technology Persp ectives from the Rigi Pro ject. In Pro-
ceedings of CASCON'93.Toronto, ON. pp. 217-226, 22-24 Octob er
1993.
[Mylo90] J. Mylop oulos, A. Borgida, M. Jarke, M. Koubarakis, Telos: Rep-
resenting Knowledge Ab out Information Systems. ACM Transac-
tions on Information Systems (TOIS).Vol.8, No. 4, Octob er 1990.
1. Towards Environment Re-targetable Parser Generators 30
[Mylo96] J. Mylop oulos, A. Gal, K. Kontogiannis, M. Stanley. A Generic
Integration Architecture for Co op erative Information Systems. In
Proceedings of Co-operative Information Systems'96. Brussels, Bel-
gium, 1996.
[Par96] T. Parr. Language Translation Using PCCTS & C++ - A Refer-
ence Guide. Automata Publishing Company 1996.
[Purti89] J. Purtilo, J. Callahan. Parse-Tree Annotations. Communications
of the ACM.Vol32, Numb er 12 Decemb er 1989.
[Sta97] S. Stanch eld. A PCCTS Tutorial" at URL.
http://www.scruz.net/ thetick/pcctstut/. August 1997.
[Stallings91] W. Stallings. Data and Computer Communications. MacMil-
lan Inc. Toronto, 1991.
[Reas96a] Reasoning Systems, Inc. Dialect User's Guide. Version 2.0.CA
U.S.A., July 1990.
[Reas96b] Reasoning Systems, Inc. C-Programmer's Guide. Version 1.2.
CA U.S.A. July 1990.
[Till96] S. Tilley,D.Smith.ComingAttractions in Program Understand-
ing. Technical Report CMU/SEI-96-TR-019 ESC-TR-96-019. De-
cemb er 1996.
[VACpp] IBM Corp oration. Visual Age C++.
http://www.ibm.com/software/ad/vacpp. September 1999.