Databases for Software Engineering
Environments
|
The Goal has not yet b een attained
1 1 2
Wolfgang Emmerich , Wilhelm Schafer and Jim Welsh
1
University of Dortmund, Informatik 10
D-44221 Dortmund, Germany
2
University of Queensland, Dept. of Computer Science,
Queensland 4072, Australia
Abstract. We argue that, despite a substantial numb er of prop osed and
existing new database systems, a suitable database system for software
developmentenvironments and esp ecially pro cess-centred environments
do es not yet exist. Wedosoby reviewing and re ning the requirements
for such systems in detail based on a numb er of examples. We then sketch
anumberofavailable and archetypical database systems and indicate
why they do not meet these requirements.
1 Intro duction
Software developmentenvironments SDEs include to ols which supp ort most of
the software life-cycle phases, i.e. construction and analysis of the corresp ond-
ing do cuments and do cumentinterdep endencies. Sophisticated integrated envi-
ronments enable the incremental, intertwined and syntax-directed development
and maintenance of these do cuments such that errors are easily traced back
through di erent do cuments and necessary changes are propagated across do c-
ument b oundaries to correct the errors c.f., for example, [11,9, 4]. Suchenvi-
3
ronments should provide multi-user supp ort , i.e. they should have exible and
adaptable mechanisms often called "design transactions" to control access by
anumb er of users to shared information. The construction of suchenvironments
and corresp onding advanced design transaction mechanisms is an area of active
research c.f., for example, [2, 20, 22]. We call this latter kind of environmenta
pro cess-centred environment PSDE, b ecause the provided multi-user supp ort
is or rather should b e based on a well-de ned development pro cess, i.e. the def-
inition of the users' resp onsibilities, their corresp onding access rights, and the
schedule of the activities to b e carried out by them.
For any kind of environment, a large numb er of ob jects and corresp onding
relations on very di erent levels of granularityhave to b e stored and retrieved,
3
As the p eople normally called software develop ers are the users of an SDE and this
pap er discusses only SDEs, we use the term user instead of develop er hereafter.
and, in case of a pro cess-centred environment, these ob jects must b e manipu-
lated under the control of an advanced transaction mechanism. An underlying
database management system is thusakey comp onent of a PSDE and if not
chosen carefully, could b ecome a ma jor p erformance b ottleneck when the PSDE
is in use.
As early as 1987, Bernstein has argued that dedicated database systems for
software engineering, sp ecialised with resp ect to functionality and implemen-
tation, are necessary [3]. He, and others [22] argued that the functionality and
eciency of existing systems in particular, relational systems do not adequately
supp ort the construction of software engineering to ols and environments.Anum-
b er of systems, some of which di er radically from standard relational technol-
ogy,have since b een describ ed in the literature and some are nowavailable as
commercial pro ducts.
In this pap er we argue that, despite the substantial numb er of these new
database systems, a suitable database system for SDEs, and esp ecially PSDEs,
do es not yet exist. We are aware of the fact that we put very stringent require-
ments on those database systems as we require that the systems 1 allowto
eciently manipulate abstract syntax graph representations of do cuments and
2 provide advanced transaction mechanisms on those graphs to enable sophisti-
cated multi-user supp ort c.f. section 3. Many existing commercial environments
or to ols resp. [13] and even PSDEs [15,22] develop ed as research prototyp es do
not yet have those stringent requirements, b ecause they handle do cuments as
monolithic blo cks. This, however, reduces signi cantly the p ossibilityofcheck-
ing and preserving inter-do cument consistency and thus provides no adequate
supp ort for incremental, intertwined development and maintenance of software
do cuments. Weeven b elieve that partly due to missing appropriate database sys-
tems, the currently available environments lack appropriate functionality with
resp ect to supp ort of evolutionary software development. Thus, this pap er adds
an imp ortant viewp oint to the discussion ab out dedicated software engineering
databases, b ecause the other pap ers known to us have either never expressed
such a stringent requirements list like Bernstein or have never discussed them
in full detail like [16].
This pap er reviews and re nes, in sections 2 and 3, Bernstein's requirements
based on our own exp eriences in building environments and to ols. Section 4 then
brie y reviews a number of available database systems, and sketches that these
requirements are not met by them. It nally,sketches the ongoing work to remedy
the situation.
2 Pro cess centred Software DevelopmentEnvironments
Architecturally, a pro cess centred software developmentenvironment consists of
the following main comp onents:
{ a pro cess engine that co ordinates the work of develop ers involved in a pro ject,
{ a set of integrated, syntax-directed to ols that allow develop ers to conve-
niently manipulate and analyse do cuments and to maintain consistency b e-
tween related do cuments of di erenttyp es, and
{ an underlying database for software engineering DBSE which is capable of
storing pro ject information and do cuments.
2.1 The Pro cess Engine
The pro cess engine executes a formal description of a software pro cess in order
to co ordinate the work of the users involved in a pro ject. In more detail this
covers the following issues.
Multi-User support The pro cess engine determines for each user participating in
a pro ject a p ersonal agenda that indicates on which do cuments he or she may
p erform particular actions. The contents of the agenda dep end on
{ the user's resp onsibilities in the pro ject and
{ the current state of the pro ject.
The invo cation of to ols that enable a user to p erform the actions contained in
an agenda is controlled by the pro cess engine via the agenda. In simple terms,
the agenda acts as a menu from which the user selects his or her next activities.
In most pro jects a numb er of users work in parallel on di erent parts of the
overall pro ject activity. This means that changes to a user's agenda are not only
necessary due to his or her own actions. They are also necessary when some other
user changes the state of a do cument on which the rst user's agenda dep ends.
As each user usually works at a p ersonal workstation, the agendas must b e
presented and up dated in a distributed fashion. Likewise, the to ols that are
called via the agenda have to access do cuments in a distributed fashion.
To ensure orderly use of do cuments in parallel development activities, the pro cess
engine must also b e able to create versions of do cuments, or sets of do cuments,
to retrieve a particular version and to merge version branches of do cuments.
Eciency The state of the pro ject changes, when a new do cumentisintro duced
or an existing do cument is deleted, when a do cument is declared to dep end on
some other do cument, when a do cument b ecomes complete or when it b ecomes
incomplete due to a change in some other do cument it dep ends on. Although
this list is incomplete, it already indicates that changes to pro ject states o ccur
frequently. All of them cause a recomputation of the agenda. As a user cannot
p erform any tasks while his or her agenda is b eing computed, the computation
must b e done eciently.
Persistence and Integrity The pro cess engine may need to b e stopp ed from time
to time. Therefore it must b e able to store the state of the pro ject p ersistently in
order to prepare a restart. Even if it is stopp ed accidentally e.g. by a hardware
or software failure, it must resume with a consistent pro ject state. Moreover,
such a failure must not result in a signi cant loss of pro ject information. Thus
we require that the pro cess engine preserves integrityofany pro ject information,
i.e. it ensures that continuation of any op eration is p ossible after any failure.
Change It has b een widely recognised that software pro cesses can not completely
b e de ned in advance [17,19]. The pro cess b eing executed needs to b e changed
\on the y" from time to time. For example, new users may participate in the
pro ject, resp onsibilities may b e rede ned, new typ es of do cuments may b e in-
tro duced or new to ols for manipulation of these do cuments may need to b e
integrated.
Reasoning capabilities As it might not always b e clear to a user why a particular
do cument is to undergo a particular action, the pro cess engine should have the
capability to explain this to the user. For instance, the pro cess engine should b e
able to answer questions from a user like \What is the state of the sp eci cation of
mo dule m1?", \Who else is involved in the pro ject?" or \What happ ens if I now
co de mo dule m1?". The latter example indicates that the reasoning capabilities
do not only explain how a particular pro ject state was reached, i.e. the past
as in classical exp ert system reasoning but that they also give insights ab out
p ossible future consequences of a particular action.
2.2 Highly integrated syntax-directed to ols
Syntax-direction The to ols contained in a PSDE are used by users to edit, anal-
yse and transform do cuments. To giveasmuch supp ort to users as p ossible, the
to ols should b e directed towards the syntax of the languages in which do cuments
are written. In particular, they should reduce the rate at which errors concerning
the context-free syntax are intro duced to do cuments.
Consistency preservation Besides dealing with errors concerning the context-free
syntax, to ols should also deal with errors concerning b oth the internal static
semantics of do cuments and inter-do cument consistency constraints. The to ols
may allow temp orary inconsistencies to b e created b oth during input and as the
result of edit op erations, since do cument creation in a way whichavoids such
temp orary inconsistencies is impractical in many cases.
To ols should also supp ort users in removing inconsistencies. In particular, follow-
on inconsistencies such as use of a non-existing imp ort, could b e removed on
demand bychange propagation, such as propagating the change that rede nes
the imp ort to all places where it is used.
Finally, to ols must allow the user to de ne additional semantic relationships
during the course of a pro ject. For instance, dep endencies b etween the source-
co de, the test plans and the technical do cumentation on the level of identi er
names and corresp onding section titles in the technical do cumentation may
only b e de ned in this way.
Persistence and Integrity Obviously, do cuments must b e stored p ersistently b e-
cause they must survive editing pro cesses. Moreover, users require to ols to op-
erate as safely as p ossible, i.e. in case of a hardware or software failure they
exp ect that the integrity of do cuments their immediate usabilityby the same
or other to ols will b e preserved and that signi cant user e ort will not b e lost.
Thus, we require the p ersistence to b e achieved as follows: An interactive session
with a to ol of the PSDE consists of a sequence of user-actions likechanging the
typ e-identi er exp orted in a mo dule or adding a parameter to a pro cedure. We
require from to ols that a user-action is p ersistent if and only if it is completed.
Moreover, user-actions must b e designed in a way that the integrity of a do cu-
ment is guaranteed whenever a user-action is completed. In case of a hardware
or software failure we require that the to ols should recover to the last completed
user-action.
Backtracking When the consequences of user actions are p ersistent, users must
have the ability to backtrack or \undo" such actions when mistakes are made.
Thus, in addition to the do cumentversions used for management purp oses in a
multi-user pro ject, users maywant to store intermediate revisions of a do cument
so that they can revert to a previous revision if their subsequent mo di cations
turn out to b e ill-chosen.
Eciency Very fast typists can typ e ab out 300 characters p er minute. In this
case, the time b etween successivekeystrokes is ab out 200 milliseconds. Thus any
resp onse time of an editor b elow 200 milliseconds is non-critical, as users are
never going to recognise them as delays.
Unlike secretaries, users do not typ e continuously. They frequently pause to
think ab out what to do next. These thinking p erio ds break the basic sequence
of user transactions into a higher-level sequence of task-oriented transactions. If
the non-trivial pro cessing that a to ol carries out is aligned with these natural
breaks in user interaction, much higher resp onse times may b e acceptable.
In other circumstances, users may accept much higher resp onse times, if they
o ccur less frequently and can b e justi ed by the complexity of the task con-
cerned [24]. No user, for instance, would reject using a compiler just b ecause it
needs more than a second to compile a source.
3 Requirements for DBSEs
3.1 Persistent Do cument Representation
What is a document in the database? The common internal representation for
syntax-directed to ols suchassyntax-directed editors, analysers, pretty-printers
and compilers is a syntax-tree of some form. In practice, this abstract syntax-tree
representation of do cuments is frequently generalised to an abstract syntax-graph
representation for reasons such as ecient execution of do cuments, consistency
preservation by to ols, and user-de ned relations within do cuments.
PROCEDURE InitWindowManagery:
VAR x:
BEGIN
WHILE
x:=Expression;
END
END InitWindowManager;
Name=’InitWindowManager’ Proc ToIdent Decl Decl Ident Name=’y’ ToPar ToFirst List Param CBV ToName List ToLast Param Ident
ToType Ident Name=’x’ ToFirst ToFirst ToDecl Decl VarDecl ToName List ToLast List ToLast VarDecl Ident
ToType Ident ToFirst ToStat ToCond Stat While Bool ToObject List ToLast Stat Expr
AllInitialiseProcs ContIf True Stat ToBody List ContIn ToFirst Name=’x’ Proc AssStat ToLHS Decl Ident
ToRHS Abstract Syntax Tree Expr ContIfFalse ToNext User−defined Relations ContIn Context−sensitive Relations ToLast Control−flow Relations ContIn AssStat
...
Fig. 1. Mo dula-2 external textual and internal graph representation
As an example of a small excerpt from an internal Mo dula-2 program repre-
sentation c.f. Fig. 1 which will b e used to explain further details in the following.
Consistency checking on a do cument can b e done by attribute evaluations
along parent/child paths in the do cument's attributed abstract syntax tree. The
evaluation paths are computed at generation-time based on attribute dep enden-
cies. This approach, however, has proven to b e quite inecienteven for static
semantic checks of a single do cument, b ecause of the long path-lengths involved.
Techniques based on the intro duction of additional, non-syntactic paths for more
direct attribute propagation have b een develop ed [14,12]. Such non-syntactic
paths are examples of context-sensitive relationships which connect syntacti-
cally disjoint parts of a do cument, and are used in b oth consistency checking
and change propagation when the do cumentischanged c.f. the " " edge
which connects declaration and use of an identi er in Fig. 1.
Most PSDEs o er test and execution of software do cuments byinterpretation of
the do cument syntax-tree. Suchinterpretation can b e achieved purely in terms of
an attributed syntax-tree itself but is very inecient. For executable do cuments,
therefore, the abstract syntax-tree representing the do cument is commonly en-
hanced by additional edges that indicate the control ow in order to enable more
ecientinterpretation c.f. the " " edges exemplifying the control owof
a while statement in Fig. 1.
As noted in section 2.2, the user may also intro duce additional user-de ned
relations b etween do cument parts for purp oses of do cumentation, traceability,
etc.. All such relationships must also b e seen as edges in the abstract syntax
graph that represents the do cument during manipulation, as illustrated by the
" " edge lab elled Al lInitialiseProcs which connects all pro cedures which
implement create/initialise op erations.
How is it stored? Due to the requirements of p ersistence and integrity, a p er-
sistent representation of each do cument under manipulation must b e up dated
as each user-action is nished. Typically a user-action a ects only a very small
p ortion of the do cument concerned, if any. Given that the representation under
manipulation is an abstract syntax-graph, however, the up date can easily b e-
come inecient if, rstly, a complex-transformation b etween the graph and its
p ersistent representation is required and, secondly, the p ersistent representation
is such that large parts of it have to b e rewritten each time, although not b eing
mo di ed. This would for instance b e the case, if we had chosen to store the
graph in a sequential op erating system le which is up dated at the end of each
user-action.
Such ineciency can b e avoided completely if the p ersistent representation takes
the form of an abstract syntax graph itself, with comp onents and up date op er-
ations that are one-to-one with those required by the to ols concerned. To allow
this approach, therefore, the DBSE must supp ort the de nition, access and incre-
mental up date of a graph structure of no des and edges with asso ciated lab elling
information. To preserve the integrity of the abstract syntax-graph, the DBSE
must supp ort atomic transactions, i.e., a transaction mechanism that allows us
to group a sequence of up date-op erations such that they are either p erformed
completely or not p erformed at all. To ensure that a to ol can recover in case of
a failure to the state of the last completed user-action, each completed DBSE
transaction must b e durable.
Inter-document relationships Context-sensitive relations and user-de ned rela-
tions b etween do cument comp onents, as discussed ab ove, contribute to the need
for an abstract syntax-graph representation of a software do cument. In practice,
however, such relations are not con ned to within individual do cuments { they
frequently exist b etween comp onents of distinct do cuments. As an example c.f. Fig. 2. ToSection Value=’WindowManager’ List Section ToFirst ToTitle Documentation List Section Title ToParagraph Value=’The Module ...’ List Paragraph ToFirst ToNext Paragraph ... List ToDoc ToLast ToNext Documentation Subgraph ......
Name=’WindowManager’ Function ToIdent Decl Module Ident Name= ’InitWindowManager’ ToExport Operation ToFirst Proc ToIdent Decl List Decl Ident ToParList Par ToFirst ToImport Ident ToFirst List ... List ... Module Design Subgraph
Name=’WindowManager’ Function ToIdent Decl Module Ident ToImpl
ToImpl Name= ToImpl ’InitWindowManager’ ToProcs Operation ToFirst Proc ToIdent Decl List Decl Ident ToParList Par ToFirst ToImport ... List ...... ToDecl ... ToNext ToStat ... Name=’ComputePosition’ Proc ToIdent Decl Decl Ident ToParList Par ToFirst List ...
ToNext ToDecl ...
Module Implementation Subgraph ToStat ...
Fig. 2. Internal representation of Inter- and Intra-do cument Relations
The gure illustrates the connection of the previously shown source-co de
graph Fig. 1 with the corresp onding parts of a design do cument and the tech-
nical do cumentation resp ectively.Thus a mo dule de nition in the design do cu-
ment has inter-do cument relationships with the implementation of that mo dule
in the corresp onding implementation do cument. Likewise, a paragraph in the
technical do cumentation may b e linked to the corresp onding formal mo dule
de nition in the design do cument, for traceability reasons.
To handle these inter-do cument relationships in a consistentway, the obvious
strategy is to view the set of do cuments making up a pro ject as a single pro ject-
wide graph. This approach, however, immediately reinforces our concern that
the cost of up dating its p ersistent representation should b e indep endent of the
overall size of the graph concerned, and requires that the DBSE must avoid
imp osing limits on the overall size of the graphs it can handle.
We note that this generalisation to a single pro ject-wide graph do es not nec-
essarily undermine the concept of a do cument as a distinguishable representa-
tion comp onent. If we distinguish b etween aggregation edges in the graph which
express syntactic relationships, and reference edges, which arise from control,
context-sensitive or user-de ned relationships, then a do cument of the pro ject is
a subgraph whose no de-set is the closure of no des reachable by aggregation edges
from a do cument no de i.e., a no de not itself reachable in this way, together
4
with all edges internal to the set . The edges not included in this way are then
necessarily the inter-do cument relationships inherent in the pro ject.
3.2 Data De nition Language/Data Manipulation Language
DDL/DML
The kinds of no des and edges required to represent a pro ject, and the attribute
information asso ciated with each, cannot b e determined by the DBSE itself. It
should b e de ned and controlled, however, by the DBSE in order to have dif-
ferent to ols sharing a well-de ned pro ject-graph. The overall structure of the
pro ject's syntax-graph should therefore b e de ned in terms of the data de ni-
tion language of the DBSE and b e established and controlled by the DBSE's
conceptual schema.
As a minimum,we require that the data de nition language can express the
di erentnodetyp es that o ccur within the graph, that it can express which edge-
typ es may start from no de typ es and to whichnodetyp es they may lead, and
that it can express which attributes are attached to no de typ es. Such basic re-
quirements are common to any graph storage.
In practice, the data de nition language should b e tailored to the syntax graphs
that the DBSE is used to store. Structures that o ccur often in syntax-graphs
are lists, sets and dictionaries of no des that contain no des of p ossibly di erent
typ es. The data de nition language should therefore o er means to express these
common aggregations as conveniently as p ossible.
As argued previously,changes to the internal syntax-graph should b ecome
incrementally p ersistent. Therefore, edit op erations p erformed by to ols on do c-
uments have to b e implemented in terms of op erations mo difying the inter-
nal syntax-graph. These op erations should b e established as part of the DBSE
schema for mainly two reasons:
Encapsulation The structure de nition of the pro ject-graph should b e encap-
sulated with op erations which preserve the graph's integrity. They then pro-
vide a well-de ned interface for accessing and mo difying the graph. In order
to enforce usage of this interface, the op erations must b ecome part of the
DBSE schema.
Performance Executing graph accessing and mo difying op erations within the
DBSE is more ecient than executing similar op erations within to ols as the
numb er of no des and edges that need to b e transferred from the DBSE to
to ols via some network communication facility is reduced signi cantly.
4
What we call a subgraph here, is comparable to the notion of a composite entity in
PACT VMCS cf. [23]
To establish graph-mo difying op erations as part of the DBSE schema, the DML
must be powerful enough to express them. This means in particular, that the
DBSE's DML must b e capable of expressing creation and deletion of no des and
edges as well as assignment of attribute values. Moreover, the DML must b e
computationally complete, as alternatives and iterations are needed in graph-
mo difying op erations for navigation purp oses.
In addition, we noted in subsection 2.1 that the user may wish to query the
pro ject state via a reasoning comp onent in the pro cess engine. In practice, such
queries may also arise from within the pro cess engine itself, and from the pro cess
engineer who maintains the pro cess governing any pro ject.
The pro cess engine needs to query the pro ject's current syntax graph in order
to extract information ab out the states of do cuments on which the engine must
base its decisions. An example for such a query could b e: select all program
mo dules from the set of mo dules for which programmer p is resp onsible that
1
are incomplete but whose sp eci cations are complete.
The pro cess engineer maywant to query the internal pro ject graph in order to
determine the pro ject state.
The queries to b e answered by the reasoning comp onent are not known in ad-
vance, they have to b e formulated in a pro cess-related query language and must
b e translated by the reasoning comp onentinto a DBSE query. Likewise the
queries the pro cess engineer will use are not known a priori, i.e. at the time the
PDSE is b eing built, so they cannot b e precompiled. Thus, the DBSE must o er
ad-ho c query facilities to b e used by the pro cess engine, and indirectly by the
user and the pro cess engineer.
3.3 Schema Up dates
In section 2.1 we noted that the pro cess b eing executed mayhavetobechanged
\on the y". The DBSE must therefore enable the de nition of new typ es of
no des and edges that are to b e included in the internal syntax graph. Existing
no des and edges must not b e a ected by this kind of schema up date.
Moreover, it is necessary that new typ es of edges can b e added to existing typ es
of no des to allow the integration of no des of newly de ned typ es into an existing
syntax graph. This implies that existing no des whose typ es are mo di ed by such
aschema up date can migrate from the old typ e to the mo di ed one.
Consider as an example, the intro duction of a new quality assurance pro ce-
dure to b e applied to all future do cuments as well as existing ones. The new
pro cedure may require that the p erson who reviews a do cument writes a review
rep ort that is attached to the do cument. Furthermore his or her name is to b e
recorded as an attribute of the reviewed do cument. In this situation, the schema
of the internal syntax graph needs to b e mo di ed: A new do cumenttyp e review
report must b e included and each reviewable do cument needs to have an addi-
tional no de to record the do cument's reviewer and an additional edge to express its relationship to the review rep ort.
3.4 Revisions and Versions
Given the overall representation of a pro ject de ned in section 3.1, the DBSE
must supp ort the creation and managementofversions of those subgraphs that
representversionable do cument sets. In particular, it must enable its clients to
derive a new version of a given subgraph, to maintain a version history for a sub-
graph, to removea version of a given subgraph, and to select a currentversion
from the version history. In doing so it must resolvebetween alternative dupli-
cation strategies, b oth within versions and for extra-version relations, preferably
in a de nable way.
Within versioned subgraphs the DBSE must resolvebetween fully lazy, fully
eager and hybrid duplication strategies for the no des and aggregation edges of
the subgraph concerned. Fully lazy duplication gives maximum sharing of com-
p onents, and hence minimum storage utilisation, but complicates the up date
pro cess during user edits. Fully eager duplication avoids all such complications,
but implies maximum storage utilisation. To meet the needs of all PSDE func-
tions, a de nable hybrid strategy b etween these twomayhave to b e supp orted
by the DBSE.
Documentation Subgraph
ToSection Value=’WindowManager’ List Section ToFirst ToTitle Documentation List Section Title ToParagraph Value=’The Module ...’ ToNext List Paragraph ToFirst Paragraph ... List ToNext ToDoc ToLast
ToDoc ......
Module Design Subgraph
Name=’WindowManager’ Vers 1.1 Function ToIdent Decl Module Ident ToPredVersion Name= ’InitWindowManager’ ToExport Operation ToFirst Proc ToIdent Decl List Decl Ident ToParList Par ToImport Ident ToFirst List List ... Name=’WindowManager’ Vers 1.0 Function ToIdent Decl Module Ident
ToExport Operation List
ToImport Ident ToFirst version−duplicated List ... version−specific
version−transparent
Fig. 3. Logical View of a Do cument under Version Control
The DBSE must also resolvebetween alternative strategies for handling b oth
intra- and inter-do cument relationships or reference edges. Within a do cument,
reference edges will normally b e treated like aggregation edges as version-
duplicated i.e., a new edge is automatically created for eachversion created.
In particular circumstances, however, such edges may b e seen as version-speci c
and hence not duplicated when new versions are created. Relations b etween
do cuments within a versioned do cument set i.e. within a con guration are
treated similarly, i.e., they may b e either version-duplicated or version-sp eci c.
Relations b etween a versioned do cument and a do cument outside the versioned
set are also sub ject to the same basic choice, but version-duplication in this leads
to a state whichwe call version-transparent since the same relation necessarily
holds b etween all versions within the versioned set and the same comp onentof
the unversioned do cument. Again a de nable hybrid strategy may b e necessary
for e ective PSDE supp ort.
Fig. 3 depicts the logical view of the mo dule design graph b eing under ver-
sion control. An additional version-sp eci c edge ToPredVers is intro duced that
makes the version history explicit by leading for each do cument no de to the do c-
ument no de of the predecessor version. An example of a version-transparent edge
is the ToDoc edge connecting a function mo dule no de with the corresp onding
section title no de in the do cumentation subgraph.
3.5 Access Rights and Adjustable Transaction Mechanisms
Requirements on the DBSE functionality which stem from the multi-user supp ort
of PSDEs, are the de nition of access rights for particular do cuments and parts
thereof and the de nition of a variety of transaction mechanisms to control and
enable parallel access to shared information bymultiple users.
In more detail, to de ne access rights, the DBSE must b e able to identify
users and arbitrary many probably nested user groups. Secondly, the DBSE
must allow to de ne and mo dify the ownership of subgraphs which could even
b e a single no de at any time. Similarly, a subgraph may b e accessed by several
groups. Thus a subgraph needs to maintain its group memb erships. Thirdly, the
DBSE must allow to de ne and mo dify access rights for a subgraph individually
for its owners and groups at any time. Finally, the DBSE must enforce that the
5
de ned access rights are resp ected by all its users .
Conventional transaction mechanisms which control parallel up dates of the
database have b een proven to b e to o restrictive to b e used in PSDEs b ecause
they could result in a rollback which deletes the e ect of a p ossibly long-lasting
human development e ort, or they could blo ck the execution of a certain activity
for days or even weeks. Both situations are untolerable.
Consequently, advanced transaction mechanisms such as split/join transac-
tions [21] or co op erating transactions [18]have b een develop ed. Their common
5
The kind of discretionary access control required here is similar to that provided
by mo dern op erating systems like e.g. UNIX. It di ers in that a subgraph can have
more than one group that may access it. Thus, we can express access rights in a more
exible way without intro ducing arti cial user groups.
characteristics is that they relax one or more prop erties of conventional trans-
action mechanisms which are atomicity, consistency preservation, isolation and
durability.For a detailed overview and critical evaluation of those mechanisms
we refer to [2].
We argue, however, that none of them is p owerful enough to b e incorp orated
as the transaction mechanism into the database of a PSDE. As already argued
in [20], only the pro cess engine which knows the current state of an ongoing
pro ject, can decide whether and when to request a lo ck for a particular subgraph
and how to react, if the lo ck is not granted. It also de nes whether a transaction
is executed in isolation or in a non-serialisable mo de.
The requirements for the transaction mechanisms of a DBSE are that it
o ers the p ossibility to de ne and invoke either a serialisable transaction or a
non-serialisable transaction which only guarantees atomicity and durability.
Atomicity and durability are needed to preserve the integrity of the pro ject
graph against hard/ or software failures as required in subsection 2.1 and 2.2
resp. Serialisability is needed, for example, when pro ject state information and
corresp onding user agendas are up dated c.f. subsection 2.1. This information
must not b e invalidated by parallel up dates as that could hinder the pro cess
engine from continuing its work or let the user p erform unnecessary work. For-
tunately, these up dates are relatively short and do not involveanyhuman inter-
action. Moreover, computation of a users' agenda only incorp orates read access
to pro ject's state information such that it may b e p erformed in an optimistic
mo de.
An example of a situation where transactions are either executed in a serial-
isable or non-serialisable mo de dep ending on the pro ject state is the following.
During the very rst development of a do cument, editing the do cument could
and should b e done in isolation until the do cument has evolved into a certain
mature state, mayb e a state where the do cument is released. During mainte-
nance phase, a released do cument should b e always consistent in itself as well as
with resp ect to other do cuments. Thus a transaction whichdoeschange propa-
gations due to error corrections should b e executed immediately,even if a ected
do cuments are currently accessed by other transactions.
3.6 Distribution
Distribution of the pro ject activities over a numb er of single-user workstations
can b e achieved in twoways. The rst is to allow for a distributed access from
the users' client workstations to a DBSE server. The second waywould b e
to distribute transparently the syntax graph itself over various DBSEs that are
lo cally accessible from user's workstations.
With the rst approach, the server would surely b ecome a p erformance b ot-
tleneck for the whole PSDE. Hence, this approach seems feasible only for small
pro jects say less then 10 users. It is, however, worth consideration, as many
pro jects are either small pro jects or can b e split into fairly indep endent sub- pro jects that are small enough.
With the second approach, the pro cess engine can arrange that those parts
of the internal syntax graph that represent a particular do cument are lo cally
6
accessible from the workstation of the resp onsible user . The to ols that op erate
on the syntax graph, however, should not need to knowanything ab out the phys-
ical distribution of the syntax graph, i.e. the distribution must b e transparent
for them. It should rather b e the resp onsibility of the DBSE to manage physical
distribution.
3.7 DBSE Administration
As the contents of the DBSE might b e the most imp ortant capital of software
houses, issues of data security are very imp ortant. This involves two asp ects:
access to the database and the p ossibility of hardware crash recoveries.
The DBSE must b e able to restrict the access to its ob jects such that non-
authorised p ersons are excluded from any access. This means that the DBSE
must b e able to identify its users. Additionally, the DBSE must enforce authen-
tication e.g. by assigning passwords to DBSE users such that it can assure that
p ersons corresp ond to DBSE users.
For purp oses of user management, the DBSE has to o er means to b e used by
a database administrator DBA in order to enter new users and groups to the
DBSE, to change user informations like passwords, to remove users from the
set of known users and groups from the set of known groups, and to change
memb ership to groups.
Moreover, data stored in the DBSE must b e protected from any hardware
failures such as disk crashes. Therefore the DBSE has to o er means for dumping
the contents of the DBSE to backup media like e.g. tap es. As the size of a pro ject
database may b e to o large to b e completely backed up daily, the DBSE must
allow for incremental backups. The DBA should not havetoshut down the
database in order to p erform these incremental backups.
3.8 Views
As prop osed in subsection 3.1, the pro ject graph may contain a lot of redundant
information { in Fig. 2 the no des Function, Module, DeclIdent, OperationList and
the edges ToIdent and ToExport are duplicated in the design and implementa-
tion subgraphs. Eliminating this duplication by sharing the aggregation subtrees
concerned has the following advantages: 1 the conceptual schema is simpli ed
2 storage of the schema and corresp onding data requires less space, and 3
consistency preservation esp ecially across do cument b oundaries b ecomes much
easier. Such sharing cannot b e contemplated, of course, if automatic consistency
preservation b etween do cuments is inappropriate.
If subtree sharing is to b e used, to ols accessing the pro ject graph need a view
mechanism like that o ered in many relational database systems, to maintain
6
Though the do cument itself is lo cally accessible, there may still b e inter-do cument
relationships in the internal syntax graph, that lead to remote no des. Function ToIdent Decl Name=’WindowManager’ Design View Module Ident Name= ’InitWindowManager’ ToExport Operation ToFirst Proc ToIdent Decl List Decl Ident Name=’y’ Par ToFirst ToName ToImport ToParList CBV Ident ... List Par ToType Ident
Function ToIdent Decl Name=’WindowManager’ Implementation View Module Ident Name= ’InitWindowManager’ ToProcs Operation ToFirst Proc ToIdent Decl List Decl Ident Name=’y’ Par ToFirst ToImport ToParList CBV ToName Ident ... List Par ToNext ToDecl ... ToType ToStat ... Ident Name=’ComputePosition’ ToLast Proc ToIdent Decl Decl Ident Name=’Pos’ ToParList Par ToFirst CBR ToName List Par Ident Name=’Position’ ToDecl ... ToType ToStat ... Ident
Function ToIdent Decl Name=’WindowManager’ Conceptual Graph Module Ident Name= ToFirst ’InitWindowManager’ ToExportOps Operation Proc ToIdent Decl List Decl Ident Name=’y’ ToParList Par ToFirst CBV ToName List Par Ident ToDecl ... ToType ToStat ... Ident Name=’ComputePosition’ ToHiddenOps Operation ToFirst Proc ToIdent Decl List Decl Ident Name=’Pos’ ToParList Par ToFirst CBR ToName List Par Ident ToImport ToDecl Name=’Position’ ...... ToType
ToStat ... Ident
Fig. 4. Reducing Redundant Information with Views
appropriate separation of to ol concerns and so allow separate to ol development
and maintenance.
Fig. 4 sketches how the redundancy which exists in Fig. 2 is reduced in a new
schema. The schema will b e accessed by the design editor and the implementa-
tion editor through two views. The gure depicts the conceptual pro ject-graph
together with the design and implementation view of this graph.
The design view of a function mo dule no de is declared to hide the ToHiddenOps
edge de ned in the conceptual schema as these op erations shall not b e seen at
the mo dule interface level. The design view also renames the ToExportedOps
edge of a conceptual function mo dule no de to ToExport. Finally, the design view
hides the ToDecl and ToStat edges de ned for pro cedure declaration no des in
the conceptual schema from each pro cedure declaration no de in the design view.
Consequently, all no de typ es reachable only from these edges in the conceptual
schema are also hidden in the design view.
The implementation view of a function mo dule declares the ToProcs edge to
lead to a no de of a virtual no de typ e which is constructed by concatenation of
the lists accessed via the ToExportedOps and ToHiddenOps edges of a function
mo dule in the conceptual schema. This allows us to have exp orted and hidden
op erations merged in one op eration list of the implementation graph.
These to ol-oriented views must b e regarded as virtual structures since they
are not actually stored in the DBSE, but up dates on views by to ols must prop-
agate automatically to the underlying pro ject graph.
Intro duction of a view mechanism, however, has non-trivial consequences for
the access control mechanisms required since users see this access control as
applying to do cument views and for the versioning p olicy which the PDSE
designer or the users may adopt. Detailed resolution of these issues is largely a
matter of PSDE design and is thus b eyond the scop e of this pap er. We note,
however, that provision of such a view mechanism is clearly a requirement for
e ective PDSE design and implementation.
4 State-of-the-Art in DBMS Technology and Further
Work
Relational DBMSs are inappropriate for storing pro ject graphs, since 1 the
data mo del can not express syntax graphs appropriately, 2 RDBMSs do not
supp ort versioning of do cument subgraphs and 3 RDBMSs can not b e used to
implement customised transaction schemes.
No structurally ob ject-oriented DBMS meets all of our requirements. Those that
are capable of eciently managing pro ject graphs, lack functionality w.r.t. views,
versioning, access rights and adjustable transaction mechanisms, and distribu-
tion like GRAS [16]. Others that o er these functionalities are unable to manage
the large collections of small ob jects as they o ccur in pro ject graphs and are
therefore inappropriate like e.g. PCTE [10] or Damokles [6]. Moreover, none of
these systems enables encapsulation of no des with op erations, whichwe regard
to b e crucial.
For this reason we fo cus on o oDBMSs like GemStone [5] or O [1] in the future.
2
As general databases, they have not b een designed to manipulate a particu-
lar granularity of ob jects. They provide p owerful schema de nition languages
which particularly enable encapsulation, inheritance and p olymorphism. Some
have b een proven to p erform fast enough even while accessing do cument sub-
graphs from a remote host [7]. A more detailed investigation and reasoning ab out
the non-appropriateness of existing RDBMSs and structurally ob ject-oriented
DBMSs is done in [8].
Currently,we are p orting the Merlin PSDE to an o oDBMS and enhancing
Merlin with syntax-directed to ols which store their do cument subgraphs in this
o oDBMS. Merlin is a research pro ject c.f. [20, 19] at the University of Dort-
mund carried out in co op eration with STZ, a lo cal software house. One result
of Merlin is the prototyp e of a PSDE based on a rule based description of the
software pro cess
The requirements discussed in this pap er provide the foundation for the ESPRIT-
I I I pro ject Go o dStep General Ob ject-Oriented Database for SofTware Engi-
neering Pro cesses whose goal is to extend the o oDBMS O to make it partic-
2
ularly suitable as a DBSE. The pro ject will improveversion management, add
view de nition capabilities and break up the transaction management to enable
implementation of customised transaction schemes with the O system.
2
Acknowledgements
We are indebted to the memb ers of the Merlin pro ject, namely G. Junkerman,
C. Luc king, O. Neumann, B. Peuschel and S. Wolf, for a lot of fruitful discussions
ab out PSDE architectures. We thank the participants of the Go o dStep pro ject
for stimulating discussions, in particular, Prof. C. Ghezzi, Prof. R. Zicari, Dr.
S. Abiteb oul, Dr. P. Armenise, Dr. A. Co en. Last, but not least, we enjoyed
working with Prof. U. Kelter, Dr. S. Dewal, F. Buddrus, D. Dong, H. Hormann,
M. Kampmann, D. Platz, M. Roschewski and L. Schop e on the exp erimental
evaluation of existing database systems.
The jointwork describ ed here was done at University of Dortmund while J. Welsh
was on study leave from University of Queensland. The authors are grateful to
the German Ministry of Research and the Australian Department of Industry
Trade and Commerce for enabling this co op eration.
References
1. F. Bancilhon, C. Delob el, and P. Kanellakis. Building an Object-Oriented Database
System: the Story of O . Morgan Kaufmann, 1992.
2
2. N. S. Barghouti and G. E. Kaiser. Concurrency Control in Advanced Database
Application s. ACM Computing Surveys, 233:269{317, 1991.
3. P. A. Bernstein. Database System Supp ort for Software Engineering. In Proc. of
th
the 9 Int. Conf. on Software Engineering, Monterey, Cal., pages 166{178, 1987.
4. P. Borras, D. Cl ement, T. Desp eyroux, J. Incerpi, G. Kahn, B. Lang, and
V. Pascual. CENTAUR: the system. ACM SIGSOFT Software Engineering Notes,
135:14{24, 1988. Pro c. of the ACM SIGSOFT/SIGPLAN Software Engineering
Symp osium on Practical Software DevelopmentEnvironments, Boston, Mass.
5. R. Bretl, D. Maier, A. Otis, J. Penney,B.Schuchardt, J. Stein, E. H. Williams,
and M. Williams. The GemStone data management system. In W. Kim and F. H.
Lo chovsky, editors, Object-Oriented Concepts, Databases and Applications, pages
283{308. Addison-Wesley, 1989.
6. K. R. Dittrich, W. Gotthard, and P.C.Lockemann. Damokles { a database system
for software engineering environments. In R. Conradi, T. M. Didriksen, and D. H.
Wanvik, editors, Proc. of an Int. Workshop on AdvancedProgramming Environ-
ments,volume 244 of Lecture Notes in Computer Science, pages 353{371. Springer,
1986.
7. W. Emmerich and M. Kampmann. The Merlin OMS Benchmark { De nition,
Implementations and Results. Technical Rep ort 65, University of Dortmund, Dept.
of Computer Science, Chair for Software Technology, 1992.
8. W. Emmerich, W. Schafer, and J. Welsh. Databases for Software Engineering En-
vironments | The Goal has not yet b een attained. Technical Rep ort 66, University
of Dortmund, Dept. of Computer Science, Chair for Software Technology, 1992.
9. G. Engels, C. Lewerentz, M. Nagl, W. Schafer, and A. Schurr. Building Integrated
Software DevelopmentEnvironments | Part 1: To ol Sp eci cation. ACM Trans-
actions on Software Engineering and Methodology, 12:135{167, 1992.
10. F. Gallo, R. Minot, and I. Thomas. The ob ject management system of PCTE as a
software engineering database management system. ACM SIGPLAN NOTICES,
221:12{15, 1987.
11. A. N. Hab ermann and D. Notkin. Gandalf: Software Development Environments.
IEEE Transactions on Software Engineering, 1212:1117{1127, 1986.
12. R. Ho over. Incremental graph evaluation. PhD thesis, Cornell University, Dept. of
Computer Science, Ithaca, NY, 1987. Technical Rep ort No. 87-836.
st
13. P. Hruschka. ProMo d { in the age 5. In Proc. of the 1 European Software
Engineering Conference, Strasb ourg, Sept. 1987.
14. G. F. Johnson and C. N. Fisher. Non-syntactic attribute ow in language based
th
editors. In Proc. of the 9 Annual ACM Symposium on Principles of Programming
Languages, pages 185{195. ACM Press, 1982.
15. G. E. Kaiser, P.H.Feiler, and S. S. Pop ovich. Intelligent assistance for software
development and maintenance. IEEE Software, pages 40{49, May 1988.
16. C. Lewerentz and A. Schurr. GRAS, a management system for graph-likedoc-
rd
uments. In Proc. of the 3 Int. Conf. on Data and Know ledge Bases. Morgan
Kaufmann, 1988.
17. N. H. Madhavji. EnvironmentEvolution: The Prism Mo del of Changes. IEEE
Transactions on Software Engineering, 185:380{392, 1992.
18. M. H. No dine, A. H. Skarra, and S. B. Zdonik. Synchronization and Recovery in
Co op erativeTransactions. In Implementing Persistent Object Bases { Principles
th
and Practice{ Proc. of the 4 Int. Workshop on Persistent Object Systems , pages
329{342, 1991.
19. B. Peuschel and W. Schafer. Concepts and Implementation of a Rule-based Pro-
th
cess Engine. In Proc. of the 14 Int. Conf. on Software Engineering, Melbourne,
Australia, pages 262{279. IEEE Computer So ciety Press, 1992.
20. B. Peuschel, W. Schafer, and S. Wolf. A Knowledge-based Software Development
Environment Supp orting Co op erativeWork. International Journal for Software
Engineering and Know ledge Engineering, 21:79{106, 1992.
21. C. Pu, G. Kaiser, and N. Hutchinson. Split transactions for op en-ended activites.
th
In Proc. of the 14 Int. Conf. on Very Large Databases, pages 26{37. Morgan
Kaufman, 1989.
22. R. N. Taylor, R. W. Selby,M.Young, F. C. Belz, L. A. Clarce, J. C. Wileden,
L. Osterweil, and A. L. Wolf. Foundations of the Arcadia Environment Architec-
ture. ACM SIGSOFT Software Engineering Notes, 135:1{13, 1988. Pro c. of the
th
4 ACM SIGSOFT Symp osium on Software DevelopmentEnvironments, Irvine,
Cal.
th
23. Ian Thomas. Tool Integration in the Pact Environment. In Proc. of the 11 Int.
Conf. on Software Engineering, Pittsburg, Penn., pages 13{22. IEEE Computer
So ciety Press, 1989.
24. J. Welsh, B. Bro om, and D. Kiong. A Design Rational for a Language-based Edi-
tor. Software| Practice and Experience, 219:923{948, 1991.
a
This article was pro cessed using the L T X macro package with LLNCS style E