A CASE STUDY EXAMINATION
DATA MODELLING
IN PRACTICE
Paul Groves
A report submitted in partial fulfilment of the requirements of the degree of Master
of Commerce (Honours) to the University of New South Wales
1988 CERTIFICATION
"I hereby declare that this submission is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of a University or any other institute of higher learning, except where due acknowledgement is made in the text." ABSTRACT
Data modelling for analysis and data base design is increasingly being viewed as a critical phase in the systems development process. This report is a comparative analysis of data modelling theory and practice. It investigates the nature of data and examines several data modelling methododologies.
Current international standards for the conceptual data model are reviewed and based on this a reference framework is defined. This framework is used to compare four contemporary data modelling theories. Field testing of three of the methods is conducted, two case studies from a commercial environment and one from an academic setting. The case studies are conducted on a descriptive research basis.
Results from the case studies confirm that data modelling represents a technique of growing impor tance in the systems development process. Increasing resources applied to the practice of relational database should ensure ensure ongoing theoretical interest and development. Although in the for mative stages of implementation and use, binary data modelling was seen to have achieved notable sucess in enhancing communication between project participants and in increasing user participation.
As a consequence it was anticipated that system quality would improve. Limitations on the practical application of binary modelling were noted based on case study results. Several (future) empirical studies are detailed in which the quantitative and qualitative impacts of binary data modelling usage might be evaluated.
1 CONTENTS
Chapter 1 INTRODUCTION ...... 1-1
Chapter 2 REFERENCE FRAMEWORK ...... 2-1
Chapter 3 DATA AND THE NATURE OF REALITY...... 3-1
Chapter 4 DATA MODELS AND DESIGN ...... 4-1 4.1 Conventional Data Models ...... 4-2 4.2 Semantic Modelling v Semantic Data Models ...... 4-4
Chapter 5 DATABASE ARCHITECTURE...... 5-1 5.1 Conceptual Schema - Defined...... 5-1 5.1.1 Conceptual schema and the Information System ...... 5-3 5.1.2 Content of the conceptual schema ...... 5-4 5.1.3 Functions of the Conceptual Schema...... 5-5
Chapter 6 DATA MODELLING ...... 6-1
Chapter 7 INFORMATION SYSTEMS LIFECYCLf ...... 7-1
Chapter 8 DATA MODELLING METHODS: FEATURE ANALYSIS...... 8-1 8.1 Entity Relationship Modelling ...... 8-2 8.1.1 Concepts ...... 8-2 8.2 Fact Based Data Analysis and Design ...... 8-5 8.2.1 Design Process...... 8-5 8.3 Nijssen' s Information Analysis ...... 8-8 8.3.1 Concepts ...... 8-9 8.3.2 NIAM Development Lifecycle ...... 8-10 8.3.3 Information Base: NIAM Sentence Model ...... 8-11 8.3.4 Semantics ...... 8-12 8.4 Active and Passive Component Modelling (ACM/PCM) ...... 8-13 8.4.1 Abstraction Modelling ...... 8-13 8.4.2 Structural Modelling ...... 8-14 8.4.3 Behavioural Modelling ...... 8-16 8.4.4 ACM/PCM Design Modelling ...... 8-17
iii Chapter 9 DATA MODELLING METHODS: COMPARATIVE REVIEW ...... 9-1 9.1 Lifecycle Support ...... 9-2 9 .1.1 Representation and Communicability ...... 9-4 9 .1.2 Abstraction Support ...... 9-7 9.1.3 Documentation Support...... 9-10 9.1.4 User Orientation ...... 9-12 9.1.5 Semantic Expressiveness ...... 9-14 9.1.6 Quality Control ...... 9-16 9.1.7 Comparative Review - Summary ...... 9-18
Chapter 10 UNIVERSITY OF NEW SOUTH WALES ...... 10-1 10.1 Objectives ...... 10-1 10.2 Research Method ...... 10-1 10.3 Environment ...... 10-2 10.4 Database Systems Development ...... 10-4 10.4.1 Database Systems - 1984 ...... 10-5 10.4.2 Database Systems - 1985 ...... 10-7 10.4.3 Database Systems - 1986 ...... 10-10 10.5 Interview Plan ...... 10-12 10.5.1 Lecturers ...... 10-12 10.5.2 Tutors ...... 10-15 10.5.3 Students ...... 10-16 10.6 Conclusion ...... 10-17
Chapter 11 AUSTRALIAN MUTUAL PROVIDENT...... 11-1 11.1 Objectives ...... 11-1 11.2 Research Method ...... 11-2 11.3 Environment ...... 11-3 11.3.1 Hardware...... 11-3 11.3.2 Software History ...... 11-4 11.3.3 Software Current ...... 11-5 11.4 Data Modelling ...... 11-6 11.5 Systems Lifecycle ...... 11-8 11.6 Data Modelling Experiences ...... 11-9 11.6.1 User experiences ...... 11-11 11.7 Conclusion ...... 11-12
iv Chapter 12 DIGITAL EQUIPMENT CORPORATION ...... 12-1 12.1 Introduction...... 12-1 12.2 Corporate Environment ...... 12-2 12.3 Local Environment ...... 12-3 12.4 Methodolgy review ...... 12-4 12.5 Systems Analysis...... 12-5 12.6 Modelling and Partitioning ...... 12-6 12.6.1 Conceptual Modelling ...... 12-8 12.6.2 Functional modelling ...... 12-9 12.6.3 Physical modelling ...... 12-10 12.7 An inventory application ...... 12-10 12.8 Modelling experiences ...... 12-12 12.9 Conclusion ...... 12-15
Chapter 13 SUMMARY ...... 13-1. . . 13.1 Case study conclusions ...... 13-2 13.2 Research Limitations ...... 13-4 13.3 Future Research...... 13-5
Appendix A SUBJECT DESCRIPTIONS ...... A-1.
FIGURES 1 ACM/PCM Design Phases ...... 8-17 2 Candidate keys ...... 10-9 3 Pseudo record merges ...... 10-9 4 DMR Systems Lifecycle ...... 12-7
V CHAPTER 1
INTRODUCTION
"Designing database, one of the major activities of the system development process, is a difficult,
complex, and time consuming task. Inadequate designs have presented many problems. The failure to specify clearly the organisational goals and requirements has resulted in databases of limited scope and usefulness, which are unable to adapt to change. In many cases, these problem-ridden databases
have prevented database management systems from becoming an effective data processing tool.~
[Kahn 85)
The development of database management systems in the late sixties for mainframe machines her
alded a new era of data processing. Organisations were given the opportunity to have centralised
control of operational data. This meant the capability of sharing data instead of dedicating files to
specific applications. Standards and security could be enforced across all users of the database. In
tegrity and redundancy would be controlled with significant implications for data consistency. A major
benefit would be the provision of data independence, the ability to insulate applications from changes
in storage structure and access strategy.
The concept of database management was embraced enthusiastically with considerable development
resouces devoted to the design of data models and database languages. Among the many data models
developed, Hierarchic, Network and Relational designs were the most prominent. Implementation
machines ranged from mainframes in the late sixties and seventies to the microcomputers of the early
eighties.
Technology dominated early database implementations. Database design was usually conducted as a
,. single phase activity, with emphasis on physical details (data structure types, access paths, indexes
etc.) rather than as a two phased activity comprising logical and physical design. A consequence of this
Introduction 1-1 was that the structural (1) characteristics of the database management system, were more influential
during design than the structural characteristics of the data. The physical model thereby pre- empted
logical model design. This resulted in applications (and databases) with greatly reduced flexibility
(adaptivity) on account of considerably more complex behavioural properties. (2)
Lack of formal procedures led to the design exercise being perceived as something of an art form
rather than a science, which relied on the intuition and experience of the analyst.
As the applications developed in a database environment became more sophisticated, an increasingly
heavy burden of responsibility was placed on the design role of the analyst. The pressure was eased with the development of normalisation theory [Codd 72). This provided a theory basis to guide file
and database design and allowed a formal approach to be developed. However, as indicated from the
introductory quote, database design continues to be a problem. Database implementation experiences
suggest that inadequate design is preventing the theoretical advantages of the database concept being
realised.
"Errors made during the design process affect the application's entire life and any decisions that are
made at different levels based on the data. This process therefore is of great importance for the
enterprise and it is necessary to pay much attention to it. This fact explains why so much work is
under development in this area." [Agosti 84]
Normalisation and relational theory aided the search for a more rigorous and formalised approach to
the database design task. A result was the development of data modelling. The term is defined here
as the process of abstraction and documentation of data characteristics [Davis 85].
This paper begins with a discussion of the need for a framework of reference in which data modelling
methodologies and methods can be compared (3). The characteristics of data, and of data models are
explored. The data modelling process is decomposed and its components considered in detail.
1-2 Introduction On this base, a number of data models are reviewed highlighting the essential characteristics. Conven tional and semantic data models are examined and the meanings of these terms discussed. A database architecture conforming to the International Standards Organisation (ISO) model is presented. This leads to a review of four data modelling methods. Again the purpose is to identify the major fea tures of each method. Active and Passive Component Modelling (ACM/PCM) and Entity-Relationship modelling (ER) are considered in overview as examples of semantic data modelling methodologies.
Nijssens Information Analysis Method (NIAM) and Fact Based Data Analysis and Design (KENT) are analysed as examples of binary data modelling methodologies.
In section five a comparative feature analysis is conducted of these four methods utilising the frame work of review outlined in the second section. This is followed by three case studies of data modelling
in practice.
From the previous discussion two major functions of this paper can be identified. One, is to conduct a
review of selected data modelling methods from a theoretical aspect. This aims to present a relatively
balanced perspective of the major features supported. Two, is to present a survey of data modelling
in practice by conducting a case study examination of two commercial organisations and a university.
It is emphasised that no attempt is being made at a qualitative assessment of the chosen methods. To
do this would require a considerably deeper analysis of each method than will be attempted in this
paper. Instead, it is hoped that by concentrating on the major features some direction can be given
towards future research into specific features of the methods.
Footnotes:
(1) Structure, as used in this paper, is a term which describes the means of representing data and data
relationships, that is, the static properties of data. For example, in the relational model, attributes
and tuples embodied in relations are the means by which the structural properties of data can be
represented. Behaviour, as used in this paper, is a term which describes the means of representing
the rules, governing the changes to data and data relationships, that is, the dynamic properties of the
Introduction 1-3 data. For example, in the relational model, the domain, primary key and foreign key concepts could
be used to specify behavioural properties of data (insert, update and deletion rules). An instance of a
behavioural rule might then be that tuples with non-unique primary keys are not allowed (by definition
of a primary key).
(2) The data may not possess a logical structure equivalent to the physical data model being employed.
Consider for example, the restrictions of an IMS hierarchical model. No child record type may be
owned by more than one parent record type. To model a 'treatment' record that is owned by a
'doctor' and also owned by a 'patient' requires two hierarchical data structures which are linked by a
logical pointer. This could be represented more naturally in a network type data model. Given the
constraint of an IMS environment the specification of the behavioural properties of the application
become considerably more complex.
(3) The following definitions of method and methodology in a data modelling and database context will be used in this paper. A data modelling methodology is an integrated collection of methods and
techniques, which supports the complete database design process. A technique or a method in this
design context will be defined as a systemmatic way of performing a specific activity or subset of the
design. A technique or method does not fulfill the requirements of integration and completeness that
are required of a methodology.
1-4 Introduction CHAPTER 2
REFERENCE FRAMEWORK
A number of data modelling methodologies and techniques have been developed and proposed in the literature. These methodologies and techniques are not directly comparable for a number of reasons. Firstly they cover different aspects of the data modelling process and place different emphasis on its components. Secondly, definitions and language vary considerably between models, making discussion and comparisons of concepts difficult. Both these problems are typical of an emergent discipline. It is argued [Bubenko 83 p248] that the field of information systems study is far from mature.
'Until a research framework, or paradigm, can be established, that is accepted by the majority of re
searchers within the field, there is little prospect of advancement of the discipline or field of research.'
The research problem inevitably impacts the practice of data management and data modelling. Propo
nents of alternative methods have no common language in which to discuss the relative strengths and
weaknesses of the methods in an objective manner. As a consequence of this, the choice of method
for an information system development is based on subjective criteria. This usually means the design
experience and previous method exposure of staff are paramount. To develop an objective basis of
evaluation the paradigm conflict must be addressed. This requires a systematic procedure for review.
The first step is agreement on terminology and discussion of data characteristics. Following this, a
framework will be presented which places data modelling in the broader context of the information
systems lifecycle. A database architecture and the role of the conceptual schema will be presented.
Wherever possible, this paper will adopt the language and architecture of American National Stan
dards Institute (ANSI) and the International Standards Organisation (ISO) committees.
Reference Framework 2-1 CHAPTER 3
DATA AND THE NATURE OF REALITY
"A message to mapmakers: highways are not painted red, rivers don't have county lines running down the middle, and you can't see contour lines on a mountain." [Kent 78]
Data is described [Davis 85 p96] as consisting of symbols which represent, describe or record reality.
But like the contour lines on a map, data symbols are clearly not reality and can never provide a complete representation of the objects and events which comprise it. For instance, my christian name is Paul. In certain circumstances it can be used to identify me (that is when the name is unique in a group of people). But I am not the same as the name. Whilst it has utility, the name is not reality. The
distinction between the object (a person) and the symbol (an instance of a name) is important for this
reason. How can reality be modelled? The simple answer is that it cannot be modelled in an objective
manner. Decisions about what to extract from reality and which symbols will be used to represent
it must reflect on the needs and views of the users which interact with that segment of reality. Any
structure which is developed to model reality is simply another map. It may be useful to someone,
but remains nevertheless an aproximation of the underlying terrain.
This is not the whole picture, unfortunately it gets worse. There are many views of reality and there
are many realities. People, buildings, grass and trees are part of the physical reality in which we
participate. For information systems modelling this may not be the reality of interest. The reality may
have no physical existence. It may be historical information, not part of the now reality, or, did it ever
exist? A falsified reality? It may relate to a future reality, about intended states of affairs, or it may be
a conjectured reality.
This philosophical approach can be pursued to great lengths. The purpose of introducing it here is to
demonstrate that there are no 'hard' definitions of data from which a strict mathematical formalism can
Data and the Nature of Reality 3-1 be developed to guide the modelling process. What then is meant, when a data model is described as a representation of reality?
It would seem that a data model, like reality, is an elusive concept. Kent concludes that there is no
'best' model. Only the interaction of data and usage determine the meaning of data and the efficiency of processing.
Despite these problems (and because of its importance) the section following continues with an attempt to 'define' the concepts which· are used repeatedly in the literature and in the reviews of this paper.
From what precedes the reader should be aware that it is a difficult and imprecise exercise.
3-2 Data and the Nature of Reality CHAPTER 4
DATA MODELS AND DESIGN
A data model is an abstract representation of data, a way of representing data and its inter-relationships at a logical and/or physical level. Graphs, tables and mathematical formulas may all be used for rep resentation purposes. At a logical level it should support the definition of the conceptual schema, and external schemas. At a physical level it should support definition of the storage structures which allow for the update and retrieval of data instances, i.e. definition of the internal schema. In performing these functions a data model acts as a tool for conducting data modelling. The data models naturally vary in the extent to which they support the data modelling process. It is important that the model be distinguished from the functions it is performing and from the 'reality' it is modelling. This section considers a classification framework and the major classes of data models.
Codd (1981) defines a data model according to three sets of characteristics:
1. Data structure types supported by the model. Examples include relations, trees and networks.
2. Operations or inferencing rules which can be applied to occurrences of the data structure types
which the model supports. An example of these rules for the relational data model is embodied
in relational algebra. This specifies operations such as join, select and project which manipulate
the data structures, relations, in pre-defined ways.
3. Integrity constraints and rules which have to be respected in the representation of the data to keep
the database in a situation of integrity and consistency. These may be expressed as insert-update
delete rules. For example, in the relational model a deletion operation on a 'parent' record (tuple)
might require that all 'children' records (tuples) are also deleted. This is referred to as a cascade
deletion [Date 86 p254].
Data Models and Design 4-1 These three characteristics have largely been used to distinguish the physical, or conventional data
models. However, data models can also be categorised into processable and non-processable or se
mantic data models. The processable models were the first to be used and were machine orientated,
stressing the structural form of the data. Efficient storage and manipulation were the primary moti vations. Hierarchical, network and relational data models are the best known in this category. These
models are utilised in the implementation design phase. Based on structural forms, ISO categorise
design in this area as data modelling.
Non-processable or semantic data models are logical models which stress the importance of modelling
the meaning of data. That is, the data model should not be restricted to modelling the structure only
of data. Structural relationships should be explicit and the behavioural properties of the data clearly
specified. Semantic data models provide support for the conceptual modelling stage (defined in
section 5.1). An objective of a sucessful semantic model should be to balance the essential requirement
of completeness with simplicity. Semantic models are seen as independent of the processable data
models. Care is taken to avoid the implication that these are in any way a substitute for the processable
models. Structural data forms cannot be neglected and data modelling is essential to support database
implementation. However in the absence of a formal model stating the semantic rules and constraints,
data modelling will be sub- optimal. Semantic modelling is described as information modelling by ISO.
Conventional data models and semantic data models are discussed in detail in the following sections.
4.1 Conventional Data Models
Conventional database models, namely the relational, hierarchical and network models provide fa
cilities for describing the logical structure of a database using trees, tables, nodes and sets. A data
manipulation language is provided for these constructs through general purpose access and update
operators. Typically the user level view of the data is provided by record structures. Using a data
definition language, a schema is specified which utilises the database model constructs. This ex
presses the structural definitions of the data. Behavioural properties of the data are supported by the
provision of a data manipulation language. A problem with the conventional models is the lack of
4-2 Data Models and Design semantic expressiveness. Being record orientated imposes limitations on the data structures used to model an application. Inevitably there will be loss of information when the application does not fit the structure of the chosen database model. For example, an application with a natural hierarchic structure would be the subject, school, faculty organisation of a university. If the relational model was used for implementation then the associations between these objects (entities) could only be specified implicitly (through matching on common domains) and not explicitly as with a network or hierarchic data model (pointers). In addition, semantic integrity constraints must be defined and enforced exter nally. However, when these constraints are embedded in application programs, data independence is compromised. In this situation the data model will only be able to specify a subset of the designers knowledge of the application.
A significant problem with the early database models (which motivated the development of the re lational model) was the inability to distinguish the logical model from the physical model. This is evident in the correspondence between physical access paths and logical inter-record links as used in the network data model. The result is a data manipulation language which is 'navigational' in the
sense that users must traverse the structure of a database rather than specify the properties of the
data of interest. The problems of rigidity and inflexibility prevent data being easily arranged so as to
provide multiple user views of data.
The relational model aims to provide data independence through the presentation of data in a form
(tables) which is independent of the physical storage structures. For example, the mainframe IBM
relational package 0B2, provides the user view with tables. At the physical level however, data is
stored in binary tree (VSAM style) data structures.
The symmetric structure of the relational model (i.e. the ability to formulate all queries in a consistent
manner) also favours data manipulation languages which are non-procedural, allowing set processing
without requiring loop and navigational coding. Relational data manipulation languages are gener
ally derived from a relational algebra or calculus and are designed to allow highly flexible database
interaction.
Data Models and Design 4-3 In the relational model relationships among data items are formed dynamically at access time, based on the values of the data items. Whilst this allows considerable flexibility, in the absence of semantic detail, (i.e. support for the domain concept) it is possible that spurious relationships between data items will be formed (an example of this follows).
4.2 Semantic Modelling v Semantic Data Models
Semantic modelling is a term used to describe the activity of representing meaning in data [Date 86 p609]. This seems a worthwhile pursuit. The more meaning that can be incorporated into the data, the better will be the model of reality. Current database systems have only a limited understanding of what the data means. As an example, take the data items, age and weight. Both are numeric, and for an individual, a value of 60 for each is feasible. But the data items are very different, semantically different. The first field may have a unit of measurement specified in years, the second a unit of mea surement in kilograms. Conventional data models and database management systems can represent these items but are not able to represent the meaning. A relational join on records containing these fields should be rejected outright by the system but it is unlikely that present day database systems would do this. The interpretation of what the.se types of relationship represent, is left to the database user (who may or may not appreciate the semantics).
A semantic data model attempts to logically structure the data in a database in a manner that captures
more of the meaning of the data than conventional database models [King 85, p115]. This is achieved
through the provision of an extended set of modelling constructs which allows the structural and
behavioural properties of the data to be defined. When data is organised with a semantic data model
the designer can structure a database by expressing application knowledge in a more natural, formal
and explicit manner.
Many of the concepts in this area are derived from research in the area of knowledge representation
undertaken by artificial intelligence researchers. In the language of this discipline, a knowledge base can
be considered as consisting of a network of objects (nodes) connected by relations (directed edges).
[King 85 p127] refers to these networks as semantic networks.
4-4 Data Models and Design In (Brodie 83 p579] a semantic data model is defined in the following terms:
"A semantic data model is a collection of mathematically defined concepts with which to identify static and dynamic properties of real or imagined objects and to specify them using structural or behavioural means. n
In this context, structure should be interpreted as •states and static properties (entities and their relation ships)", whilst behaviour should be interpreted as "state transitions and dynamic properties (operations and their relationships)."
The distinction between semantic data models and conventional data models is made on the basis of their relative ability to represent both the structural and behavioural properties of objects. Typically,
conventional data models have provided only primitive operations for modelling behaviour.
From the preceding descriptions of semantic data models it should be apparent that some practical
difficulies remain to distinguish them from the conventional data models. It may well be possible to
describe a data model as a semantic model if extensive support is provided for behavioural modelling
and to describe a data model as conventional if it provides for structural modelling only. In between
these extremes however, classification will be difficult. It is emphasised therefore that the concept
of a semantic data model is a relative one. As was argued in chapter 3, 'Data and the Nature of
Reality', a data model is an imperfect representation of reality. Capturing the meaning of data and
representing it in a formal model is a formidable task which, in a general sense, can probably never
be considered complete. As a result, data models continue to be developed which provide more
extensive constructs for the expression of the structural and behavioural properties of data than has
previously been possible. ACM/PCM, discussed later in the paper, is an example of a relatively recent
data model which employs extensive semantic modelling concepts.
On the other hand, conventional data models, for example, the relational model, should not be seen
as devoid of semantic concepts. In particular, the primary and foreign key aspects of that model are
more than syntactic constructs [paraphrasing Date 86 p609). Consequently, the term semantic data
Data Models and Design 4-5 model is one which should be used with caution. A more fitting description may be an extended data model, that is, a data model which employs semantic modelling concepts.
In [Date 86) the overall approach to semantic modelling is as follows:
1. Identify a set of semantic concepts which are useful for informally discussing the real world. Such
concepts include, entities, properties, associations and subtypes. It may be agreed that the real
world consists of entities that possess properties and are connected in associations.
2. Devise a set of corresponding symbolic (formal) objects to represent the semantic concepts.
The extended relational model, RM/T [Codd 79) for example, introduces E-relations to represent
entities and P-relations to represent properties. These are special forms of an n-ary relation.
3. Devise a set of integrity rules to be used with these symbolic objects. RM/T provides a property
integrity rule which requires every entry in a P-relation to have a corresponding entry in an E
relation (i.e. every property must be a property of some entity).
4. Operators must be developed for manipulating the symbolic objects. RM/T provides the PROP
ERTY operator which can be used to join together an E-relation with all the corresponding P
relations so as to collect together all properties of a given entity.
Paragraph two deals with the structure of a database whilst paragraph four deals with the behavioural
properties. Together, paragraphs two through four constitute an 'extended' data model. The model
is necessarily a general approach to the topic for reasons previously discussed. The example used,
RM/T, is an extension the relational model which represents a large spectrum of semantic constructs
(structural) directly, in relational form.
In the comparitive review, the three sets of characteristics outlined in [Codd 81) (at the beginning
of this chapter) will be used in conjunction with those of Date (above), to classify the underlying
data models of each method and to consider semantic expressiveness. The ANSI/SPARC database
architecture is presented in the next chapter. This is followed by a detailed discussion of the role
of the conceptual schema and the relationship between semantic data modelling and the conceptual
schema.
4-6 Data Models and Design CHAPTER 5
DATABASE ARCHITECTURE
A schema is an abstract data model which represents a subset of the description of an information system. Different schemas are prepared corresponding to different levels of abstraction during the design process. In this paper a three level database architecture, internal, conceptual and external,
corresponding to the ANSI/SPARC DBMS model is used for evaluating data modelling methods and methodologies. This architecture is shown graphically in appendix A.
The conceptual level can be taken as a representation of the entire information content of the database
in a form which is somewhat abstracted from the way the data is physically stored. IThis paper
emphasizes the conceptual level.
The external level providing a user-orientated representation of information, is visible at the informa
tion system/environment interface. Such useuer viewcan be derived from the conceptual level.
The internal schema can be produced by mapping the conceptual schema to a virtual physical envi
ronment. That is, the internal schema is one step removed from the physical level and does not deal
with device specific considerations but does specify representations, sequencing and access paths. It
specifies a user transparent representation of information within a physical implementation.
5.1 Conceptual Schema - Defined
The conceptual ·c,iew as defined by ANSI/SPARC concentrates on the meaning of the information. In
defining the role of the conceptual schema the International Standards Organisation (ISO) [Griethuysen
85) made the following comments:
"It is the classifications, rules, etc., that are of primary interest to a systems designer designing a
database system. In analysing the universe of discourse, it is these things he will want to identify,
Database Architecture 5-1 discuss with users and describe. In recording them he will actually create a "skeleton" description of the universe of discourse, the conceptual schema. In this way the conceptual schema describes which entities can possibly exist in the universe of discourse, that is, which entities exist, have existed, or might ever exist. In the same sense it describes what facts and happenings are possible for those entities or, if relevant, are required for them. We assume it will be held in a formal representation within the_ data base system."
This description tells us that a conceptual model is more than an abstracted data model based only on the stn,ctural characteristics of data. It must capture the semantics of the data such that it may be used as a communications vehicle by designers and users to discuss properties of the universe of discourse.
The universe of discourse is defined as the set of information that an information system may receive, derive, store, or distribute during its lifetime. It includes therefore not only base information given to the system but also information derived or implied by others [Ramon 83). An example of a universe of discourse would be the enrolment of university students in degree courses, perhaps limited in scope by the interest of the university. Within this universe are data objects both abstract and real which represent the 'properties' of the universe of discourse. Specifically, students, subjects, gmdes and enrolment dates for example.
Continuing with the ISO definition, emphasis needs to be placed on the phrase 'formal presentation'.
Whilst the concept of a conceptual model as a component of the ANSI/SPARC architecture was defined in a final report in 1978 it has been supported primarily as a structural data model with little capability of expressing semantics. Database management systems have limited understanding of what the data means. Currently the general rules and procedures mentioned in the above quote are described only in application programs. They are often described as 'validation rules' [Griethuysen 85 p3-2].
A consequence of this, is that each application altering the contents of the database requires a copy of these 'rules'. The potential for redundancy, and hence inconsistency among 'copies' is high. The problem is difficult enough in a tightly controlled data processing department. Real threats to integrity
5-2 Database Architecture exist when so called fourth generation enquiry and update language tools are made available to end users. The rationale for centralised standards enforced by a formal conceptual schema should be clear. To implement this, a conceptual schema language must be designed to support procedural and declarative semantic statements.
5.1.1 Conceptual schema and the Information System
An Information System is defined by ISO as consisting of the conceptual schema, an information base and an information processor. The processor acts on a stimulus from the environment to produce change in the otherwise static conceptual schema and information base. An information system consequently, is a formal system, "being fully predictable and unable to deviate from the rules or constraints defined by the conceptual schema and information base." [Griethuysen 85 pl-6)
The information base is distinguished from the conceptual schema and is defined as:
"The description of the specific objects (entities) that in a specific instant, or period of time, are perceived to exist in the universe of discourse and their actual states of affairs that are of interest."
[Griethuysen 85 pl-4)
Both the information base and the conceptual schema are perceived as part of the conceptual level in the ISO report [Griethuysen 85, pl-3). Furthermore, the structural characteristics of the information base should be derivable from the conceptual schema, whilst the behaviour of the information base should conform to the behavioural properties of the information system as defined in the concep tual schema. However, from the above quote, it is evident that the information base contains data instances. Therefore, it would seem that only a subset of the information base lies at the conceptual level. That is, the machine representation of the conceptual schema as embodied in the information base. Further discussion of this follows.
A diagram and description of the mapping between the conceptual schema, the information base and the universe of discourse is shown in appendix B.
Database Architecture 5-3 In (Shoval 84] the conceptual schema is interpreted as having dual functions. The first, is to define the universe of discourse in an implementation independent 'enterprise model'. As used here an enterprise model is more than a strategic data model which documents major entities and their re lationships. It is a complete logical model abstracted only from the physical implementation. The second function, is to control the descriptions in the information base in terms of computer orien tated data structures. This implies that the conceptual schema will exist in two forms. The first form is typically represented by diagrams and restricted natural language reflecting an orientation towards analyst/user communications. It may be supported by a conceptual language which can express struc tural and behavioural properties in a formal manner. NIAM, through information flow diagrams, information structure diagrams and the conceptual grammar (declarative and procedural statements) is an example of this.
The second form of the conceptual schema is a machine representation. This should be derivable from the conceptual schema of the first form whether it be by manual or automated means. Not all of the conceptual schema (first form) may be representable in a database management system lan guage. Some (particularly behavioural properties) will be supported through external procedural code.
Currently, there are no known commercial implementations of database management systems which provide anything like full support for the concept of a conceptual schema as described here. IBM's,
DB2 relational database catalog, performs some functions of a conceptual schema at the information base level.
For the remainder of this paper a reference to the conceptual schema will mean a reference to both levels unless otherwise indicated.
5.1.2 Content of the conceptual schema
As an abstract model of an information system, the conceptual schema describes structural and be havioural aspects of data. It enforces preservation of meaning in the transformation between various data representations and defines their interpretations. But it does not provide guidelines for estab lishing the boundaries or scope of the information system analysis task. As a consequence, definition
5-4 Database Architecture of the scope must be based on the judgement of the systems designer. Given that the scope of the information system can be tightly defined, the following principles (paraphrased from [Griethuysen
85, pl-8)) should be observed regarding content:
• 100% principle
This requires that all relevant structural and behavioural aspects (rules and laws) of the information
system be described in the schema.
"The information system cannot be held responsible for not meeting those described elsewhere,
including in particular, those described in application programs." [Griethuysen 85, pl-9]
This follows from the previous discussion on integrity threats when a formal conceptual schema
does not model the semantics of the data, but requires application programs, or the information
system users, to interpret the meaning of the data.
• Conceptualisation principle
Only the relevant aspects of the information system should be included, thus excluding external
and internal details of data representation i.e., excluding physical organisation and access strategy
in addition to user views of data. This principle supports data independence (physical and logical)
by isolating the user external views from the internal representations. It also supports the concept
of abstraction, a vital tool in the management of complexity. By focussing only on the conceptually
relevant details, conceptual schema design is relieved of the burden of implementation details.
Similarly at the external and internal levels the design processes will be simplified. The logic
supporting this will be recognised as similar to that which justified the development of the seven
layered communications model for Open Systems Interconnection.
5.1.3 Functions of the Conceptual Schema
A major role of the conceptual schema is to provide agreement on the representation of the universe of discourse. This allows it to be used as a focal point for human commmunications. It also allows different users of the common information system to take consistent internal and external views of the data to suit their varying requirements.
Database Architecture 5-5 The fundamental roles of the conceptual schema as defined by ISO include:
1. To provide a common basis for understanding the general behaviour of the universe of discourse;
2. To define the allowed evolution and manipulation of the information about the universe of dis-
course;
3. To provide a basis for interpretation of external and internal syntactical forms which represent
the information about the universe of discourse;
4. To provide a basis of mappings between and among external and internal schemata.
These roles for the conceptual schema correspond to the properties of semantic data models. A conceptual schema, such as has been defined, thereby represents an example of a semantic data model. Furthermore a semantic data modelling methodology will provide the constructs from which the conceptual schema can be specified.
5-6 Database Architecture CHAPTER 6
DATA MODELLING
The data modelling process is concerned with the construction of a database as a component of an information system. It is a process that transforms and organises unstructured information and processing requirements concerning an application, through different intermediate representations, to a complex representation which defines schemas and functional specifications [Agosti 84 p5]. Various documents which record the intermediate representations and the semantics of the representations are produced during the process.
The data modelling process is usually divided into components which produce the intermediate rep resentations. Appendix C shows a scheme of these components. Typically the process will include the following [Agosti 84 p7] :
1. Information requirements design. This is an interface between the analysis and design processes
and represents the mapping of analysis into design. [Davis 85 p473] discusses a contingency
approach to determine information requirements at the organisational, data base or application
level. Many design methodologies consider the requirements specification as a pre- requisite.
2. Conceptual design. This leads to the construction of the conceptual schema which is not con
strained by the information structure requirements of a specific data base management system.
The conceptual schema integrates the user (application) views into an overall conceptual view
that resolves 'view conflicts'. 'Metadata', meanings ascribed by the designer to data kept in the
database, is collected and may be managed by a data dictionary system.
3. Implementation design. This involves mapping the conceptual model to the structure of the
selected database management system be it relational, network, hierarchical or some alternative
data model. Transaction analysis is performed to establish efficient access path strategies. This
step is also referred to as internal schema design. It is still one step removed from the physical
Data Modelling 6-1 level and assumes a virtual hardware environment. Typically this phase will be conducted under
supervision of a systems analyst or database administrator.
4. Physical design. The mapping of the internal schema to the physical storage structure is com
pleted. The physical space for records and indexes is defined along with page and block sizes.
This step is naturally, highly system specific as it deals with performance tuning and optimisa
tion. The database managment systems software may not support this phase directly. Typically
a database administrator and/or systems programmer would be responsible for this phase.
The distinction between these components is made necessary by the existence of the ANSI/SPARC multi-level database architecture. If.this architecture is adopted, then it follows that the data modelling process should support each of its elements. This has resulted in the phased approach presented above.
There is no implication that these phases must be approached sequentially. Iteration and abstraction are implicitly supported. For example, at the conceptual design phase, a macro conceptual model which documents the major entities and their relationships, may be prepared as a prelude to the preparation of a detailed conceptual model, or to the preparation of detailed requirements specifica tion. The use of iteration and abstraction corresponds with the observed practice of data modelling in many organisations.
6-2 Data Modelling CHAPTER 7
INFORMATION SYSTEMS LIFECYCLE
An important concept, in placing data modelling in the context of information system development is the systems lifecycle. This defines a model of the activities comprising an information systems development and evolution. A representative model detailing the typical phases of such a lifecycle is taken from [Wasserman 83). For the purposes of this analysis six broad phases are distinguished:
Analysis of the system to establish a requirements specification. A description of the activities,
data, information flow, relationships and problem constraints. ii Functional specification to detail the processes to be performed by the system. External software
design. iii Design of the internal structure of the software to provide the functions previously specified
resulting in a description of the system structure, the architecture of the system components, the
algorithms to be used, and the logical data structures. iv Systems test and implementation. v Validation of the development process to ensure that it is of acceptable quality and that it is an
accurate transformation from the previous phase. v1 Evolution and ongoing maintenance as a result of new requirements and/or the discovery of errors
in the current version of the system.
The inclusion of Phase V, Validation does not imply that it is a single phase but rather that it is performed continuously during the development lifecycle.
An information systems design methodology ideally should support all phases of Wassermans model.
Data modelling as a component of information systems design is usually associated with Phase III although there is often an overlap with parts of Phases I and II.
Information Systems Lifecycle 7-1 The phases of this lifecycle model will be used to classify data modelling methods and methodologies in the following chapter.
7-2 Information Systems Lifecycle CHAPTER 8
DATA MODELLING METHODS: FEATURE ANALYSIS
A literature review in the area of data modelling and information systems design methodologies reveals a considerable number of alternatives available to the systems analyst and designer. These alternatives vary in their comprehensiveness and in the phases of the system lifecycle and database design task which are supported. In this chapter four data modelling methods were reviewed. These are :
1. Entity Relationship (ER)
2. Nijssens Information Analysis (NIAM)
3. Fact Based Data Analysis and Design (KENT)
4. Active and Passive Component Modelling (ACM/PCM)
ER and NIAM were included in this review because they have a significant development history and user base vis-a-vis data modelling. ER modelling was one of the first proposals in the area of semantic modelling and has had a substantial influence on the developments in this area. It has a large installed user base. NIAM, reflecting its academic origins, claims a strong theoretical basis, emphasizes binary data modelling and is achieving recognition as a superior analysis and modelling method. Both techniques are featured in case study presentations. KENT was chosen because of the experience gained in its use as a teaching tool at the University of New South Wales and because it is representative of a pure binary modelling approach. A case study presentation has been included.
ACM/PCM as the most comprehensive method and with the strongest theoretical development was chosen because of the emphasis it places on semantic modelling, in particular, the behavioural aspects of an information system. As a relative newcomer no commercial implementation could be found, however the theory serves as a useful indicator of the direction in which data modelling practice may be heading.
Data Modelling Methods: Feature Analysis 8-1 The reviews which follow draw heavily on the material contained in the major reference paper for
each method. By necessity most of the concepts represent a summary of the respective authors own
material. Insights, in the form of a comparative review are contained in chapter 9.
8.1 Entity Relationship Modelling
The concept of ER modelling originated in the artificial intelligence, knowledge base research of the
seventies. The aim had been to develop a data model which could express data semantics with the
most influential work in this respect being that of [Chen 76). ER modelling subsequently developed
as a high level modelling tool orientated towards the definition of structural data characteristics. It
utilises binary modelling concepts representing data structures through entity-relationship diagrams
and relations. An enterprise schema is produced by following a top down development strategy.
Logical design is the major function however the analysis phase is not explicitly supported and only
general guidelines are presented on the classification of entities, attributes and relationships. The role
of the conceptual schema is not emphasised.
8.1.1 Concepts
The entity-relationship model [Chen 76) adopts the view that the real world consists of identifiable
entities and relationships. Chen describes an entity as a 'thing' which can be distinctly identified giving
the examples of a person, company or event. A relationship is then defined as an association among
entities, for example a marriage is a relationship between two 'person' entities. The question inevitably
arises as to how a relationship can be distinguished from an entity or an attribute. Chen notes that
the distinction is in the view taken by the designer.
"We think that this is a decision which has to be made by the enterprise administrator. He should
define what are entities and what are relationships so that the distinction is suitable for his environ
ment."
An entity set is the entity classification used in a particular environment. For instance, Employer or
Department are entity sets. These need not be mutually disjoint.
8-2 Data Modelling Methods: Feature Analysis An attribute is defined as a function which maps from an entity set into a value set, or a product of value sets. Associations between entities, or relationships are defined by Chen as a mathematical relation.
There are four basic steps in designing a database using the entity-relationship model :
1. identify the entity sets and relationship sets of interest that are significant for the view of the
enterprise
2. identify semantic information in the relationship sets to determine the order of relationships ie
1:1, 1:N
3. elicit attributes that establish values for the entity
4. organise data into entity/relationship relations and decide primary keys
ER adopts a three level framework corresponding to logical views of data. At level 1 is the information concerning entities and relationships. These are taken as given in the model and no reference is made to analysis support. The information about entities is distinguished from the information about relationships so as to prepare a conceptual information structure. At level 2, the information structure is presented by considering the representations of conceptual objects. Entity/Relationship relations are produced in diagrammatic and tabular form. Attributes of entities and attributes of relationships are mapped to value sets and from this mapping primary keys can be determined. An entity key is a group of attributes in which the mapping from an entity set to a value set or group of value sets is one to one but new attributes may need to be introduced to make this mapping possible. In some cases a relationship may be required to uniquely identify an entity set. Chen uses the example of employee dependents. Dependents may be identified by their names and by the employee (entity) primary key.
This is an example of a weak-entity relation. Entities not requiring a relationship for identification are regular-entity relations.
Semantics are included in the entity-relationship model through the use of entity-relationship dia grams. These diagrams are produced through the addition of a relationship symbol to a network
Data Modelling Methods: Feature Analysis 8-3 data model. They represent entities and relationships as symbols with the mappings between them
categorised as 1:1, 1:N or N:M. In this way the relationships and the roles they play are made explicit.
With the ability to represent entity and relationship relations in a tabular format it is possible to define the semantics of information retrieval requests and updating transactions.
Chen compares entity-relationship relations to the concept of relations as used in the relational data model. The latter concept of a relation is 'any grouping of domains'. To produce third normal form relations it is usually necessary to use a transformation/decomposition process. Arbitrarily grouped relations with the addition of semantic information concerning functional dependencies can be trans
formed into third normal form relations. It is claimed that entity-relationship relations do not require
this process. Using a top-down strategy, semantic information is applied to organise data directly into
third normal form.
Chen argues that based on the definition of an attribute used in the model, entity-relationship relations will be produced in a third normal form. It appears that a major problem with this could be the
assumption that the definition of an attribute as used by the method, will correspond with the natural
view of an attribute held by the analyst/designer. Through observation of the method in practice, it
seems that for non-trivial designs it is unlikely that entities, relationships and attributes will always be
chosen in such a manner as to conform to Chen' s definitions. As the method provides no guidelines
for the selection and even the classification of data into entities and relationships it seems unreasonable
to expect that relations will be constructed directly in third normal form.
In the final section of his paper Chen shows how an entity- relationship model can be used as a basis
for the unification of different data views. A method of establishing a network model and entity set
view of data is presented.
8-4 Data Modelling Methods: Feature Analysis 8.2 Fact Based Data Analysis and Design
KENT is a binary modelling method for data analysis and design which utilises a simplified form of the
entity-relationship model for fact (relationship) specification. Analysis and logical record design are
the major functions however in some areas the method goes beyond what might be considered the bounds of logical design. The method is structured on attribute synthesis (bottom-up) design with the
output taking the form of normalised relations. There is limited semantic expressiveness (structural
only) and no support for a conceptual schema (as defined in this paper). Representation (of data) is
treated extensively.
8.2.1 Design Process
Data analysis and design under the KENT method is split into seven phases [Kent 84). An outline of
these phases and the major tasks within them is as follows.
1. Specify the facts to be maintained, in term of relationships among entities.
2. Generate a pseudo record for each fact.
3. Identify pseudo keys for each pseudo record.
4. Merge pseudo records having compatible keys.
5. Assign representations.
6. Consider alternative designs.
7. Name the fields and record types.
Fact specification is the starting point of the method however it is not supported by a procedure for
identification of entities or relationships. A 'fact' is defined in the method as being 'something which
connects things together'. This is in contrast to many data modelling methods which suggest that
there are two kinds of facts. Those about things. that is attributes, and those that connect things, that
is relationships. With only the one 'fact' concept, the classification into one category or the other,
depending on the view adopted, is not required. This does not imply that the facts defined in phase
1 will not later be seen as the basis of attributes and relationships but that they do not need to be
classified as such by the analyst.
Data Modelling Methods: Feature Analysis 8-5 Whilst described as a binary relation modelling method not all facts will involve pairs of things. Some facts wiJJ involve three things and are labelled ternary. In general, any number of things may be involved. N-ary relations are the generic term. In order to qualify as an n-ary fact it must already be irreducibly decomposed. In other words, the same information must not be derivable from a combination or join of any set of binary (or n-ary) facts whilst maintaining database integrity. [Consider the ternary relation CONCERT consisting of the fields Performer, Location, Date. No binary subset(s) of this relation can be formed which will consistently provide the same information when joined].
If we recognise that a fact may participate as a 'thing' in some other fact then it is possible to represent all facts in a binary construct. A hierarchic structure of facts can be used. The example taken from
(Kent 78 p149) is based on (P)arts, (W)arehouses and (S)uppliers in which a given part may be ordered from a number of suppliers for a number of warehouses. This could be represented as a ternary relationship. Considering first the binary relationship between parts and warehouses (PW) labelled
'allocations' a binary relationship can then be defined between allocations and suppliers S(PW). This identifies which supplier is responsible for which allocations (supplying parts to a warehouse). The
ternary relationship has thus been rendered into a high level binary form. The problem remains
that this is only one of three possible combinations which could have been chosen. Alternatives are
P(SW) and W(SP). For a relationship of degree 4 there are 15 possible permutations. How should n-ary
relations be decomposed? By implementation considerations? Clearly, an arbitrary choice or even
a decision based on the current implementation requirements should not be made when modelling
at the conceptual level. This suggests that the use of irreducible n-ary relations (as utilised in the
relational model) is a more natural and simpler means of representing relationships.
For the analyst, normalised records are a valuable design objective. Such records have minimal
redundancy and exposure to update anomalies despite possible trade offs in retrieval efficiency. As
opposed to a normalisation design process in which records are decomposed, the method follows a
'synthetic' approach in which records are constructed in normal form directly. Intuitively, he argues,
that until the phase 'consideration of alternative designs' the records should be in fifth normal form.
8-6 Data Modelling Methods: Feature Analysis This is because pseudo records (irreducible n-aries) are initially in fifth normal form and that merging on keys (for binary records) preserves normal form.
This argument is appealing, however it holds only when the facts have been specified independent of each other and are fully decomposed. If some facts can be derived then redundancy will exist and if not fully decomposed then they will not be in fifth normal form. The method provides no formal strategy for ensuring these pre-conditions.
The determination of pseudo keys requires that the nature of the relationship be classified as one-to one, one-to-many or many-to- many. Kent uses the concept of participation to express the relationship between binary pairs. Least participation (LP) may take the values of O or 1 and maximum participation
(MP) the values 1 or N. The combination of LP' s and MP' s which are possible are :
0 * 1 - at most one
1 * 1 - exactly one
0 * N - some or many
1 * N - at least one
A candidate key can only be selected from a field with a maximum participation of 1. A minimum
participation of O for a key field will imply that nulls for non-key fields are accepted. When both
fields have a maximum participation of N, a compound key involving the whole of the pseudo record
is required. With n-ary relations some decision needs to be made about combinations of roles. As
previously discussed, it is possible to represent all relations in a binary form but depending on the
degree, a large number of permutations may result. For participation purposes a single permutation
must be used. There is no real basis on which the selection can be made other than the analyst
considering implementation requirements. This is undesirable at the conceptual modelling level but
unfortunately the problem is not addressed in the method.
At this point there exits a record for each fact. The natural thing to do is to merge records wherever
possible. The objective of merging is to collect all single-valued facts about the same thing into one
Data Modelling Methods: Feature Analysis 8-7 record. A separate record is provided for each many-to-many relationship which includes all single valued facts about the relationship. Merging can occur where entities and keys are compatible. A simple merge is conducted when entity types are the same and both contain full populations of the entity type (ie. LP = 1). Merging is possible for unequal populations (LP = 0) if the fields of the resulting pseudo record can be padded with nulls, but to make such a decision may well be beyond the limits of logical design.
Representation of entities can be deferred to late in the design process. By first designing for the facts to be maintained and later assigning representation some complexity can be avoided. Repre sentation is required because symbols are needed as surrogates to express real world facts in data.
Unfortunately entities rarely have a simple symbol which is unique. Symbol types are considered as
character strings occuring in a data field. To distinguish one symbol type from another may require a description of its properties, for example, length and numerical base as well as an identifier. The
method devotes considerable time to the discussion of representation techniques including structured
symbols eg. dates, derived representations and compressed representations (coding), and the quality
of a representation.
Consideration of alternatives may involve changes to the fact specifications or to the assumptions
which dictated the participations of pseudo records. Alternatives in merging records may be consid
ered. The existence of this phase emphasises the incremental and recursive orientation of the method.
There are no systemmatic guidelines for this phase.
8.3 Nijssen's Information Analysis
NIAM [Verheijen 82) is a binary based, top-down method for data analysis and database design. It was
developed in the early seventies at a time when physical database design was the primary concern of
database design methodologies. As experience with database grew it became apparent that a means
of specifying the information content of a system was required which would be independent of its
implementation characteristics. The concept of a conceptual schema developed from this. NIAM was
designed as a method to define a conceptual schema. This is labelled infor111atio11 analysis by NIAM,
8-8 Data Modelling Methods: Feature Analysis a term equivalent to the ISO information modelling concept. Whilst information analysis is the clear strength of NIAM, the method has been gradually expanded to include business and process analysis and to provide automated support for documentation and implementation. A graphical notation is used to enhance communications between analysts and users which represents both the structural and behavioural properties of an information system.
8.3.1 Concepts
The two major concepts of NIAM are the Conceptual Schema and Information Flow Diagrams. The
NIAM conceptual schema is based on five principles, four of which correspond to the ISO report on
Conceptual Schemas (1985).
1. The first of these says that "all traffic between a user and an information system consists of deep
structure natural language sentences."
The 'deep structure' is exhibited by the ability to transform the sentences into a variety of other
representations. The representations may be graphical, tabular or in the form of predicate calcu
lus. Elements of a natural language sentence can be classified into lexical and non-lexical objects,
sub-types and idea and bridge types (a discussion of these terms follows).
2. The second principle holds that "there is one grammar, called conceptual schema, which com
pletely and exclusively prescribes all the permitted transitions of the database."
The conceptual schema consists of the above mentioned sentence types and a set of constraints.
These can be fully expressed in a formal conceptual manipulation language. The language is set
orientated allowing relational style query support.
3. The third principle states that there is "an internal schema, which prescribes how all the permitted
states of the conceptual data base are to be transformed into a machine data base, sometimes
called physical data base."
4. The fourth principle says "there are external schemas which describe views of the data base as
can be seen by particular users or groups of users."
These views are not restricted to subsets of the conceptual schema (natural language sentences)
but might be COBOL records, CODAYSL sets and records or relational tables.
Data Modelling Methods: Feature Analysis 8-9 5. The fifth principle, called Meta, means that the three schemas of NIAM can be considered as a
data base. This allows the data dictionary and the data base management system to be treated
as an integrated package.
An information flow is considered to be a stream of messages, which represents a communication between two partners. It therefore has an origin and a destination.
An information system may be conceived as a function which transforms information flows. Accord ingly a function has the capability to transform an information flow such that the incoming flow is different from the outgoing flow.
The transformation of information flows at a system level is often complex. To manage this complexity decomposition is applied to produce a number of sub-functions. Decomposition is applied until functions result for which the transformation can be described in full and for which the information flows can be detailed. At this stage it is appropriate to express each level of decomposition in a graphical format.
The resulting diagrams are called Information Flow Diagrams. Information Flow Diagrams reveal the flows of information between functions without showing physical or control details. They consist of four primitives each with a defined graphical symbol. A function, represented by a square; an information flow, represented by a line with arrow; an information base, represented by an online file flowchart symbol; and the environment represented by an oval.
From the information flow diagrams the analyst/user is in a position to define the structure of the information flows.
8.3.2 NIAM Development Lifecycle
The first phase of the development lifecycle supported by NIAM is business analysis. This involves analysis of the object system to establish a model. If it is shown that an information system would improve object system performance then the next phase is information analysis.
8-10 Data Modelling Methods: Feature Analysis The first step of information analysis involves making an inventory of all functions that the information system is expected to support. These functions are then decomposed through the use of Information
Flow Diagrams (IFD's) to a level at which the individual flows and the transformations performed by the functions is clear. Each of the elementary information flows gives rise to a Information Struc ture Diagram (ISO). Constraints and functions are formally described and documentation support is provided by an information dictionary. The output of this phase is the conceptual grammar.
Implementation can be supported through the combination of an information dictionary and software generator. The conceptual schema can be transformed into a database schema and data manipulation programs can be generated from the conceptual manipulation language.
8.3.3 Information Base : NIAM Sentence Model
Information flows can be described in NIAM through the use of natural language (deep structure) sentences. Analysis of the structure allows identification of two classes of objects, lexical objects
(LOTS) and non-lexical objects (NOLOTS). A lexical object can be considered a naming convention as in surname, or for a non-lexical object as in person. Hence non-lexical objects might be considered as entities with lexical objects their representations.
Associations can also be identified. NIAM decomposes sentence types into binary associations. These may be bridge type or idea type, associations. An instance of a bridge type association might be : the employee has employee# 2341
Hence a bridge type is an association between a non-lexical object (employee) and a lexical object
(employee number). This corresponds to the familiar concept of an entity and an attribute.
An instance of an idea type might be : the employee works for the department
Hence an idea type is an association between two non-lexical objects. This corresponds to the notion of a relationship.
Data Modelling Methods: Feature Analysis 8-11 The concept of idea and bridge types and lexical and non-lexical objects allows a distinction to be made between things and their names. When a natural language sentence is decomposed into binary ideas and bridges the information content of the sentence is conveyed by the ideas. Bridges enable the exchange of information through representation of non-lexical objects but they do not convey information themselves.
8.3.4 Semantics
To complete the conceptual model it is necessary to specify the rules which describe the behaviour of the object system. That is the semantics of the information must be expressed. NIAM uses the concept of constraints and subtypes for this.
A constraint is part of the conceptual grammar, the purpose of which is to prevent discrepancies developing between the content of the information base and the phenomena of the object system.
Many of the constraints can be expressed graphically as part of Information Structure Diagrams. Where constraints cannot be expressed in this manner they can be expressed procedurally in the conceptual grammar. The major types of constraint include:
• identifier - these are used to define the populations of binary idea or bridge types. Populations
may be 1:1, 1:N, N:1 or N:M.
• subset - this is used to express a relationship between an idea or bridge type and another idea
or bridge type based on similar object types such that the population of one is a subset of the
population of the other.
• equality - this expresses that the population of an idea or bridge type is equal to that of another
idea or bridge type for the same objects. A simple example of this is the equality between the
ideatypes, 'start date' and 'end date', for the non- lexical objects, 'session' and 'date'.
• uniqueness - a combination of role occurrences from different idea or bridge types uniquely identi
fies a non-lexical object. For example an 'enrolment' occurrence may be identified by a •student'
occurrence and a 'course' occurrence.
8-12 Data Modelling Methods: Feature Analysis • disjoint - this asserts that the populations of two subtypes exclude each other, in the manner that
subtypes, 'pass students' exclude the subtype, 'failed students' for the type, 'students.'
• total role - this states that every object of an object type acts in a certain role. For example the
object system may indicate that an 'author' always has a 'book'. A total role constraint would
imply then that the information base will not record information on 'authors' who do not have
an associated 'book'. 'Author' has a minimum participation in the relationship of 1.
8.4 Active and Passive Component Modelling (ACM/PCM)
ACM/PCM [Brodie 82) developed in a University environment as a semantic data modelling method ology (that is, it utilises an extended data model as discussed in chapter 4 section 2) for the design •· and development of moderate to large size database-intensive applications. It makes extensive use of abstraction principles as a means of managing complexity and ensuring a high degree of semantic integrity. The data model utilised is the extended semantic hierarchy model (SMH + ). This embod ies the main concepts. Structural aspects of SMH + were developed first, after which the database lifecycle and the role of data design was defined. This led to the development of the ACM/PCM framework. SMH+ was then extended to include behavioural concepts. Tools and techniques for support were added.
ACM/PCM places equal emphasis on the structural and behavioural aspects of data base systems.
Discrete strategies are provided for dealing with these aspects. Development proceeds in a parallel fashion resulting in a conceptual model. In ACM/PCM the conceptual model is a network of data abstractions related by the three forms of abstraction supported by the methodology.
8.4.1 Abstraction Modelling
ACM/PCM distinguishes three levels of abstraction which are similar to the ANSI/SPARC architec ture. The levels are, the transaction level, the conceptual level and the database level. Modelling is conducted at all levels for behavioural and structural properties and proceeds as a two step pro cess. The first step identifies and relates the gross structural and behavioural properties of objects.
Diagrammatic tools called action and object schemes are used.
Data Modelling Methods: Feature Analysis 8-13 In the second step the detailed design specifies the properties of the objects. A specification language called BETA is used for this purpose.
The transaction level is designed to meet the end user application requirements in the manner of an external schema. Structural and behavioural properties of transactions, queries and reports are specified.
The name 'Active and Passive Component Modelling' stems from the treatment of objects as data abstractions. Due to objects being highly interrelated, an operation invoked on one object may result in operations being invoked on many others. Objects may then be classified as taking a passive or an active role. An active role implies that an object can invoke operations over other objects in order to complete a transaction. For example, 'sales order' might invoke 'reduce inventory', 'customer credit' and 'inventory order'. In a passive role operations are invoked on an object as with 'reduce inventory'.
8.4.2 Structural Modelling
Structural properties are expressed in SMH + through the use of objects and four forms of abstraction which relate objects. The forms of abstraction are:
Classification, Aggregation, Generalisation and Association.
Classification considers a collection of objects as a higher level object class. An object class is defined as a precise characterisation of all properties shared by each object in the collection. Classification is an instance-of relationship between an object class in a schema and an object in a database. The example given [Brodie 82 p44] is of object class 'employee' with properties 'employee-name', 'employee number' and 'salary'. An instance of the object may have the values 'Paul Groves', '8020665' and
'$34,000'. In structural modelling classification allows objects to be grouped into classes which are described by common properties.
Aggregation, generalisation and association are used to express relationships between objects. Ag gregation considers the part-of relationship in which a relationship between component objects is con sidered a higher level aggregate object. The example given [Brodie 82 p44) concerns an 'employee'
8-14 Data Modelling Methods: Feature Analysis who may be considered an aggregation of the components, 'employee-number', 'employee-name' and 'salary.'
Generalisation is a form of has-subtype relationship in which a relationship between two objects is considered as a higher level generic object. 'Employee' may again be considered the generic object for the objects of 'manager' and 'secretary.'
Association is a member-of relationship. Related member objects are considered as higher level set objects. The given example is of the set 'management' being an association of a set of employee members.
Composition/decomposition and generalisation/specialisation are the major tools for structural mod elling. These concepts are supported by 'property inheritance'. The abstraction principles of ag gregation and generalisation support upward inheritance in which properties of the aggregate or set are derived from the properties of components or members. Downward inheritance is supported by generalisation in which all properties of an object are inherited by each of its category objects.
For example all properties of 'employee' are inherited by 'secretary' or 'manager.' The category of secretary only requires those properties which distingish it from the generic object. This might be
'job-title.'
At the conceptual level structural modelling will involve each of the abstraction forms being applied.
This results in the identification and relationship of all objects of interest. Hierarchies result from their repeated application. At the transaction level objects and their relationships are defined for the scope of the transaction. This might involve the introduction of new objects not yet incorporated in the conceptual model. Object schemes are then used to graphically represent the objects and structural relationships in a manner similar to entity- relationship diagrams.
Data Modelling Methods: Feature Analysis 8-15 8.4.3 Behavioural Modelling
At the conceptual level behavioural modelling involves the identification, design and specification of actions for each object. At the transaction level it involves the identification, design and specification of transactions. Gross modelling precedes detailed specification. There is however, no requirement that gross conceptual modelling precede gross transaction modelling. Due to their dependence, an iterative process is usually followed.
A transaction is defined as :
'An application-orientated operation which alters one or more objects.' A transaction is designed to meet specific user requirements. It is comprised of a number of actions. An action is defined as :
'An application-orientated operation designed for one object to ensure that all the properties of the object are satisfied.'
Actions are the only means by which an object may be altered. Before the object is altered each action will specify pre- conditions and post-conditions. Actions on other objects may be required. Semantic
~. integrity is ensured since all constraints will be satisfied by all attempts to alter it. The behaviour of an object will consequently be completely defined by its actions. SHM + utilises the primitives, INSERT,
UPDATE and DELETE.
High level composite operations based on these primitives are constructed through the use of control abstractions, sequence, choice and repetition. These have structural equivalents of aggregation, generalisa tion and association.
Behaviour schemes are used to graphically represent the properties of a single action or transaction.
They integrate structural and behavioural properties in one representation. As with structural design gross behavioral properties are modelled first followed by detailed specification. Details are specified in the BET A language which is based on axiomatic and predicate transformer techniques.
8-16 Data Modelling Methods: Feature Analysis 8.4.4 ACM/PCM Design Modelling
The following table is a summary of the major steps for the logical design and specification phases of
ACM/PCM. It is reproduced from [Brodie 82) page 50.
Figure 1: ACM/PCM Design Phases 1. Conceptual modelling 1.1 Conceptual modelling of structure 1.1.1 Structural design An object scheme for each object and an integrated object scheme for the entire application. 1.1.2 Structure specification A structure specification for each object. 1.2 Conceptual modelling of behaviour 1.2.1 Behaviour design One insert, one delete and at least one update action scheme for each object. 1.2.2 Behaviour specification A behaviour specification for each action scheme. 1.3 Encapsulation One data abstraction for each object consisting of its structure and behaviour specifications. 2. Transaction modelling 2.1 Transaction design A transaction scheme for each identified transaction. 2.2 Transaction specification A transaction specification for each identified transaction.
Data Modelling Methods: Feature Analysis 8-17 CHAPTER 9
DATA MODELLING METHODS: COMPARATIVE REVIEW
The objective of this chapter is to conduct a comparative review of the four data modelling methods
introduced in the previous chapter. Originally there had been the intention to compare their effective
ness and efficiency when applied to the development process. It should be clear from the individual
analyses however, that the methods differ markedly in their objectives. This makes a comparison of
the means of achieving those objectives of little value. Accordingly, this section will concentrate on
improving the classification process by highlighting the strengths and weaknesses of the respective
methods. For this purpose the following taxonomy will be used:
• Lifecycle Support
• Representation and Communicability
• Abstraction Support
• Documentation
• User Orientation
• Semantic Expressiveness
• Quality Control
The taxonomy was derived from a review of the CRIS 2 conference proceedings (Comparative Review
of Information Systems) [Brandt 83], [Wasserman 83] and [Rzevski 83] and from consideration of the
major characteristics of data and data models as discussed in chapters three and four of this paper.
Each element of the taxonomy is briefly described then is followed by an analysis of the four data
modelling methodologies (methods).
Data Modelling Methods: Comparative Review 9-1 9.1 Lifecycle Support
This looks at the specific phases of the systems lifecycle, and the tasks within the database design phase, which are supported by the method. The focus of this report is on data modelling, however the extent to which the method supports adjacent analysis and implementation phases is of considerable importance due to the iterative nature of many systems development efforts and the consequent feed back and review process. The reference framework outlined in chapter 7 will be used for classification purposes.
KENT
The analysis function, phase 1 of the system lifecycle, and design of the logical data structures, phase 3, are strongly supported. It is argued [Kent 84) that the method extends beyond what might be regarded as logical design. The terms conceptual schema and internal schema design are not used by Kent and his comments are not based on the ANSI/SPARC architecture. Using this architecture however, it appears that the method is primarily directed at internal schema design due to the extensive treatment of representation. Some aspects of the conceptual schema design are supported.
our aim is to produce the actual record designs as they will be implemented in a database ....
What we do not deal with are the other aspects of resource and access path management." [Kent 84 p99]
Analysis of the application is supported through the fact specification phase of the method. Identifi cation of the entities and the relationships between them, in fact form, lays the foundation for a clear understanding of the application.
ER
As outlined in [Chen 76), ER modelling is mainly concerned with phase 3 of the systems lifecycle, the design of logical data structures. It takes as given the classification of an object system into entities and relationships, hence it does not provide formal support for the analysis phase. The end result of ER
9-2 Data Modelling Methods: Comparative Review modelling is entity/relationship relations. These represent a conceptual model of the object system.
In (Date 86] it is argued that the ER model is little more than a collection of data structures and that the purpose of ER modelling is determination of structure only. The integrity and manipulative aspects
(behavioural) are not considered.
Implementation support is not provided in a formal manner, however, (Chen 76) demonstrates the operation of view derivation for the relational, network and entity-set models.
NIAM
Support is provided for the first three phases of the information systems lifecycle. That is, systems analysis for requirements specification, functional specification, and logical design. NIAM uses the term business analysis to descibe the systems analysis function and information analysis to describe the functional specification phase. Abstraction modelling is used to represent information flows and functional specifications. Information analysis is the most developed phase and includes the data system design phase.
Conceptual modelling is the major function of NIAM. It results in a conceptual grammar which de scribes all structural and behavioural aspects of the object system. Tools are available for imple mentation design (i.e. the mapping of the conceptual schema to a target internal schema) (Verheijen
82)
ACM/PCM
Is being developed to support a six stage database life cycle, commencing with requirements analysis and specification through design, implementation and evolution. It is a composite of methods that apply to different phases of the life-cycle model. Not all phases are supported equally. The major reference (Brodie 82) deals only with the logical design and specification phases. Whilst it seems that these are the most developed phases there is support evident for the implementation design and validation phase. Requirements analysis and specification appears to be undeveloped [Brodie 83).
Data Modelling Methods: Comparative Review 9-3 9.1.1 Representation and Communicability
Representation refers to the way in which the method models the object system and in particular, how it presents results. For instance whether graphical or list based data models are supported. This will provide a measure of communication support between analyst/designer and user, and between analysts. Similarly, it may suggest which methods or parts thereof suitable are for automation e.g. data dictionary, software generation.
KENT
This method combines list and diagrammatic representations of data but is biased towards the former.
The object system is modelled through n-ary relationships which can be presented in several formats.
Output of the method is typically in the form of relational record structures.
The first step in the method, fact specification, presents n-ary relations as a list. The facts are then presented in diagram format as pseudo records. Pseudo records consist of boxes, which represent data items or fields (generally true), and relationship links, indicated by dotted lines. This diagram notation can be utilised through all phases of the design. That is, definition of relationship links, key specification, and merging of pseudo records to create implementation records. As a diagrammatic representation the constructs are very limited and support structural relationships only.
The simplicity of the underlying concept, binary modelling (as a special case of n-ary modelling) and the simple presentation of the relations provides for good communication between analysts and users.
Similarly, communication between analysts is supported because there is little risk of ambiguity.
Automation of the design process would not be difficult. A data dictionary could be used to manage the n-ary relations and pseudo records could be represented with simple graphics. Considerable potential exists for automation of the merging process. Merging follows well defined principles and assuming that the method could be extended to model semantics of the relations it would be a valuable exercise to formalise this process.
9-4 Data Modelling Methods: Comparative Review ER
This method uses diagrams extensively (called entity- relationship diagrams) to represent the logical
structure of the object system [Date 86 p612] and as a tool for database design [Chen 76 plO]. Tables
(entity and relationship relations) are used to represent the output of the modelling process. The
relations are basically equivalent to those of the relational model but with more extensive semantic
detail. The fundamental object type is the n-ary relation.
[McFadden 85 p198] describes the ER model as augmenting the network model through the use of
a special symbol, the diamond, to explicitly model relationships. [Date 86 p 612) considers the ER
approach as 'a thin layer on top of the relational model'. Both of these statements are supported. The
diagrams are identifiable as a basic network whilst the tables are relational. [Chen 76) emphasizes the
ability of the ER approach to unify views of data such that implementation data models (e.g. relational)
can be easily derived.
An entity-relationship diagram uses three simple symbols to depict an object system. A labelled box
represents an entity set. A labelled diamond between entity sets represents a relationship set, and a
labelled elipse represents a value set (attribute). Connecting arcs are used to specify the relationship
roles (i.e. 1:1, 1:N etc.) [Davis 85 p521). These diagram constructs allow specification of structure
and some degree .of semantic detail including an existence dependency and identifier dependency
[McFadden 85 p200].
With respect to communication support [Date 86 p612) comments "the popularity of entity-relationship
modelling as an approach to database design can probably be attributed more to the existence of
that diagramming technique than to any other cause." Analyst communications, and analyst user
communications are well supported by the simplicity of this tool.
Automation could be provided in the form of graphical support for entity-relationship diagramming.
Data dictionary support would be useful for maintenance of relations and a formal language for the
definition of a conceptual schema could be readily incorporated.
Data Modelling Methods: Comparative Review 9-5 NIAM
This utilises an extensive graphical notation for structural and behavioural aspects of an information
system. Two types of diagrams are used, Information Flow Diagrams and Information Structure
Diagrams. Output of the method is by way of a conceptual grammar.
The fundamental object type in NIAM is the binary relation which is represented by idea and bridge
types. The model is developed in graphical format based on the analysis of 'deep structure' natural
language sentences [Verheijen 82]. Information Flow Diagrams are very similar to the more widely
known data flow diagrams. The information flows included at this level are decomposed to produce
Information Structure Diagrams. These depict the structure of the data model by representing entities
(NOLOTS) by an unbroken circle, an attribute (LOT) as a broken circle and relationships (IDEA and
BRIDGE types) as labelled boxes between entities and attributes. The diagrams are used to model
semantics through the use of constraint and subtype notations.
As a communication tool the diagrams are excellent given that the analyst or user is familiar with the
conventions. The constructs are relatively simple but much more extensive than for ER or KENT. This
probably necessitates training. Once the concepts are understood the diagrams can be appreciated
for the compact yet comprehensive model they provide of the object system.
A data dictionary has been developed for use with the system and software generators based on the
conceptual grammar are available. Graphical support would be extremely valuable and not difficult
to integrate.
ACM/PCM
This method utilises a graphical notation and a specification language to model the object system.
Structural and behavioural properties are included. The graphical notation makes use of object
schemes and behavioural schemes which are used for gross design modelling. The specification
language BET A is used for detailed design [Brodie~ 82]. Output is a formal conceptual schema. The
9-6 Data Modelling Methods: Comparative Review underlying data model is hierarchic which gives rise to the models main data modelling concept - abstraction.
An object scheme graphically represents the objects and structural relationships of a database appli cation [Brodie 82 p45]. An object scheme is described as a directed graph in which nodes are strings denoting objects and edges identify relationships between objects. A graphic notation for each form of abstraction, aggregation, generalisation and association, is provided. A behaviour scheme is an ex plicit graphical representation of the gross properties of a single action or transaction [Brodie 82 p47].
A behaviour scheme combines behavioral information with the structural information represented by object schemes.
ACM/PCM has 'concentrated on simplicity' [Brodie 82 p43] however it is apparent that to model the object system expressing structural relationships and extensive behavioural relationships it has been necessary to include a relatively large number of modelling concepts. Consequently comprehensive ness has been achieved at the cost of relative simplicity. Compared to the previous three methods training would be necessary before it provided a similar level of communication support between analysts. Users would require extensive training to participate in the design process.
Automation could be provided for the detailed specification language BET A. A data dictionary is a virtual necessity because of the voluminous schema description and could be easily incorporated.
9.1.2 Abstraction Support
Abstraction is the operation of generalisation and in data modelling is represented by the ability to hierarchically decompose a system. Abstraction support allows different views of a system to be presented and is generally associated with a top- down development approach. Performance on this measure is an indicator of the potential to support the ANSI/SPARC database architecture.
KENT
The first phase of the Kent method requires specification of the facts to be maintained. The database design is then synthesised from the elementa1y facts. As a bottom up design technique there is
Data Modelling Methods: Comparative Review 9-7 consequently no direct support for hierarchic decomposition. Abstraction is not discussed in the documentation (Kent 84).
Facts are represented by KENT as n-ary relations. In his book, 'Data and Reality' Kent demonstrates the ability to represent all facts through binary relations. An n-ary relation which is implemented through binary relations is providing support for abstraction. In addition there is no requirement that the fact specification stage represent elementary facts. Accordingly, it would be possible for a designer to adopt a top down strategy (at least in the initial phases) through the use of 'high level' or abstracted binary facts.
It is argued [Kent 84] that the documentation of the entities and relationships from phase 1 of the method would comprise the nucleus of a conceptual model for the information system being modelled but it is not claimed that there is support for the derivation of this model. The ANSI/SPARC three level framework is acknowledged, however the method is biased towards the development of an internal model.
Modifications could be made to Kent to explicitly support abstraction and the ANSI/SPARC architec ture.
ER
This follows a top down approach, also called entity analysis, in which an entity relationship model is derived through an analysis of business processes and functions. Semantic information is progres sively added until the conceptual design is complete. The design is represented by entity-relationship relations.
ER modelling is described (Date 86] as a 'thin layer on top of the basic relational model.' It aims to produce a conceptual schema with support for decomposition. Structural modelling abstraction facilities are provided. In [Chen 76] it is argued that the entity-relationship model can be used to
9-8 Data Modelling Methods: Comparative Review derive views of data for the relational, network or entity set models. Consequently it supports the basics of abstraction principles and could support the ANSI/SPARC database architecure.
NIAM
Directly supports structural and behavioral abstraction through hierarchical decomposition of data flows and functions. NIAM treats an information system as a complex function which transforms information flows [Verheijen 82 p542]. A catalogue of system functions is produced which is then represented by information flow diagrams. These are decomposed until the functions and information flows can be described in detail. Strong support exists for the ANSI/SPARC database architecture.
ACM/PCM
Is based on the principle of abstraction supporting data, procedure and control elements. Abstraction is used as a key element in the management of complexity and in the specification and enforcement of semantic integrity. Of the four methods ACM/PCM provides the most extensive support for the priciple with both behavioural and structural abstraction techniques available. Structural abstraction techniques explicitly provided are :
• classification
• aggregation
• association
• generalisation
Behavioural abstraction techniques are considered in two groups, control abstractions and procedural abstractions. Under control abstractions there exists :
• sequence
• choice
• repetition
Data Modelling Methods: Comparative Review 9-9 Under procedural abstractions :
• actions for conceptual modelling
• transactions for transaction modelling
Decomposition is explicit through the definition of gross and detailed phases of structural and behav ioral properties. The techniques listed above are applied to approach decomposition in a step-wise manner [Brodie 82 p43). The ANSI/SPARC database architecture is strongly supported.
9.1.3 Documentation Support
This will cover two aspects. Documentation as regards the method itself and documentation of the object system design. The first measure influences the ease of learning of analysts and users and for a mature method largely reflects the logical completeness of the method. The second measure looks at the traceability of the analysis and design tasks and the ability to make the decision processes visible to other users and analysts.
KENT
As a relatively recent data modelling method the documentation has not been developed to the point of a detailed procedure. The paper [Kent 84) focusses attention on the concepts on which the method is based and on the major steps. A simple example is used to illustrate the method. In some areas the documentation (or perhaps the method) is clearly incomplete. In particular the treatment of n-ary facts requires more attention in the logical development and documentation thereof. Currently design experience and some intuition is required at these points in order to resolve problems. In [Kent 78] the issue of binary and n-ary relations is discussed at some depth but a clear approach for the purposes of the method is not made and the reader/designer to left to his/her own conclusions.
As regards ease of learning, the method is rated favourably despite the sho1tcomings of the documen tation. This is largely due to the simplicity of the modelling constructs and procedures.
With respect to the documentation of the object system design, the method rates well. Complete fact listings are generated from which the database is designed. These represent the entities of the business
9-10 Data Modelling Methods: Comparative Review and the relationships between them in a form which expresses the semantics of the information system.
It should be noted that at the fact specification level it is not necessary to have pre-classified entities and relationships. In following phases the synthesis of facts into records is highly visible. Design decisions are well documented by following the process of participation and key selection. This makes for a highly traceable design process.
ER
As a mature data modelling method, there is an extensive body of literature describing the application of this approach to the design task however, based on his original paper [Chen 76) it would be hard to describe the method as highly documented. As presented, the concepts are not difficult and the design follows a structured approach. It does, nevertheless, require substantial analyst experience to utilise the method properly. For example, no guide is given as to the categorisation of entities or relationships and yet this is a fundamental step in the method. If this is not done correctly then the design may well be inadequate for the information system requirements.
Again, due to the simplicity of the concepts, the method is not difficult for analysts to learn. For users the concepts may not be so easy to grasp because they utilise terminology with which many users would not be familiar.
As concerns the object system design, the method does not enforce documentation of all important decision processes. This reduces the ability to trace the design evolution. The analyst is called on to make design decisions, for example, entity classification, without the rationale of the decision being documented.
NIAM
Method documentation under NIAM is relatively extensive. The concepts, tools and the development phases are discussed in detail with the aid of examples [Verheijen 82). As a mature data modelling method it appears to be logically complete however it continues to be enhanced. As regards learning,
Data Modelling Methods: Comparative Review 9-11 it requires a greater investment of time than ER or KENT because of the comprehensive coverage of the
information systems design process. It does not however rely as extensively on analyst experience as
ER modelling. Being a binary based technique the concepts are easily understood but some difficulties
may be experienced with semantic modelling constructs. Users are able to relate to the design process with minimal training.
Design documentation is of a high standard. Decision processes are traceable because of the natural
language expression of the information system facts. Transformations from fact specification through
to record design are highly structured. The method recommends use of an information dictionary.
The documentation of analysis phases is stressed.
ACM/PCM
Method documentation of ACM/PCM is complex. In particular, the description of abstraction concepts
is difficult to follow from the major reference paper [Brodie 82). The structure of the design process
is outlined in a step by step tabular format but at a relatively high level. ACM/PCM is clearly an
extensive information systems design method and the data modelling phase is presented in the above
reference only as a subset of the full method. It utilises a large number of constructs to model the
semantics of the application which contriubutes greatly to the complexity. For an analyst to understand
the method and become proficient would require a considerable amount of time. On account of its
complexity it is not a method which facilitates user involvement. Documentation is being extended as
the development proceeds.
Being highly structured the method should produce suitable documentation of the design process. It
is not clear however, whether the decision processes could be easily traced.
9.1.4 User Orientation
This measure is utilised to establish the ease at which analysts can understand and become productive
in the method and the extent to which the method can be used as a communications tool between
analysts and applications users. The focus will be on the experience requirements of analysts and
9-12 Data Modelling Methods: Comparative Review application users and on the learning curve associated with the method. Results on this measure will
reflect the expected lifetime of the method and its ability to attract users.
KENT
Does not require extensive analyst experience. The primary modelling construct is the binary relation which does not present conceptual difficulties. Representation is via lists and simple diagrams which are easily mastered. These facilitate effective communications between analysts and between users.
The method caters for analysis and logical design phases only. Structural properties of data are mod
elled but behavioural properties are not. This makes the method relatively simple when compared
to the more comprehensive approaches of NIAM and ACM/PCM. KENT follows a st~p wise devel0
opment strategy which combined with the simplicity of the constructs results in a shallow learning
curve. For the modelling of structural characteristics of data it is simple and effective and for this task
it would be expected to attract users. Ongoing development should ensure a growing user base.
ER
Compared with KENT, this method requires more extensive analyst experience to produce an 'effec
tive' design. Again, the modelling constructs are binary based. Presentation is through diagrams and
tables which are easy to understand and are very effective for communications. The method caters for
some of the analysis phase but concentrates on logical design. It provides greater semantic expres
siveness than KENT but this does not overly complicate the design. The learning curve is shallow. ER
already has a substantial user base and through continued development will probably retain a great
deal of support. Whilst the method is straightforward it is not as highly structured [Chen 76] as the
other three methods. This requires more input from the analyst. As a consequence it is easier to
produce bad designs. Experience is needed to support the intuitive design phases of the method.
NIAM
Data Modelling Methods: Comparative Review 9-13 Is a comprehensive data modelling method which supports requirements analysis, functional specifi cation and logical design. It exhibits a high degree of semantic expressiveness capturing both structural and behavioural characteristics of an information system. It is a highly structured design method that is binary based and emphasises diagrams as tools for communication and decomposition. Analyst experience requirements are greater than for KENT but probably equivalent to that of ER. The learning curve should not be significant but because of its comprehensiveness would most likely be greater than for ER or KENT. Users are able to understand the concepts with minimal training and to participate in design from an early stage. The method is attracting a growing user base.
ACM/PCM
Like NIAM this is a highly structured and comprehensive data modelling technique which emphasises the role of data semantics. It is based on the concept of abstraction and utilises the extended semantic hierarchy model. Structural and behavioural characteristics of an information system are modelled.
Diagrams (schemes) facilitate communication between analysts. This method requires the greatest level of analyst experience which reflects on the concepts and language utilised, and its comprehen siveness. It does not explicitly support a user role in design. The output is a formal description of a conceptual model which is not user friendly. Training would be required and it is expected that to become proficient the learning curve would be considerable. As ACM/PCM is directed at the design of complex database intensive applications it is expected that it will not attract a large user base for environments not exhibiting these characteristics.
9.1.5 Semantic Expressiveness
This is evaluated by examining the structural and behavioural constructs of the data model. Results on this measure reflect the support given to the definition of the conceptual schema (as defined by
ISO).
KENT
9-14 Data Modelling Methods: Comparative Review This method provides the least support for data semantics. Structural constructs are provided but
there is no formal mechanism for the inclusion of behavioural constructs. With respect to the for
mer the major structural primitive is the fact which specifies objects and their properties (but does
not distinguish between them at specification stage). The pseudo record, a fact with participations,
is a structural relationship which specifies the functional relationships between and within objects.
Aggregation (the reader should refer to the review of ACM/PCM for a description of structural and
behavioural constructs used in this section) is provided through merging of pseudo records. Generali
sation is supported informally as there is provision for the treatment of subtypes. There is no support
for sets (i.e. association).
Structural modelling in KENT is directed at the logical record level not the conceptual level (although
conceivably it could be used to produce a conceptual model).
ER
This method was the first widely recognised attempt to model data semantics. It is more expressive
than KENT. Structural concepts are supported but there are is only cursory support for behavioural
concepts. The major structural primitive is the attribute which is used to characterise properties of
entities and relationships [Brodie 83 p592). Classification, entity aggregation and attribute aggrega
tion are supported through entity and relationship relations (sets). Generalisation (sub-. typing) and
association are not supported. Structural modelling is directed at the conceptual and logical record
levels.
Behavioural concepts are not part of the formal model but could easily be introduced. [Chen 76]
includes examples of the semantics of set operations and information retrieval requests.
NlAM
This methodology is specifically directed at the derivation of a conceptual schema. It provides for
extensive semantic expression of structural and behavioural features of an information system. The
Data Modelling Methods: Comparative Review 9-15 major structural primitives are objects (lexical or non-lexical) and types (idea or bridge). Abstraction principles are strongly supported, through the use of information structure diagrams (classification, aggregation and generalisation) and a conceptual grammar. The conceptual grammar formally de scribes structural and behavioural properties. Information flow diagrams are used to depict the latter.
Functional decomposition is applied through the diagrams to completely define the behaviour of the information system.
ACM/PCM
This methodology (like NIAM) provides for extensive semantic expression. Behavioural and structural characteristics are modelled explicitly through diagrams (schemes) and" described in a formal concep tual language (BET A). The major concept is abstraction. Structural and behavioural tools are provided which are based on this concept. Structural abstractions, classification, aggregation, generalisation and association are directly modelled in object schemes. Behavioural properties of an object are com pletely defined by its actions and the gross properties of actions are depicted in behaviour schemes.
Detailed properties are defined procedurally in the specification language BET A.
9.1.6 Quality Control
This feature reflects on the provision or availability of validation techniques for the method to ensure consistency and completeness of the system design. The provision of these features is considered in light of the method objectives. Design convergence (the extent to which the same model would result from the work of independent analysts), clarity of the design output, and detail resolution are considered.
KENT
The method provides no formal validation procedures to ensure that the design is consistent or com plete. Binary facts reflecting structure, as opposed to behaviour (information flows) are stated in the first phase. These are synthesised into logical records. The design can be considered complete with respect to the stated facts after the final step. Consistency in this method reflects on the degree of
9-16 Data Modelling Methods: Comparative Review normalisation. It aims to produce normalised records through synthesis. To validate the process the records can be checked for degrees of normalisation. This provides a check that the procedures have been followed correctly and that the facts had been stated independently. If either of these conditions does not hold then the design is unlikely to be in a fully normalised form. This is not however a formal part of the method.
Design convergence should be high given that facts are stated consistently. That is, assuming the output of the analysis process is constant there will be little variation in designs. The method provides for varying levels of detail resolution. As it aims to produce logical record designs detail resolution is relatively high.
ER
The method provides no formal validation procedures, nor are there guidelines for the classification
of entities, attributes or relationships. Accordingly the output, entity and relationship relations, are
based solely on the analysts view of the information system. A given fact base may give rise to a variety of designs. Consequently, design divergence can be significant. A detailed data model can be
produced but typically the ER approach is used for generation of an enterprise (or business) schema.
At this level validation techniques, apart from user reviews, may be difficult to apply.
NIAM
Is a methodology for conceptual schema design. Model specification and diagram construction is
iterative. A formal mechanism for consistency and completeness checking is provided by comparing
the design with requirements specifications and the conceptual grammar can be checked against
information structure diagrams [Brandt 83 p22]. NIAM is an analysis aid as well as design aid. Based
on a stable set of requirements it is expected that design convergence would be relatively high (better
than ER, similar to ACM/PCM). Output, to the required level of detail, can be produced in a clear and
precise form.
Data Modelling Methods: Comparative Review 9-17 ACM/PCM
Is a methodology for the conceptual schema design of complex database intensive applications. Each
stage of development is verified for completion and consistency by comparing the schemes and spec
ifications of current representations with those of previous stages [Brandt 83 p14]. Design procedes
in two stages. Gross design followed by detailed design of structural and behavioural characteristics.
Iteration and decomposition are fully supported. Detail resolution is high with final specifications
close to program level as far as data structure and transactions are concerned [Brandt 83 pl4). Design
convergence should be greater than for ER modelling.
9.1. 7 Comparative Review - Summary
The methods differ markedly in their comprehensiveness with regard to the phases of the systems
lifecycle supported and the detail within a phase. KENT and ER are the ~ost restricted. NIAM and
ACM/PCM are considerably more detailed. This makes comparisions on some of the other criteria
difficult.
Each of the methods employs a variety of representation means. KENT uses lists and simple dia
grams, ER uses diagrams and tables, NIAM uses diagrams and a formal conceptual grammar whilst
ACM/PCM uses simple diagrams and a specification language. KENT and ER benefit from their sim
plicity in representation. They provide for good communication between users and analysts. NIAM
diagrams are somewhat more complex because of the number of semantic constructs supported. Nev
ertheless they provide a compact notation and are effective in analyst communications. ACM/PCM
uses simple diagrams but emphasises the formal specification language. This provides ·a detailed
description but is not conducive to design communications especially when users are involved.
Automated support is provided for NIAM in the form of a data dictionary and software generator.
There are no tools for ACM/PCM or KENT but both these would benefit considerably from a data
dictionary tool. ER was not developed with automated support.
9-18 Data Modelling Methods: Comparative Review Abstraction is strongly supported by NIAM and ACM/PCM with formal procedures for decomposition.
ER is considerably more limited in this respect despite being based on a top-down development strategy. KENT as a bottom-up synthesis approach does not utilise abstraction although it could easily be incorporated into the analysis phase.
Documentation considerations were based on the major references for each method. The evaluation of the documentation is biased by the fact that for NIAM and ACM/PCM the papers were part of conference proceedings restricted by length. In addition the variation in comprehensiveness between the method objectives is an important factor. With these factors considered, NIAM and KENT appear to be best documented. ER is not presented as a detailed procedure and lacks a logical foundation.
ACM/PCM is orientated towards theoretical justification of the concepts with less detail on the practical approach. Documentation of the object system is most comprehensive under ACM/PCM and NIAM.
KENT reflecting its lifecycle objectives is less extensive. ER provides minimal documentation.
User orientation is reflected in the representation means and in the simplicity of the constructs sup
ported. A tradeoff is apparent on simplicity and comprehensiveness. Accordingly, KENT is the
simplest followed by ER, NIAM and then ACM/PCM. The same pattern is evident in the degree of
semantic expressiveness.
With regard to quality control all methods provide for iteration during the development and specifi
cation phases. Formal validation techniques are provided for ACM/PCM and NIAM with the latter
being best served. KENT could be easily expanded to incorporate a validation procedure.
Data Modelling Methods: Comparative Review 9-19 CHAPTER 10
UNIVERSITY OF NEW SOUTH WALES
10.1 Objectives
The major objective of this case study was to examine the use and development of data modelling within a teaching and research environment and to ascertain the suitability of binary data modelling
as a technique for the communication of conceptual modelling concepts. The chosen environment,
the University of New South Wales, allowed an almost exclusive focus to be directed on the metrics
of communication and ease of student understanding (learning). These metrics corresponding to
two of the important criterium with which a new technique is evaluated for inclusion in the teaching
program. Being an academic institution it also provided the opportunity to examine these factors free
of the financial pressures and with reduced technical and time pressures usually associated with the
corporate systems development environment. Nevertheless, by gaining an insight into these issues
some indication was given of the ease/difficultly with which data modelling procedures and techniques
could be changed in the business environment and the associated training that would be required.
To pursue these objectives the case study examined the processes which led to the introduction of
binary data modelling and then highlighted the extent to which the changes in data modelling methods
impacted student learning. In the conclusion some feedback on the theory was provided based on
the metrics outlined in chapter 9.
10.2 Research Method
The research methods employed in this case study were based on direct observation and interviews.
Direct observation stemmed from the authors involvement with database systems at the University
of New South Wales over a five year period. This commenced as a student in the database subject,
'Advanced File Design' in 1982. From 1984 through 1986 the author was heavily involved in the
tutorial workload of the restructured course 'Database Systems.' Section 10.4 of the case study which
University of New South Wales 10-1 describes the subjects is derived from the author's experiences. Documents relevant to this section
(course outlines, major assignments etc.) have been included in the appendices.
Section 10.5 is based on interviews with lecturing and tutorial staff. All lecturers associated with the subject during the period 1984 through 1986, and the major tutorial staff are represented. The com ments of several students (1986 class) have been included. The interviews were conducted using an asking strategy employing open questions. Comments were sought on the advantages, disadvantages and teaching utility of data modelling with special emphasis on binary data modelling. Free comment was encouraged.
Limitations of this case study include the biases introduced through interview selection and recall. It was attempted to minimise the former by approaching all staff involved with the database curricu
lum. With students this was clearly not viable due to the numbers. The projects themselves are not
representative of corporate systems neither in size nor technical complexity an important consider
ation to be made before generalisation. Furthermore the students may not be representative of the
typical information systems employee. This effect was somewhat countered by the graduate students
involved. Bearing these restrictions in mind the focus on communication and learning nevertheless
allowed some useful results to be obtained.
10.3 Environment
The Department of Information Systems, University of New South Wales, falls under the adminis
tration of the School of Accountancy within the Faculty of Commerce. It offers both graduate and
undergraduate, pass, and honours course majors. For the purposes of this case study the focus will be
on the undergraduate degree, however a small amount of material is taken from the graduate program.
The undergraduate subject, Database Systems 14.608, has been taught in the Department of Infor
mation Systems since 1983. Prior to this the subject was known as Advanced File Design. Database
Systems forms part of an Information Systems major as a first session third year (full time) subject.
Pre-requisite subjects are Computer Information Systems 1, and Computer Information Systems 2 or,
10-2 University of New South Wales Management Information Systems Design. Current subject descriptions for each of these are con tained in the appendices. At the graduate level the subject Data Management 14.992G was offered for the first time in 1986.
Teaching in Database Systems and Data Management has been structured around a 14 week session with a two hour lecture and one hour tutorial. Assessment has typically been split between course work and a final examination. Course work has varied according to the resources available but has centered on exercises with microcomputer database management packages and on conceptual file/database
design.
In 1984 the lecture content was split into a Systems and Technology stream. Michael Lawrence was
responsible for Systems and Robert Edmundson for the Technology stream. Tutor in charge was Paul
Groves. In 1985 Ross Jeffery assumed the responsibility for the Systems stream. Robert Edmundson
continued teaching the Technology stream and Patrick Thng joined Paul Groves to share the tutorial workload. In 1986 Ross Jeffery lectured a restructured course in which the Systems and Technology
streams were merged. Paul Groves and Chris Johnson tutored.
From 1984, enrollments have been stable in the 80-100 student range for Database Systems. Aprox
imately 30 students were enrolled in the graduate subject. Tutorial sizes have been held within the
range of 15-18 students.
Practical exercises which were designed to complement the theoretical components of the course
required that a selection of database management systems software should be available for student
use. With the majority of the University's computer power concentrated in centralised mini- computers
it appeared sensible to support mini-computer database packages. Unfortunately on these machines,
availability of suitable DBMS software and the associated cost of the packages suggested that a different
strategy would be necessary. Accordingly, micro computer support was provided in the form of
Datamax CP/M machines. A network DBMS, MDBS 1 was purchased. In 1984 this package and the
relational package Dbase II were used for major assignments.
University of New South Wales 10-3 Increasing student numbers and a continuing heavy price bias towards micro computer hardware and software saw an IBM PC laboratory established in late 1984. Availability problems with an educational version of the network DBMS, MOBS III resulted in Dbase II being the only package available in 1985.
By second session of 1985, it was evident that MOBS III would be available in an educational version
for an IBM environment for its use in 1986. Major assignments in 1986 were once again conducted in both relational and network packages.
10.4 Database Systems Development
Until 1984 data modelling had assumed a low profile in data base courses. It had not been taught by reference to a single structured methodology but had drawn on concepts of entity relationship
modelling and normalisation theory. These had been used to support what remained largely an in
tuitive design approach. Consequently, design exercises resulted in considerable difficulties being
experienced by students, who, in the majority of cases had only minimal previous exposure to pro
gramming concepts and even less to practical information system design. In retrospect, understanding
of normalisation theory and entity relationship modelling appeared to have been more sucessful for
students with previous systems exposure. It was not unl!sual for students to complete the data base
systems course without the benefit of having worked with a structured design methodology.
Design skills which developed during the course were largely the result of the practical exercises set
in the relational package Dbase II and in the network database package MOBS I. For the majority
of assignments a logical design exercise preceded the practical element. This provided meaningful
feedback on the implications of design choices because poor logical design would be expected to
cause problems to the student in the implementation phase.
The understanding of normalisation theory was boosted with the availability of a relational package.
This was because of the ease with which a normalised conceptual design could be expressed at a
physical level. That is, the logical and physical designs were usually equivalent. In comparison,
the implementation of the same logical design with a network database required restatement of the
10-4 University of New South Wales schema. The particular advantage of a relational package then was the ability to clearly demonstrate the difficulties imposed at a physical level by poor logical design.
In summary, design concepts in this period were developed by students mostly from practical experi ence with the exercises and only supplemented by the teaching of normalisation theory. The intuitive top down approach dominated design exercises. The major problem with this being internalisation of the design task. There was no pre-defined method or visible decision process. This was highlighted by the absence of documentation at the completion of the logical design process. The major design effort was then shifted to the physical level.
10.4.1 Database Systems - 1984
During late 1983 a draft copy of William Kent's work on binary modelling was brought to the attention
of Michael Lawrence. After a review by database staff the technique was adopted with considerable
enthusiasm as the standard data modelling method. It was taught for the first time in 1984.
Three design exercises were set using the method. The first of these involved purely a modelling
exercise in which the basic data to be modelled, 'the facts' using Kent terminology, were provided.
The exercise involved a small medical records system with the design to be completed as a 1 week
tutorial exercise. The draft Kent paper was referenced but students knowledege of the method was
otherwise restricted to the lecture examples.
In the following tutorials it was evident that understanding of the concepts of participation, key iden
tification and merging was not good. The design application had been simple enough that records
could be designed from 'inspection' without undue difficulty. Hence some students had followed a
top-down design strategy then documented a binary bottom-up approach. This avoided the issue of
coming to grips with the Kent method.
The second design exercise involved a textbook case study in which variable length record designs
were provided for an application with a hierarchical inter-record structure. Several extra fields were
to be added to those already present. This was a somewhat more difficult exercise in which the
University of New South Wales 10-5 maxcimum and minimum participations of the binary relationships required careful consideration.
Some student difficulties continued in this area and the resulting designs were often not in third normal form.
The final exercise was both an analysis and design exercise in which a video hiring application was briefly described. In tutorial discussion directions were given as to the required detail of the design and on techniques to resolve modelling problems. A large percentage of the time taken to complete the exercise was required in the fact specification phase. This was accompanied by considerable class discussion. With the concept of participation being addressed more carefully, results improved. How ever problems still remained with the final designs due to the incorrect merging of pseudo records.
The design ouput of the final exercise was to be used as the basis for a network data model in
MDBS I. Consequently, the more subtle problems with student designs were allowed to go unchecked in the hope that implementation of the design would alert students to the difficulties arising from unintentional, unnormalised designs. This was unfortunately, only a partial success. The MDBS I exercise concentrated on database loading and enquiry operations. Maintenance, that is, change and delete transactions were not part of the exercise and this allowed a number of logical level design problems to go unnoticed by students.
Nevertheless, improved designs soon replaced poor designs as the exercise continued and the aware ness of design implications grew. By the completion of the exercise many implementation designs had converged forcing a change to the logical level designs.
In summary, at the end of session it was evident that students had gained considerably in design skills and the majority possessed a good appreciation of Kents method. However, difficulty had been shown in understanding the concepts on which the method was based. This resulted in problems in those areas of the method which allowed considerable freedom, or in which the method was incomplete. It was found that the majority of design issues could be handled simply by the method but some situations still required design insight. This was often beyond the experience of students
10-6 University of New South Wales who, when resorting to intuitive design made normalisation errors. On the whole, results showed an improvement over previous years mostly because a design model was available with which to structure
the design task.
10.4.2 Database Systems - 1985
Based on the experience from teaching Kent in 1984 it was felt that a summary of the paper, "Fact
Based Data Analysis and Design" would assist students comprehension of the method and its objec tives. Consequently, a six page overview with example was prepared for student distribution. The
overview concentrated on the essential phases of the method and provided rules, but avoided de
tail and discussion of problem areas. Students were strongly recommended to obtain a copy of the
original paper. As a result of this approach, students understanding of the binary modelling process
developed much faster than in the previous year.
For the major data modelling assignment it was decided that design and analysis should both play a
large part in the project. A one page description of a rock music promotions system was distributed.
This defined minimum process requirements but allowed considerable flexibility as to the compre
hensiveness of the design. Project deliverables were matched to the phases outlined in the overview
paper.
Phase 1, specification of the facts, is in essence an analysis task. Following the pattern of the previous
year, tutorial discussion of the 'facts', as represented by binary relations, was intensive. Sufficient
ambiguity as to the scope of the system led to a variety of approaches differing as to the level of
detail and functionality. Whenever possible students were encouraged to see alternative views of the
problem but generally little prompting was needed. As few restrictions as possible were placed on
the system scope.
Communication between students and between tutors and students regarding the application was
generally good. The method allowed students at this point, to focus entirely on the application
without the distraction of working with the syntax and conventions of a formal data model. The
University of New South Wales 10-7 primary concept required for fact specification, binary relations was one readily accepted by students because of its simplicity.
As a proportion of total project effort the analysis exercise (phase 1) was relatively large. It was evident at the completion of the phase that the application was well understood by most students. Concern as to the scope of the system had been raised by several students, but in general, problems had been minimal. However, what was generally not appreciated after completing this phase was the importance of establishing the facts to represent reality. Problems in later phases would be traced to incorrect or incomplete fact specification. In order to provide a uniform problem statement for the subsequent modelling phases the scope was defined in detail following the completion of phase 1.
A large share of the available tutorial time continued to be devoted to the conceptual design as it entered the second phase of specifying fact participations. Once again the work was analysis orien tated because establishing participations required insights into the application and not (at least on the first pass) into design issues. Whilst the concept of participation was now familiar to many stu dents, difficulties were encountered understanding the implications of binary relations involving a least participation of zero.
Identification of keys in phase 3, required little tutorial time. The rules for key identification had been stated precisely in the overview documentation and with some theoretical justification were quickly accepted and understood by the students. The main difficulty, though not a serious one, lay in redefining the concept of a key for those students who tried to equate the Kent concept of a key with their practical experience using indexed files (where duplicates may be acceptable). There was also a popular misconception that the key to a pseudo record needed to be practical. That is, students resisted the notion of composite keys involving the whole pseudo record. For the majority of students this phase progressed quickly due to the contraints imposed on selecting candidate keys (determined by the participations established in the previous phase).
10-8 University of New South Wales The fourth phase, merging of pseudo records met with some real, and some imagined difficulties.
The real difficulties stemmed from evaluating a merge in which alternative keys were available. No
clear cut rule could be used at this point. The solution lay in considering the application requirements and/or using design intuition. For example consider the following pseudo record :
Figure 2: Candidate keys
---- Manages Both department number I I and employee nwnber are Department No. Employee No. candidate keys.
1 * 1 0 * 1
Given the following Department and Employee records, it is possible to merge the pseudo record on
department no. of the Department record or to merge on employee no. of the Employee record.
With the later merge the department no. field must be able to handle nulls for those employees who
are not managers. Figure 3: Pseudo record merges
--- Department I I Department No. Dept. Name Dept. Location
--- Employee I I Employee No. Emp. Name Emp. Salary
Imagined difficulties related to 'fragmented' designs. After an initial pass through the method it
was common for a significant number of pseudo records to remain unmerged. This was a direct
result of assumptions made concerning participations. Usually students avoided making restrictive
assumptions in determining participations. Hence maximum participations of 'N' were common with
composite keys resulting. This meant many pseudo records could not be merged. With iterative de
velopment however, these assumptions could be gradually modified, reducing flexibility but allowing
further merging of pseudo records. The objective here was not to assume away problems by changing
University of New South Wales 10-9 participations, but to make the link between the degree of flexibility assumed and the resultant record design clear.
After the merging process the designs should have been normalised at least to degree 3 providing the technique had been followed accurately and facts carefully specified. Accordingly the students were
encouraged to check the designs using normalisation theory. Anomolies which arose were ususally traced to fact specification errors. By this, it is meant that differences existed between the students perception of reality being modelled and how the fact was actually stated. This problem usually arises when the fact has been specified in general terms but on assignment of representation (filling in the
detail) a 'new' fact is created which is different to the original intended fact.
The final phase in the technique, consideration of alternative designs was handled with difficulty by
students. Having arrived at a design students were reluctant to review facts and participations which
would have generated alternatives. The idea of iterative design and progressive refinement which is
central to the technique was not widely appreciated. Students tended to be locked into a single view
of the system.
At the conclusion of the design exercise it was felt that students grasp of the issues in data modelling
was much improved over previous years and that understanding of normalisation and its implications
had been greatly improved.
10.4.3 Database Systems - 1986
Encouraged by the sucess of the previous year data modelling once again assumed a high profile in the
course outline. The early weeks of lectures concentrated on data concepts and characteristics. Kent
was introduced in week 5 of session accompanied by the six page overview documentation used in the
previous year. Practical work was assigned in Dbase II and MDBS III. The major conceptual design
followed a different approach to that of the previous year. A full case study description of a production
and marketting system was provided which ran to 25 pages. The objective was to reduce uncertainty
in the analysis phase so that the major effort would be concentrated on design and understanding of
10-10 University of New South Wales the method. By providing a detailed case study the expectation was that variation between designs would be minimal due to the reduction in uncertainty.
In phases 1 and 2 consequently, tutorial discussion was brief. It was necessary to point out that the design should cater for a base set of data from which all reports could be produced. Any fields which could be derived were to be excluded. Few questions were raised however regarding the case study material itself. As concerned participations, the theory was covered quickly with illustrations drawn from case study facts to support.
Completion of phases 1 and 2 represented a design deliverable. Compared to the previous year these two phases had occupied roughly 50% less student time and 60-70% less tutorial time. The provision
of a detailed case study material appeared to have achieved its objective in significantly reducing the
analysis effort.
Phase 3, key specification produced similar problems as in previous years. The task was completed
easily by those students who followed the method rules without question, and by those students who
understood the strict definitiori of a key. Students who endeavoured to use intuition alone, invariably
had problems. Once the concept had been clarified in the tutorial the problem vanished.
Merging brought with it the usual concerns regarding record fragmentation. Some students under
stood the distinction between logical and physical design phases and were happy to consider changes
in representation or participation assumptions in order to produce an implemention design. Many
did not. This was shown clearly by the difficulties experienced in the final phase, consideration of
alternatives. The link between fact specification, participation, representation and the final design
did not seem to be clear. Most alternatives involved tinkering with the merging process. As with
the previous year, the concept of iterative design and progressive refinement was not widely used or
appreciated.
Normalisation checking was conducted before the final deliverable so as to verify adherence to the
method, and to provide a check on the correct statement of facts. This also served to demonstrate
University of New South Wales 10-11 to students the equivalence of top down modelling through decomposition, with bottom up, binary modelling.
In conclusion, it seemed that students appreciation of data modelling concepts was good by the time the design task was complete. In comparison with the previous year fewer problems had been evident but there was some feeling that students exposure to problem areas in Kent had been reduced through the provision of a detailed case study. Owing to reduced uncertainty, less class discussion had been generated and consequently fewer alternatives considered.
10.5 Interview Plan
Each of the lecturers, and one of the tutors involved in Database Systems since 1984, were asked to discuss their feelings towards the use of the Kent method as a tool for teaching data modelling.
Discussion ranged from its ease of teaching to the level of student comprehension. It was hoped a consensus on the strengths and weaknesses of the method could be identified. General comments on individual experiences were sought.
Naturally, to gain a better appreciation of the impact of the method, several students from Database
Systems and Data Management were asked for comments. Responses were sought regarding their un derstanding of data modelling concepts and the contribution that the Kent method had made towards this. Comments on ease of use, and confidence, with the method were also sought. In all interviews an open question strategy was adopted.
10.5.1 Lecturers
Robert Edmundson introduced the Kent method in lectures in 1984. He was asked for comments on teaching with the method, on students understanding and usage of the method, general observations and specific strengths and weaknesses. With respect to teaching he made the following observations:
The Kent method was an easy and natural way of thinking about data which lent itself to easy il lustration. It was a method which did not require design intuition or prior systems exposure and
10-12 University of New South Wales was therefore appropriate for a student or user environment. Accordingly it was feasible to intro
duce data analysis using this method at an earlier stage of an Information Systems major, probably
from Computer Information Systems 2. Experience in the first session of its use had shown that
for teaching purposes it was beneficial to place a different emphasis on the various phases than had been suggested in the original Kent paper. Pseudo record participation and fact specification were areas requiring more attention, whilst fact participation, included in phase 1, was less important as its
relevence seemed unclear.
Regarding students understanding and experiences with Kent:
Some students had difficulty with the method but less difficulty than with alternative methods (Entity
Relationship) because Kent did not require students to differentiate between an entity an attribute
and a relationship. The Kent method was able to take students from base level data through to
a record design (bottom up) via a highly visible well documented path. This provided improved
understanding of the application and an insight into the problems of data modelling as evidenced
by increased awareness of normalisation principles and in the quality and insight shown in student
questions.
On the strengths of the method:
Ease of understanding binary concepts. No prior experience required. When used in conjunction
with normalisation the method provided a powerful tool for analysis equal to its primary function as
a design tool. Due to its bottom up approach data modelling with Kent requires more effort than a
top down strategy which is beneficial due to the thoroughness of the analysis.
On problems with the method:
Handling of n-ary relations requires greater attention in the documentation. No clear guidelines are
provided.
University of New South Wales 10-13 Michael Lawrence was the first staff member of the department to be introduced to the method and co lectured in Database Systems when it was first taught. On teaching with the method he commented that it was straight forward to explain the concepts but that initially for a student it might well be
difficult to understand, although no more so than alternative methods. He believed students to be more involved in data analysis than previously and that Kent's method, by following an incremental
design strategy had de-mystified the design process. The use of a well defined method and uniform theory, relational, was of considerable benefit to students. Intuitively, he felt that students appreciation
of normalisation and of data modelling problems had been improved through working with the Kent
method.
On the strengths of the technique:
The method had sought to minimise the intellectual difficulties of modelling data through the use of a
single construct, the binary fact. In addition the development process was self documenting thereby
providing the capability of traceing all aspects of the design process from fact specification through to
the record level. This was facilitated by the virtue of a 'fact catalogue' with natural language descrip
tions produced as a product of phase 1. A comforting feeling was the ability to resolve modelling
problems or explain design errors by stepping through the detail of the method (whether this was
fact specification, participation or merging phase). It was felt that the Kent method was substantially
better than entity relationship modelling for the promotion of group discussion, particularly in the
analysis phase.
On the weaknesses:
'Correct' fact specification was seen as the basis of the methods success. If this was not done carefully
then bad designs were likely to result. This was seen to necessitate the use of normalisation, as a
check on the final design.
Ross Jeffery had taught entity relationship modelling and NIAM prior to teaching Kent in 1986. He
regarded Kent's method as very easy to teach but saw no particular problems with the alternative
10-14 University of New South Wales methods. Before introducing the Kent method to students, concepts of entities, relationships, and attributes were taught first. Whilst is was not necessary to define entities and attributes in order to use the Kent method (due to its bottom up orientation) it was felt that these concepts helped the modeller to perceive the structure of the problem. This first view of the problem was believed to be critical to the success of all methods. It was argued that for modelling exercises a distinction should be made between the analysis and design components. For a design exercise a detailed case study should be provided so as to minimise the analysis effort.
On the strengths:
The Kent method was easy to grasp and allowed many decisions which were- not relevant at the
conceptual design level be deferred to later phases of the design process. An example of this was the
representation issue.
On the weaknesses:
As a bottom up approach the method did not allow good perception of the facts unless it was sup
plemented by an overview of the problem. As a list based binary modelling method it was seen to
be at a disadvantage when used as a communications tool compared to graphical binary modelling
methods such as NIAM.
10.5.2 Tutors
Jamie Crowley tutored in the graduate subject Data Management in 1986. Prior to this he had no
experience with the teaching of binary modelling. Kent was used for several of the design exercises but
was not compulsory (students could select a preferred method although only Kent was supported). For
students with design experience the Kent method was felt to be long winded. These students preferred
entity relationship modelling combined with normalisation. Less experienced students found the
method to be supportive because it provided a framework for analysis and effectively illustrated the
concepts of data modelling. It was claimed that difficulties had been experienced in defining the facts
for a given application and that the method provided no support or guidelines for this activity. In
University of New South Wales 10-15 addition, non binary relations were a source of confusion. As a communications tool the diagrammatic representation of NIAM was preferred.
10.5.3 Students
A number of students having completed Database Systems in 1986 were asked of their experiences with data modelling and the Kent method. Hock-Seang Khaw had been introduced to entity relation ship modelling in Computer Information Systems 2 and used this as a benchmark for the evaluation of the Kent method. He believed that the theory was easy to understand and quite simple to learn.
However entity relationship concepts had been easier to apply in practice. After having completed the
conceptual design assignment it was felt that the method had helped in understanding normalisation.
He was confident with the method and would use it for future design problems. The major problem
he had encountered was the initial understanding of normalisation theory.
David Liebsman felt that substantial problems had existed in learning the Kent method. He had not
appreciated the reasons for the various phases of the method or where it was going, feeling that an
overview of the method had been lacking. Despite this, generation of pseudo keys and merging had - - not been difficult. He believed that his understanding of normalisation was good and that the method
had assisted in that respect.
Szue-Shang Chai felt that Kents method was not difficult to understand but that it had been mostly
self learnt with little recollection from lectures. The distributed six page overview had been very
useful and was almost of as much benefit as the full Kent paper. Normalisation concepts were well
understood. A major advantage of the method lay with its systematic step driven approach.
Julian Terry enrolled as a masters student in Data Management had not previously been exposed
to data modelling. He felt that Kents method was a 'common sense' approach which allowed the
concepts of data modelling to be quickly grasped. The method fitted naturally with relational theory
and complemented normalisation concepts. A major attraction was the ability to undertake detailed
application analysis which when completed produced a complete logical model of the data. In his
experience of the design exercises it had not been necessa1y to have an understanding of entitiy,
10-16 University of New South Wales attribute, or relationship concepts in order to work with the method. Whilst acknowledging the basic bottom-up orientation of binary modelling it was possible to use 'high level' facts (deferred repre sentation) as a means of taking a top-down view of the application. These high level facts could subsequently be decomposed after a first pass at an overview level. Problems encountered with the method were in the area of n-ary relations. A clear and systemmatic approach was not apparent from the documentation. It was felt that some systems experience would have been helpful at this point.
10.6 Conclusion
The Department of Information Systems at the University of New South Wales is a teaching and re search body in which major activities are undertaken in the area of database and information systems
design. The case study represents a longitudinal analysis of the database curriculum spanning five years in total. It draws on anecdotal material from a variety of students (Masters and Undergraduates), a variety of lecturers and tutors, and a selection of projects encompassing both design and implemen
tation phases and several database management systems architectures. In such an environment the
major opportunity was to examine the issue of communication and data modelling through the metrics
of 'representation and communicability' and 'ease of learning'. Naturally some light was cast on the
other metrics used in this paper but these findings should not be over-emphasised nor generalised to
other environments due to the attypical nature of the projects (small and simplified) and purpose of
the projects (pedagogic).
During the course of this case study the role of data modelling in information systems design courses
was extended to the point where it represented the foundation of systems and database design.
Accordingly, with a mission to provide students with state of the art design methodologies and tech
niques, continuous investigation, development and emphasis was placed on this area. In the case
study it was seen that students had been exposed to a variety of modelling techniques including ER,
NIAM and KENT. The introduction of binary data modelling as represented by NIAM and KENT was
however the major event marking the increased importance (and success) of data modelling within
the teaching program.
University of New South Wales 10-17 An important achievement observed during the course of the case study was the realisation (by staff and students) that a data model represented a 'chosen' reality and that it was therefore critical for the data modelling technique to make the design process traceable and all assumptions explicit. This required that support for documentation and support for analyst/user (student/tutor) communication be strong. Accordingly, a graphical basis of representation was seen as an important means of achieving this. Modelling with Kent was seen to have provided communication support through the logical construct of the binary relation but to have lacked in the model representation domain. This was often not critical due to the project size but for large complex systems could be a serious disadvantage. As such early enthusiasm that Kent was the binary modelling method was replaced by the understanding that a complimentary top-down approach might also be beneficial.
As expected the phases supported by KENT were limited to analysis and design of the data model.
What was somewhat unexpected however was the strength of that support in the analysis phase. This was believed to have been directly related to its user orientated nature and support of user/analyst
communications. Modelling discussions in a group environment involving KENT had been much
more lively than those involving ER although some differences could be accounted for due to project
variance and tutorial group variance. Quasi-experimental controls could be applied to investigate this
issue further.
As mentioned representation and communication were good using the KENT technique. This was
also in line with theoretical projections. What was not present in the case study (as in the original
paper) was a large or complex application against which 'real world' performance could be measured.
Some doubts exist as to whether representation and communication would be satisfactory in these
types of projects particularly with the low level of abstraction support provided in the method. For
small, simple systems the results were very satisfactory.
Documentation of the method was found to be less than satisfactory with students finding logical
holes in the theory. This resulted in extended tutorial discussions in several areas. Documentation
of the design was found to be extremely well supported when students had observed the method
10-18 University of New South Wales procedures. As expected the method was found to be highly user orientated when compared with
the approaches of NIAM or ER and students were observed to pick up the major concepts quickly.
Advantages in this area were offset by the inability to provide more than superficial semantic support.
For this reason students were often encouraged to include textual explanations of their designs.
Quality control measures were not part of the KENT method but were incorporated into the projects via normalisation and through group discussion. The necessity of doing this was anticipated from the
theory.
University of New South Wales 10-19 CHAPTER 11
AUSTRALIAN MUTUAL PROVIDENT
11.1 Objectives
This case study investigates the use and evolution of data modelling in the corporate environment.
Against a background of hardware and software changes it examines the history and development of data modelling, and the forces driving its' development. An evaluation is made of the current status of data modelling within the organisation and of the degree of success achieved through its' implementation. The major objeftive of this is to provide feedback for theory development based on the experience of applying binary data modelling to large, complex, corporate projects. Naturally of interest in the broader world of information systems management is observation of the resultant changes in information systems development procedures and metrics. For this case, only qualitative research was conducted, however in future research the measurement of changes in the metrics of
quality and productivity (for example) would be of major interest.
Australian Mutual Provident (A.M.P.) was selected as an ideal candidate for these purposes having long
been associated with database technology in financial systems applications. The company represents
a sophisticated user and developer of large commercial information systems, one which is constantly
adapting to utilise new technologies and new methodologies to meet its business objectives.
In order to provide an environmental context the following related issues were also investigated :
• details of the physical environment ie. hardware, software, applications and personnel • history of database usage including investigation of the strength and weaknesses of their approach • major issues or problems associated with database technology in general application, or with its
implementation
• trends and future directions in systems technology
Australian Mutual Provident 11-1 11.2 Research Method
Due to the descriptive nature of this research a case study approach has been followed. Data was gathered in interviews via an 'asking' strategy. So as to encourage free comment by A.M.P. staff magnetic media was not used to record these sessions. Despite the obtrusive nature of this approach and the inherent limitations of an asking strategy it is believed that the data gathered represents an accurate description of the environment and of the techniques utilised in the systems analysis function.
Unfortunately, it was not possible to obtain copies of standards, documentation or project material because of a corporate restricted disclosure policy.
The initial contact at A.M.P. was made through Daryl Dobe, the Application Support Services manager
(see appendix D).It was anticipated that after providing an overview of A.M.P. operations Daryl would be able to identify further contacts in the system development groups. Based on the first interview,
and with the aid of a data processing organisational chart it was possible to arrange the following
interviews :
• Brian Donelly - Manager (Assistant) Systems Engineering
• David Nash - Manager User Support Services
The material gathered in the first of these interviews related primarily to hardware and operations
details as would be expected for a section responsible for capacity planning, systems performance
monitoring, and systems engineering. Due to time constraints and corporate security restrictions it
was not possible to conduct an in-depth analysis of these aspects. Despite this, sufficient information
was collected to place subsequent interviews in the right 'environmental' context.
As manager resposible for user computing, user support (technical) and data administration, David
Nash was able to provide an overview of the systems analysis and data analysis methods employed.
Whilst adopting a mostly supervisory role at the first interview he was able to introduce two systems
analysts in the Systems Engineering section from whom much of the detailed material on data mod
elling was obtained. A follow-up interview was organised with one of these analysts, Mark McMillan.
11-2 Australian Mutual Provident In the final interview, Mark in conjunction with David Nash organised contact with a 'user', a former
N.S.W branch manager seconded to data administration.
Through these interviews a cross section of data modelling from a management, analyst and user perspective was provided. Time and access restrictions unfortunately prevented a wider sample.
Selection bias (towards pro data modelling analysts and users) could not be controlled for, although no evidence for the existence of such bias was found. This case study also represents a one-shot study. Only one project employing binary data modelling was made available for review. With time however, new projects will be completed thereby offering the possibility of a multi-case longitudinal analysis. As a consequence of these restrictions it is apropriate to regard the nature of this case study as essentially explorative.
11.3 Environment 11.3.1 Hardware
As far as Australian organisations are concerned A.M.P. has a long history of computer involvement
extending back to the 1960's. In the overview provided by Daryl Dobe it was apparent that apart
from a period in the mid to late seventies, when UNIVAC equipment was used, A.M.P. had been
an IBM 'shop'. IBM's 360 series mainframes were used from the late 60's until the early 1970's. A
UNIVAC system was in place until 1979, when, after what was described as a diaster due to software
and hardware unreliability, IBM again won the hardware tender. As with many large corporations
with a history of early computer involvement A.M.P. had, and continues to work with, a centralised
data processing system. Current thinking is tending towards the use of distributed 'data centres' but
overall with a centralised structure prevailing. It was not seen to be 'economically viable' in the words
of Brian Donnelly, to move into a distributed processing, networked environment at least, in the short
to medium term.
Based on' a corporate straegy which recognised the strategic role information systems could play in
lifting corporate performance, A.M.P. began in 1979 to invest heavily in the provision of computer
resources. This expansion was accompanied by an increase in workload (measured by transaction
Australian Mutual Provident 11-3 throughput) that averaged 45% per annum over a seven year period. This rate of growth, but now from a much higher base, was still believed to be growing in the region of 25% per annum. Over the same period online storage utilisation had grown from 6-7 gigabytes in 1979 to an impressive
215 gigabytes in 1986. A significant portion of this growth was attributed to IMS based systems.
Not surprisingly, this growth had generated a demand for CPU resources which far exceeded the performance improvements of a single mainframe. The response, was to move to a multi-mainframe configuration represented by three of IBM's largest machines.
In an environment undergoing such rapid growth, capacity planning has become an essential activity.
At A.M.P. this is performed by the Information Systems Strategic Planning Group who undertake a system and user review to establish future requirements. Almost 250 user applications are reviewed annually to provide estimates of terminal usage, IMS and TSO transaction frequency and batch versus interactive usage. Amalgamated, these form the basis of a two and a half year projection of resource requirements.
11.3.2 Software History
Involvement in database management systems software began with the UNIVAC machine in the early seventies. DMS-1100 a CODASYL network database was introduced. A combination of factors, unreliable hardware, unreliable software and poor design knowledge made this experience a disaster.
Due to a high priority attached to application efficiency the 'database' design consisted of a single record. This had inevitably produced maintenance problems and forced extensive application rewrites whenever the data structure changed.
Accompanying the return to the IBM hardware world in 1979 was the hierarchical DBMS package IMS.
This provided the opportunity for a redesign of the database. Anxious to avoid a repeat of the data modelling problems that had been experienced with DMS-1100, entity relationship (ER) modelling and normalisation theory were specified as mandatory design techniques.
As part of the push towards end-user computing, RAMIS, a fourth generation programming lan guage/tool was introduced in 1982. The objective was to utilise it within the Information Systems
11-4 Australian Mutual Provident group to enhance application development, and outside the group as a user tool to assist in reducing the IS project backlog.
Productivity advantages were realised, to the extent that the package investment was repaid within nine months. Usage as an end-user tool was limited however because of concurrency and integrity problems which were experienced with RAMIS.
11.3.3 Software Current
The three IBM mainframes run under the MVS XA operating system. TSO and VT AM are used for support of online processing and RACF is used to manage system security. Transaction volumes related to IMS are of the order of 105-135,000 per day and in the vicinity of 500,000 per day for TSO transactions. It was estimated that up to 60% of the TSO transactions reflected applications under
development.
Using the Service Level Reporter facility, throughput is monitored in an endeavour to meet service
level objectives relating to availability and response time. Stated policy was to service a TSO transac
tion in less than 0.25 second and to process a 'simple' transaction in less than 4 seconds for Australia,
and in less than 5 seconds for New Zealand. It was believed that these objectives were being achieved
at least 90% of the time, howe_ver it was readily acknowledged that the definition of a 'simple' trans
action left considerable room for manipulation of the statistics.
In an analysis of information systems and strategic business objectives (functionally the responsibility
of the long range planning group) it was established that A.M.P. would need the ability to integrate
(at a systems level) functional business areas. Life Insurance, and Fire and General Insurance for
example, had developed as separate business areas and whilst both used IMS based applications
they maintained independent but isolated databases. A query, for example "What types of insurance
does customer X have with .us?" could not be answered without initiating a separate enquiry on
each functional business unit (database). This had not been regarded as a disadvantage until it was
established (through strategic planning) that the Insurance industry would move towards client based
insurance packaging as opposed to product based packaging. In order to provide client 'bundling'
Australian Mutual Provident 11-5 of insurance it was then evident that a restructuring and integration of the underlying information systems would be necessary. Unfortunately, with IMS databases this post-hoe integration presented significant technical challenges. When this was combined with the minimal flexibility expected from such integration (and the rapid changes in insurance marketting and consequently the information systems) it was determined that another alternative was needed. Relational database was seen as a solution.
Largely in a response to those integration requirements IBM's relational package DB2, was placed under implementation review. Results from this review showed that DB2 would become a critical systems tool in the development of strategic information systems. Expectations were that most new applications would be implElinented under it with the exception of time critical transaction processing systems. Where they exceeded DB2 performance limits such applications would continue to be im plemented under IMS. [Due to the absence of conversion plans for current IMS applications the role
of IMS was anticipated to remain dominant in the short to medium term].
With respect to the performance impact it was anticipated that a DB2 implementation would require
25% more CPU cycles than an equivalent IMS implementation. [This figure corresponds to that quoted
by IBM for release two of the package]. The impact on online disk storage had not been quantified
but was also expected to require a substantial increase in resources. Concern was expressed over the
availability of DB2 query functionality at an end user level because it was believed that the related
throughput impact would be immense unless strict controls were enforced.
11.4 Data Modelling
Parallel to developments in the hardware and software environment were changes in the systems
analysis and data modelling methods employed. [Causation appears to flow from the introduction of
new systems software to new analysis procedures]. At the time of the arrival of the UNIVAC machine
in the early seventies, the concept of data modelling, at least at A.M.P. if not universally in commercial
installations, was non-existant. Design of file structures was a 'black art' in which the 'expertise' of the
analyst was the critical factor in successful systems design. Furthermore, machine efficiency rather
11-6 Australian Mutual Provident than design flexibility was an overriding concern. This situation prevailed throughout the life of the
UNIV AC despite the (troubled) introduction of a network DBMS. Data analysis was only recognised as a standard activity upon the return to an IBM environment and the installation of IMS.
In vogue at this time was the Entity Relationship (ER) modelling technique which was adopted as a standard for the systems analysis and design phases. With this technique analysts utilising intuition and 'observation' would select entity categories. From here a normalisation process (Codd) was applied to produce a logical design. Typical comments on the method (with the benefit of hindsight) were that it relied too much on staff expertise. This had lead to 'less than optimal' results in which design errors would often only be identified after implementation. Th~ consequences were inflexible systems and unsatisfied users. A factor believed to have played a large part in the design problems was the gap which existed between users, who understood the application, (and the data with implied semantic
relationships) but little technical detail, and the analyst, who understood the technical aspects but little
of the users business knowledge.
As such the problems experienced at this time were explained as one of communication and not due
to fundamental flaws in the data modelling methods chosen. Applied rigorously, these techniques,
forming the basis of top down modelling, should result in the same data model as a bottom up
approach as represented by binary modelling. This requires the important assumption however, that
designer understanding of the application will not be significantly influenced through choice of these
alternative modelling approaches. [This is equivalent to saying that the fact base should be the same to
produce logically comparable models]. However, there is reason to doubt that this assumption holds in
practice because binary modelling and analysis as employed through NIAM, facilitates communication
in a way not provided by ER modelling and analysis.
ER modelling was used as the major modelling tool until December 1984 when NIAM became the
mandatory data modelling standard. Unlike the introduction of ER modelling NIAM preceded a ma
jor change in systems software - the arrival of the relational database DB2. This can probably be
attributed to the ad-hoe introduction of NIAM brought about through the efforts of a contract analyst
Australian Mutual Provident 11-7 who had previously worked with Professor Nijssen (the developer of NIAM). A formal search for a new modelling method was never initiated nevertheless, NIAM concepts rapidly found acceptance among Information Systems management and a pilot project was initiated to test it. Based on the suc cess of this pilot, a major application was commenced and the systems analysis phase subsequently restructured to incorporate NIAM.
Currently A.M.P. data modelling procedures describe 14 discrete steps on the NIAM method which guide an analyst through data analysis and design. Commencing with a base set of facts describing the information system an induction process is followed resulting in a "syntactic and semantically
expressive" conceptual schema. The major parts of the schema are presented in a graphical form, which are reviewed by users during the design phases. System documentation includes the conceptual grammar, which provides a formal description of the application, and the schematic diagrams. The
following section describes at which points NIAM has been integrated into the systems lifecycle.
11.5 Systems Lifecycle
The systems lifecycle at A.M.P. commences with business systems planning (BSP) conducted by the
strategic planning group. Ultimately this group is concerned with forecasting future business directions
and establishing the role data processing technology will play. The process begins with a review
of business units (up to 5 year projection) noting future requirements and potential products and
applications. A generic, or macro data model is then prepared utilising an ER modelling technique.
Based on this portfolio of potential applications, feasability studies are conducted. Contingent on the
results, and subject to user approval and resource availability, structured analysis then commences.
SDM-70, a lifecycle development package is used to prepare systems requirements documentation
(SRD). The SRD incorporates data flow diagrams and utilises a data dictionary. It is at this stage that
data analysis and data modelling commence resulting in the development of a full logical design. User
involvement in the modelling phases is mandatory.
In the next phase, Systems Design Alternatives (SDA), physical design issues are considered. When
the desired alternative is generated System External Specifications (SES), corresponding to the physical
11-8 Australian Mutual Provident database design, and Systems Internal Specifications (SIS), corresponding to the programming phase,
are prepared. The development process is completed with the implementation specification.
A diagram of the database design process which corresponds to a subset of SES is reproduced in
appendix E. Binary data analysis (NIAM) is represented by the phase 'Application Data Analysis'.
Input for this phase is the generic entity relationship data model of the organisation and business
information structure detail. The business information structure detail has in turn been derived from
the Information Systems Architecture model. It comprises an analysis of the application in terms of batch, online and cyclic transactions. As well as being used for logical data modelling the structure
detail is used to map transaction type usage statistics onto the physical model to determine access
strategies and keys.
As a result of the binary modelling process an Application Data Model in relational form is produced.
This is then rationalised to determine what will be implemented in conformity with the project scope.
A system or Implementation Data Model is thereby produced. When transaction statistics are mapped
to it, a physical data model results. It is at this point that the actual design process departs from that
depicted in the diagram. Since the introduction of binary modelling the need to conduct a second
rationalisation has been obviated. The use of the relational DBMS, DB2 has made the physical data
model directly implementable. The procedure as drawn remains current for IMS implementations.
11.6 Data Modelling Experiences
In the interview with David Nash and two staff analysts an attempt was made to identify the advantages
and disadvantages of binary data modelling. In the latter respect there was little success. Without
reservation, it was felt that binary modelling had been implemented smoothly and had been a success.
There was not seen to be a limit to its use (taken to mean that project size was not a restricting factor)
although, it was conceded that it would not replace the role of ER modelling at a corporate level (i.e.
derivation of a generic data model and strategic modelling). Interestingly, the major advantage was
seen to be that binary modelling 'de-mystifyed' analysis.
Australian Mutual Provident 11-9 Whilst no direct evidence (in the form of project records) could be found to support the statement,
it was thought that the surge in user involvement had been directly related to the introduction of the
simplified data analysis process (NIAM). Analysts were viewed as using a technique to which users
could relate with minimal training. Due to the enhanced communication between these groups the gap was effectively being closed on differences in problem comprehension.
The analysts were of the opinion that the technique was much more rigorous in producing a systems
model than had been possible with 'conventional' techniques. It allowed for early problem diagnosis
and forced evaluation of the 'conventional wisdom' or assumptions. When these assumptions were
carefully considered it had often necessitated management (functional) involvement. This ultimately
lead to tighter specification of user requirements. A significant benefit was seen to be the forward
planning this had forced on management (functional). In addition, owing to the self-documenting
nature of the design process the quality and integrity of the documentation had improved.
On the basis of the major application analysis results, it was felt that users had contributed up to
50% of the analysis effort and, that this could be increased. Countering this effect on DP workload
had been an increase in the effort associated with analysis, perhaps of the order of 100%. It was felt
strongly however, that the real rewards would accrue during implmentation, to user satisfaction, to
reduced maintenance and to an extended system life.
In summary data modelling had achieved :
• a shift in development responsibility to user departments • a shift in the analysis workload from DP to users (although possibly in percentage terms only) • an increase in management involvement and thereby improved management planning • improved documentation quality
• extended system life and reduced lifecycle costs
11-10 Australian Mutual Provident 11.6.1 User experiences
Having established management and analyst perspectives on binary modelling it remained to confirm with users how well binary modelling had been accepted, and of the experiences relating to its use.
For this purpose, an interview was organised by David Nash with 'Steve' an experienced user who had played a key role in the development of a new 'agents commission' system. This was the first large project in which NIAM had been used.
Steve had been seconded to head office from the position of the Departmental Head of the N.S.W.
Commision Branch in October 1983. He was assigned the task of writing a user manual for the then
current commission system. At this time Steve had no prior development experience but had been
selected for the task on the basis of procedural familiarity with the system. After having accomplished this task in two months he was offered the position of user representative on the new commision
system design team.
The agent commission project began with 6 staff. In the final development phases 72 people were
involved. It was anticipated that the system would have a strategic business impact and consequently
the budget was set at a lofty $10 million. The scope was initially very wide; 'build a replacement and
automate new areas of related business'. The result of this was a business model with 'too much'
data. Subsequently, the design was cut back for implementation. It was speculated that this had been
a deliberate strategy.
In May 1984 project managers became involved and the system was split into four with a group of
technical and user staff associated with each sub-system. This initial analysis phase was conducted
with a group of two technical staff and two users. Steve remained firmly in the user camp assisting with
such tasks as report and screen design. The analysis effort was described as '12 months of continuous
meetings' most of which were conducted as brainstorming sessions. Minutes of the meetings were
logged by the technical staff.
Australian Mutual Provident 11-11 At the commencement of the analysis phase a two day training session was conducted for analysts and users unfamiliar with binary modelling. Steve unfortunately missed this initial training and 'went
in cold' to the first analysis meetings. As a result it took a while before he felt familiar with data
modelling concepts. With usage, he found the technique easy to master and a valuable aid in systems
documentation (representation) and user analyst communications. Steve was unable to comment
on the merits of binary modelling relative to other approaches because of his limited exposure to
systems development however the concept had been 'easy to grasp'. Whilst not demonstrating the
same enthusiasm as the analysts he seemed content with the technique and believed that good results
had been achieved;
'Without it the results would not have been as good'. 'We would not have identified the key areas as
early.'
Commenting on the project as a whole he expressed discontent over the power problems of group
interaction. He believed that a group of no greater than 5 be used, 4 being optimal. In his view this
would be best formed by 3 user representatives and a systems analyst.
11. 7 Conclusion
A.M.P. represents a mature organisation in terms of its data processing history and current state.
Its experience with database management systems dates to the mid-seventies and it has continued to
remain abreast of DBMS technology. In conjunction with expansionary policies in hardware acquisition
and in productivity based software tools A.M.P. has focused attention on the data modelling and
requirements analysis issues. From a 'back door' introduction, binary data modelling through NIAM
was adopted as a standard for the systems analysis phase.
As a response to marketting pressures which demanded integration of the business, A.M.P. embraced
relational database technology. This further reinforced the trend towards a 'data focus' in the devel
opment of information systems and naturally on the data modelling task itself.
11-12 Australian Mutual Provident In the case study the evolution and introduction of this method from a management, analyst and
user perspective was described. Considerable support was found in each of these groups for the
concept and practice of data modelling. Systems analysts found NIAM to be a clear and consistent
(de-mystifyed) method for use in the analysis and requirements specification phases. These phases were emphasised ahead of its usage in the logical design phase notwithstanding that through iteration
the distinction between phases blurs somewhat. Nevertheless, NIAM's claim as a method which
supports the first three phases of the information systems lifecycle was clearly supported.
Enhanced user analyst communication, with the ability to fully involve users, and good systems repre
sentation tools were seen as fundamental to the success of NIAM. These advantages are in line with
predictions from the theory which touts graphical notation and the simplicity of the binary construct.
Users verified the ease of learning and contrasted their extensive involvement in the development pro
cess after NIAM's introduction to their minimal involvement under the traditional analysis approach
of which ER modelling had been a part.
From the belief (expressed by analysts and systems management) that NIAM specifications were better
developed and would result in reduced maintainence and lifecycle costs, indirect support was found
for system quality, as measured through design convergence, consistency and completeness (section
9.1.6). Quantifying gains was beyond the scope of this limited study however it would appear that a
measurable impact on productivity and lifecycle costs could be found, warranting closer examination
in future research.
Abstraction support and semantic expressiveness (strong theoretical advantages of NIAM) were not
mentioned by the analysts. If present, these advantages would be expected to become dominant with
growing systems comlpexity. A multi-project case study in this environment (with varying complexity)
would be necessary before conclusions could be drawn on this aspect.
Australian Mutual Provident 11-13 CHAPTER 12
DIGITAL EQUIPMENT CORPORATION
12.1 Introduction
The third case study examines the development and application of data modelling in a high technology manufacturing organisation. The company, Digital Equipment Corporation (International) Kaufbeuren located in the 'German Silicon Valley' near Munich is a subsiduary of the multinational of the same name headquartered in Maynard, Massachussetts, U.S.A.
Digital is representative of 'leading edge' high technology corporations. Whilst predominantly a man
ufacturer of computer hardware systems it also develops advanced systems and applications software
in support of its business operations and objectives. As a consequence, Digital has evolved into a
sophisticated user of software development methodologies, a trend which is certain to continue as
increasing resources are directed at this segment of operations.
There are two major objectives which flow from this. The first is to examine the evolution of sys
tems analysis and data modelling techniques at a corporate level and the second is to examine the
success of these changes (at a local site level). The expectation is that Digital, through definition of
its (information systems) business requirements will become an influential body in data modelling
theory development. This is because the feedback obtained in the process is used for the ongoing
development of the Digital systems lifecycle methodology and ultimately this exerts an influence on
theory development.
The following specific issues were investigated :
• the corporate and local business domains
• history of systems and data analysis including their evolution and the reasons underlying the
changes (at a corporate level)
Digital Equipment Corporation 12-1 • description of the systems development process and data modelling phase as a methodology
• applications of data modelling (to Kaufbeuren projects)
12.2 Corporate Environment
Digital Equipment Corporation lays claim to being the worlds leading supplier of networked computer systems. It has operations in 24 countries and a workforce of 112,000. Annual sales as of April 1988 amounted to $11.3 billion and currently growing at the rate of 20% per annum. Market competition is intense as evidenced by the number of new product announcements. Industry price/performance ratios, however measured, are under constant surveilance and constant pressure.
In this environment, survival and growth require the ability to research, develop and apply technology rapidly. As the corporation has grown and the computer market matured this requirement has come to embrace software equally as much as hardware. Significantly, the strategic 'system' advantage claimed by Digital is based largely on interconnectivity. This derivates from the hardware and operating system architectures and related systems software.
Recognition of the importance of systems and application software, including languages, development tools, database products and third party applications, has led to greatly increased effort and resource allocation in both the procedural and technical domains of software development. The objectives for internal, but especially external applications of software have been to provide:
• Fast response to market trends (proactive) and specific customer requirements (reactive)
• Improved reliability of products • Reduced development and maintenance costs • Evolutionary approach to software products embracing a 'release' concept • Enhanced communication capabilities between applications through electronic data interchange
In line with these objectives in the software domain, Digital is continually examining techniques and methodologies through which they can be realised. Software quality and development productivity have been targettcd for improvement through the following measures:
12-2 Digital Equipment Corporation • provision of automated tools in support of requirements definition, data analysis and data mod
elling (the tools represent a combination of internally developed and externally contracted prod
ucts depending on availability and strategic requirements)
• provision of research and development funding for technical and architectural design of dis
tributed information systems
• provision of a wide variety of tools for data management from relational database products to
fourth generation languages
• provision of extensive internal training on data analysis, data modelling and systems design
stressing the concept of data independence
• support of the Computer Aided Systems Engineering (CASE) project
12.3 Local Environment
Digital Equipment Kaufbeuren was established as Digital' s first manufacturing site on the European continent in 1977. It's charter is volume production of high-end mass storage products to supply
European demand and to act as a second source for the United States and Group International markets.
Kaufbeuren has a sister plant located in Colorado Springs, Colorado, with which joint projects in storage technology are undertaken.
With a mission to be the 'European Storage Centre of Excellence' the original manufacturing operations have been supplemented by the formation of process and product engineering departments. These were introduced to provide incremental storage systems engineering capability, to improve quality,
and to assist field service operations. Since the formation of these departments the percentage of the
Kaufbeuren workforce employed in engineering functions has risen to 25%, representing 200 people.
The manufacturing processes consist of a combination of precision, yield sensitive assembly op
erations conducted in a clean room environment (Head Disk Assembly) plus circuit configuration,
electronic testing and end product configuration. Workflow (completion) data, and process test data
are collected at all stages of the manufacturing process. The data is used to support scrap/rework
Digital Equipment Corporation 12-3 decisions, failure diagnosis and process engineering. Scrap/rework and failure diagnosis at manual workstations are supported by 'expert systems' programmed in the language OPS-5.
Data collection systems and process control systems are currently being integrated under a corporate sponsored project, Computer Integrated Manufacturing, CIM. Some of the systems implemented in
Kaufbeuren under this project include:
• TDC - Test Data Collection
• CAPS - Computer Aided Process Support
• ASRS - Automated Storage and Retrieval System
• MAXCIM - integrated financial, inventory control and manufacturing planning package
With the exception of MAXCIM, which is an external product maintained and enhanced by Kauf beuren (source code supplied), all systems have been designed and developed by engineering and information systems groups within Digital. TDC was developed by Colorado Springs. CAPS is a joint
Kaufbeuren, Colorado Springs project and ASRS is exclusively a Kaufbeuren project. All of these systems have, or will have in the near future, a transaction interface available to MAXCIM under a recently commissioned Electronic Data Interchange (EDI) project. A key objective of these projects has not only been to improve productivity and control of manufacturing operations but to demonstrate to customers the application of Digital systems to the manufacturing environment.
12.4 Methodolgy review
Systems life cycle methodologies were subject to review at the corporate level by the Digital Informa tion Systems group in 1984. At this time it was found that the existing system life cycle methodology was obsolete because it failed to provide adequate support in the following areas:
• technical architecture (representation thereof)
• data management
• data modelling
• prototyping
12-4 Digital Equipment Corporation The Digital Standards Group was subsequently asked to develop a requirements specification against which external or internally developed life cycle methodologies could be evaluated. A systems life cycle review team was then formed with representatives of the Technical Management Committee and Data Management Committee.
During the course of the project twelve system life cycle packages were reviewed of which four were selected for presentation to the review group. From this process the life cycle package from DMR
Group Inc. was selected for field testing. The field tests involved one new application development and two, replacement of existing applications developed for older hardware. The feedback from all three field tests suggested that DMR' s life cycle was beneficial. Within Digital the methodology was then recommended for development use on all new projects. Rights to the package were purchased and a commitment made for the provision of training, documentation and support world wide. In the following section an overview of the methodology is presented.
12.5 Systems Analysis
The DMR methodology is a systems development package which incorporates structured techniques with integrated data and process modelling phases. At the macro level, the methodology is not unlike
the traditional systems lifecycle model as defined by Wasserman (section 3.6). In documentation
and training a heavy emphasis is placed on 'information' engineering concepts to ensure that system
development is data orientated. The methodology explicitly supports three development approaches:
• Traditional development
• Prototyping
• Package selection
After an initial project evaluation is complete one of these methods is selected for development, al
though combinations, for example of traditional and prototyping approaches, are possible. A different
set of tasks exists for each approach, but all have six phases in common (see figure 4). The primary
concepts are:
• Structured decomposition through a hierarchy of data and process models
Digital Equipment Corporation 12-5 The approach adopted by the DMR methodology to systems development (and reflected in the
lifecycle phases) conforms to the 'conventional' approach of top-down development. Decompo
sition is extensively used. The ISO reference model was used as a base when developing the
methodology, and reflecting this, strong support has been provided for the conceptual, functional
and physical levels. (The specific concepts and techniques employed are treated in depth in the
following section].
• Release orientated development
This is based on the principle of developing a system architecture and then partioning the sys
tem functionality into releases which are progressively developed. Each release is a functioning
application. This reflects a management (and marketting) strategy behind large software projects.
Release orientated development also provides implicit support for prototyping.
• Project management by deliverables
By emphasising 'deliverables' the methodology focuses on the end products of a teams effort
rather than the process by which it is accomplished. The methodology represents a generic
approach to systems development which makes it applicable to a wide variety of projects. Con
sequently, specific techniques, methods and tools, when mentioned, are not tightly coupled to
the methodology. This has allowed Digital to continue to use proprietary tools and techniques
and to upgrade or introduce them as they are developed, and as needed. The possibility then
exists to meet highly variable, development requirements whilst enforcing uniform development
and control concepts. [Tools for representation of the technical architecture are supported in this
way).
12.6 Modelling and Partitioning
Information systems, in the DMR methodology, are regarded as a composition of structural and
procedural elements. Accordingly, two types of models are employed to analyse and define them:
• models of data and their interrelationships
• models of processes and their interrelationships
12-6 Digital Equipment Corporation Figure 4: DMR Systems Lifecycle
I.Opportunity evaluation - define the problem - evaluate the appropriateness of a preliminary analysis - prepare a project proposal (if appropriate) 2. Preliminary Analysis - analyse current system define system context and objectives build the conceptual data model build the conceptual process model establish basic system concepts - describe external design alternatives - translate selected alternatives into basic systems concepts build the functional process model determine technical feasibility perform cost/benefit analysis 3. Systems Architecture - complete conceptual data model refine functional process model define system performance criteria define environment, technical standards and data processing outline physical process model 4. Functional Design - develop implementation plan - build the functional data model - detail the functional data model - detail the functional process model s. Systems Construction - build physical, data and process models - prepare test environment - conduct functional tests 6. Implementation - install system - conduct systems tests - start production - evaluate system Management of complexity is handled through partioning these models and providing the conceptual, functional, and physical hierarchy as reflected in the systems lifecycle.
DMR provides for processes and data to be modelled and partitoned in different ways. Processes are defined and grouped according to their objectives. Their purpose is to perform functions or to transform data. In order to minimise overall process complexity they are partitioned or decomposed into increasingly elementary functions with (an objective of) minimal interaction.
Data are defined and grouped according to their subject or meaning. Data complexity is minimized when an object or event of interest is unambigously defined and when a minimum of data is required
Digital Equipment Corporation 12-7 to interpret it and access it. A hierarchy of data models can be built by aggregating or adding subjects at increasing levels of detail.
With processes and data being modelled by two different techniques with different structures the boundary between the models is made distinct. Nevertheless, the models are interdependent. Each process manipulates a certain set of data and has a certain view of the objects which the data repre sents. Conversely, each data element is used by a variety of processes. The data must fit the view of each process and the processes must treat each data element consistently. This implies that at each stage of modelling synchronisation is required. This is a function of the system architecture.
12.6.1 Conceptual Modelling
At the conceptual level, DMR defines the processes to be carried out and the interpretation of the data.
The conceptual level reflects management strategies for operating the business independently of the way the system will function or the equipment on which it will run. Entity-relationship diagrams (as
described by Chen, 1976) are used to provide a graphic representation of the conceptual data model.
In DMR the conceptual data model is defined as:
"a representation of the objects or entities about which an information system collects, stores or
produces data; of the associations or relationships occuring among entities when the system causes
or responds to an event, and of the attributes of those entities and relationships"
The DMR/ER technique utilises stepwise decomposition. Firstly a macro model of the system is
developed showing the relationship to other systems and major entities at the organisational level.
This is the context data model. Entities of immediate interest to the developing system are then
grouped into subject data bases (domains). The subject data bases so formed may then be modelled
in more detailed entity-relationship diagrams or with binary modelling tools.
The conceptual process model is represented graphically by data flow diagrams (as described by Your
don). It contains only logical level detail, that is, the model depicts only relationships between data
and processes independent of the methods or tools employed to transfer data or execute processes.
12-8 Digital Equipment Corporation As with the conceptual data model the process model is structured as a hierarchy. A context process model is first developed which is then decomposed into subsystems and functions.
DMR define the conceptual process model as:
"a representation of the data flows describing situations or events to which the system responds, of the functions or processes that are stimulated by the data flows and produce ths system response, of the external entities of the system's environment acting as sources or links of data flows and of data stores holding the data the system needs in order to respond to events"
Both definitions are consistent with the ISO definitions.
12.6.2 Functional modelling
The functional model describes the behaviour of processes, their interaction with each other and the paths they use to access the data. The functional model is also a representation of the way the system will interact with the environment. Technical details are user transparent. The functional model
equates to the external model as defined by ANSI/SPARC.
The functional process model is based on the conceptual model. Using an interative partioning process
(described at the conceptual level) the functional model adds detail through inclusion of organisational
and geographical structure, work methods, automation guidelines and implementation strategy. Data
flow diagrams are supplemented by narrative descriptions of the process logic.
The functional data model includes record, data element and access path detail. It is the conceptual
data model enhanced by access path information, the limitations of the DBMS available, automation
guidelines and efficiency considerations. The functional model as defined by DMR, is consequently
navigational when non-relational data base management systems are used. Optimisation is based
on qualitative and quantitative factors. Qualitative in the sense of considering geographical distribu
tion, recovery, and required level of data independence, and quantitative in the sense of transaction
volumes and storage considerations.
Digital Equipment Corporation 12-9 The functional data model is represented by data structure diagrams which show a record as a rectan gle, and a link as an arrow. The links between two record types specify the maximum and minimum number of occurences which can be associated with the binary relationship. These maximum and minimum occurences are used to guide the process of record formation in a fashion similar to that of
NIAM.
12.6.3 Physical modelling
The physical model used by DMR describes the internal processes and data structures used to build the
system. It represents the technical organisation of the system and corresponds to the ANSI/SPARC in ternal level. Data and process models are represented by the record layouts, environment parameters,
and program structure charts respectively. The detail required at the physical level varies depending
on the implementation environment. For a system developed in a relational database environment with a high level programming language the required detail is significantly reduced in comparison to
an environment with a network database and low level language.
12. 7 An inventory application
Business planning conducted at a corporate level in the early 1980's identified the need for steady
but significant cuts in inventory levels through all stages of the manufacturing process, reductions in
product cycle times (from date of receiving a customer order to date of shipment) and a commitment to
principles of Just in Time GIT) and Total Quality Control (TQq. These were seen as critical responses
to intensifying market competition.
One aspect of the response by Kaufbeuren was to investigate means of improved inventory control
through an Automated Storage and Retrieval System (ASRS). This required the installation of high bay
storage units for location and lot controlled pallet storage of component parts and work in process.
The objective was to reduce storage space requirements to 25% of the former level and to provide
greater inventory visibility and hence inventory control. Furthermore when linked to an Automated
Guided Vehicle (AGV) material movement system it would allow fully automated material flow on the
production floor.
12-10 Digital Equipment Corporation The system architecture was designed by the MIS group in Kaufbeuren. Flexibility and modularity were two important criteria as the system would be required to interface with present business planning and control systems (MAXCIM) and with future material transport systems. A macro model of material flow was prepared by a cross-functional team representing material planning, process engineering
(layout), advanced manufacturing technology (physical material flow) and management information systems. Conceptual data and process models evolved from these group meetings over a 12 month period. Entity relationship modelling, binary data modelling and data flow diagrams were used as documentation and group communication tools.
Parallel to the conceptual planning work, a sub-committee was formed to address the priority re
quirements of the ASRS system. Using the inventory partition of the evolving conceptual model the functional specifications were developed consisting of four major components. A MAXCIM interface, an ASRS control module, an AGV interface and the underlying hardware module at the physical level.
Physical design and development of the interfaces was undertaken by the MIS group in Kaufbeuren whilst the ASRS control module was developed, based on Kaufbeuren functional specifications, by
external contractors. Data analysis and database design phases were conducted for the first time using
data and process modelling as defined by DMR. Elapsed time for the functional modelling phase was
aproximately 8 months during which several versions were generated.
At the functional level both process and data modelling were relatively complex due to the variety of
transactions possible and the integrity requirements. Transaction variety stemmed from the need to
cater for storage and control of inventory with lot, location and quality characteristics. This last factor
was particularly significant because of the need to process test engineering, quality engineering and
material rework transactions with resultant samples, returns and rejections. Data requirements were
complex for this reason but also because of the requirement to 'track' components in 'downstream'
manufacturing operations from source data.
With these characteristics the project represented a good test application for the methodology and
modelling techniques. Prior to DMR, an in-house 'methodology' had been used which had provided
Digital Equipment Corporation 12-11 guidance in project management and had also provided tools for development. Missing however, were specific techniques for the design and analysis phases.
In the final system configuration some 10 major transaction types were documented which would
pass data through the MAXCIM/ASRS interface. A similar number of transactions in each component
system were also identified. Some 20 MAXCIM files representing 280 data elements were impacted
and 5 RDB databases in ASRS representing 34 data elements resulted. Binary data modelling was
used for analysis, documentation and design to produce a functional data model. This model was
then 'rationalised' to meet the performance requirements of a process control environment.
12.8 Modelling experiences
The systems analyst representing MIS was assigned responsibility for development of the functional
and physical level models. As preparation, a two week course on data (binary) and process modelling
was undertaken. The analyst had no prior exposure to data modelling techniques but through data
flow diagrams had process modelling experience.
The cross functional team previously mentioned met on a fortnightly basis for 2-3 hours alternating
the discussion between business analysis and systems design reviews. Data analysis training was not
extended to the users as it was felt that the concepts and techniques could be explained through usage
and under analyst guidence. The users assumed responsibility for eliciting the 'business' (functional)
data model. Design documentation was largely generated as a byproduct of these meetings.
The analyst comments on the early business analysis sessions indicated that some problems had been
experienced by the users in accepting the technique. This was believecj to be because the users were
familiar with 'transactions', 'processes' and 'procedures' but not with the concept of a low le11el data
orientated view of their business. As a result more structure was required in those first sessions to
guide the users and to prompt for 'details' of data and the relationships. In later sessions a 'free
association' approach was followed as the users became familiar with the purpose and direction of
data analysis and binary modelling. Based on this experience the project team agreed that future teams
12-12 Digital Equipment Corporation should attend some preliminary data modelling training. Communications between team members was nevertheless at a high level. At the end of the project the Materials' user commented:
"We were able to discuss the business issues and data flows in a a structured manner but free of the normal systems issues and technical jargon. This allowed us to feel comfortable with the modelling process and to develop a sense of data ownership."
That opinion was further supported by the analyst who confirmed that prior to binary data modelling
MIS had "struggled to maintain user participation" in the crucial analysis and design stages of projects.
It was theorised that ER modelling had been too complicated for the casual user and that unlike binary modelling it was perceived as systems work.
Design of the data models was conducted by the analyst outside the regular meeting times however as they developed the models were subject to regular review and the process of modelling explained
such that users were made aware of the impact of their assumptions and decisions on the model. An
interesting decision was the use of Entity-Relationship diagrams for representation purposes. Justifica
tion for their use was that binary modelling had shown its strength in support of data analysis and
modelling but that a conceptual model in ER format was easier to understand.
"We wanted user involvement and a data focus. We also wanted to re-examine without prejudice our
data assumptions. A bottom-up approach achieved this, however for final representation and wider
(user) review we believed ER diagrams suited our purposes best."
General user comments on the modelling process mostly reflected satisfaction with the level of par
ticipation. This prompted a feeling of greater control. "Detailed examination of the data also forced
us to review our business practice and perhaps to see opportunities for change which had not been
identified at the outset of the project." It was felt that a role existed for the more traditional modelling
techniques (of which ER was associated) in the domain of macro (business) modelling but that espe
cially with functional modelling benefits had been realised with a 'details to generalities' approach of
binary modelling. The major problem was seen as the tendency of binary analysis to "exceed project
Digital Equipment Corporation 12-13 boundaries" as indicated by some discussions which had gone off track during the 'free association' sessions. Tight control over scope was therefore felt important in preventing an 'all-encompassing' project with 'never ending' analysis.
In summary, the benefits attributed to the introduction of data and process modelling techniques encompassed:
• design verifiability
Project participants, including the external consultants, users and development team members
were able to examine the assumptions underlying the data modelling (data relationships as re
flected in participations for example) to determine the validity of the design. Binary data mod
elling provided a means of examining the 'conventional wisdom' in a critical and rigorous manner.
• enhanced problem understanding
The development team was able to use the documentation generated from the binary modelling
phase to come to a common understanding and agreement on the business issues and prob
lems. Such agreement was essential for coding and testing phases particularly as the design
specifications formed the basis of the contractual agreement with the external vendors.
• management of complexity - data orientated system
Rather than define the functions first then fit (or adjust) the data model as required, the data model
was completed in conjunction with the functional model. This approach produced a simpler (better
defined) process model as reflected in the resulting manual and computer procedures.
Indirect benefits included a better working relationship between project participants (through en
hanced communication) and improved project control (due to a heavier investment in analysis and
design in the planning stages). For a larger project the benefits were expected to be greater in this
area. Testing and implementation were also seen to have been eased due to the clear definiton of
functions and responsibilities which existed.
12-14 Digital Equipment Corporation 12.9 Conclusion
Digital Equipment Corporation represents a leading edge technology firm, one which is dominant in the hardware and software realms of the minicomputer and workstation market. Strategic business requirements indicated the need for proprietory relational database systems and associated tools.
These were subsequently developed however in order to promote effective and efficient usage of these products it was recognised that existing lifecycle methodologies would require change. In the lifecycle reviews which followed process and data modelling were identified as key techniques to be supported.
The case study looked at the Digital environment, the systems lifecycle methodology review process and the use of data modelling in an inventory application. It was seen that a combination of data modelling techniques was used, ER for macro modelling and model representation purposes, and binary modelling for the data analysis and functional/conceptual modelling. Such a result is interesting but not altogether surprising. The theory would suggest that ER has strengths in conceptual modelling
and representation with the later being a significant factor in its use on a project with a high level of
user involvement. That ER was used for final representation purposes reflects somewhat on the actual
facilities available in the binary modelling technique itself. As the binary modelling was most similiar
to Kent, lack of diagrammatic support favoured ER. The binary model was chosen nevertheless for
the detailed phases because of its ease of use, documentation support and communicability. These
features were confirmed during the project.
As with the A.M.P. case study indirect support for quality improvements were found, attributed to the
use of binary modelling. The extended analysis phase and heavier (than normal) user involevment
was believed to have improved the specification and detailing of requirements. Whether this had
translated into a better system was only confirmed in a subjective manner by project participants.
With the low level of project complexity it was difficult to verify the metrics of abstraction and semantic
expressiveness. Based on the tools used in the project however problems might well be experienced
in this area. Both ER, and the Kent like technique, seemed not to provide adequate support for
large complex data models. This is undoubtedly an area which requires development as expectations
Digital Equipment Corporation 12-15 indicate that significant advantages from binary modelling could accrue on such projects. A more sophisticated binary approach, perhaps that offered by NIAM might be beneficial. The advantage offered by the DMR methodology is that such a technique would fit seamlessly into the systems development process should it be required.
12-16 Digital Equipment Corporation CHAPTER 13
SUMMARY
In this report a comparative analysis of data modelling theory and practice has been conducted. Com mencing with a justification for the significant volume of research in the area of data modelling, the report has argued for the creation of a reference framework in which competing modelling method ologies could be evaluated. Pursuant to the goal of standardisation of terminology the language of the
International Standards Organisation was adopted whenever possible. In Chapter 3 the philosophy of the nature of data and reality was discussed as a forerunner to the difficult task of integrating the diverse perspectives of data, data models and database architectures found in the literature. In Chap ter 4 a classification of data models was presented followed by a summary of the conceptual schema and database model as defined by the ANSI/SPARC committee.
Based on this framework and terminology, a feature analysis was conducted of four data modelling methods. Each represented an approach to data modelling varying however, in comprehensiveness, application and philosophy. The major concepts of each model were described in Chapter 8. In
Chapter 9 a comparative analysis of the methods was conducted, using a taxonomy derived from the
Comparative Review of Information Systems conference.
Based on the findings from this analysis it was evident that most of the 'theories' represented normative positions and in order to support or reject those positions field testing was seen to be necessary. This presented a difficulty. Several of the data modelling theories (KENT, ACM/PCM and to a lesser
extent NIAM) were not widely known in development environments thereby limiting the potential of
most types of field research. Due to this 'sample' shortage a three environment case study design was
adopted with subject selection based on availability of relevant data. The major findings are presented
in the next section, followed by a discussion of the research limitations. Based on a combination of
these two sections the report concludes with a review of future research opportunities.
Summary 13-1 13.1 Case study conclusions
"The change process and the solutions introduced correlated with the sophistication of the environ ment"
In each of the three case study environments the concept of a data driven approach to systems design was found to have strong support. In the commercial environments adoption of a data driven approach was seen as a response to increasing system complexity and of the need to integrate organisational data requirements in an effective manner. Digital Equipment Corporation, finding the existing lifecycle methodology inadequate to support present and anticipated requirements commenced a controlled search for a replacement. In the requirements specification which resulted, the need to support data analysis, data modelling and the concept of data independence, was emphasised. Australian
Mutual Provident, with clearly a different set of business objectives, did not conduct a formal search for the data modelling method/methodology which was introduced. In contrast to Digital, change resulted from the efforts of a contract analyst who introduced NIAM. Being less extensive than the
DMR methodology from Digital, the impact changed the process of analysis and design but left other
lifecycle phases unchanged. At the University of New South Wales change in the data modelling
method also reflected more of an ad-hoe approach (opportunity) than of a planned search. The
KENT technique, being less extensive than NIAM impacted only the analysis phase and some aspects
of design. Drawing these results togther the non-surprising conclusion is that the methodology or
method introduced should match the requirements and sophistication of the environment. Whilst the
objectives have been similar, a single concept, binary modelling for example, was seen not to have
provided the complete solution (Digital).
"Effective communication and user involvement were enhanced"
A single theme was dominant in each of the case studies, that being the role of communication be
tween project participants. The phrases 'increased user involvment', 'demystified analysis', 'ease of
learning' and 'simplicity of the concept' reflected on the positive experiences with binary data mod
elling. In both of the commercial environments users reported increased project participation with
13-2 Summary corresponding gains in effectivity. Through the benefit of a 'traceable design process' the belief was strongly held (users, analysts and IS management) that better specifications and designs had resulted.
In the University environment an improved understanding of data modelling and normalisation con cepts (by students) was acknowledged. Here (through direct observation and participation) it was seen that significant discussion was generated regarding the 'modelled reality' and that binary modelling had faciltated this. In all environments the idea that a 'correct' data model could be 'produced' by IS seemed to have been dispelled in light of the improved user awareness of the data modelling process.
"Binary modelling has its' limitations"
At A.M.P. it was seen that NIAM had been enthusiastically embraced as the data modelling standard, displacing ER modelling in the process. Nevertheless, there remained a role for ER modelling in the creation of a macro or business model. This was dear acknowledgement that NIAM and binary modelling could not be all things for all people. A bottom up approach was an addition to the overall analysis task rather than a replacement to top-down analysis. Acceptance of a dual analysis and design
approach was also made explicit in the methodology adopted by Digital. ER had a defined role in
the preparation of overviews and scope definition. It was also the model of choice for representation .. purposes. Convergence of bottom-up and top-down analysis was stressed on consistency grounds.
In the University environment such a dual-method approach was seen as less important on practical
grounds due to the small project size. In the theory however the usefulness of such an approach was
stressed.
"Rush towards relational technology strengthens, implications for data modelling"
From the environment descriptions of each case study the push towards relational technology was
seen to be strengthening. A.M.P. a major customer of IBM Corporation was moving towards impl
mentation of DB2 with the expectation that the package would be used almost exclusively for new
systems. Digital having recently announced a major version release of its' relational package ROB fol
lowed this by an agressive marketting push into transaction processing. This was significant because
Summary 13-3 it could be interpreted as support for the relational concept irrespective of the system type (whether high performance or otherwise). Whilst continuing to support existing network database users, Digital now markets a relational approach as the primary system solution. This has played hand in hand with developments in distributed data processing. A result of this growth in relational implmentations will be to place further demands on the methodologies and theories which support it. The imple mentation of data and process modelling, along the lines promoted by the International Standards
Organisations would be expected to become widespread. Binary data modelling, being an element of these standards, should continue to develop thereby attracting significant research interest.
13.2 Research Limitations
The three case studies presented in this report have been drawn from diverse environments. The
University of N.S.W. representing a research and teaching facility, Australian Mutual Provident a large insurance company, and Digital Equipment Corporation a multi-national computer manufacturer. In addition to this, each of the case studies investigated the implementation and usage of a different data modelling technique. Such a heterogeneous sample would restrict the validity of generalisation across these environments. This was anticipated and as such the purpose of the research has not been to
draw out general conclusions nor to extrapolate the results to other environments. The major purposes
have been one, to undertake exploratory/descriptive research with the objective of identifying areas where empirical research might be beneficial, and two, to provide some qualitative feedback for the
ongoing theoretical development and implementation of data modelling. Such objectives were best
served through seeking a cross-section of information systems environments.
Bearing these objectives in mind the following weaknesses nevertheless exist in the case studies.
Firstly, time restictions and disclosure restictions in the corporate environments dictated the level of
detail which could be obtained and the research methods which could be employed. This resulted in a
heavier than desired emphasis being placed on interviews and verbal recall as it was often not possible
to obtain documentation. In these two cases interviews were also restricted through nomination of
employees by the corporation rather than through selection by the researcher. Consequently, selection
13-4 Summary bias may have served to influence the findings. Due to the reliance on verbal communication much of the material gathered was also necessarily of a subjective nature.
Secondly, data modelling theory suggests that many of the impacts (benefits) of its' use will be realised over the lifetime of the information system. Some will be immediate but others will only be evident in the medium to long term. Quality improvements for example, perhaps reflected in lower maintenance costs would not be initially evident. In order to find evidence for these effects longitudinal analysis was/is required at the project level.
Thirdly, in the University environment the project sizes and levels of complexity could not be taken as representative of commercial (real world) projects. The impact of this was to reduce the external validity of some findings however it was possible to concentrate on several metrics, learning, com munication and representation, for which the results could be extrapolated to other environments.
Finally, the data collection phase for the three environments extended over several years. Since this
phase has been completed it is likely that each environment has moved ahead in the application of data
modelling and systems design and in the sophistication of useage. Such changes would be expected
particularly since each environment had demonstrated a limited history of binary data modelling and
in the corporate environments a learning phase was clearly in process. As a consequence the case
studies should not be taken as current descriptions of their respective environments.
13.3 Future Research
Each of the case study environments offers rich potential for empirical research. In this section
the corporate environment and the university environments are examined and a number of research
alternatives considered.
Perhaps the most promising area for research in the corporate environments involves longitudinal
analysis and the collection of project lifecycle data. Specifically, data showing the percentage dis
tributions of analysis, design, and development effort against total project effort. Such data would
support comparisons between projects which had utilised binary data modelling and those which had
not. The data could also be used as input for one of the many parametric models developed for
Summary 13-5 project estimation and control. Post-hoe analysis of the project data based on the model forecasts might then reveal if there was justification for significant change in the parameter relationships or model assumptions for binary data modelling projects (reflecting a change in the lifecycle structure).
An important measure would also be the level of user involvement perhaps calculated as a percentage of total project effort and total analysis effort.
Within the bounds of a longitudinal analysis measures of system quality would be beneficial. Although difficult to operationalise (perhaps a subjugate of change requests per unit period in programs or data might be used) the quality measure could then be be checked for correlation with changes in the lifecy cle phase distributions. [This would enable predictions from the theory to be checked suggesting that increased user involvement and increased analysis (at the conceptual level) will lead to improved sys tems quality). A multiple organisation analysis conducted within a homogeneous industry/environment would add further validity to the results from the longitudinal studies.
Within the University environment the possibility exists to provide greater experimental control than
in the commercial environment. Consequently for a limited range of variables for which external va
lidity could be established it would be possible to design an 'experiment' with small projects and small
groups utilising alternative modelling techniques. Such an exercise could be accomplished by ran
domly dividing the students from one of the database subjects into a group instructed to use Kent and
another instructed to use ER modelling. Based on the identical project description and deliverables
being used, such variables as project time in phases, degree of data model normalisation achieved,
data model modifications required before implementation (after completion of the conceptual model),
time taken to understand the modelling technique and standard of documentation produced could
be collected. In addition a number of qualitative measures concerning the development experience,
for example, ease of use and understanding, or inter-group communication, might be collected via
questionaire.
After a first pass completion it should be possible to modify the requirements and then to once more,
collect data on the previously mentioned variables. This would then allow some aspects of a 'real
13-6 Summary world' environment (change) to be simulated and measured. The experimental design could be further
expanded by adding a third group with the instructions to complete the project without the benefit of
a conceptual modelling phase.
Summary 13-7 APPENDIX A
DATABASE ARCHITECTURE
User B 1 User B2 User B3 Host language Host language Host language + DSL + DSL + DSL
• External ·External External view A External view B - - - - - schema A schema B
External/conceptual' External/conceptual/ mappirn-g "--A-----'=-----=--m-a~pping B ~ Schemas and mappings built Conceptual Conceptual view and maintained schema by the database administrator (OBA)
Storage structure definition u (Internal schema)
• User interlace
t Source: [Dale 86 p33)
Database Architech.:,re A-1 APPENDIX B
UNIVERSE OF DISCOURSE
C O N C E P T U A L U N I V E R S E SCHEMA 0 F
DISCOURSE~
I N F O R M A T I O N
B A S E
1. Classification, abstraction, generalisation, establishing rules etc. about the Universe of Discourse
and recording them. This is a human process, describing a (shared) mental model of the Universe
of Discourse. t
2. Recording facts and happenings about the Universe of Discourse including what entities are of
interest.
t Source: (Grielhuysen 85 /03-15)
Universe of Discourse B-1 APPENDIX C
DATA SYSTEM DESIGN
Information System Design r ------, Application Design - - - - - r- Data System Design -, t I Information Requirements Design I I Requirements Specification "-./ I I Conceptual Design I I I I I Conceptual Schema 'v I I Implementation Design I I I DBMS (Relational or ...) I Schema I ""-./ I I I Physical Design I I I I Storage Schema 'v' I '------.J L. ------J Component (or level of the process) I I V Link (or interface) between two levels
t StlurCt': !Agosti 84 phi
Data System Design C-1 APPENDIX D
SUBJECT DESCRIPTIONS
14.608 Database Systems
Advanced data storage concepts, including detailed study of alternative approaches to database man agement systems. Management information needs and database specification in a commercial en vironment. Detailed evaluation, with project work, of a microcomputer based management system.
Information retrieval concepts, relational query systems, security, control and audit considerations.
14.603 Computer Information Systems 2
Systems design: physical design of business systems, specifications and updating of VSAM files, man machine dialogue procedures, top-down structured design and evolutionary design methodologies.
Introduction to communications networks. Operating systems concepts: processor, storage, device and process management, segmentation and paging systems. COBOL programming.
14.606 Management Information Systems Design
Organisational impact, information systems design methodologies, requirements elicitation, logical and physical design, implementation procedures, principles of data management, data analysis, telecommunications networks, systems design in a distributed environment, commercial programming practice, systems development case studies using spreadsheet, file management and word processing software.
14.992G Data Management
A review of data management principles including both simple and complex file designs, and the
concept of database management systems. Alternative database management systems architectures,
including network hierarchical and relational approaches. Database query systems, including relational
algebra. Case studies and assignments embodying these principles.
Subject Descriptions D-1 APPENDIX E
A.M.P. DATA BASE DESIGN PROCESS
D.B. DESIGN OTHER DATA MODELS PHASES INPUTS
GENERIC APPLICATION BUSINESS DATA DATA INFORMATION HODEL ANALYSIS STRUCTURE
APPLICATIO REMOVAL BOUNDARIES DATA 1------• OF REDUNDANT .,._____ --t FOR HODEL ENTITIES~ MPLEMENTATI
IMPLEMENT KAP TRANSACTION ATION STATISTICSONTO .,.______US.f\GE DATA -~-s;n.cs· HODEL HODEL
CurfPOS.ITE RATIONALIZE USAGE FOR PHYSIC u.------s HAPS IHPLEMEHTATI
····-· . . ,.,., .
RATION~ DEFINE I,ZED IHS C9HPOSITE ACCESS SAGE KAPS KEYS
IMS PHYSICAL KEY DESIGN HAPS
A.M.P. Data Base Design Process E-1
IMS PHYSICAL DATA BASE DESIGN BIBLIOGRAPHY
Agosti, M., Johnson, R.G. (1984). A Framework of Reference for Database Design. DATA BASE Summer
1984, 3-9.
Brodie, M.L. (1983). Association: A Database Abstraction for Semantic Modelling in Entity-Relationship
Approach to Information Modelling and Analysis. edited by Chen, P.P.S. North-Holland, 577-601.
Brandt, I. A Comparitive Study of Information Systems Design Methodologies in INFORMATION SYSTEMS
DESIGN METHODOLOGIES: A Comparitive Review. edited by Olle, T.W. North-Holland, (1982),
9-35.
Brodie, M.L., Silva, E. Active and Passive Component Modelling: ACM/PCM in INFORMATION SYS
TEMS DESIGN METHODOLOGIES: A Comparitive Review. edited by Olle, T.W. North-Holland,
(1982), 41-91.
Brodie, M.L., Silva, E.O., Ridjanovic D. On a Framework For Information Systems Design Methodologies
In: INFORMATION SYSTEMS DESIGN METHODOLOGIES: A Feature Analysis. edited by Olle,
T.W. North-Holland, (1983), 231-241.
Bubenko, J.A., Gustafsson, M.R., Karlsson, T. Comments on some Comparisons of Information System De sign Methodologies In : INFORMATION SYSTEMS DESIGN METHODOLOGIES: A Feature Analysis. edited by Olle, T.W. North-Holland, (1983), 243-249.
Chen, P.P.S. The Entity-Relationship Model - Toward A Unified View of Data. ACM Transactions on
Database Systems, vol 1, No.l March (1976) p9-36.
Chilson, D.W., Kudlac, M.E. Database Design: A survey of Logical and Physical Design Techniques. DATA
BASE Fall 1983, 11-19.
Codd, E.F. Further Normalisation of the Data Base Relational Model. in Data Base Systems, Courant
Computer Science Symposia Series, Vol. 6. Englewood Cliffs, N.J. Prentice-Hall (1972).
1 Codd, E.F. Extending the Database Relational Model to Capture More Meaning. ACM TODS 4, No.4
(December 1979).
Codd, E.F. Data Models in Database Management. Proc. Workshop on Data Abstraction, Databases and
Conceptual Modelling. ACM SIGPLAN Notices 16, No. 1 0anuary 1981).
Date, C.J. (1986). An Introduction to Database Systems. Volume 1, Fourth Edition. Addison-Wesley,
Sydney.
Davis, B.G., Olson, H.M. (1985). Management Information Systems. Conceptual Foundations, Structure and
Development. Second Edition. McGraw-Hill, Sydney.
Griethuysen van, J.J. Concepts and Terminology for the Conceptual Schema and Information Base. Interna
tional Standards Organisation Document No. ISO/TC97/SC5-N695 (August 1985).
Kahn, B.K. (1985). Requirement Specification Techniques. In Principles of Database Design: Volume 1
Logical Organisations edited by: Yao, S.B. Prentice-Hall, New Jersey pp 1-65.
Kent, W. (1978). Data and Reality. North-Holland.
Kent, W. (1984). Fact-Based Data Analysis and Design. Journal of Systems and Software 4, pp99-121.
King, R., McLeod, D. (1985). Semantic Data Models. In Principles of Database Design: Volume 1
Logical Organisations edited by : Yao, S.B. Prentice-Hall, New Jersey pp 1-65.
McFadden, F.R., Hoffer, J.A. (1985). Data Base Management. Benjamin-Cummings, California.
Ramon, A.O.1. (1983) Information Derivability Analysis in Logical Information Systems. CACM, Vol. 26,
No. 11 (September '83).
Rzevski, G. On the Comparisons of Design Methodologies In : INFORMATION SYSTEMS DESIGN
METHODOLOGIES: A Feature Analysis. edited by Olle, T.W. North-Holland, (1983), 259-266.
2 Shoval, P. (1985) Essential Information Structure Diagrams and Database Schema Design. Information Sys tems Vol 10, No.4 pp417-423.
Verheijen, G.M.A., Van Bekkum, J. (1982) NIAM: An Information Analysis Method. in INFORMATION
SYSTEMS DESIGN METHODOLOGIES: A Comparitive Review. edited by Olle, T.W. North-Holland,
(1982), 537-589.
Wasserman, A.I., Freeman, P., Porcella, M. Characteristics of Software Development Methodologies In :
INFORMATION SYSTEMS DESIGN METHODOLOGIES: A Feature Analysis. edited by Olle, T.W.
North-Holland, (1983), 37-62.
3