A CASE STUDY EXAMINATION

DATA MODELLING

IN PRACTICE

Paul Groves

A report submitted in partial fulfilment of the requirements of the degree of Master

of Commerce (Honours) to the University of New South Wales

1988 CERTIFICATION

"I hereby declare that this submission is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of a University or any other institute of higher learning, except where due acknowledgement is made in the text." ABSTRACT

Data modelling for analysis and data base design is increasingly being viewed as a critical phase in the systems development process. This report is a comparative analysis of data modelling theory and practice. It investigates the nature of data and examines several data modelling methododologies.

Current international standards for the conceptual are reviewed and based on this a reference framework is defined. This framework is used to compare four contemporary data modelling theories. Field testing of three of the methods is conducted, two case studies from a commercial environment and one from an academic setting. The case studies are conducted on a descriptive research basis.

Results from the case studies confirm that data modelling represents a technique of growing impor­ tance in the systems development process. Increasing resources applied to the practice of relational database should ensure ensure ongoing theoretical interest and development. Although in the for­ mative stages of implementation and use, binary data modelling was seen to have achieved notable sucess in enhancing communication between project participants and in increasing user participation.

As a consequence it was anticipated that system quality would improve. Limitations on the practical application of binary modelling were noted based on case study results. Several (future) empirical studies are detailed in which the quantitative and qualitative impacts of binary data modelling usage might be evaluated.

1 CONTENTS

Chapter 1 INTRODUCTION ...... 1-1

Chapter 2 REFERENCE FRAMEWORK ...... 2-1

Chapter 3 DATA AND THE NATURE OF REALITY...... 3-1

Chapter 4 DATA MODELS AND DESIGN ...... 4-1 4.1 Conventional Data Models ...... 4-2 4.2 Semantic Modelling v Semantic Data Models ...... 4-4

Chapter 5 DATABASE ARCHITECTURE...... 5-1 5.1 - Defined...... 5-1 5.1.1 Conceptual schema and the Information System ...... 5-3 5.1.2 Content of the conceptual schema ...... 5-4 5.1.3 Functions of the Conceptual Schema...... 5-5

Chapter 6 DATA MODELLING ...... 6-1

Chapter 7 INFORMATION SYSTEMS LIFECYCLf ...... 7-1

Chapter 8 DATA MODELLING METHODS: FEATURE ANALYSIS...... 8-1 8.1 Entity Relationship Modelling ...... 8-2 8.1.1 Concepts ...... 8-2 8.2 Fact Based Data Analysis and Design ...... 8-5 8.2.1 Design Process...... 8-5 8.3 Nijssen' s Information Analysis ...... 8-8 8.3.1 Concepts ...... 8-9 8.3.2 NIAM Development Lifecycle ...... 8-10 8.3.3 Information Base: NIAM Sentence Model ...... 8-11 8.3.4 Semantics ...... 8-12 8.4 Active and Passive Component Modelling (ACM/PCM) ...... 8-13 8.4.1 Abstraction Modelling ...... 8-13 8.4.2 Structural Modelling ...... 8-14 8.4.3 Behavioural Modelling ...... 8-16 8.4.4 ACM/PCM Design Modelling ...... 8-17

iii Chapter 9 DATA MODELLING METHODS: COMPARATIVE REVIEW ...... 9-1 9.1 Lifecycle Support ...... 9-2 9 .1.1 Representation and Communicability ...... 9-4 9 .1.2 Abstraction Support ...... 9-7 9.1.3 Documentation Support...... 9-10 9.1.4 User Orientation ...... 9-12 9.1.5 Semantic Expressiveness ...... 9-14 9.1.6 Quality Control ...... 9-16 9.1.7 Comparative Review - Summary ...... 9-18

Chapter 10 UNIVERSITY OF NEW SOUTH WALES ...... 10-1 10.1 Objectives ...... 10-1 10.2 Research Method ...... 10-1 10.3 Environment ...... 10-2 10.4 Database Systems Development ...... 10-4 10.4.1 Database Systems - 1984 ...... 10-5 10.4.2 Database Systems - 1985 ...... 10-7 10.4.3 Database Systems - 1986 ...... 10-10 10.5 Interview Plan ...... 10-12 10.5.1 Lecturers ...... 10-12 10.5.2 Tutors ...... 10-15 10.5.3 Students ...... 10-16 10.6 Conclusion ...... 10-17

Chapter 11 AUSTRALIAN MUTUAL PROVIDENT...... 11-1 11.1 Objectives ...... 11-1 11.2 Research Method ...... 11-2 11.3 Environment ...... 11-3 11.3.1 Hardware...... 11-3 11.3.2 Software History ...... 11-4 11.3.3 Software Current ...... 11-5 11.4 Data Modelling ...... 11-6 11.5 Systems Lifecycle ...... 11-8 11.6 Data Modelling Experiences ...... 11-9 11.6.1 User experiences ...... 11-11 11.7 Conclusion ...... 11-12

iv Chapter 12 DIGITAL EQUIPMENT CORPORATION ...... 12-1 12.1 Introduction...... 12-1 12.2 Corporate Environment ...... 12-2 12.3 Local Environment ...... 12-3 12.4 Methodolgy review ...... 12-4 12.5 Systems Analysis...... 12-5 12.6 Modelling and Partitioning ...... 12-6 12.6.1 Conceptual Modelling ...... 12-8 12.6.2 Functional modelling ...... 12-9 12.6.3 Physical modelling ...... 12-10 12.7 An inventory application ...... 12-10 12.8 Modelling experiences ...... 12-12 12.9 Conclusion ...... 12-15

Chapter 13 SUMMARY ...... 13-1. . . 13.1 Case study conclusions ...... 13-2 13.2 Research Limitations ...... 13-4 13.3 Future Research...... 13-5

Appendix A SUBJECT DESCRIPTIONS ...... A-1.

FIGURES 1 ACM/PCM Design Phases ...... 8-17 2 Candidate keys ...... 10-9 3 Pseudo record merges ...... 10-9 4 DMR Systems Lifecycle ...... 12-7

V CHAPTER 1

INTRODUCTION

"Designing database, one of the major activities of the system development process, is a difficult,

complex, and time consuming task. Inadequate designs have presented many problems. The failure to specify clearly the organisational goals and requirements has resulted in databases of limited scope and usefulness, which are unable to adapt to change. In many cases, these problem-ridden databases

have prevented database management systems from becoming an effective data processing tool.~

[Kahn 85)

The development of database management systems in the late sixties for mainframe machines her­

alded a new era of data processing. Organisations were given the opportunity to have centralised

control of operational data. This meant the capability of sharing data instead of dedicating files to

specific applications. Standards and security could be enforced across all users of the database. In­

tegrity and redundancy would be controlled with significant implications for data consistency. A major

benefit would be the provision of data independence, the ability to insulate applications from changes

in storage structure and access strategy.

The concept of database management was embraced enthusiastically with considerable development

resouces devoted to the design of data models and database languages. Among the many data models

developed, Hierarchic, Network and Relational designs were the most prominent. Implementation

machines ranged from mainframes in the late sixties and seventies to the microcomputers of the early

eighties.

Technology dominated early database implementations. was usually conducted as a

,. single phase activity, with emphasis on physical details ( types, access paths, indexes

etc.) rather than as a two phased activity comprising logical and physical design. A consequence of this

Introduction 1-1 was that the structural (1) characteristics of the database management system, were more influential

during design than the structural characteristics of the data. The physical model thereby pre- empted

logical model design. This resulted in applications (and databases) with greatly reduced flexibility

(adaptivity) on account of considerably more complex behavioural properties. (2)

Lack of formal procedures led to the design exercise being perceived as something of an art form

rather than a science, which relied on the intuition and experience of the analyst.

As the applications developed in a database environment became more sophisticated, an increasingly

heavy burden of responsibility was placed on the design role of the analyst. The pressure was eased with the development of normalisation theory [Codd 72). This provided a theory basis to guide file

and database design and allowed a formal approach to be developed. However, as indicated from the

introductory quote, database design continues to be a problem. Database implementation experiences

suggest that inadequate design is preventing the theoretical advantages of the database concept being

realised.

"Errors made during the design process affect the application's entire life and any decisions that are

made at different levels based on the data. This process therefore is of great importance for the

enterprise and it is necessary to pay much attention to it. This fact explains why so much work is

under development in this area." [Agosti 84]

Normalisation and relational theory aided the search for a more rigorous and formalised approach to

the database design task. A result was the development of data modelling. The term is defined here

as the process of abstraction and documentation of data characteristics [Davis 85].

This paper begins with a discussion of the need for a framework of reference in which data modelling

methodologies and methods can be compared (3). The characteristics of data, and of data models are

explored. The data modelling process is decomposed and its components considered in detail.

1-2 Introduction On this base, a number of data models are reviewed highlighting the essential characteristics. Conven­ tional and semantic data models are examined and the meanings of these terms discussed. A database architecture conforming to the International Standards Organisation (ISO) model is presented. This leads to a review of four data modelling methods. Again the purpose is to identify the major fea­ tures of each method. Active and Passive Component Modelling (ACM/PCM) and Entity-Relationship modelling (ER) are considered in overview as examples of semantic data modelling methodologies.

Nijssens Information Analysis Method (NIAM) and Fact Based Data Analysis and Design (KENT) are analysed as examples of binary data modelling methodologies.

In section five a comparative feature analysis is conducted of these four methods utilising the frame­ work of review outlined in the second section. This is followed by three case studies of data modelling

in practice.

From the previous discussion two major functions of this paper can be identified. One, is to conduct a

review of selected data modelling methods from a theoretical aspect. This aims to present a relatively

balanced perspective of the major features supported. Two, is to present a survey of data modelling

in practice by conducting a case study examination of two commercial organisations and a university.

It is emphasised that no attempt is being made at a qualitative assessment of the chosen methods. To

do this would require a considerably deeper analysis of each method than will be attempted in this

paper. Instead, it is hoped that by concentrating on the major features some direction can be given

towards future research into specific features of the methods.

Footnotes:

(1) Structure, as used in this paper, is a term which describes the means of representing data and data

relationships, that is, the static properties of data. For example, in the relational model, attributes

and tuples embodied in relations are the means by which the structural properties of data can be

represented. Behaviour, as used in this paper, is a term which describes the means of representing

the rules, governing the changes to data and data relationships, that is, the dynamic properties of the

Introduction 1-3 data. For example, in the relational model, the domain, primary key and foreign key concepts could

be used to specify behavioural properties of data (insert, update and deletion rules). An instance of a

behavioural rule might then be that tuples with non-unique primary keys are not allowed (by definition

of a primary key).

(2) The data may not possess a logical structure equivalent to the physical data model being employed.

Consider for example, the restrictions of an IMS hierarchical model. No child record type may be

owned by more than one parent record type. To model a 'treatment' record that is owned by a

'doctor' and also owned by a 'patient' requires two hierarchical data structures which are linked by a

logical pointer. This could be represented more naturally in a network type data model. Given the

constraint of an IMS environment the specification of the behavioural properties of the application

become considerably more complex.

(3) The following definitions of method and methodology in a data modelling and database context will be used in this paper. A data modelling methodology is an integrated collection of methods and

techniques, which supports the complete database design process. A technique or a method in this

design context will be defined as a systemmatic way of performing a specific activity or subset of the

design. A technique or method does not fulfill the requirements of integration and completeness that

are required of a methodology.

1-4 Introduction CHAPTER 2

REFERENCE FRAMEWORK

A number of data modelling methodologies and techniques have been developed and proposed in the literature. These methodologies and techniques are not directly comparable for a number of reasons. Firstly they cover different aspects of the data modelling process and place different emphasis on its components. Secondly, definitions and language vary considerably between models, making discussion and comparisons of concepts difficult. Both these problems are typical of an emergent discipline. It is argued [Bubenko 83 p248] that the field of information systems study is far from mature.

'Until a research framework, or paradigm, can be established, that is accepted by the majority of re­

searchers within the field, there is little prospect of advancement of the discipline or field of research.'

The research problem inevitably impacts the practice of data management and data modelling. Propo­

nents of alternative methods have no common language in which to discuss the relative strengths and

weaknesses of the methods in an objective manner. As a consequence of this, the choice of method

for an information system development is based on subjective criteria. This usually means the design

experience and previous method exposure of staff are paramount. To develop an objective basis of

evaluation the paradigm conflict must be addressed. This requires a systematic procedure for review.

The first step is agreement on terminology and discussion of data characteristics. Following this, a

framework will be presented which places data modelling in the broader context of the information

systems lifecycle. A database architecture and the role of the conceptual schema will be presented.

Wherever possible, this paper will adopt the language and architecture of American National Stan­

dards Institute (ANSI) and the International Standards Organisation (ISO) committees.

Reference Framework 2-1 CHAPTER 3

DATA AND THE NATURE OF REALITY

"A message to mapmakers: highways are not painted red, rivers don't have county lines running down the middle, and you can't see contour lines on a mountain." [Kent 78]

Data is described [Davis 85 p96] as consisting of symbols which represent, describe or record reality.

But like the contour lines on a map, data symbols are clearly not reality and can never provide a complete representation of the objects and events which comprise it. For instance, my christian name is Paul. In certain circumstances it can be used to identify me (that is when the name is unique in a group of people). But I am not the same as the name. Whilst it has utility, the name is not reality. The

distinction between the object (a person) and the symbol (an instance of a name) is important for this

reason. How can reality be modelled? The simple answer is that it cannot be modelled in an objective

manner. Decisions about what to extract from reality and which symbols will be used to represent

it must reflect on the needs and views of the users which interact with that segment of reality. Any

structure which is developed to model reality is simply another map. It may be useful to someone,

but remains nevertheless an aproximation of the underlying terrain.

This is not the whole picture, unfortunately it gets worse. There are many views of reality and there

are many realities. People, buildings, grass and trees are part of the physical reality in which we

participate. For information systems modelling this may not be the reality of interest. The reality may

have no physical existence. It may be historical information, not part of the now reality, or, did it ever

exist? A falsified reality? It may relate to a future reality, about intended states of affairs, or it may be

a conjectured reality.

This philosophical approach can be pursued to great lengths. The purpose of introducing it here is to

demonstrate that there are no 'hard' definitions of data from which a strict mathematical formalism can

Data and the Nature of Reality 3-1 be developed to guide the modelling process. What then is meant, when a data model is described as a representation of reality?

It would seem that a data model, like reality, is an elusive concept. Kent concludes that there is no

'best' model. Only the interaction of data and usage determine the meaning of data and the efficiency of processing.

Despite these problems (and because of its importance) the section following continues with an attempt to 'define' the concepts which· are used repeatedly in the literature and in the reviews of this paper.

From what precedes the reader should be aware that it is a difficult and imprecise exercise.

3-2 Data and the Nature of Reality CHAPTER 4

DATA MODELS AND DESIGN

A data model is an abstract representation of data, a way of representing data and its inter-relationships at a logical and/or physical level. Graphs, tables and mathematical formulas may all be used for rep­ resentation purposes. At a logical level it should support the definition of the conceptual schema, and external schemas. At a physical level it should support definition of the storage structures which allow for the update and retrieval of data instances, i.e. definition of the internal schema. In performing these functions a data model acts as a tool for conducting data modelling. The data models naturally vary in the extent to which they support the data modelling process. It is important that the model be distinguished from the functions it is performing and from the 'reality' it is modelling. This section considers a classification framework and the major classes of data models.

Codd (1981) defines a data model according to three sets of characteristics:

1. Data structure types supported by the model. Examples include relations, trees and networks.

2. Operations or inferencing rules which can be applied to occurrences of the data structure types

which the model supports. An example of these rules for the relational data model is embodied

in relational algebra. This specifies operations such as join, select and project which manipulate

the data structures, relations, in pre-defined ways.

3. Integrity constraints and rules which have to be respected in the representation of the data to keep

the database in a situation of integrity and consistency. These may be expressed as insert-update­

delete rules. For example, in the relational model a deletion operation on a 'parent' record (tuple)

might require that all 'children' records (tuples) are also deleted. This is referred to as a cascade

deletion [Date 86 p254].

Data Models and Design 4-1 These three characteristics have largely been used to distinguish the physical, or conventional data

models. However, data models can also be categorised into processable and non-processable or se­

mantic data models. The processable models were the first to be used and were machine orientated,

stressing the structural form of the data. Efficient storage and manipulation were the primary moti­ vations. Hierarchical, network and relational data models are the best known in this category. These

models are utilised in the implementation design phase. Based on structural forms, ISO categorise

design in this area as data modelling.

Non-processable or semantic data models are logical models which stress the importance of modelling

the meaning of data. That is, the data model should not be restricted to modelling the structure only

of data. Structural relationships should be explicit and the behavioural properties of the data clearly

specified. Semantic data models provide support for the conceptual modelling stage (defined in

section 5.1). An objective of a sucessful semantic model should be to balance the essential requirement

of completeness with simplicity. Semantic models are seen as independent of the processable data

models. Care is taken to avoid the implication that these are in any way a substitute for the processable

models. Structural data forms cannot be neglected and data modelling is essential to support database

implementation. However in the absence of a formal model stating the semantic rules and constraints,

data modelling will be sub- optimal. Semantic modelling is described as information modelling by ISO.

Conventional data models and semantic data models are discussed in detail in the following sections.

4.1 Conventional Data Models

Conventional database models, namely the relational, hierarchical and network models provide fa­

cilities for describing the logical structure of a database using trees, tables, nodes and sets. A data

manipulation language is provided for these constructs through general purpose access and update

operators. Typically the user level view of the data is provided by record structures. Using a data

definition language, a schema is specified which utilises the constructs. This ex­

presses the structural definitions of the data. Behavioural properties of the data are supported by the

provision of a data manipulation language. A problem with the conventional models is the lack of

4-2 Data Models and Design semantic expressiveness. Being record orientated imposes limitations on the data structures used to model an application. Inevitably there will be loss of information when the application does not fit the structure of the chosen database model. For example, an application with a natural hierarchic structure would be the subject, school, faculty organisation of a university. If the relational model was used for implementation then the associations between these objects (entities) could only be specified implicitly (through matching on common domains) and not explicitly as with a network or hierarchic data model (pointers). In addition, semantic integrity constraints must be defined and enforced exter­ nally. However, when these constraints are embedded in application programs, data independence is compromised. In this situation the data model will only be able to specify a subset of the designers knowledge of the application.

A significant problem with the early database models (which motivated the development of the re­ lational model) was the inability to distinguish the logical model from the physical model. This is evident in the correspondence between physical access paths and logical inter-record links as used in the network data model. The result is a data manipulation language which is 'navigational' in the

sense that users must traverse the structure of a database rather than specify the properties of the

data of interest. The problems of rigidity and inflexibility prevent data being easily arranged so as to

provide multiple user views of data.

The relational model aims to provide data independence through the presentation of data in a form

(tables) which is independent of the physical storage structures. For example, the mainframe IBM

relational package 0B2, provides the user view with tables. At the physical level however, data is

stored in binary tree (VSAM style) data structures.

The symmetric structure of the relational model (i.e. the ability to formulate all queries in a consistent

manner) also favours data manipulation languages which are non-procedural, allowing set processing

without requiring loop and navigational coding. Relational data manipulation languages are gener­

ally derived from a relational algebra or calculus and are designed to allow highly flexible database

interaction.

Data Models and Design 4-3 In the relational model relationships among data items are formed dynamically at access time, based on the values of the data items. Whilst this allows considerable flexibility, in the absence of semantic detail, (i.e. support for the domain concept) it is possible that spurious relationships between data items will be formed (an example of this follows).

4.2 Semantic Modelling v Semantic Data Models

Semantic modelling is a term used to describe the activity of representing meaning in data [Date 86 p609]. This seems a worthwhile pursuit. The more meaning that can be incorporated into the data, the better will be the model of reality. Current database systems have only a limited understanding of what the data means. As an example, take the data items, age and weight. Both are numeric, and for an individual, a value of 60 for each is feasible. But the data items are very different, semantically different. The first field may have a unit of measurement specified in years, the second a unit of mea­ surement in kilograms. Conventional data models and database management systems can represent these items but are not able to represent the meaning. A relational join on records containing these fields should be rejected outright by the system but it is unlikely that present day database systems would do this. The interpretation of what the.se types of relationship represent, is left to the database user (who may or may not appreciate the semantics).

A attempts to logically structure the data in a database in a manner that captures

more of the meaning of the data than conventional database models [King 85, p115]. This is achieved

through the provision of an extended set of modelling constructs which allows the structural and

behavioural properties of the data to be defined. When data is organised with a semantic data model

the designer can structure a database by expressing application knowledge in a more natural, formal

and explicit manner.

Many of the concepts in this area are derived from research in the area of knowledge representation

undertaken by artificial intelligence researchers. In the language of this discipline, a knowledge base can

be considered as consisting of a network of objects (nodes) connected by relations (directed edges).

[King 85 p127] refers to these networks as semantic networks.

4-4 Data Models and Design In (Brodie 83 p579] a semantic data model is defined in the following terms:

"A semantic data model is a collection of mathematically defined concepts with which to identify static and dynamic properties of real or imagined objects and to specify them using structural or behavioural means. n

In this context, structure should be interpreted as •states and static properties (entities and their relation­ ships)", whilst behaviour should be interpreted as "state transitions and dynamic properties (operations and their relationships)."

The distinction between semantic data models and conventional data models is made on the basis of their relative ability to represent both the structural and behavioural properties of objects. Typically,

conventional data models have provided only primitive operations for modelling behaviour.

From the preceding descriptions of semantic data models it should be apparent that some practical

difficulies remain to distinguish them from the conventional data models. It may well be possible to

describe a data model as a semantic model if extensive support is provided for behavioural modelling

and to describe a data model as conventional if it provides for structural modelling only. In between

these extremes however, classification will be difficult. It is emphasised therefore that the concept

of a semantic data model is a relative one. As was argued in chapter 3, 'Data and the Nature of

Reality', a data model is an imperfect representation of reality. Capturing the meaning of data and

representing it in a formal model is a formidable task which, in a general sense, can probably never

be considered complete. As a result, data models continue to be developed which provide more

extensive constructs for the expression of the structural and behavioural properties of data than has

previously been possible. ACM/PCM, discussed later in the paper, is an example of a relatively recent

data model which employs extensive semantic modelling concepts.

On the other hand, conventional data models, for example, the relational model, should not be seen

as devoid of semantic concepts. In particular, the primary and foreign key aspects of that model are

more than syntactic constructs [paraphrasing Date 86 p609). Consequently, the term semantic data

Data Models and Design 4-5 model is one which should be used with caution. A more fitting description may be an extended data model, that is, a data model which employs semantic modelling concepts.

In [Date 86) the overall approach to semantic modelling is as follows:

1. Identify a set of semantic concepts which are useful for informally discussing the real world. Such

concepts include, entities, properties, associations and subtypes. It may be agreed that the real

world consists of entities that possess properties and are connected in associations.

2. Devise a set of corresponding symbolic (formal) objects to represent the semantic concepts.

The extended relational model, RM/T [Codd 79) for example, introduces E-relations to represent

entities and P-relations to represent properties. These are special forms of an n-ary relation.

3. Devise a set of integrity rules to be used with these symbolic objects. RM/T provides a property

integrity rule which requires every entry in a P-relation to have a corresponding entry in an E­

relation (i.e. every property must be a property of some entity).

4. Operators must be developed for manipulating the symbolic objects. RM/T provides the PROP­

ERTY operator which can be used to join together an E-relation with all the corresponding P­

relations so as to collect together all properties of a given entity.

Paragraph two deals with the structure of a database whilst paragraph four deals with the behavioural

properties. Together, paragraphs two through four constitute an 'extended' data model. The model

is necessarily a general approach to the topic for reasons previously discussed. The example used,

RM/T, is an extension the relational model which represents a large spectrum of semantic constructs

(structural) directly, in relational form.

In the comparitive review, the three sets of characteristics outlined in [Codd 81) (at the beginning

of this chapter) will be used in conjunction with those of Date (above), to classify the underlying

data models of each method and to consider semantic expressiveness. The ANSI/SPARC database

architecture is presented in the next chapter. This is followed by a detailed discussion of the role

of the conceptual schema and the relationship between semantic data modelling and the conceptual

schema.

4-6 Data Models and Design CHAPTER 5

DATABASE ARCHITECTURE

A schema is an abstract data model which represents a subset of the description of an information system. Different schemas are prepared corresponding to different levels of abstraction during the design process. In this paper a three level database architecture, internal, conceptual and external,

corresponding to the ANSI/SPARC DBMS model is used for evaluating data modelling methods and methodologies. This architecture is shown graphically in appendix A.

The conceptual level can be taken as a representation of the entire information content of the database

in a form which is somewhat abstracted from the way the data is physically stored. IThis paper

emphasizes the conceptual level.

The external level providing a user-orientated representation of information, is visible at the informa­

tion system/environment interface. Such useuer viewcan be derived from the conceptual level.

The internal schema can be produced by mapping the conceptual schema to a virtual physical envi­

ronment. That is, the internal schema is one step removed from the physical level and does not deal

with device specific considerations but does specify representations, sequencing and access paths. It

specifies a user transparent representation of information within a physical implementation.

5.1 Conceptual Schema - Defined

The conceptual ·c,iew as defined by ANSI/SPARC concentrates on the meaning of the information. In

defining the role of the conceptual schema the International Standards Organisation (ISO) [Griethuysen

85) made the following comments:

"It is the classifications, rules, etc., that are of primary interest to a systems designer designing a

database system. In analysing the universe of discourse, it is these things he will want to identify,

Database Architecture 5-1 discuss with users and describe. In recording them he will actually create a "skeleton" description of the universe of discourse, the conceptual schema. In this way the conceptual schema describes which entities can possibly exist in the universe of discourse, that is, which entities exist, have existed, or might ever exist. In the same sense it describes what facts and happenings are possible for those entities or, if relevant, are required for them. We assume it will be held in a formal representation within the_ data base system."

This description tells us that a conceptual model is more than an abstracted data model based only on the stn,ctural characteristics of data. It must capture the semantics of the data such that it may be used as a communications vehicle by designers and users to discuss properties of the universe of discourse.

The universe of discourse is defined as the set of information that an information system may receive, derive, store, or distribute during its lifetime. It includes therefore not only base information given to the system but also information derived or implied by others [Ramon 83). An example of a universe of discourse would be the enrolment of university students in degree courses, perhaps limited in scope by the interest of the university. Within this universe are data objects both abstract and real which represent the 'properties' of the universe of discourse. Specifically, students, subjects, gmdes and enrolment dates for example.

Continuing with the ISO definition, emphasis needs to be placed on the phrase 'formal presentation'.

Whilst the concept of a conceptual model as a component of the ANSI/SPARC architecture was defined in a final report in 1978 it has been supported primarily as a structural data model with little capability of expressing semantics. Database management systems have limited understanding of what the data means. Currently the general rules and procedures mentioned in the above quote are described only in application programs. They are often described as 'validation rules' [Griethuysen 85 p3-2].

A consequence of this, is that each application altering the contents of the database requires a copy of these 'rules'. The potential for redundancy, and hence inconsistency among 'copies' is high. The problem is difficult enough in a tightly controlled data processing department. Real threats to integrity

5-2 Database Architecture exist when so called fourth generation enquiry and update language tools are made available to end­ users. The rationale for centralised standards enforced by a formal conceptual schema should be clear. To implement this, a conceptual schema language must be designed to support procedural and declarative semantic statements.

5.1.1 Conceptual schema and the Information System

An Information System is defined by ISO as consisting of the conceptual schema, an information base and an information processor. The processor acts on a stimulus from the environment to produce change in the otherwise static conceptual schema and information base. An information system consequently, is a formal system, "being fully predictable and unable to deviate from the rules or constraints defined by the conceptual schema and information base." [Griethuysen 85 pl-6)

The information base is distinguished from the conceptual schema and is defined as:

"The description of the specific objects (entities) that in a specific instant, or period of time, are perceived to exist in the universe of discourse and their actual states of affairs that are of interest."

[Griethuysen 85 pl-4)

Both the information base and the conceptual schema are perceived as part of the conceptual level in the ISO report [Griethuysen 85, pl-3). Furthermore, the structural characteristics of the information base should be derivable from the conceptual schema, whilst the behaviour of the information base should conform to the behavioural properties of the information system as defined in the concep­ tual schema. However, from the above quote, it is evident that the information base contains data instances. Therefore, it would seem that only a subset of the information base lies at the conceptual level. That is, the machine representation of the conceptual schema as embodied in the information base. Further discussion of this follows.

A diagram and description of the mapping between the conceptual schema, the information base and the universe of discourse is shown in appendix B.

Database Architecture 5-3 In (Shoval 84] the conceptual schema is interpreted as having dual functions. The first, is to define the universe of discourse in an implementation independent 'enterprise model'. As used here an enterprise model is more than a strategic data model which documents major entities and their re­ lationships. It is a complete logical model abstracted only from the physical implementation. The second function, is to control the descriptions in the information base in terms of computer orien­ tated data structures. This implies that the conceptual schema will exist in two forms. The first form is typically represented by diagrams and restricted natural language reflecting an orientation towards analyst/user communications. It may be supported by a conceptual language which can express struc­ tural and behavioural properties in a formal manner. NIAM, through information flow diagrams, information structure diagrams and the conceptual grammar (declarative and procedural statements) is an example of this.

The second form of the conceptual schema is a machine representation. This should be derivable from the conceptual schema of the first form whether it be by manual or automated means. Not all of the conceptual schema (first form) may be representable in a database management system lan­ guage. Some (particularly behavioural properties) will be supported through external procedural code.

Currently, there are no known commercial implementations of database management systems which provide anything like full support for the concept of a conceptual schema as described here. IBM's,

DB2 relational database catalog, performs some functions of a conceptual schema at the information base level.

For the remainder of this paper a reference to the conceptual schema will mean a reference to both levels unless otherwise indicated.

5.1.2 Content of the conceptual schema

As an abstract model of an information system, the conceptual schema describes structural and be­ havioural aspects of data. It enforces preservation of meaning in the transformation between various data representations and defines their interpretations. But it does not provide guidelines for estab­ lishing the boundaries or scope of the information system analysis task. As a consequence, definition

5-4 Database Architecture of the scope must be based on the judgement of the systems designer. Given that the scope of the information system can be tightly defined, the following principles (paraphrased from [Griethuysen

85, pl-8)) should be observed regarding content:

• 100% principle

This requires that all relevant structural and behavioural aspects (rules and laws) of the information

system be described in the schema.

"The information system cannot be held responsible for not meeting those described elsewhere,

including in particular, those described in application programs." [Griethuysen 85, pl-9]

This follows from the previous discussion on integrity threats when a formal conceptual schema

does not model the semantics of the data, but requires application programs, or the information

system users, to interpret the meaning of the data.

• Conceptualisation principle

Only the relevant aspects of the information system should be included, thus excluding external

and internal details of data representation i.e., excluding physical organisation and access strategy

in addition to user views of data. This principle supports data independence (physical and logical)

by isolating the user external views from the internal representations. It also supports the concept

of abstraction, a vital tool in the management of complexity. By focussing only on the conceptually

relevant details, conceptual schema design is relieved of the burden of implementation details.

Similarly at the external and internal levels the design processes will be simplified. The logic

supporting this will be recognised as similar to that which justified the development of the seven

layered communications model for Open Systems Interconnection.

5.1.3 Functions of the Conceptual Schema

A major role of the conceptual schema is to provide agreement on the representation of the universe of discourse. This allows it to be used as a focal point for human commmunications. It also allows different users of the common information system to take consistent internal and external views of the data to suit their varying requirements.

Database Architecture 5-5 The fundamental roles of the conceptual schema as defined by ISO include:

1. To provide a common basis for understanding the general behaviour of the universe of discourse;

2. To define the allowed evolution and manipulation of the information about the universe of dis-

course;

3. To provide a basis for interpretation of external and internal syntactical forms which represent

the information about the universe of discourse;

4. To provide a basis of mappings between and among external and internal schemata.

These roles for the conceptual schema correspond to the properties of semantic data models. A conceptual schema, such as has been defined, thereby represents an example of a semantic data model. Furthermore a semantic data modelling methodology will provide the constructs from which the conceptual schema can be specified.

5-6 Database Architecture CHAPTER 6

DATA MODELLING

The data modelling process is concerned with the construction of a database as a component of an information system. It is a process that transforms and organises unstructured information and processing requirements concerning an application, through different intermediate representations, to a complex representation which defines schemas and functional specifications [Agosti 84 p5]. Various documents which record the intermediate representations and the semantics of the representations are produced during the process.

The data modelling process is usually divided into components which produce the intermediate rep­ resentations. Appendix C shows a scheme of these components. Typically the process will include the following [Agosti 84 p7] :

1. Information requirements design. This is an interface between the analysis and design processes

and represents the mapping of analysis into design. [Davis 85 p473] discusses a contingency

approach to determine information requirements at the organisational, data base or application

level. Many design methodologies consider the requirements specification as a pre- requisite.

2. Conceptual design. This leads to the construction of the conceptual schema which is not con­

strained by the information structure requirements of a specific data base management system.

The conceptual schema integrates the user (application) views into an overall conceptual view

that resolves 'view conflicts'. 'Metadata', meanings ascribed by the designer to data kept in the

database, is collected and may be managed by a data dictionary system.

3. Implementation design. This involves mapping the conceptual model to the structure of the

selected database management system be it relational, network, hierarchical or some alternative

data model. Transaction analysis is performed to establish efficient access path strategies. This

step is also referred to as internal schema design. It is still one step removed from the physical

Data Modelling 6-1 level and assumes a virtual hardware environment. Typically this phase will be conducted under

supervision of a systems analyst or database administrator.

4. Physical design. The mapping of the internal schema to the physical storage structure is com­

pleted. The physical space for records and indexes is defined along with page and block sizes.

This step is naturally, highly system specific as it deals with performance tuning and optimisa­

tion. The database managment systems software may not support this phase directly. Typically

a database administrator and/or systems programmer would be responsible for this phase.

The distinction between these components is made necessary by the existence of the ANSI/SPARC multi-level database architecture. If.this architecture is adopted, then it follows that the data modelling process should support each of its elements. This has resulted in the phased approach presented above.

There is no implication that these phases must be approached sequentially. Iteration and abstraction are implicitly supported. For example, at the conceptual design phase, a macro conceptual model which documents the major entities and their relationships, may be prepared as a prelude to the preparation of a detailed conceptual model, or to the preparation of detailed requirements specifica­ tion. The use of iteration and abstraction corresponds with the observed practice of data modelling in many organisations.

6-2 Data Modelling CHAPTER 7

INFORMATION SYSTEMS LIFECYCLE

An important concept, in placing data modelling in the context of information system development is the systems lifecycle. This defines a model of the activities comprising an information systems development and evolution. A representative model detailing the typical phases of such a lifecycle is taken from [Wasserman 83). For the purposes of this analysis six broad phases are distinguished:

Analysis of the system to establish a requirements specification. A description of the activities,

data, information flow, relationships and problem constraints. ii Functional specification to detail the processes to be performed by the system. External software

design. iii Design of the internal structure of the software to provide the functions previously specified

resulting in a description of the system structure, the architecture of the system components, the

algorithms to be used, and the logical data structures. iv Systems test and implementation. v Validation of the development process to ensure that it is of acceptable quality and that it is an

accurate transformation from the previous phase. v1 Evolution and ongoing maintenance as a result of new requirements and/or the discovery of errors

in the current version of the system.

The inclusion of Phase V, Validation does not imply that it is a single phase but rather that it is performed continuously during the development lifecycle.

An information systems design methodology ideally should support all phases of Wassermans model.

Data modelling as a component of information systems design is usually associated with Phase III although there is often an overlap with parts of Phases I and II.

Information Systems Lifecycle 7-1 The phases of this lifecycle model will be used to classify data modelling methods and methodologies in the following chapter.

7-2 Information Systems Lifecycle CHAPTER 8

DATA MODELLING METHODS: FEATURE ANALYSIS

A literature review in the area of data modelling and information systems design methodologies reveals a considerable number of alternatives available to the systems analyst and designer. These alternatives vary in their comprehensiveness and in the phases of the system lifecycle and database design task which are supported. In this chapter four data modelling methods were reviewed. These are :

1. Entity Relationship (ER)

2. Nijssens Information Analysis (NIAM)

3. Fact Based Data Analysis and Design (KENT)

4. Active and Passive Component Modelling (ACM/PCM)

ER and NIAM were included in this review because they have a significant development history and user base vis-a-vis data modelling. ER modelling was one of the first proposals in the area of semantic modelling and has had a substantial influence on the developments in this area. It has a large installed user base. NIAM, reflecting its academic origins, claims a strong theoretical basis, emphasizes binary data modelling and is achieving recognition as a superior analysis and modelling method. Both techniques are featured in case study presentations. KENT was chosen because of the experience gained in its use as a teaching tool at the University of New South Wales and because it is representative of a pure binary modelling approach. A case study presentation has been included.

ACM/PCM as the most comprehensive method and with the strongest theoretical development was chosen because of the emphasis it places on semantic modelling, in particular, the behavioural aspects of an information system. As a relative newcomer no commercial implementation could be found, however the theory serves as a useful indicator of the direction in which data modelling practice may be heading.

Data Modelling Methods: Feature Analysis 8-1 The reviews which follow draw heavily on the material contained in the major reference paper for

each method. By necessity most of the concepts represent a summary of the respective authors own

material. Insights, in the form of a comparative review are contained in chapter 9.

8.1 Entity Relationship Modelling

The concept of ER modelling originated in the artificial intelligence, knowledge base research of the

seventies. The aim had been to develop a data model which could express data semantics with the

most influential work in this respect being that of [Chen 76). ER modelling subsequently developed

as a high level modelling tool orientated towards the definition of structural data characteristics. It

utilises binary modelling concepts representing data structures through entity-relationship diagrams

and relations. An enterprise schema is produced by following a top down development strategy.

Logical design is the major function however the analysis phase is not explicitly supported and only

general guidelines are presented on the classification of entities, attributes and relationships. The role

of the conceptual schema is not emphasised.

8.1.1 Concepts

The entity-relationship model [Chen 76) adopts the view that the real world consists of identifiable

entities and relationships. Chen describes an entity as a 'thing' which can be distinctly identified giving

the examples of a person, company or event. A relationship is then defined as an association among

entities, for example a marriage is a relationship between two 'person' entities. The question inevitably

arises as to how a relationship can be distinguished from an entity or an attribute. Chen notes that

the distinction is in the view taken by the designer.

"We think that this is a decision which has to be made by the enterprise administrator. He should

define what are entities and what are relationships so that the distinction is suitable for his environ­

ment."

An entity set is the entity classification used in a particular environment. For instance, Employer or

Department are entity sets. These need not be mutually disjoint.

8-2 Data Modelling Methods: Feature Analysis An attribute is defined as a function which maps from an entity set into a value set, or a product of value sets. Associations between entities, or relationships are defined by Chen as a mathematical relation.

There are four basic steps in designing a database using the entity-relationship model :

1. identify the entity sets and relationship sets of interest that are significant for the view of the

enterprise

2. identify semantic information in the relationship sets to determine the order of relationships ie

1:1, 1:N

3. elicit attributes that establish values for the entity

4. organise data into entity/relationship relations and decide primary keys

ER adopts a three level framework corresponding to logical views of data. At level 1 is the information concerning entities and relationships. These are taken as given in the model and no reference is made to analysis support. The information about entities is distinguished from the information about relationships so as to prepare a conceptual information structure. At level 2, the information structure is presented by considering the representations of conceptual objects. Entity/Relationship relations are produced in diagrammatic and tabular form. Attributes of entities and attributes of relationships are mapped to value sets and from this mapping primary keys can be determined. An entity key is a group of attributes in which the mapping from an entity set to a value set or group of value sets is one to one but new attributes may need to be introduced to make this mapping possible. In some cases a relationship may be required to uniquely identify an entity set. Chen uses the example of employee dependents. Dependents may be identified by their names and by the employee (entity) primary key.

This is an example of a weak-entity relation. Entities not requiring a relationship for identification are regular-entity relations.

Semantics are included in the entity-relationship model through the use of entity-relationship dia­ grams. These diagrams are produced through the addition of a relationship symbol to a network

Data Modelling Methods: Feature Analysis 8-3 data model. They represent entities and relationships as symbols with the mappings between them

categorised as 1:1, 1:N or N:M. In this way the relationships and the roles they play are made explicit.

With the ability to represent entity and relationship relations in a tabular format it is possible to define the semantics of information retrieval requests and updating transactions.

Chen compares entity-relationship relations to the concept of relations as used in the relational data model. The latter concept of a relation is 'any grouping of domains'. To produce third normal form relations it is usually necessary to use a transformation/decomposition process. Arbitrarily grouped relations with the addition of semantic information concerning functional dependencies can be trans­

formed into third normal form relations. It is claimed that entity-relationship relations do not require

this process. Using a top-down strategy, semantic information is applied to organise data directly into

third normal form.

Chen argues that based on the definition of an attribute used in the model, entity-relationship relations will be produced in a third normal form. It appears that a major problem with this could be the

assumption that the definition of an attribute as used by the method, will correspond with the natural

view of an attribute held by the analyst/designer. Through observation of the method in practice, it

seems that for non-trivial designs it is unlikely that entities, relationships and attributes will always be

chosen in such a manner as to conform to Chen' s definitions. As the method provides no guidelines

for the selection and even the classification of data into entities and relationships it seems unreasonable

to expect that relations will be constructed directly in third normal form.

In the final section of his paper Chen shows how an entity- relationship model can be used as a basis

for the unification of different data views. A method of establishing a network model and entity set

view of data is presented.

8-4 Data Modelling Methods: Feature Analysis 8.2 Fact Based Data Analysis and Design

KENT is a binary modelling method for data analysis and design which utilises a simplified form of the

entity-relationship model for fact (relationship) specification. Analysis and logical record design are

the major functions however in some areas the method goes beyond what might be considered the bounds of logical design. The method is structured on attribute synthesis (bottom-up) design with the

output taking the form of normalised relations. There is limited semantic expressiveness (structural

only) and no support for a conceptual schema (as defined in this paper). Representation (of data) is

treated extensively.

8.2.1 Design Process

Data analysis and design under the KENT method is split into seven phases [Kent 84). An outline of

these phases and the major tasks within them is as follows.

1. Specify the facts to be maintained, in term of relationships among entities.

2. Generate a pseudo record for each fact.

3. Identify pseudo keys for each pseudo record.

4. Merge pseudo records having compatible keys.

5. Assign representations.

6. Consider alternative designs.

7. Name the fields and record types.

Fact specification is the starting point of the method however it is not supported by a procedure for

identification of entities or relationships. A 'fact' is defined in the method as being 'something which

connects things together'. This is in contrast to many data modelling methods which suggest that

there are two kinds of facts. Those about things. that is attributes, and those that connect things, that

is relationships. With only the one 'fact' concept, the classification into one category or the other,

depending on the view adopted, is not required. This does not imply that the facts defined in phase

1 will not later be seen as the basis of attributes and relationships but that they do not need to be

classified as such by the analyst.

Data Modelling Methods: Feature Analysis 8-5 Whilst described as a binary relation modelling method not all facts will involve pairs of things. Some facts wiJJ involve three things and are labelled ternary. In general, any number of things may be involved. N-ary relations are the generic term. In order to qualify as an n-ary fact it must already be irreducibly decomposed. In other words, the same information must not be derivable from a combination or join of any set of binary (or n-ary) facts whilst maintaining database integrity. [Consider the ternary relation CONCERT consisting of the fields Performer, Location, Date. No binary subset(s) of this relation can be formed which will consistently provide the same information when joined].

If we recognise that a fact may participate as a 'thing' in some other fact then it is possible to represent all facts in a binary construct. A hierarchic structure of facts can be used. The example taken from

(Kent 78 p149) is based on (P)arts, (W)arehouses and (S)uppliers in which a given part may be ordered from a number of suppliers for a number of warehouses. This could be represented as a ternary relationship. Considering first the binary relationship between parts and warehouses (PW) labelled

'allocations' a binary relationship can then be defined between allocations and suppliers S(PW). This identifies which supplier is responsible for which allocations (supplying parts to a warehouse). The

ternary relationship has thus been rendered into a high level binary form. The problem remains

that this is only one of three possible combinations which could have been chosen. Alternatives are

P(SW) and W(SP). For a relationship of degree 4 there are 15 possible permutations. How should n-ary

relations be decomposed? By implementation considerations? Clearly, an arbitrary choice or even

a decision based on the current implementation requirements should not be made when modelling

at the conceptual level. This suggests that the use of irreducible n-ary relations (as utilised in the

relational model) is a more natural and simpler means of representing relationships.

For the analyst, normalised records are a valuable design objective. Such records have minimal

redundancy and exposure to update anomalies despite possible trade offs in retrieval efficiency. As

opposed to a normalisation design process in which records are decomposed, the method follows a

'synthetic' approach in which records are constructed in normal form directly. Intuitively, he argues,

that until the phase 'consideration of alternative designs' the records should be in fifth normal form.

8-6 Data Modelling Methods: Feature Analysis This is because pseudo records (irreducible n-aries) are initially in fifth normal form and that merging on keys (for binary records) preserves normal form.

This argument is appealing, however it holds only when the facts have been specified independent of each other and are fully decomposed. If some facts can be derived then redundancy will exist and if not fully decomposed then they will not be in fifth normal form. The method provides no formal strategy for ensuring these pre-conditions.

The determination of pseudo keys requires that the nature of the relationship be classified as one-to­ one, one-to-many or many-to- many. Kent uses the concept of participation to express the relationship between binary pairs. Least participation (LP) may take the values of O or 1 and maximum participation

(MP) the values 1 or N. The combination of LP' s and MP' s which are possible are :

0 * 1 - at most one

1 * 1 - exactly one

0 * N - some or many

1 * N - at least one

A candidate key can only be selected from a field with a maximum participation of 1. A minimum

participation of O for a key field will imply that nulls for non-key fields are accepted. When both

fields have a maximum participation of N, a compound key involving the whole of the pseudo record

is required. With n-ary relations some decision needs to be made about combinations of roles. As

previously discussed, it is possible to represent all relations in a binary form but depending on the

degree, a large number of permutations may result. For participation purposes a single permutation

must be used. There is no real basis on which the selection can be made other than the analyst

considering implementation requirements. This is undesirable at the conceptual modelling level but

unfortunately the problem is not addressed in the method.

At this point there exits a record for each fact. The natural thing to do is to merge records wherever

possible. The objective of merging is to collect all single-valued facts about the same thing into one

Data Modelling Methods: Feature Analysis 8-7 record. A separate record is provided for each many-to-many relationship which includes all single­ valued facts about the relationship. Merging can occur where entities and keys are compatible. A simple merge is conducted when entity types are the same and both contain full populations of the entity type (ie. LP = 1). Merging is possible for unequal populations (LP = 0) if the fields of the resulting pseudo record can be padded with nulls, but to make such a decision may well be beyond the limits of logical design.

Representation of entities can be deferred to late in the design process. By first designing for the facts to be maintained and later assigning representation some complexity can be avoided. Repre­ sentation is required because symbols are needed as surrogates to express real world facts in data.

Unfortunately entities rarely have a simple symbol which is unique. Symbol types are considered as

character strings occuring in a data field. To distinguish one symbol type from another may require a description of its properties, for example, length and numerical base as well as an identifier. The

method devotes considerable time to the discussion of representation techniques including structured

symbols eg. dates, derived representations and compressed representations (coding), and the quality

of a representation.

Consideration of alternatives may involve changes to the fact specifications or to the assumptions

which dictated the participations of pseudo records. Alternatives in merging records may be consid­

ered. The existence of this phase emphasises the incremental and recursive orientation of the method.

There are no systemmatic guidelines for this phase.

8.3 Nijssen's Information Analysis

NIAM [Verheijen 82) is a binary based, top-down method for data analysis and database design. It was

developed in the early seventies at a time when physical database design was the primary concern of

database design methodologies. As experience with database grew it became apparent that a means

of specifying the information content of a system was required which would be independent of its

implementation characteristics. The concept of a conceptual schema developed from this. NIAM was

designed as a method to define a conceptual schema. This is labelled infor111atio11 analysis by NIAM,

8-8 Data Modelling Methods: Feature Analysis a term equivalent to the ISO information modelling concept. Whilst information analysis is the clear strength of NIAM, the method has been gradually expanded to include business and process analysis and to provide automated support for documentation and implementation. A graphical notation is used to enhance communications between analysts and users which represents both the structural and behavioural properties of an information system.

8.3.1 Concepts

The two major concepts of NIAM are the Conceptual Schema and Information Flow Diagrams. The

NIAM conceptual schema is based on five principles, four of which correspond to the ISO report on

Conceptual Schemas (1985).

1. The first of these says that "all traffic between a user and an information system consists of deep

structure natural language sentences."

The 'deep structure' is exhibited by the ability to transform the sentences into a variety of other

representations. The representations may be graphical, tabular or in the form of predicate calcu­

lus. Elements of a natural language sentence can be classified into lexical and non-lexical objects,

sub-types and idea and bridge types (a discussion of these terms follows).

2. The second principle holds that "there is one grammar, called conceptual schema, which com­

pletely and exclusively prescribes all the permitted transitions of the database."

The conceptual schema consists of the above mentioned sentence types and a set of constraints.

These can be fully expressed in a formal conceptual manipulation language. The language is set

orientated allowing relational style query support.

3. The third principle states that there is "an internal schema, which prescribes how all the permitted

states of the conceptual data base are to be transformed into a machine data base, sometimes

called physical data base."

4. The fourth principle says "there are external schemas which describe views of the data base as

can be seen by particular users or groups of users."

These views are not restricted to subsets of the conceptual schema (natural language sentences)

but might be COBOL records, CODAYSL sets and records or relational tables.

Data Modelling Methods: Feature Analysis 8-9 5. The fifth principle, called Meta, means that the three schemas of NIAM can be considered as a

data base. This allows the data dictionary and the data base management system to be treated

as an integrated package.

An information flow is considered to be a stream of messages, which represents a communication between two partners. It therefore has an origin and a destination.

An information system may be conceived as a function which transforms information flows. Accord­ ingly a function has the capability to transform an information flow such that the incoming flow is different from the outgoing flow.

The transformation of information flows at a system level is often complex. To manage this complexity decomposition is applied to produce a number of sub-functions. Decomposition is applied until functions result for which the transformation can be described in full and for which the information flows can be detailed. At this stage it is appropriate to express each level of decomposition in a graphical format.

The resulting diagrams are called Information Flow Diagrams. Information Flow Diagrams reveal the flows of information between functions without showing physical or control details. They consist of four primitives each with a defined graphical symbol. A function, represented by a square; an information flow, represented by a line with arrow; an information base, represented by an online file flowchart symbol; and the environment represented by an oval.

From the information flow diagrams the analyst/user is in a position to define the structure of the information flows.

8.3.2 NIAM Development Lifecycle

The first phase of the development lifecycle supported by NIAM is business analysis. This involves analysis of the object system to establish a model. If it is shown that an information system would improve object system performance then the next phase is information analysis.

8-10 Data Modelling Methods: Feature Analysis The first step of information analysis involves making an inventory of all functions that the information system is expected to support. These functions are then decomposed through the use of Information

Flow Diagrams (IFD's) to a level at which the individual flows and the transformations performed by the functions is clear. Each of the elementary information flows gives rise to a Information Struc­ ture Diagram (ISO). Constraints and functions are formally described and documentation support is provided by an information dictionary. The output of this phase is the conceptual grammar.

Implementation can be supported through the combination of an information dictionary and software generator. The conceptual schema can be transformed into a database schema and data manipulation programs can be generated from the conceptual manipulation language.

8.3.3 Information Base : NIAM Sentence Model

Information flows can be described in NIAM through the use of natural language (deep structure) sentences. Analysis of the structure allows identification of two classes of objects, lexical objects

(LOTS) and non-lexical objects (NOLOTS). A lexical object can be considered a naming convention as in surname, or for a non-lexical object as in person. Hence non-lexical objects might be considered as entities with lexical objects their representations.

Associations can also be identified. NIAM decomposes sentence types into binary associations. These may be bridge type or idea type, associations. An instance of a bridge type association might be : the employee has employee# 2341

Hence a bridge type is an association between a non-lexical object (employee) and a lexical object

(employee number). This corresponds to the familiar concept of an entity and an attribute.

An instance of an idea type might be : the employee works for the department

Hence an idea type is an association between two non-lexical objects. This corresponds to the notion of a relationship.

Data Modelling Methods: Feature Analysis 8-11 The concept of idea and bridge types and lexical and non-lexical objects allows a distinction to be made between things and their names. When a natural language sentence is decomposed into binary ideas and bridges the information content of the sentence is conveyed by the ideas. Bridges enable the exchange of information through representation of non-lexical objects but they do not convey information themselves.

8.3.4 Semantics

To complete the conceptual model it is necessary to specify the rules which describe the behaviour of the object system. That is the semantics of the information must be expressed. NIAM uses the concept of constraints and subtypes for this.

A constraint is part of the conceptual grammar, the purpose of which is to prevent discrepancies developing between the content of the information base and the phenomena of the object system.

Many of the constraints can be expressed graphically as part of Information Structure Diagrams. Where constraints cannot be expressed in this manner they can be expressed procedurally in the conceptual grammar. The major types of constraint include:

• identifier - these are used to define the populations of binary idea or bridge types. Populations

may be 1:1, 1:N, N:1 or N:M.

• subset - this is used to express a relationship between an idea or bridge type and another idea

or bridge type based on similar object types such that the population of one is a subset of the

population of the other.

• equality - this expresses that the population of an idea or bridge type is equal to that of another

idea or bridge type for the same objects. A simple example of this is the equality between the

ideatypes, 'start date' and 'end date', for the non- lexical objects, 'session' and 'date'.

• uniqueness - a combination of role occurrences from different idea or bridge types uniquely identi­

fies a non-lexical object. For example an 'enrolment' occurrence may be identified by a •student'

occurrence and a 'course' occurrence.

8-12 Data Modelling Methods: Feature Analysis • disjoint - this asserts that the populations of two subtypes exclude each other, in the manner that

subtypes, 'pass students' exclude the subtype, 'failed students' for the type, 'students.'

• total role - this states that every object of an object type acts in a certain role. For example the

object system may indicate that an 'author' always has a 'book'. A total role constraint would

imply then that the information base will not record information on 'authors' who do not have

an associated 'book'. 'Author' has a minimum participation in the relationship of 1.

8.4 Active and Passive Component Modelling (ACM/PCM)

ACM/PCM [Brodie 82) developed in a University environment as a semantic data modelling method­ ology (that is, it utilises an extended data model as discussed in chapter 4 section 2) for the design •· and development of moderate to large size database-intensive applications. It makes extensive use of abstraction principles as a means of managing complexity and ensuring a high degree of semantic integrity. The data model utilised is the extended semantic hierarchy model (SMH + ). This embod­ ies the main concepts. Structural aspects of SMH + were developed first, after which the database lifecycle and the role of data design was defined. This led to the development of the ACM/PCM framework. SMH+ was then extended to include behavioural concepts. Tools and techniques for support were added.

ACM/PCM places equal emphasis on the structural and behavioural aspects of data base systems.

Discrete strategies are provided for dealing with these aspects. Development proceeds in a parallel fashion resulting in a conceptual model. In ACM/PCM the conceptual model is a network of data abstractions related by the three forms of abstraction supported by the methodology.

8.4.1 Abstraction Modelling

ACM/PCM distinguishes three levels of abstraction which are similar to the ANSI/SPARC architec­ ture. The levels are, the transaction level, the conceptual level and the database level. Modelling is conducted at all levels for behavioural and structural properties and proceeds as a two step pro­ cess. The first step identifies and relates the gross structural and behavioural properties of objects.

Diagrammatic tools called action and object schemes are used.

Data Modelling Methods: Feature Analysis 8-13 In the second step the detailed design specifies the properties of the objects. A specification language called BETA is used for this purpose.

The transaction level is designed to meet the end user application requirements in the manner of an external schema. Structural and behavioural properties of transactions, queries and reports are specified.

The name 'Active and Passive Component Modelling' stems from the treatment of objects as data abstractions. Due to objects being highly interrelated, an operation invoked on one object may result in operations being invoked on many others. Objects may then be classified as taking a passive or an active role. An active role implies that an object can invoke operations over other objects in order to complete a transaction. For example, 'sales order' might invoke 'reduce inventory', 'customer credit' and 'inventory order'. In a passive role operations are invoked on an object as with 'reduce inventory'.

8.4.2 Structural Modelling

Structural properties are expressed in SMH + through the use of objects and four forms of abstraction which relate objects. The forms of abstraction are:

Classification, Aggregation, Generalisation and Association.

Classification considers a collection of objects as a higher level object class. An object class is defined as a precise characterisation of all properties shared by each object in the collection. Classification is an instance-of relationship between an object class in a schema and an object in a database. The example given [Brodie 82 p44] is of object class 'employee' with properties 'employee-name', 'employee­ number' and 'salary'. An instance of the object may have the values 'Paul Groves', '8020665' and

'$34,000'. In structural modelling classification allows objects to be grouped into classes which are described by common properties.

Aggregation, generalisation and association are used to express relationships between objects. Ag­ gregation considers the part-of relationship in which a relationship between component objects is con­ sidered a higher level aggregate object. The example given [Brodie 82 p44) concerns an 'employee'

8-14 Data Modelling Methods: Feature Analysis who may be considered an aggregation of the components, 'employee-number', 'employee-name' and 'salary.'

Generalisation is a form of has-subtype relationship in which a relationship between two objects is considered as a higher level generic object. 'Employee' may again be considered the generic object for the objects of 'manager' and 'secretary.'

Association is a member-of relationship. Related member objects are considered as higher level set objects. The given example is of the set 'management' being an association of a set of employee members.

Composition/decomposition and generalisation/specialisation are the major tools for structural mod­ elling. These concepts are supported by 'property inheritance'. The abstraction principles of ag­ gregation and generalisation support upward inheritance in which properties of the aggregate or set are derived from the properties of components or members. Downward inheritance is supported by generalisation in which all properties of an object are inherited by each of its category objects.

For example all properties of 'employee' are inherited by 'secretary' or 'manager.' The category of secretary only requires those properties which distingish it from the generic object. This might be

'job-title.'

At the conceptual level structural modelling will involve each of the abstraction forms being applied.

This results in the identification and relationship of all objects of interest. Hierarchies result from their repeated application. At the transaction level objects and their relationships are defined for the scope of the transaction. This might involve the introduction of new objects not yet incorporated in the conceptual model. Object schemes are then used to graphically represent the objects and structural relationships in a manner similar to entity- relationship diagrams.

Data Modelling Methods: Feature Analysis 8-15 8.4.3 Behavioural Modelling

At the conceptual level behavioural modelling involves the identification, design and specification of actions for each object. At the transaction level it involves the identification, design and specification of transactions. Gross modelling precedes detailed specification. There is however, no requirement that gross conceptual modelling precede gross transaction modelling. Due to their dependence, an iterative process is usually followed.

A transaction is defined as :

'An application-orientated operation which alters one or more objects.' A transaction is designed to meet specific user requirements. It is comprised of a number of actions. An action is defined as :

'An application-orientated operation designed for one object to ensure that all the properties of the object are satisfied.'

Actions are the only means by which an object may be altered. Before the object is altered each action will specify pre- conditions and post-conditions. Actions on other objects may be required. Semantic

~. integrity is ensured since all constraints will be satisfied by all attempts to alter it. The behaviour of an object will consequently be completely defined by its actions. SHM + utilises the primitives, INSERT,

UPDATE and DELETE.

High level composite operations based on these primitives are constructed through the use of control abstractions, sequence, choice and repetition. These have structural equivalents of aggregation, generalisa­ tion and association.

Behaviour schemes are used to graphically represent the properties of a single action or transaction.

They integrate structural and behavioural properties in one representation. As with structural design gross behavioral properties are modelled first followed by detailed specification. Details are specified in the BET A language which is based on axiomatic and predicate transformer techniques.

8-16 Data Modelling Methods: Feature Analysis 8.4.4 ACM/PCM Design Modelling

The following table is a summary of the major steps for the logical design and specification phases of

ACM/PCM. It is reproduced from [Brodie 82) page 50.

Figure 1: ACM/PCM Design Phases 1. Conceptual modelling 1.1 Conceptual modelling of structure 1.1.1 Structural design An object scheme for each object and an integrated object scheme for the entire application. 1.1.2 Structure specification A structure specification for each object. 1.2 Conceptual modelling of behaviour 1.2.1 Behaviour design One insert, one delete and at least one update action scheme for each object. 1.2.2 Behaviour specification A behaviour specification for each action scheme. 1.3 Encapsulation One data abstraction for each object consisting of its structure and behaviour specifications. 2. Transaction modelling 2.1 Transaction design A transaction scheme for each identified transaction. 2.2 Transaction specification A transaction specification for each identified transaction.

Data Modelling Methods: Feature Analysis 8-17 CHAPTER 9

DATA MODELLING METHODS: COMPARATIVE REVIEW

The objective of this chapter is to conduct a comparative review of the four data modelling methods

introduced in the previous chapter. Originally there had been the intention to compare their effective­

ness and efficiency when applied to the development process. It should be clear from the individual

analyses however, that the methods differ markedly in their objectives. This makes a comparison of

the means of achieving those objectives of little value. Accordingly, this section will concentrate on

improving the classification process by highlighting the strengths and weaknesses of the respective

methods. For this purpose the following taxonomy will be used:

• Lifecycle Support

• Representation and Communicability

• Abstraction Support

• Documentation

• User Orientation

• Semantic Expressiveness

• Quality Control

The taxonomy was derived from a review of the CRIS 2 conference proceedings (Comparative Review

of Information Systems) [Brandt 83], [Wasserman 83] and [Rzevski 83] and from consideration of the

major characteristics of data and data models as discussed in chapters three and four of this paper.

Each element of the taxonomy is briefly described then is followed by an analysis of the four data

modelling methodologies (methods).

Data Modelling Methods: Comparative Review 9-1 9.1 Lifecycle Support

This looks at the specific phases of the systems lifecycle, and the tasks within the database design phase, which are supported by the method. The focus of this report is on data modelling, however the extent to which the method supports adjacent analysis and implementation phases is of considerable importance due to the iterative nature of many systems development efforts and the consequent feed­ back and review process. The reference framework outlined in chapter 7 will be used for classification purposes.

KENT

The analysis function, phase 1 of the system lifecycle, and design of the logical data structures, phase 3, are strongly supported. It is argued [Kent 84) that the method extends beyond what might be regarded as logical design. The terms conceptual schema and internal schema design are not used by Kent and his comments are not based on the ANSI/SPARC architecture. Using this architecture however, it appears that the method is primarily directed at internal schema design due to the extensive treatment of representation. Some aspects of the conceptual schema design are supported.

our aim is to produce the actual record designs as they will be implemented in a database ....

What we do not deal with are the other aspects of resource and access path management." [Kent 84 p99]

Analysis of the application is supported through the fact specification phase of the method. Identifi­ cation of the entities and the relationships between them, in fact form, lays the foundation for a clear understanding of the application.

ER

As outlined in [Chen 76), ER modelling is mainly concerned with phase 3 of the systems lifecycle, the design of logical data structures. It takes as given the classification of an object system into entities and relationships, hence it does not provide formal support for the analysis phase. The end result of ER

9-2 Data Modelling Methods: Comparative Review modelling is entity/relationship relations. These represent a conceptual model of the object system.

In (Date 86] it is argued that the ER model is little more than a collection of data structures and that the purpose of ER modelling is determination of structure only. The integrity and manipulative aspects

(behavioural) are not considered.

Implementation support is not provided in a formal manner, however, (Chen 76) demonstrates the operation of view derivation for the relational, network and entity-set models.

NIAM

Support is provided for the first three phases of the information systems lifecycle. That is, systems analysis for requirements specification, functional specification, and logical design. NIAM uses the term business analysis to descibe the systems analysis function and information analysis to describe the functional specification phase. Abstraction modelling is used to represent information flows and functional specifications. Information analysis is the most developed phase and includes the data system design phase.

Conceptual modelling is the major function of NIAM. It results in a conceptual grammar which de­ scribes all structural and behavioural aspects of the object system. Tools are available for imple­ mentation design (i.e. the mapping of the conceptual schema to a target internal schema) (Verheijen

82)

ACM/PCM

Is being developed to support a six stage database life cycle, commencing with requirements analysis and specification through design, implementation and evolution. It is a composite of methods that apply to different phases of the life-cycle model. Not all phases are supported equally. The major reference (Brodie 82) deals only with the logical design and specification phases. Whilst it seems that these are the most developed phases there is support evident for the implementation design and validation phase. Requirements analysis and specification appears to be undeveloped [Brodie 83).

Data Modelling Methods: Comparative Review 9-3 9.1.1 Representation and Communicability

Representation refers to the way in which the method models the object system and in particular, how it presents results. For instance whether graphical or list based data models are supported. This will provide a measure of communication support between analyst/designer and user, and between analysts. Similarly, it may suggest which methods or parts thereof suitable are for automation e.g. data dictionary, software generation.

KENT

This method combines list and diagrammatic representations of data but is biased towards the former.

The object system is modelled through n-ary relationships which can be presented in several formats.

Output of the method is typically in the form of relational record structures.

The first step in the method, fact specification, presents n-ary relations as a list. The facts are then presented in diagram format as pseudo records. Pseudo records consist of boxes, which represent data items or fields (generally true), and relationship links, indicated by dotted lines. This diagram notation can be utilised through all phases of the design. That is, definition of relationship links, key specification, and merging of pseudo records to create implementation records. As a diagrammatic representation the constructs are very limited and support structural relationships only.

The simplicity of the underlying concept, binary modelling (as a special case of n-ary modelling) and the simple presentation of the relations provides for good communication between analysts and users.

Similarly, communication between analysts is supported because there is little risk of ambiguity.

Automation of the design process would not be difficult. A data dictionary could be used to manage the n-ary relations and pseudo records could be represented with simple graphics. Considerable potential exists for automation of the merging process. Merging follows well defined principles and assuming that the method could be extended to model semantics of the relations it would be a valuable exercise to formalise this process.

9-4 Data Modelling Methods: Comparative Review ER

This method uses diagrams extensively (called entity- relationship diagrams) to represent the logical

structure of the object system [Date 86 p612] and as a tool for database design [Chen 76 plO]. Tables

(entity and relationship relations) are used to represent the output of the modelling process. The

relations are basically equivalent to those of the relational model but with more extensive semantic

detail. The fundamental object type is the n-ary relation.

[McFadden 85 p198] describes the ER model as augmenting the network model through the use of

a special symbol, the diamond, to explicitly model relationships. [Date 86 p 612) considers the ER

approach as 'a thin layer on top of the relational model'. Both of these statements are supported. The

diagrams are identifiable as a basic network whilst the tables are relational. [Chen 76) emphasizes the

ability of the ER approach to unify views of data such that implementation data models (e.g. relational)

can be easily derived.

An entity-relationship diagram uses three simple symbols to depict an object system. A labelled box

represents an entity set. A labelled diamond between entity sets represents a relationship set, and a

labelled elipse represents a value set (attribute). Connecting arcs are used to specify the relationship

roles (i.e. 1:1, 1:N etc.) [Davis 85 p521). These diagram constructs allow specification of structure

and some degree .of semantic detail including an existence dependency and identifier dependency

[McFadden 85 p200].

With respect to communication support [Date 86 p612) comments "the popularity of entity-relationship

modelling as an approach to database design can probably be attributed more to the existence of

that diagramming technique than to any other cause." Analyst communications, and analyst user

communications are well supported by the simplicity of this tool.

Automation could be provided in the form of graphical support for entity-relationship diagramming.

Data dictionary support would be useful for maintenance of relations and a formal language for the

definition of a conceptual schema could be readily incorporated.

Data Modelling Methods: Comparative Review 9-5 NIAM

This utilises an extensive graphical notation for structural and behavioural aspects of an information

system. Two types of diagrams are used, Information Flow Diagrams and Information Structure

Diagrams. Output of the method is by way of a conceptual grammar.

The fundamental object type in NIAM is the binary relation which is represented by idea and bridge

types. The model is developed in graphical format based on the analysis of 'deep structure' natural

language sentences [Verheijen 82]. Information Flow Diagrams are very similar to the more widely

known data flow diagrams. The information flows included at this level are decomposed to produce

Information Structure Diagrams. These depict the structure of the data model by representing entities

(NOLOTS) by an unbroken circle, an attribute (LOT) as a broken circle and relationships (IDEA and

BRIDGE types) as labelled boxes between entities and attributes. The diagrams are used to model

semantics through the use of constraint and subtype notations.

As a communication tool the diagrams are excellent given that the analyst or user is familiar with the

conventions. The constructs are relatively simple but much more extensive than for ER or KENT. This

probably necessitates training. Once the concepts are understood the diagrams can be appreciated

for the compact yet comprehensive model they provide of the object system.

A data dictionary has been developed for use with the system and software generators based on the

conceptual grammar are available. Graphical support would be extremely valuable and not difficult

to integrate.

ACM/PCM

This method utilises a graphical notation and a specification language to model the object system.

Structural and behavioural properties are included. The graphical notation makes use of object

schemes and behavioural schemes which are used for gross design modelling. The specification

language BET A is used for detailed design [Brodie~ 82]. Output is a formal conceptual schema. The

9-6 Data Modelling Methods: Comparative Review underlying data model is hierarchic which gives rise to the models main data modelling concept - abstraction.

An object scheme graphically represents the objects and structural relationships of a database appli­ cation [Brodie 82 p45]. An object scheme is described as a directed graph in which nodes are strings denoting objects and edges identify relationships between objects. A graphic notation for each form of abstraction, aggregation, generalisation and association, is provided. A behaviour scheme is an ex­ plicit graphical representation of the gross properties of a single action or transaction [Brodie 82 p47].

A behaviour scheme combines behavioral information with the structural information represented by object schemes.

ACM/PCM has 'concentrated on simplicity' [Brodie 82 p43] however it is apparent that to model the object system expressing structural relationships and extensive behavioural relationships it has been necessary to include a relatively large number of modelling concepts. Consequently comprehensive­ ness has been achieved at the cost of relative simplicity. Compared to the previous three methods training would be necessary before it provided a similar level of communication support between analysts. Users would require extensive training to participate in the design process.

Automation could be provided for the detailed specification language BET A. A data dictionary is a virtual necessity because of the voluminous schema description and could be easily incorporated.

9.1.2 Abstraction Support

Abstraction is the operation of generalisation and in data modelling is represented by the ability to hierarchically decompose a system. Abstraction support allows different views of a system to be presented and is generally associated with a top- down development approach. Performance on this measure is an indicator of the potential to support the ANSI/SPARC database architecture.

KENT

The first phase of the Kent method requires specification of the facts to be maintained. The database design is then synthesised from the elementa1y facts. As a bottom up design technique there is

Data Modelling Methods: Comparative Review 9-7 consequently no direct support for hierarchic decomposition. Abstraction is not discussed in the documentation (Kent 84).

Facts are represented by KENT as n-ary relations. In his book, 'Data and Reality' Kent demonstrates the ability to represent all facts through binary relations. An n-ary relation which is implemented through binary relations is providing support for abstraction. In addition there is no requirement that the fact specification stage represent elementary facts. Accordingly, it would be possible for a designer to adopt a top down strategy (at least in the initial phases) through the use of 'high level' or abstracted binary facts.

It is argued [Kent 84] that the documentation of the entities and relationships from phase 1 of the method would comprise the nucleus of a conceptual model for the information system being modelled but it is not claimed that there is support for the derivation of this model. The ANSI/SPARC three level framework is acknowledged, however the method is biased towards the development of an internal model.

Modifications could be made to Kent to explicitly support abstraction and the ANSI/SPARC architec­ ture.

ER

This follows a top down approach, also called entity analysis, in which an entity relationship model is derived through an analysis of business processes and functions. Semantic information is progres­ sively added until the conceptual design is complete. The design is represented by entity-relationship relations.

ER modelling is described (Date 86] as a 'thin layer on top of the basic relational model.' It aims to produce a conceptual schema with support for decomposition. Structural modelling abstraction facilities are provided. In [Chen 76] it is argued that the entity-relationship model can be used to

9-8 Data Modelling Methods: Comparative Review derive views of data for the relational, network or entity set models. Consequently it supports the basics of abstraction principles and could support the ANSI/SPARC database architecure.

NIAM

Directly supports structural and behavioral abstraction through hierarchical decomposition of data flows and functions. NIAM treats an information system as a complex function which transforms information flows [Verheijen 82 p542]. A catalogue of system functions is produced which is then represented by information flow diagrams. These are decomposed until the functions and information flows can be described in detail. Strong support exists for the ANSI/SPARC database architecture.

ACM/PCM

Is based on the principle of abstraction supporting data, procedure and control elements. Abstraction is used as a key element in the management of complexity and in the specification and enforcement of semantic integrity. Of the four methods ACM/PCM provides the most extensive support for the priciple with both behavioural and structural abstraction techniques available. Structural abstraction techniques explicitly provided are :

• classification

• aggregation

• association

• generalisation

Behavioural abstraction techniques are considered in two groups, control abstractions and procedural abstractions. Under control abstractions there exists :

• sequence

• choice

• repetition

Data Modelling Methods: Comparative Review 9-9 Under procedural abstractions :

• actions for conceptual modelling

• transactions for transaction modelling

Decomposition is explicit through the definition of gross and detailed phases of structural and behav­ ioral properties. The techniques listed above are applied to approach decomposition in a step-wise manner [Brodie 82 p43). The ANSI/SPARC database architecture is strongly supported.

9.1.3 Documentation Support

This will cover two aspects. Documentation as regards the method itself and documentation of the object system design. The first measure influences the ease of learning of analysts and users and for a mature method largely reflects the logical completeness of the method. The second measure looks at the traceability of the analysis and design tasks and the ability to make the decision processes visible to other users and analysts.

KENT

As a relatively recent data modelling method the documentation has not been developed to the point of a detailed procedure. The paper [Kent 84) focusses attention on the concepts on which the method is based and on the major steps. A simple example is used to illustrate the method. In some areas the documentation (or perhaps the method) is clearly incomplete. In particular the treatment of n-ary facts requires more attention in the logical development and documentation thereof. Currently design experience and some intuition is required at these points in order to resolve problems. In [Kent 78] the issue of binary and n-ary relations is discussed at some depth but a clear approach for the purposes of the method is not made and the reader/designer to left to his/her own conclusions.

As regards ease of learning, the method is rated favourably despite the sho1tcomings of the documen­ tation. This is largely due to the simplicity of the modelling constructs and procedures.

With respect to the documentation of the object system design, the method rates well. Complete fact listings are generated from which the database is designed. These represent the entities of the business

9-10 Data Modelling Methods: Comparative Review and the relationships between them in a form which expresses the semantics of the information system.

It should be noted that at the fact specification level it is not necessary to have pre-classified entities and relationships. In following phases the synthesis of facts into records is highly visible. Design decisions are well documented by following the process of participation and key selection. This makes for a highly traceable design process.

ER

As a mature data modelling method, there is an extensive body of literature describing the application of this approach to the design task however, based on his original paper [Chen 76) it would be hard to describe the method as highly documented. As presented, the concepts are not difficult and the design follows a structured approach. It does, nevertheless, require substantial analyst experience to utilise the method properly. For example, no guide is given as to the categorisation of entities or relationships and yet this is a fundamental step in the method. If this is not done correctly then the design may well be inadequate for the information system requirements.

Again, due to the simplicity of the concepts, the method is not difficult for analysts to learn. For users the concepts may not be so easy to grasp because they utilise terminology with which many users would not be familiar.

As concerns the object system design, the method does not enforce documentation of all important decision processes. This reduces the ability to trace the design evolution. The analyst is called on to make design decisions, for example, entity classification, without the rationale of the decision being documented.

NIAM

Method documentation under NIAM is relatively extensive. The concepts, tools and the development phases are discussed in detail with the aid of examples [Verheijen 82). As a mature data modelling method it appears to be logically complete however it continues to be enhanced. As regards learning,

Data Modelling Methods: Comparative Review 9-11 it requires a greater investment of time than ER or KENT because of the comprehensive coverage of the

information systems design process. It does not however rely as extensively on analyst experience as

ER modelling. Being a binary based technique the concepts are easily understood but some difficulties

may be experienced with semantic modelling constructs. Users are able to relate to the design process with minimal training.

Design documentation is of a high standard. Decision processes are traceable because of the natural

language expression of the information system facts. Transformations from fact specification through

to record design are highly structured. The method recommends use of an information dictionary.

The documentation of analysis phases is stressed.

ACM/PCM

Method documentation of ACM/PCM is complex. In particular, the description of abstraction concepts

is difficult to follow from the major reference paper [Brodie 82). The structure of the design process

is outlined in a step by step tabular format but at a relatively high level. ACM/PCM is clearly an

extensive information systems design method and the data modelling phase is presented in the above

reference only as a subset of the full method. It utilises a large number of constructs to model the

semantics of the application which contriubutes greatly to the complexity. For an analyst to understand

the method and become proficient would require a considerable amount of time. On account of its

complexity it is not a method which facilitates user involvement. Documentation is being extended as

the development proceeds.

Being highly structured the method should produce suitable documentation of the design process. It

is not clear however, whether the decision processes could be easily traced.

9.1.4 User Orientation

This measure is utilised to establish the ease at which analysts can understand and become productive

in the method and the extent to which the method can be used as a communications tool between

analysts and applications users. The focus will be on the experience requirements of analysts and

9-12 Data Modelling Methods: Comparative Review application users and on the learning curve associated with the method. Results on this measure will

reflect the expected lifetime of the method and its ability to attract users.

KENT

Does not require extensive analyst experience. The primary modelling construct is the binary relation which does not present conceptual difficulties. Representation is via lists and simple diagrams which are easily mastered. These facilitate effective communications between analysts and between users.

The method caters for analysis and logical design phases only. Structural properties of data are mod­

elled but behavioural properties are not. This makes the method relatively simple when compared

to the more comprehensive approaches of NIAM and ACM/PCM. KENT follows a st~p wise devel0

opment strategy which combined with the simplicity of the constructs results in a shallow learning

curve. For the modelling of structural characteristics of data it is simple and effective and for this task

it would be expected to attract users. Ongoing development should ensure a growing user base.

ER

Compared with KENT, this method requires more extensive analyst experience to produce an 'effec­

tive' design. Again, the modelling constructs are binary based. Presentation is through diagrams and

tables which are easy to understand and are very effective for communications. The method caters for

some of the analysis phase but concentrates on logical design. It provides greater semantic expres­

siveness than KENT but this does not overly complicate the design. The learning curve is shallow. ER

already has a substantial user base and through continued development will probably retain a great

deal of support. Whilst the method is straightforward it is not as highly structured [Chen 76] as the

other three methods. This requires more input from the analyst. As a consequence it is easier to

produce bad designs. Experience is needed to support the intuitive design phases of the method.

NIAM

Data Modelling Methods: Comparative Review 9-13 Is a comprehensive data modelling method which supports requirements analysis, functional specifi­ cation and logical design. It exhibits a high degree of semantic expressiveness capturing both structural and behavioural characteristics of an information system. It is a highly structured design method that is binary based and emphasises diagrams as tools for communication and decomposition. Analyst experience requirements are greater than for KENT but probably equivalent to that of ER. The learning curve should not be significant but because of its comprehensiveness would most likely be greater than for ER or KENT. Users are able to understand the concepts with minimal training and to participate in design from an early stage. The method is attracting a growing user base.

ACM/PCM

Like NIAM this is a highly structured and comprehensive data modelling technique which emphasises the role of data semantics. It is based on the concept of abstraction and utilises the extended semantic hierarchy model. Structural and behavioural characteristics of an information system are modelled.

Diagrams (schemes) facilitate communication between analysts. This method requires the greatest level of analyst experience which reflects on the concepts and language utilised, and its comprehen­ siveness. It does not explicitly support a user role in design. The output is a formal description of a conceptual model which is not user friendly. Training would be required and it is expected that to become proficient the learning curve would be considerable. As ACM/PCM is directed at the design of complex database intensive applications it is expected that it will not attract a large user base for environments not exhibiting these characteristics.

9.1.5 Semantic Expressiveness

This is evaluated by examining the structural and behavioural constructs of the data model. Results on this measure reflect the support given to the definition of the conceptual schema (as defined by

ISO).

KENT

9-14 Data Modelling Methods: Comparative Review This method provides the least support for data semantics. Structural constructs are provided but

there is no formal mechanism for the inclusion of behavioural constructs. With respect to the for­

mer the major structural primitive is the fact which specifies objects and their properties (but does

not distinguish between them at specification stage). The pseudo record, a fact with participations,

is a structural relationship which specifies the functional relationships between and within objects.

Aggregation (the reader should refer to the review of ACM/PCM for a description of structural and

behavioural constructs used in this section) is provided through merging of pseudo records. Generali­

sation is supported informally as there is provision for the treatment of subtypes. There is no support

for sets (i.e. association).

Structural modelling in KENT is directed at the logical record level not the conceptual level (although

conceivably it could be used to produce a conceptual model).

ER

This method was the first widely recognised attempt to model data semantics. It is more expressive

than KENT. Structural concepts are supported but there are is only cursory support for behavioural

concepts. The major structural primitive is the attribute which is used to characterise properties of

entities and relationships [Brodie 83 p592). Classification, entity aggregation and attribute aggrega­

tion are supported through entity and relationship relations (sets). Generalisation (sub-. typing) and

association are not supported. Structural modelling is directed at the conceptual and logical record

levels.

Behavioural concepts are not part of the formal model but could easily be introduced. [Chen 76]

includes examples of the semantics of set operations and information retrieval requests.

NlAM

This methodology is specifically directed at the derivation of a conceptual schema. It provides for

extensive semantic expression of structural and behavioural features of an information system. The

Data Modelling Methods: Comparative Review 9-15 major structural primitives are objects (lexical or non-lexical) and types (idea or bridge). Abstraction principles are strongly supported, through the use of information structure diagrams (classification, aggregation and generalisation) and a conceptual grammar. The conceptual grammar formally de­ scribes structural and behavioural properties. Information flow diagrams are used to depict the latter.

Functional decomposition is applied through the diagrams to completely define the behaviour of the information system.

ACM/PCM

This methodology (like NIAM) provides for extensive semantic expression. Behavioural and structural characteristics are modelled explicitly through diagrams (schemes) and" described in a formal concep­ tual language (BET A). The major concept is abstraction. Structural and behavioural tools are provided which are based on this concept. Structural abstractions, classification, aggregation, generalisation and association are directly modelled in object schemes. Behavioural properties of an object are com­ pletely defined by its actions and the gross properties of actions are depicted in behaviour schemes.

Detailed properties are defined procedurally in the specification language BET A.

9.1.6 Quality Control

This feature reflects on the provision or availability of validation techniques for the method to ensure consistency and completeness of the system design. The provision of these features is considered in light of the method objectives. Design convergence (the extent to which the same model would result from the work of independent analysts), clarity of the design output, and detail resolution are considered.

KENT

The method provides no formal validation procedures to ensure that the design is consistent or com­ plete. Binary facts reflecting structure, as opposed to behaviour (information flows) are stated in the first phase. These are synthesised into logical records. The design can be considered complete with respect to the stated facts after the final step. Consistency in this method reflects on the degree of

9-16 Data Modelling Methods: Comparative Review normalisation. It aims to produce normalised records through synthesis. To validate the process the records can be checked for degrees of normalisation. This provides a check that the procedures have been followed correctly and that the facts had been stated independently. If either of these conditions does not hold then the design is unlikely to be in a fully normalised form. This is not however a formal part of the method.

Design convergence should be high given that facts are stated consistently. That is, assuming the output of the analysis process is constant there will be little variation in designs. The method provides for varying levels of detail resolution. As it aims to produce logical record designs detail resolution is relatively high.

ER

The method provides no formal validation procedures, nor are there guidelines for the classification

of entities, attributes or relationships. Accordingly the output, entity and relationship relations, are

based solely on the analysts view of the information system. A given fact base may give rise to a variety of designs. Consequently, design divergence can be significant. A detailed data model can be

produced but typically the ER approach is used for generation of an enterprise (or business) schema.

At this level validation techniques, apart from user reviews, may be difficult to apply.

NIAM

Is a methodology for conceptual schema design. Model specification and diagram construction is

iterative. A formal mechanism for consistency and completeness checking is provided by comparing

the design with requirements specifications and the conceptual grammar can be checked against

information structure diagrams [Brandt 83 p22]. NIAM is an analysis aid as well as design aid. Based

on a stable set of requirements it is expected that design convergence would be relatively high (better

than ER, similar to ACM/PCM). Output, to the required level of detail, can be produced in a clear and

precise form.

Data Modelling Methods: Comparative Review 9-17 ACM/PCM

Is a methodology for the conceptual schema design of complex database intensive applications. Each

stage of development is verified for completion and consistency by comparing the schemes and spec­

ifications of current representations with those of previous stages [Brandt 83 p14]. Design procedes

in two stages. Gross design followed by detailed design of structural and behavioural characteristics.

Iteration and decomposition are fully supported. Detail resolution is high with final specifications

close to program level as far as data structure and transactions are concerned [Brandt 83 pl4). Design

convergence should be greater than for ER modelling.

9.1. 7 Comparative Review - Summary

The methods differ markedly in their comprehensiveness with regard to the phases of the systems

lifecycle supported and the detail within a phase. KENT and ER are the ~ost restricted. NIAM and

ACM/PCM are considerably more detailed. This makes comparisions on some of the other criteria

difficult.

Each of the methods employs a variety of representation means. KENT uses lists and simple dia­

grams, ER uses diagrams and tables, NIAM uses diagrams and a formal conceptual grammar whilst

ACM/PCM uses simple diagrams and a specification language. KENT and ER benefit from their sim­

plicity in representation. They provide for good communication between users and analysts. NIAM

diagrams are somewhat more complex because of the number of semantic constructs supported. Nev­

ertheless they provide a compact notation and are effective in analyst communications. ACM/PCM

uses simple diagrams but emphasises the formal specification language. This provides ·a detailed

description but is not conducive to design communications especially when users are involved.

Automated support is provided for NIAM in the form of a data dictionary and software generator.

There are no tools for ACM/PCM or KENT but both these would benefit considerably from a data

dictionary tool. ER was not developed with automated support.

9-18 Data Modelling Methods: Comparative Review Abstraction is strongly supported by NIAM and ACM/PCM with formal procedures for decomposition.

ER is considerably more limited in this respect despite being based on a top-down development strategy. KENT as a bottom-up synthesis approach does not utilise abstraction although it could easily be incorporated into the analysis phase.

Documentation considerations were based on the major references for each method. The evaluation of the documentation is biased by the fact that for NIAM and ACM/PCM the papers were part of conference proceedings restricted by length. In addition the variation in comprehensiveness between the method objectives is an important factor. With these factors considered, NIAM and KENT appear to be best documented. ER is not presented as a detailed procedure and lacks a logical foundation.

ACM/PCM is orientated towards theoretical justification of the concepts with less detail on the practical approach. Documentation of the object system is most comprehensive under ACM/PCM and NIAM.

KENT reflecting its lifecycle objectives is less extensive. ER provides minimal documentation.

User orientation is reflected in the representation means and in the simplicity of the constructs sup­

ported. A tradeoff is apparent on simplicity and comprehensiveness. Accordingly, KENT is the

simplest followed by ER, NIAM and then ACM/PCM. The same pattern is evident in the degree of

semantic expressiveness.

With regard to quality control all methods provide for iteration during the development and specifi­

cation phases. Formal validation techniques are provided for ACM/PCM and NIAM with the latter

being best served. KENT could be easily expanded to incorporate a validation procedure.

Data Modelling Methods: Comparative Review 9-19 CHAPTER 10

UNIVERSITY OF NEW SOUTH WALES

10.1 Objectives

The major objective of this case study was to examine the use and development of data modelling within a teaching and research environment and to ascertain the suitability of binary data modelling

as a technique for the communication of conceptual modelling concepts. The chosen environment,

the University of New South Wales, allowed an almost exclusive focus to be directed on the metrics

of communication and ease of student understanding (learning). These metrics corresponding to

two of the important criterium with which a new technique is evaluated for inclusion in the teaching

program. Being an academic institution it also provided the opportunity to examine these factors free

of the financial pressures and with reduced technical and time pressures usually associated with the

corporate systems development environment. Nevertheless, by gaining an insight into these issues

some indication was given of the ease/difficultly with which data modelling procedures and techniques

could be changed in the business environment and the associated training that would be required.

To pursue these objectives the case study examined the processes which led to the introduction of

binary data modelling and then highlighted the extent to which the changes in data modelling methods

impacted student learning. In the conclusion some feedback on the theory was provided based on

the metrics outlined in chapter 9.

10.2 Research Method

The research methods employed in this case study were based on direct observation and interviews.

Direct observation stemmed from the authors involvement with database systems at the University

of New South Wales over a five year period. This commenced as a student in the database subject,

'Advanced File Design' in 1982. From 1984 through 1986 the author was heavily involved in the

tutorial workload of the restructured course 'Database Systems.' Section 10.4 of the case study which

University of New South Wales 10-1 describes the subjects is derived from the author's experiences. Documents relevant to this section

(course outlines, major assignments etc.) have been included in the appendices.

Section 10.5 is based on interviews with lecturing and tutorial staff. All lecturers associated with the subject during the period 1984 through 1986, and the major tutorial staff are represented. The com­ ments of several students (1986 class) have been included. The interviews were conducted using an asking strategy employing open questions. Comments were sought on the advantages, disadvantages and teaching utility of data modelling with special emphasis on binary data modelling. Free comment was encouraged.

Limitations of this case study include the biases introduced through interview selection and recall. It was attempted to minimise the former by approaching all staff involved with the database curricu­

lum. With students this was clearly not viable due to the numbers. The projects themselves are not

representative of corporate systems neither in size nor technical complexity an important consider­

ation to be made before generalisation. Furthermore the students may not be representative of the

typical information systems employee. This effect was somewhat countered by the graduate students

involved. Bearing these restrictions in mind the focus on communication and learning nevertheless

allowed some useful results to be obtained.

10.3 Environment

The Department of Information Systems, University of New South Wales, falls under the adminis­

tration of the School of Accountancy within the Faculty of Commerce. It offers both graduate and

undergraduate, pass, and honours course majors. For the purposes of this case study the focus will be

on the undergraduate degree, however a small amount of material is taken from the graduate program.

The undergraduate subject, Database Systems 14.608, has been taught in the Department of Infor­

mation Systems since 1983. Prior to this the subject was known as Advanced File Design. Database

Systems forms part of an Information Systems major as a first session third year (full time) subject.

Pre-requisite subjects are Computer Information Systems 1, and Computer Information Systems 2 or,

10-2 University of New South Wales Management Information Systems Design. Current subject descriptions for each of these are con­ tained in the appendices. At the graduate level the subject Data Management 14.992G was offered for the first time in 1986.

Teaching in Database Systems and Data Management has been structured around a 14 week session with a two hour lecture and one hour tutorial. Assessment has typically been split between course work and a final examination. Course work has varied according to the resources available but has centered on exercises with microcomputer database management packages and on conceptual file/database

design.

In 1984 the lecture content was split into a Systems and Technology stream. Michael Lawrence was

responsible for Systems and Robert Edmundson for the Technology stream. Tutor in charge was Paul

Groves. In 1985 Ross Jeffery assumed the responsibility for the Systems stream. Robert Edmundson

continued teaching the Technology stream and Patrick Thng joined Paul Groves to share the tutorial workload. In 1986 Ross Jeffery lectured a restructured course in which the Systems and Technology

streams were merged. Paul Groves and Chris Johnson tutored.

From 1984, enrollments have been stable in the 80-100 student range for Database Systems. Aprox­

imately 30 students were enrolled in the graduate subject. Tutorial sizes have been held within the

range of 15-18 students.

Practical exercises which were designed to complement the theoretical components of the course

required that a selection of database management systems software should be available for student

use. With the majority of the University's computer power concentrated in centralised mini- computers

it appeared sensible to support mini-computer database packages. Unfortunately on these machines,

availability of suitable DBMS software and the associated cost of the packages suggested that a different

strategy would be necessary. Accordingly, micro computer support was provided in the form of

Datamax CP/M machines. A network DBMS, MDBS 1 was purchased. In 1984 this package and the

relational package Dbase II were used for major assignments.

University of New South Wales 10-3 Increasing student numbers and a continuing heavy price bias towards micro computer hardware and software saw an IBM PC laboratory established in late 1984. Availability problems with an educational version of the network DBMS, MOBS III resulted in Dbase II being the only package available in 1985.

By second session of 1985, it was evident that MOBS III would be available in an educational version

for an IBM environment for its use in 1986. Major assignments in 1986 were once again conducted in both relational and network packages.

10.4 Database Systems Development

Until 1984 data modelling had assumed a low profile in data base courses. It had not been taught by reference to a single structured methodology but had drawn on concepts of entity relationship

modelling and normalisation theory. These had been used to support what remained largely an in­

tuitive design approach. Consequently, design exercises resulted in considerable difficulties being

experienced by students, who, in the majority of cases had only minimal previous exposure to pro­

gramming concepts and even less to practical information system design. In retrospect, understanding

of normalisation theory and entity relationship modelling appeared to have been more sucessful for

students with previous systems exposure. It was not unl!sual for students to complete the data base

systems course without the benefit of having worked with a structured design methodology.

Design skills which developed during the course were largely the result of the practical exercises set

in the relational package Dbase II and in the network database package MOBS I. For the majority

of assignments a logical design exercise preceded the practical element. This provided meaningful

feedback on the implications of design choices because poor logical design would be expected to

cause problems to the student in the implementation phase.

The understanding of normalisation theory was boosted with the availability of a relational package.

This was because of the ease with which a normalised conceptual design could be expressed at a

physical level. That is, the logical and physical designs were usually equivalent. In comparison,

the implementation of the same logical design with a network database required restatement of the

10-4 University of New South Wales schema. The particular advantage of a relational package then was the ability to clearly demonstrate the difficulties imposed at a physical level by poor logical design.

In summary, design concepts in this period were developed by students mostly from practical experi­ ence with the exercises and only supplemented by the teaching of normalisation theory. The intuitive top down approach dominated design exercises. The major problem with this being internalisation of the design task. There was no pre-defined method or visible decision process. This was highlighted by the absence of documentation at the completion of the logical design process. The major design effort was then shifted to the physical level.

10.4.1 Database Systems - 1984

During late 1983 a draft copy of William Kent's work on binary modelling was brought to the attention

of Michael Lawrence. After a review by database staff the technique was adopted with considerable

enthusiasm as the standard data modelling method. It was taught for the first time in 1984.

Three design exercises were set using the method. The first of these involved purely a modelling

exercise in which the basic data to be modelled, 'the facts' using Kent terminology, were provided.

The exercise involved a small medical records system with the design to be completed as a 1 week

tutorial exercise. The draft Kent paper was referenced but students knowledege of the method was

otherwise restricted to the lecture examples.

In the following tutorials it was evident that understanding of the concepts of participation, key iden­

tification and merging was not good. The design application had been simple enough that records

could be designed from 'inspection' without undue difficulty. Hence some students had followed a

top-down design strategy then documented a binary bottom-up approach. This avoided the issue of

coming to grips with the Kent method.

The second design exercise involved a textbook case study in which variable length record designs

were provided for an application with a hierarchical inter-record structure. Several extra fields were

to be added to those already present. This was a somewhat more difficult exercise in which the

University of New South Wales 10-5 maxcimum and minimum participations of the binary relationships required careful consideration.

Some student difficulties continued in this area and the resulting designs were often not in third normal form.

The final exercise was both an analysis and design exercise in which a video hiring application was briefly described. In tutorial discussion directions were given as to the required detail of the design and on techniques to resolve modelling problems. A large percentage of the time taken to complete the exercise was required in the fact specification phase. This was accompanied by considerable class discussion. With the concept of participation being addressed more carefully, results improved. How­ ever problems still remained with the final designs due to the incorrect merging of pseudo records.

The design ouput of the final exercise was to be used as the basis for a network data model in

MDBS I. Consequently, the more subtle problems with student designs were allowed to go unchecked in the hope that implementation of the design would alert students to the difficulties arising from unintentional, unnormalised designs. This was unfortunately, only a partial success. The MDBS I exercise concentrated on database loading and enquiry operations. Maintenance, that is, change and delete transactions were not part of the exercise and this allowed a number of logical level design problems to go unnoticed by students.

Nevertheless, improved designs soon replaced poor designs as the exercise continued and the aware­ ness of design implications grew. By the completion of the exercise many implementation designs had converged forcing a change to the logical level designs.

In summary, at the end of session it was evident that students had gained considerably in design skills and the majority possessed a good appreciation of Kents method. However, difficulty had been shown in understanding the concepts on which the method was based. This resulted in problems in those areas of the method which allowed considerable freedom, or in which the method was incomplete. It was found that the majority of design issues could be handled simply by the method but some situations still required design insight. This was often beyond the experience of students

10-6 University of New South Wales who, when resorting to intuitive design made normalisation errors. On the whole, results showed an improvement over previous years mostly because a design model was available with which to structure

the design task.

10.4.2 Database Systems - 1985

Based on the experience from teaching Kent in 1984 it was felt that a summary of the paper, "Fact

Based Data Analysis and Design" would assist students comprehension of the method and its objec­ tives. Consequently, a six page overview with example was prepared for student distribution. The

overview concentrated on the essential phases of the method and provided rules, but avoided de­

tail and discussion of problem areas. Students were strongly recommended to obtain a copy of the

original paper. As a result of this approach, students understanding of the binary modelling process

developed much faster than in the previous year.

For the major data modelling assignment it was decided that design and analysis should both play a

large part in the project. A one page description of a rock music promotions system was distributed.

This defined minimum process requirements but allowed considerable flexibility as to the compre­

hensiveness of the design. Project deliverables were matched to the phases outlined in the overview

paper.

Phase 1, specification of the facts, is in essence an analysis task. Following the pattern of the previous

year, tutorial discussion of the 'facts', as represented by binary relations, was intensive. Sufficient

ambiguity as to the scope of the system led to a variety of approaches differing as to the level of

detail and functionality. Whenever possible students were encouraged to see alternative views of the

problem but generally little prompting was needed. As few restrictions as possible were placed on

the system scope.

Communication between students and between tutors and students regarding the application was

generally good. The method allowed students at this point, to focus entirely on the application

without the distraction of working with the syntax and conventions of a formal data model. The

University of New South Wales 10-7 primary concept required for fact specification, binary relations was one readily accepted by students because of its simplicity.

As a proportion of total project effort the analysis exercise (phase 1) was relatively large. It was evident at the completion of the phase that the application was well understood by most students. Concern as to the scope of the system had been raised by several students, but in general, problems had been minimal. However, what was generally not appreciated after completing this phase was the importance of establishing the facts to represent reality. Problems in later phases would be traced to incorrect or incomplete fact specification. In order to provide a uniform problem statement for the subsequent modelling phases the scope was defined in detail following the completion of phase 1.

A large share of the available tutorial time continued to be devoted to the conceptual design as it entered the second phase of specifying fact participations. Once again the work was analysis orien­ tated because establishing participations required insights into the application and not (at least on the first pass) into design issues. Whilst the concept of participation was now familiar to many stu­ dents, difficulties were encountered understanding the implications of binary relations involving a least participation of zero.

Identification of keys in phase 3, required little tutorial time. The rules for key identification had been stated precisely in the overview documentation and with some theoretical justification were quickly accepted and understood by the students. The main difficulty, though not a serious one, lay in redefining the concept of a key for those students who tried to equate the Kent concept of a key with their practical experience using indexed files (where duplicates may be acceptable). There was also a popular misconception that the key to a pseudo record needed to be practical. That is, students resisted the notion of composite keys involving the whole pseudo record. For the majority of students this phase progressed quickly due to the contraints imposed on selecting candidate keys (determined by the participations established in the previous phase).

10-8 University of New South Wales The fourth phase, merging of pseudo records met with some real, and some imagined difficulties.

The real difficulties stemmed from evaluating a merge in which alternative keys were available. No

clear cut rule could be used at this point. The solution lay in considering the application requirements and/or using design intuition. For example consider the following pseudo record :

Figure 2: Candidate keys

---- Manages Both department number I I and employee nwnber are Department No. Employee No. candidate keys.

1 * 1 0 * 1

Given the following Department and Employee records, it is possible to merge the pseudo record on

department no. of the Department record or to merge on employee no. of the Employee record.

With the later merge the department no. field must be able to handle nulls for those employees who

are not managers. Figure 3: Pseudo record merges

--- Department I I Department No. Dept. Name Dept. Location

--- Employee I I Employee No. Emp. Name Emp. Salary

Imagined difficulties related to 'fragmented' designs. After an initial pass through the method it

was common for a significant number of pseudo records to remain unmerged. This was a direct

result of assumptions made concerning participations. Usually students avoided making restrictive

assumptions in determining participations. Hence maximum participations of 'N' were common with

composite keys resulting. This meant many pseudo records could not be merged. With iterative de­

velopment however, these assumptions could be gradually modified, reducing flexibility but allowing

further merging of pseudo records. The objective here was not to assume away problems by changing

University of New South Wales 10-9 participations, but to make the link between the degree of flexibility assumed and the resultant record design clear.

After the merging process the designs should have been normalised at least to degree 3 providing the technique had been followed accurately and facts carefully specified. Accordingly the students were

encouraged to check the designs using normalisation theory. Anomolies which arose were ususally traced to fact specification errors. By this, it is meant that differences existed between the students perception of reality being modelled and how the fact was actually stated. This problem usually arises when the fact has been specified in general terms but on assignment of representation (filling in the

detail) a 'new' fact is created which is different to the original intended fact.

The final phase in the technique, consideration of alternative designs was handled with difficulty by

students. Having arrived at a design students were reluctant to review facts and participations which

would have generated alternatives. The idea of iterative design and progressive refinement which is

central to the technique was not widely appreciated. Students tended to be locked into a single view

of the system.

At the conclusion of the design exercise it was felt that students grasp of the issues in data modelling

was much improved over previous years and that understanding of normalisation and its implications

had been greatly improved.

10.4.3 Database Systems - 1986

Encouraged by the sucess of the previous year data modelling once again assumed a high profile in the

course outline. The early weeks of lectures concentrated on data concepts and characteristics. Kent

was introduced in week 5 of session accompanied by the six page overview documentation used in the

previous year. Practical work was assigned in Dbase II and MDBS III. The major conceptual design

followed a different approach to that of the previous year. A full case study description of a production

and marketting system was provided which ran to 25 pages. The objective was to reduce uncertainty

in the analysis phase so that the major effort would be concentrated on design and understanding of

10-10 University of New South Wales the method. By providing a detailed case study the expectation was that variation between designs would be minimal due to the reduction in uncertainty.

In phases 1 and 2 consequently, tutorial discussion was brief. It was necessary to point out that the design should cater for a base set of data from which all reports could be produced. Any fields which could be derived were to be excluded. Few questions were raised however regarding the case study material itself. As concerned participations, the theory was covered quickly with illustrations drawn from case study facts to support.

Completion of phases 1 and 2 represented a design deliverable. Compared to the previous year these two phases had occupied roughly 50% less student time and 60-70% less tutorial time. The provision

of a detailed case study material appeared to have achieved its objective in significantly reducing the

analysis effort.

Phase 3, key specification produced similar problems as in previous years. The task was completed

easily by those students who followed the method rules without question, and by those students who

understood the strict definitiori of a key. Students who endeavoured to use intuition alone, invariably

had problems. Once the concept had been clarified in the tutorial the problem vanished.

Merging brought with it the usual concerns regarding record fragmentation. Some students under­

stood the distinction between logical and physical design phases and were happy to consider changes

in representation or participation assumptions in order to produce an implemention design. Many

did not. This was shown clearly by the difficulties experienced in the final phase, consideration of

alternatives. The link between fact specification, participation, representation and the final design

did not seem to be clear. Most alternatives involved tinkering with the merging process. As with

the previous year, the concept of iterative design and progressive refinement was not widely used or

appreciated.

Normalisation checking was conducted before the final deliverable so as to verify adherence to the

method, and to provide a check on the correct statement of facts. This also served to demonstrate

University of New South Wales 10-11 to students the equivalence of top down modelling through decomposition, with bottom up, binary modelling.

In conclusion, it seemed that students appreciation of data modelling concepts was good by the time the design task was complete. In comparison with the previous year fewer problems had been evident but there was some feeling that students exposure to problem areas in Kent had been reduced through the provision of a detailed case study. Owing to reduced uncertainty, less class discussion had been generated and consequently fewer alternatives considered.

10.5 Interview Plan

Each of the lecturers, and one of the tutors involved in Database Systems since 1984, were asked to discuss their feelings towards the use of the Kent method as a tool for teaching data modelling.

Discussion ranged from its ease of teaching to the level of student comprehension. It was hoped a consensus on the strengths and weaknesses of the method could be identified. General comments on individual experiences were sought.

Naturally, to gain a better appreciation of the impact of the method, several students from Database

Systems and Data Management were asked for comments. Responses were sought regarding their un­ derstanding of data modelling concepts and the contribution that the Kent method had made towards this. Comments on ease of use, and confidence, with the method were also sought. In all interviews an open question strategy was adopted.

10.5.1 Lecturers

Robert Edmundson introduced the Kent method in lectures in 1984. He was asked for comments on teaching with the method, on students understanding and usage of the method, general observations and specific strengths and weaknesses. With respect to teaching he made the following observations:

The Kent method was an easy and natural way of thinking about data which lent itself to easy il­ lustration. It was a method which did not require design intuition or prior systems exposure and

10-12 University of New South Wales was therefore appropriate for a student or user environment. Accordingly it was feasible to intro­

duce data analysis using this method at an earlier stage of an Information Systems major, probably

from Computer Information Systems 2. Experience in the first session of its use had shown that

for teaching purposes it was beneficial to place a different emphasis on the various phases than had been suggested in the original Kent paper. Pseudo record participation and fact specification were areas requiring more attention, whilst fact participation, included in phase 1, was less important as its

relevence seemed unclear.

Regarding students understanding and experiences with Kent:

Some students had difficulty with the method but less difficulty than with alternative methods (Entity­

Relationship) because Kent did not require students to differentiate between an entity an attribute

and a relationship. The Kent method was able to take students from base level data through to

a record design (bottom up) via a highly visible well documented path. This provided improved

understanding of the application and an insight into the problems of data modelling as evidenced

by increased awareness of normalisation principles and in the quality and insight shown in student

questions.

On the strengths of the method:

Ease of understanding binary concepts. No prior experience required. When used in conjunction

with normalisation the method provided a powerful tool for analysis equal to its primary function as

a design tool. Due to its bottom up approach data modelling with Kent requires more effort than a

top down strategy which is beneficial due to the thoroughness of the analysis.

On problems with the method:

Handling of n-ary relations requires greater attention in the documentation. No clear guidelines are

provided.

University of New South Wales 10-13 Michael Lawrence was the first staff member of the department to be introduced to the method and co­ lectured in Database Systems when it was first taught. On teaching with the method he commented that it was straight forward to explain the concepts but that initially for a student it might well be

difficult to understand, although no more so than alternative methods. He believed students to be more involved in data analysis than previously and that Kent's method, by following an incremental

design strategy had de-mystified the design process. The use of a well defined method and uniform theory, relational, was of considerable benefit to students. Intuitively, he felt that students appreciation

of normalisation and of data modelling problems had been improved through working with the Kent

method.

On the strengths of the technique:

The method had sought to minimise the intellectual difficulties of modelling data through the use of a

single construct, the binary fact. In addition the development process was self documenting thereby

providing the capability of traceing all aspects of the design process from fact specification through to

the record level. This was facilitated by the virtue of a 'fact catalogue' with natural language descrip­

tions produced as a product of phase 1. A comforting feeling was the ability to resolve modelling

problems or explain design errors by stepping through the detail of the method (whether this was

fact specification, participation or merging phase). It was felt that the Kent method was substantially

better than entity relationship modelling for the promotion of group discussion, particularly in the

analysis phase.

On the weaknesses:

'Correct' fact specification was seen as the basis of the methods success. If this was not done carefully

then bad designs were likely to result. This was seen to necessitate the use of normalisation, as a

check on the final design.

Ross Jeffery had taught entity relationship modelling and NIAM prior to teaching Kent in 1986. He

regarded Kent's method as very easy to teach but saw no particular problems with the alternative

10-14 University of New South Wales methods. Before introducing the Kent method to students, concepts of entities, relationships, and attributes were taught first. Whilst is was not necessary to define entities and attributes in order to use the Kent method (due to its bottom up orientation) it was felt that these concepts helped the modeller to perceive the structure of the problem. This first view of the problem was believed to be critical to the success of all methods. It was argued that for modelling exercises a distinction should be made between the analysis and design components. For a design exercise a detailed case study should be provided so as to minimise the analysis effort.

On the strengths:

The Kent method was easy to grasp and allowed many decisions which were- not relevant at the

conceptual design level be deferred to later phases of the design process. An example of this was the

representation issue.

On the weaknesses:

As a bottom up approach the method did not allow good perception of the facts unless it was sup­

plemented by an overview of the problem. As a list based binary modelling method it was seen to

be at a disadvantage when used as a communications tool compared to graphical binary modelling

methods such as NIAM.

10.5.2 Tutors

Jamie Crowley tutored in the graduate subject Data Management in 1986. Prior to this he had no

experience with the teaching of binary modelling. Kent was used for several of the design exercises but

was not compulsory (students could select a preferred method although only Kent was supported). For

students with design experience the Kent method was felt to be long winded. These students preferred

entity relationship modelling combined with normalisation. Less experienced students found the

method to be supportive because it provided a framework for analysis and effectively illustrated the

concepts of data modelling. It was claimed that difficulties had been experienced in defining the facts

for a given application and that the method provided no support or guidelines for this activity. In

University of New South Wales 10-15 addition, non binary relations were a source of confusion. As a communications tool the diagrammatic representation of NIAM was preferred.

10.5.3 Students

A number of students having completed Database Systems in 1986 were asked of their experiences with data modelling and the Kent method. Hock-Seang Khaw had been introduced to entity relation­ ship modelling in Computer Information Systems 2 and used this as a benchmark for the evaluation of the Kent method. He believed that the theory was easy to understand and quite simple to learn.

However entity relationship concepts had been easier to apply in practice. After having completed the

conceptual design assignment it was felt that the method had helped in understanding normalisation.

He was confident with the method and would use it for future design problems. The major problem

he had encountered was the initial understanding of normalisation theory.

David Liebsman felt that substantial problems had existed in learning the Kent method. He had not

appreciated the reasons for the various phases of the method or where it was going, feeling that an

overview of the method had been lacking. Despite this, generation of pseudo keys and merging had - - not been difficult. He believed that his understanding of normalisation was good and that the method

had assisted in that respect.

Szue-Shang Chai felt that Kents method was not difficult to understand but that it had been mostly

self learnt with little recollection from lectures. The distributed six page overview had been very

useful and was almost of as much benefit as the full Kent paper. Normalisation concepts were well

understood. A major advantage of the method lay with its systematic step driven approach.

Julian Terry enrolled as a masters student in Data Management had not previously been exposed

to data modelling. He felt that Kents method was a 'common sense' approach which allowed the

concepts of data modelling to be quickly grasped. The method fitted naturally with relational theory

and complemented normalisation concepts. A major attraction was the ability to undertake detailed

application analysis which when completed produced a complete logical model of the data. In his

experience of the design exercises it had not been necessa1y to have an understanding of entitiy,

10-16 University of New South Wales attribute, or relationship concepts in order to work with the method. Whilst acknowledging the basic bottom-up orientation of binary modelling it was possible to use 'high level' facts (deferred repre­ sentation) as a means of taking a top-down view of the application. These high level facts could subsequently be decomposed after a first pass at an overview level. Problems encountered with the method were in the area of n-ary relations. A clear and systemmatic approach was not apparent from the documentation. It was felt that some systems experience would have been helpful at this point.

10.6 Conclusion

The Department of Information Systems at the University of New South Wales is a teaching and re­ search body in which major activities are undertaken in the area of database and information systems

design. The case study represents a longitudinal analysis of the database curriculum spanning five years in total. It draws on anecdotal material from a variety of students (Masters and Undergraduates), a variety of lecturers and tutors, and a selection of projects encompassing both design and implemen­

tation phases and several database management systems architectures. In such an environment the

major opportunity was to examine the issue of communication and data modelling through the metrics

of 'representation and communicability' and 'ease of learning'. Naturally some light was cast on the

other metrics used in this paper but these findings should not be over-emphasised nor generalised to

other environments due to the attypical nature of the projects (small and simplified) and purpose of

the projects (pedagogic).

During the course of this case study the role of data modelling in information systems design courses

was extended to the point where it represented the foundation of systems and database design.

Accordingly, with a mission to provide students with state of the art design methodologies and tech­

niques, continuous investigation, development and emphasis was placed on this area. In the case

study it was seen that students had been exposed to a variety of modelling techniques including ER,

NIAM and KENT. The introduction of binary data modelling as represented by NIAM and KENT was

however the major event marking the increased importance (and success) of data modelling within

the teaching program.

University of New South Wales 10-17 An important achievement observed during the course of the case study was the realisation (by staff and students) that a data model represented a 'chosen' reality and that it was therefore critical for the data modelling technique to make the design process traceable and all assumptions explicit. This required that support for documentation and support for analyst/user (student/tutor) communication be strong. Accordingly, a graphical basis of representation was seen as an important means of achieving this. Modelling with Kent was seen to have provided communication support through the logical construct of the binary relation but to have lacked in the model representation domain. This was often not critical due to the project size but for large complex systems could be a serious disadvantage. As such early enthusiasm that Kent was the binary modelling method was replaced by the understanding that a complimentary top-down approach might also be beneficial.

As expected the phases supported by KENT were limited to analysis and design of the data model.

What was somewhat unexpected however was the strength of that support in the analysis phase. This was believed to have been directly related to its user orientated nature and support of user/analyst

communications. Modelling discussions in a group environment involving KENT had been much

more lively than those involving ER although some differences could be accounted for due to project

variance and tutorial group variance. Quasi-experimental controls could be applied to investigate this

issue further.

As mentioned representation and communication were good using the KENT technique. This was

also in line with theoretical projections. What was not present in the case study (as in the original

paper) was a large or complex application against which 'real world' performance could be measured.

Some doubts exist as to whether representation and communication would be satisfactory in these

types of projects particularly with the low level of abstraction support provided in the method. For

small, simple systems the results were very satisfactory.

Documentation of the method was found to be less than satisfactory with students finding logical

holes in the theory. This resulted in extended tutorial discussions in several areas. Documentation

of the design was found to be extremely well supported when students had observed the method

10-18 University of New South Wales procedures. As expected the method was found to be highly user orientated when compared with

the approaches of NIAM or ER and students were observed to pick up the major concepts quickly.

Advantages in this area were offset by the inability to provide more than superficial semantic support.

For this reason students were often encouraged to include textual explanations of their designs.

Quality control measures were not part of the KENT method but were incorporated into the projects via normalisation and through group discussion. The necessity of doing this was anticipated from the

theory.

University of New South Wales 10-19 CHAPTER 11

AUSTRALIAN MUTUAL PROVIDENT

11.1 Objectives

This case study investigates the use and evolution of data modelling in the corporate environment.

Against a background of hardware and software changes it examines the history and development of data modelling, and the forces driving its' development. An evaluation is made of the current status of data modelling within the organisation and of the degree of success achieved through its' implementation. The major objeftive of this is to provide feedback for theory development based on the experience of applying binary data modelling to large, complex, corporate projects. Naturally of interest in the broader world of information systems management is observation of the resultant changes in information systems development procedures and metrics. For this case, only qualitative research was conducted, however in future research the measurement of changes in the metrics of

quality and productivity (for example) would be of major interest.

Australian Mutual Provident (A.M.P.) was selected as an ideal candidate for these purposes having long

been associated with database technology in financial systems applications. The company represents

a sophisticated user and developer of large commercial information systems, one which is constantly

adapting to utilise new technologies and new methodologies to meet its business objectives.

In order to provide an environmental context the following related issues were also investigated :

• details of the physical environment ie. hardware, software, applications and personnel • history of database usage including investigation of the strength and weaknesses of their approach • major issues or problems associated with database technology in general application, or with its

implementation

• trends and future directions in systems technology

Australian Mutual Provident 11-1 11.2 Research Method

Due to the descriptive nature of this research a case study approach has been followed. Data was gathered in interviews via an 'asking' strategy. So as to encourage free comment by A.M.P. staff magnetic media was not used to record these sessions. Despite the obtrusive nature of this approach and the inherent limitations of an asking strategy it is believed that the data gathered represents an accurate description of the environment and of the techniques utilised in the systems analysis function.

Unfortunately, it was not possible to obtain copies of standards, documentation or project material because of a corporate restricted disclosure policy.

The initial contact at A.M.P. was made through Daryl Dobe, the Application Support Services manager

(see appendix D).It was anticipated that after providing an overview of A.M.P. operations Daryl would be able to identify further contacts in the system development groups. Based on the first interview,

and with the aid of a data processing organisational chart it was possible to arrange the following

interviews :

• Brian Donelly - Manager (Assistant) Systems Engineering

• David Nash - Manager User Support Services

The material gathered in the first of these interviews related primarily to hardware and operations

details as would be expected for a section responsible for capacity planning, systems performance

monitoring, and systems engineering. Due to time constraints and corporate security restrictions it

was not possible to conduct an in-depth analysis of these aspects. Despite this, sufficient information

was collected to place subsequent interviews in the right 'environmental' context.

As manager resposible for user computing, user support (technical) and data administration, David

Nash was able to provide an overview of the systems analysis and data analysis methods employed.

Whilst adopting a mostly supervisory role at the first interview he was able to introduce two systems

analysts in the Systems Engineering section from whom much of the detailed material on data mod­

elling was obtained. A follow-up interview was organised with one of these analysts, Mark McMillan.

11-2 Australian Mutual Provident In the final interview, Mark in conjunction with David Nash organised contact with a 'user', a former

N.S.W branch manager seconded to data administration.

Through these interviews a cross section of data modelling from a management, analyst and user perspective was provided. Time and access restrictions unfortunately prevented a wider sample.

Selection bias (towards pro data modelling analysts and users) could not be controlled for, although no evidence for the existence of such bias was found. This case study also represents a one-shot study. Only one project employing binary data modelling was made available for review. With time however, new projects will be completed thereby offering the possibility of a multi-case longitudinal analysis. As a consequence of these restrictions it is apropriate to regard the nature of this case study as essentially explorative.

11.3 Environment 11.3.1 Hardware

As far as Australian organisations are concerned A.M.P. has a long history of computer involvement

extending back to the 1960's. In the overview provided by Daryl Dobe it was apparent that apart

from a period in the mid to late seventies, when UNIVAC equipment was used, A.M.P. had been

an IBM 'shop'. IBM's 360 series mainframes were used from the late 60's until the early 1970's. A

UNIVAC system was in place until 1979, when, after what was described as a diaster due to software

and hardware unreliability, IBM again won the hardware tender. As with many large corporations

with a history of early computer involvement A.M.P. had, and continues to work with, a centralised

data processing system. Current thinking is tending towards the use of distributed 'data centres' but

overall with a centralised structure prevailing. It was not seen to be 'economically viable' in the words

of Brian Donnelly, to move into a distributed processing, networked environment at least, in the short

to medium term.

Based on' a corporate straegy which recognised the strategic role information systems could play in

lifting corporate performance, A.M.P. began in 1979 to invest heavily in the provision of computer

resources. This expansion was accompanied by an increase in workload (measured by transaction

Australian Mutual Provident 11-3 throughput) that averaged 45% per annum over a seven year period. This rate of growth, but now from a much higher base, was still believed to be growing in the region of 25% per annum. Over the same period online storage utilisation had grown from 6-7 gigabytes in 1979 to an impressive

215 gigabytes in 1986. A significant portion of this growth was attributed to IMS based systems.

Not surprisingly, this growth had generated a demand for CPU resources which far exceeded the performance improvements of a single mainframe. The response, was to move to a multi-mainframe configuration represented by three of IBM's largest machines.

In an environment undergoing such rapid growth, capacity planning has become an essential activity.

At A.M.P. this is performed by the Information Systems Strategic Planning Group who undertake a system and user review to establish future requirements. Almost 250 user applications are reviewed annually to provide estimates of terminal usage, IMS and TSO transaction frequency and batch versus interactive usage. Amalgamated, these form the basis of a two and a half year projection of resource requirements.

11.3.2 Software History

Involvement in database management systems software began with the UNIVAC machine in the early seventies. DMS-1100 a CODASYL network database was introduced. A combination of factors, unreliable hardware, unreliable software and poor design knowledge made this experience a disaster.

Due to a high priority attached to application efficiency the 'database' design consisted of a single record. This had inevitably produced maintenance problems and forced extensive application rewrites whenever the data structure changed.

Accompanying the return to the IBM hardware world in 1979 was the hierarchical DBMS package IMS.

This provided the opportunity for a redesign of the database. Anxious to avoid a repeat of the data modelling problems that had been experienced with DMS-1100, entity relationship (ER) modelling and normalisation theory were specified as mandatory design techniques.

As part of the push towards end-user computing, RAMIS, a fourth generation programming lan­ guage/tool was introduced in 1982. The objective was to utilise it within the Information Systems

11-4 Australian Mutual Provident group to enhance application development, and outside the group as a user tool to assist in reducing the IS project backlog.

Productivity advantages were realised, to the extent that the package investment was repaid within nine months. Usage as an end-user tool was limited however because of concurrency and integrity problems which were experienced with RAMIS.

11.3.3 Software Current

The three IBM mainframes run under the MVS XA operating system. TSO and VT AM are used for support of online processing and RACF is used to manage system security. Transaction volumes related to IMS are of the order of 105-135,000 per day and in the vicinity of 500,000 per day for TSO transactions. It was estimated that up to 60% of the TSO transactions reflected applications under

development.

Using the Service Level Reporter facility, throughput is monitored in an endeavour to meet service

level objectives relating to availability and response time. Stated policy was to service a TSO transac­

tion in less than 0.25 second and to process a 'simple' transaction in less than 4 seconds for Australia,

and in less than 5 seconds for New Zealand. It was believed that these objectives were being achieved

at least 90% of the time, howe_ver it was readily acknowledged that the definition of a 'simple' trans­

action left considerable room for manipulation of the statistics.

In an analysis of information systems and strategic business objectives (functionally the responsibility

of the long range planning group) it was established that A.M.P. would need the ability to integrate

(at a systems level) functional business areas. Life Insurance, and Fire and General Insurance for

example, had developed as separate business areas and whilst both used IMS based applications

they maintained independent but isolated databases. A query, for example "What types of insurance

does customer X have with .us?" could not be answered without initiating a separate enquiry on

each functional business unit (database). This had not been regarded as a disadvantage until it was

established (through strategic planning) that the Insurance industry would move towards client based

insurance packaging as opposed to product based packaging. In order to provide client 'bundling'

Australian Mutual Provident 11-5 of insurance it was then evident that a restructuring and integration of the underlying information systems would be necessary. Unfortunately, with IMS databases this post-hoe integration presented significant technical challenges. When this was combined with the minimal flexibility expected from such integration (and the rapid changes in insurance marketting and consequently the information systems) it was determined that another alternative was needed. Relational database was seen as a solution.

Largely in a response to those integration requirements IBM's relational package DB2, was placed under implementation review. Results from this review showed that DB2 would become a critical systems tool in the development of strategic information systems. Expectations were that most new applications would be implElinented under it with the exception of time critical transaction processing systems. Where they exceeded DB2 performance limits such applications would continue to be im­ plemented under IMS. [Due to the absence of conversion plans for current IMS applications the role

of IMS was anticipated to remain dominant in the short to medium term].

With respect to the performance impact it was anticipated that a DB2 implementation would require

25% more CPU cycles than an equivalent IMS implementation. [This figure corresponds to that quoted

by IBM for release two of the package]. The impact on online disk storage had not been quantified

but was also expected to require a substantial increase in resources. Concern was expressed over the

availability of DB2 query functionality at an end user level because it was believed that the related

throughput impact would be immense unless strict controls were enforced.

11.4 Data Modelling

Parallel to developments in the hardware and software environment were changes in the systems

analysis and data modelling methods employed. [Causation appears to flow from the introduction of

new systems software to new analysis procedures]. At the time of the arrival of the UNIVAC machine

in the early seventies, the concept of data modelling, at least at A.M.P. if not universally in commercial

installations, was non-existant. Design of file structures was a 'black art' in which the 'expertise' of the

analyst was the critical factor in successful systems design. Furthermore, machine efficiency rather

11-6 Australian Mutual Provident than design flexibility was an overriding concern. This situation prevailed throughout the life of the

UNIV AC despite the (troubled) introduction of a network DBMS. Data analysis was only recognised as a standard activity upon the return to an IBM environment and the installation of IMS.

In vogue at this time was the Entity Relationship (ER) modelling technique which was adopted as a standard for the systems analysis and design phases. With this technique analysts utilising intuition and 'observation' would select entity categories. From here a normalisation process (Codd) was applied to produce a logical design. Typical comments on the method (with the benefit of hindsight) were that it relied too much on staff expertise. This had lead to 'less than optimal' results in which design errors would often only be identified after implementation. Th~ consequences were inflexible systems and unsatisfied users. A factor believed to have played a large part in the design problems was the gap which existed between users, who understood the application, (and the data with implied semantic

relationships) but little technical detail, and the analyst, who understood the technical aspects but little

of the users business knowledge.

As such the problems experienced at this time were explained as one of communication and not due

to fundamental flaws in the data modelling methods chosen. Applied rigorously, these techniques,

forming the basis of top down modelling, should result in the same data model as a bottom up

approach as represented by binary modelling. This requires the important assumption however, that

designer understanding of the application will not be significantly influenced through choice of these

alternative modelling approaches. [This is equivalent to saying that the fact base should be the same to

produce logically comparable models]. However, there is reason to doubt that this assumption holds in

practice because binary modelling and analysis as employed through NIAM, facilitates communication

in a way not provided by ER modelling and analysis.

ER modelling was used as the major modelling tool until December 1984 when NIAM became the

mandatory data modelling standard. Unlike the introduction of ER modelling NIAM preceded a ma­

jor change in systems software - the arrival of the relational database DB2. This can probably be

attributed to the ad-hoe introduction of NIAM brought about through the efforts of a contract analyst

Australian Mutual Provident 11-7 who had previously worked with Professor Nijssen (the developer of NIAM). A formal search for a new modelling method was never initiated nevertheless, NIAM concepts rapidly found acceptance among Information Systems management and a pilot project was initiated to test it. Based on the suc­ cess of this pilot, a major application was commenced and the systems analysis phase subsequently restructured to incorporate NIAM.

Currently A.M.P. data modelling procedures describe 14 discrete steps on the NIAM method which guide an analyst through data analysis and design. Commencing with a base set of facts describing the information system an induction process is followed resulting in a "syntactic and semantically

expressive" conceptual schema. The major parts of the schema are presented in a graphical form, which are reviewed by users during the design phases. System documentation includes the conceptual grammar, which provides a formal description of the application, and the schematic diagrams. The

following section describes at which points NIAM has been integrated into the systems lifecycle.

11.5 Systems Lifecycle

The systems lifecycle at A.M.P. commences with business systems planning (BSP) conducted by the

strategic planning group. Ultimately this group is concerned with forecasting future business directions

and establishing the role data processing technology will play. The process begins with a review

of business units (up to 5 year projection) noting future requirements and potential products and

applications. A generic, or macro data model is then prepared utilising an ER modelling technique.

Based on this portfolio of potential applications, feasability studies are conducted. Contingent on the

results, and subject to user approval and resource availability, structured analysis then commences.

SDM-70, a lifecycle development package is used to prepare systems requirements documentation

(SRD). The SRD incorporates data flow diagrams and utilises a data dictionary. It is at this stage that

data analysis and data modelling commence resulting in the development of a full logical design. User

involvement in the modelling phases is mandatory.

In the next phase, Systems Design Alternatives (SDA), physical design issues are considered. When

the desired alternative is generated System External Specifications (SES), corresponding to the physical

11-8 Australian Mutual Provident database design, and Systems Internal Specifications (SIS), corresponding to the programming phase,

are prepared. The development process is completed with the implementation specification.

A diagram of the database design process which corresponds to a subset of SES is reproduced in

appendix E. Binary data analysis (NIAM) is represented by the phase 'Application Data Analysis'.

Input for this phase is the generic entity relationship data model of the organisation and business

information structure detail. The business information structure detail has in turn been derived from

the Information Systems Architecture model. It comprises an analysis of the application in terms of batch, online and cyclic transactions. As well as being used for logical data modelling the structure

detail is used to map transaction type usage statistics onto the physical model to determine access

strategies and keys.

As a result of the binary modelling process an Application Data Model in relational form is produced.

This is then rationalised to determine what will be implemented in conformity with the project scope.

A system or Implementation Data Model is thereby produced. When transaction statistics are mapped

to it, a physical data model results. It is at this point that the actual design process departs from that

depicted in the diagram. Since the introduction of binary modelling the need to conduct a second

rationalisation has been obviated. The use of the relational DBMS, DB2 has made the physical data

model directly implementable. The procedure as drawn remains current for IMS implementations.

11.6 Data Modelling Experiences

In the interview with David Nash and two staff analysts an attempt was made to identify the advantages

and disadvantages of binary data modelling. In the latter respect there was little success. Without

reservation, it was felt that binary modelling had been implemented smoothly and had been a success.

There was not seen to be a limit to its use (taken to mean that project size was not a restricting factor)

although, it was conceded that it would not replace the role of ER modelling at a corporate level (i.e.

derivation of a generic data model and strategic modelling). Interestingly, the major advantage was

seen to be that binary modelling 'de-mystifyed' analysis.

Australian Mutual Provident 11-9 Whilst no direct evidence (in the form of project records) could be found to support the statement,

it was thought that the surge in user involvement had been directly related to the introduction of the

simplified data analysis process (NIAM). Analysts were viewed as using a technique to which users

could relate with minimal training. Due to the enhanced communication between these groups the gap was effectively being closed on differences in problem comprehension.

The analysts were of the opinion that the technique was much more rigorous in producing a systems

model than had been possible with 'conventional' techniques. It allowed for early problem diagnosis

and forced evaluation of the 'conventional wisdom' or assumptions. When these assumptions were

carefully considered it had often necessitated management (functional) involvement. This ultimately

lead to tighter specification of user requirements. A significant benefit was seen to be the forward

planning this had forced on management (functional). In addition, owing to the self-documenting

nature of the design process the quality and integrity of the documentation had improved.

On the basis of the major application analysis results, it was felt that users had contributed up to

50% of the analysis effort and, that this could be increased. Countering this effect on DP workload

had been an increase in the effort associated with analysis, perhaps of the order of 100%. It was felt

strongly however, that the real rewards would accrue during implmentation, to user satisfaction, to

reduced maintenance and to an extended system life.

In summary data modelling had achieved :

• a shift in development responsibility to user departments • a shift in the analysis workload from DP to users (although possibly in percentage terms only) • an increase in management involvement and thereby improved management planning • improved documentation quality

• extended system life and reduced lifecycle costs

11-10 Australian Mutual Provident 11.6.1 User experiences

Having established management and analyst perspectives on binary modelling it remained to confirm with users how well binary modelling had been accepted, and of the experiences relating to its use.

For this purpose, an interview was organised by David Nash with 'Steve' an experienced user who had played a key role in the development of a new 'agents commission' system. This was the first large project in which NIAM had been used.

Steve had been seconded to head office from the position of the Departmental Head of the N.S.W.

Commision Branch in October 1983. He was assigned the task of writing a user manual for the then

current commission system. At this time Steve had no prior development experience but had been

selected for the task on the basis of procedural familiarity with the system. After having accomplished this task in two months he was offered the position of user representative on the new commision

system design team.

The agent commission project began with 6 staff. In the final development phases 72 people were

involved. It was anticipated that the system would have a strategic business impact and consequently

the budget was set at a lofty $10 million. The scope was initially very wide; 'build a replacement and

automate new areas of related business'. The result of this was a business model with 'too much'

data. Subsequently, the design was cut back for implementation. It was speculated that this had been

a deliberate strategy.

In May 1984 project managers became involved and the system was split into four with a group of

technical and user staff associated with each sub-system. This initial analysis phase was conducted

with a group of two technical staff and two users. Steve remained firmly in the user camp assisting with

such tasks as report and screen design. The analysis effort was described as '12 months of continuous

meetings' most of which were conducted as brainstorming sessions. Minutes of the meetings were

logged by the technical staff.

Australian Mutual Provident 11-11 At the commencement of the analysis phase a two day training session was conducted for analysts and users unfamiliar with binary modelling. Steve unfortunately missed this initial training and 'went

in cold' to the first analysis meetings. As a result it took a while before he felt familiar with data

modelling concepts. With usage, he found the technique easy to master and a valuable aid in systems

documentation (representation) and user analyst communications. Steve was unable to comment

on the merits of binary modelling relative to other approaches because of his limited exposure to

systems development however the concept had been 'easy to grasp'. Whilst not demonstrating the

same enthusiasm as the analysts he seemed content with the technique and believed that good results

had been achieved;

'Without it the results would not have been as good'. 'We would not have identified the key areas as

early.'

Commenting on the project as a whole he expressed discontent over the power problems of group

interaction. He believed that a group of no greater than 5 be used, 4 being optimal. In his view this

would be best formed by 3 user representatives and a systems analyst.

11. 7 Conclusion

A.M.P. represents a mature organisation in terms of its data processing history and current state.

Its experience with database management systems dates to the mid-seventies and it has continued to

remain abreast of DBMS technology. In conjunction with expansionary policies in hardware acquisition

and in productivity based software tools A.M.P. has focused attention on the data modelling and

requirements analysis issues. From a 'back door' introduction, binary data modelling through NIAM

was adopted as a standard for the systems analysis phase.

As a response to marketting pressures which demanded integration of the business, A.M.P. embraced

relational database technology. This further reinforced the trend towards a 'data focus' in the devel­

opment of information systems and naturally on the data modelling task itself.

11-12 Australian Mutual Provident In the case study the evolution and introduction of this method from a management, analyst and

user perspective was described. Considerable support was found in each of these groups for the

concept and practice of data modelling. Systems analysts found NIAM to be a clear and consistent

(de-mystifyed) method for use in the analysis and requirements specification phases. These phases were emphasised ahead of its usage in the logical design phase notwithstanding that through iteration

the distinction between phases blurs somewhat. Nevertheless, NIAM's claim as a method which

supports the first three phases of the information systems lifecycle was clearly supported.

Enhanced user analyst communication, with the ability to fully involve users, and good systems repre­

sentation tools were seen as fundamental to the success of NIAM. These advantages are in line with

predictions from the theory which touts graphical notation and the simplicity of the binary construct.

Users verified the ease of learning and contrasted their extensive involvement in the development pro­

cess after NIAM's introduction to their minimal involvement under the traditional analysis approach

of which ER modelling had been a part.

From the belief (expressed by analysts and systems management) that NIAM specifications were better

developed and would result in reduced maintainence and lifecycle costs, indirect support was found

for system quality, as measured through design convergence, consistency and completeness (section

9.1.6). Quantifying gains was beyond the scope of this limited study however it would appear that a

measurable impact on productivity and lifecycle costs could be found, warranting closer examination

in future research.

Abstraction support and semantic expressiveness (strong theoretical advantages of NIAM) were not

mentioned by the analysts. If present, these advantages would be expected to become dominant with

growing systems comlpexity. A multi-project case study in this environment (with varying complexity)

would be necessary before conclusions could be drawn on this aspect.

Australian Mutual Provident 11-13 CHAPTER 12

DIGITAL EQUIPMENT CORPORATION

12.1 Introduction

The third case study examines the development and application of data modelling in a high technology manufacturing organisation. The company, Digital Equipment Corporation (International) Kaufbeuren located in the 'German Silicon Valley' near Munich is a subsiduary of the multinational of the same name headquartered in Maynard, Massachussetts, U.S.A.

Digital is representative of 'leading edge' high technology corporations. Whilst predominantly a man­

ufacturer of computer hardware systems it also develops advanced systems and applications software

in support of its business operations and objectives. As a consequence, Digital has evolved into a

sophisticated user of software development methodologies, a trend which is certain to continue as

increasing resources are directed at this segment of operations.

There are two major objectives which flow from this. The first is to examine the evolution of sys­

tems analysis and data modelling techniques at a corporate level and the second is to examine the

success of these changes (at a local site level). The expectation is that Digital, through definition of

its (information systems) business requirements will become an influential body in data modelling

theory development. This is because the feedback obtained in the process is used for the ongoing

development of the Digital systems lifecycle methodology and ultimately this exerts an influence on

theory development.

The following specific issues were investigated :

• the corporate and local business domains

• history of systems and data analysis including their evolution and the reasons underlying the

changes (at a corporate level)

Digital Equipment Corporation 12-1 • description of the systems development process and data modelling phase as a methodology

• applications of data modelling (to Kaufbeuren projects)

12.2 Corporate Environment

Digital Equipment Corporation lays claim to being the worlds leading supplier of networked computer systems. It has operations in 24 countries and a workforce of 112,000. Annual sales as of April 1988 amounted to $11.3 billion and currently growing at the rate of 20% per annum. Market competition is intense as evidenced by the number of new product announcements. Industry price/performance ratios, however measured, are under constant surveilance and constant pressure.

In this environment, survival and growth require the ability to research, develop and apply technology rapidly. As the corporation has grown and the computer market matured this requirement has come to embrace software equally as much as hardware. Significantly, the strategic 'system' advantage claimed by Digital is based largely on interconnectivity. This derivates from the hardware and operating system architectures and related systems software.

Recognition of the importance of systems and application software, including languages, development tools, database products and third party applications, has led to greatly increased effort and resource allocation in both the procedural and technical domains of software development. The objectives for internal, but especially external applications of software have been to provide:

• Fast response to market trends (proactive) and specific customer requirements (reactive)

• Improved reliability of products • Reduced development and maintenance costs • Evolutionary approach to software products embracing a 'release' concept • Enhanced communication capabilities between applications through electronic data interchange

In line with these objectives in the software domain, Digital is continually examining techniques and methodologies through which they can be realised. Software quality and development productivity have been targettcd for improvement through the following measures:

12-2 Digital Equipment Corporation • provision of automated tools in support of requirements definition, data analysis and data mod­

elling (the tools represent a combination of internally developed and externally contracted prod­

ucts depending on availability and strategic requirements)

• provision of research and development funding for technical and architectural design of dis­

tributed information systems

• provision of a wide variety of tools for data management from relational database products to

fourth generation languages

• provision of extensive internal training on data analysis, data modelling and systems design

stressing the concept of data independence

• support of the Computer Aided Systems Engineering (CASE) project

12.3 Local Environment

Digital Equipment Kaufbeuren was established as Digital' s first manufacturing site on the European continent in 1977. It's charter is volume production of high-end mass storage products to supply

European demand and to act as a second source for the United States and Group International markets.

Kaufbeuren has a sister plant located in Colorado Springs, Colorado, with which joint projects in storage technology are undertaken.

With a mission to be the 'European Storage Centre of Excellence' the original manufacturing operations have been supplemented by the formation of process and product engineering departments. These were introduced to provide incremental storage systems engineering capability, to improve quality,

and to assist field service operations. Since the formation of these departments the percentage of the

Kaufbeuren workforce employed in engineering functions has risen to 25%, representing 200 people.

The manufacturing processes consist of a combination of precision, yield sensitive assembly op­

erations conducted in a clean room environment (Head Disk Assembly) plus circuit configuration,

electronic testing and end product configuration. Workflow (completion) data, and process test data

are collected at all stages of the manufacturing process. The data is used to support scrap/rework

Digital Equipment Corporation 12-3 decisions, failure diagnosis and process engineering. Scrap/rework and failure diagnosis at manual workstations are supported by 'expert systems' programmed in the language OPS-5.

Data collection systems and process control systems are currently being integrated under a corporate sponsored project, Computer Integrated Manufacturing, CIM. Some of the systems implemented in

Kaufbeuren under this project include:

• TDC - Test Data Collection

• CAPS - Computer Aided Process Support

• ASRS - Automated Storage and Retrieval System

• MAXCIM - integrated financial, inventory control and manufacturing planning package

With the exception of MAXCIM, which is an external product maintained and enhanced by Kauf­ beuren (source code supplied), all systems have been designed and developed by engineering and information systems groups within Digital. TDC was developed by Colorado Springs. CAPS is a joint

Kaufbeuren, Colorado Springs project and ASRS is exclusively a Kaufbeuren project. All of these systems have, or will have in the near future, a transaction interface available to MAXCIM under a recently commissioned Electronic Data Interchange (EDI) project. A key objective of these projects has not only been to improve productivity and control of manufacturing operations but to demonstrate to customers the application of Digital systems to the manufacturing environment.

12.4 Methodolgy review

Systems life cycle methodologies were subject to review at the corporate level by the Digital Informa­ tion Systems group in 1984. At this time it was found that the existing system life cycle methodology was obsolete because it failed to provide adequate support in the following areas:

• technical architecture (representation thereof)

• data management

• data modelling

• prototyping

12-4 Digital Equipment Corporation The Digital Standards Group was subsequently asked to develop a requirements specification against which external or internally developed life cycle methodologies could be evaluated. A systems life cycle review team was then formed with representatives of the Technical Management Committee and Data Management Committee.

During the course of the project twelve system life cycle packages were reviewed of which four were selected for presentation to the review group. From this process the life cycle package from DMR

Group Inc. was selected for field testing. The field tests involved one new application development and two, replacement of existing applications developed for older hardware. The feedback from all three field tests suggested that DMR' s life cycle was beneficial. Within Digital the methodology was then recommended for development use on all new projects. Rights to the package were purchased and a commitment made for the provision of training, documentation and support world wide. In the following section an overview of the methodology is presented.

12.5 Systems Analysis

The DMR methodology is a systems development package which incorporates structured techniques with integrated data and process modelling phases. At the macro level, the methodology is not unlike

the traditional systems lifecycle model as defined by Wasserman (section 3.6). In documentation

and training a heavy emphasis is placed on 'information' engineering concepts to ensure that system

development is data orientated. The methodology explicitly supports three development approaches:

• Traditional development

• Prototyping

• Package selection

After an initial project evaluation is complete one of these methods is selected for development, al­

though combinations, for example of traditional and prototyping approaches, are possible. A different

set of tasks exists for each approach, but all have six phases in common (see figure 4). The primary

concepts are:

• Structured decomposition through a hierarchy of data and process models

Digital Equipment Corporation 12-5 The approach adopted by the DMR methodology to systems development (and reflected in the

lifecycle phases) conforms to the 'conventional' approach of top-down development. Decompo­

sition is extensively used. The ISO reference model was used as a base when developing the

methodology, and reflecting this, strong support has been provided for the conceptual, functional

and physical levels. (The specific concepts and techniques employed are treated in depth in the

following section].

• Release orientated development

This is based on the principle of developing a system architecture and then partioning the sys­

tem functionality into releases which are progressively developed. Each release is a functioning

application. This reflects a management (and marketting) strategy behind large software projects.

Release orientated development also provides implicit support for prototyping.

• Project management by deliverables

By emphasising 'deliverables' the methodology focuses on the end products of a teams effort

rather than the process by which it is accomplished. The methodology represents a generic

approach to systems development which makes it applicable to a wide variety of projects. Con­

sequently, specific techniques, methods and tools, when mentioned, are not tightly coupled to

the methodology. This has allowed Digital to continue to use proprietary tools and techniques

and to upgrade or introduce them as they are developed, and as needed. The possibility then

exists to meet highly variable, development requirements whilst enforcing uniform development

and control concepts. [Tools for representation of the technical architecture are supported in this

way).

12.6 Modelling and Partitioning

Information systems, in the DMR methodology, are regarded as a composition of structural and

procedural elements. Accordingly, two types of models are employed to analyse and define them:

• models of data and their interrelationships

• models of processes and their interrelationships

12-6 Digital Equipment Corporation Figure 4: DMR Systems Lifecycle

I.Opportunity evaluation - define the problem - evaluate the appropriateness of a preliminary analysis - prepare a project proposal (if appropriate) 2. Preliminary Analysis - analyse current system define system context and objectives build the conceptual data model build the conceptual process model establish basic system concepts - describe external design alternatives - translate selected alternatives into basic systems concepts build the functional process model determine technical feasibility perform cost/benefit analysis 3. Systems Architecture - complete conceptual data model refine functional process model define system performance criteria define environment, technical standards and data processing outline physical process model 4. Functional Design - develop implementation plan - build the functional data model - detail the functional data model - detail the functional process model s. Systems Construction - build physical, data and process models - prepare test environment - conduct functional tests 6. Implementation - install system - conduct systems tests - start production - evaluate system Management of complexity is handled through partioning these models and providing the conceptual, functional, and physical hierarchy as reflected in the systems lifecycle.

DMR provides for processes and data to be modelled and partitoned in different ways. Processes are defined and grouped according to their objectives. Their purpose is to perform functions or to transform data. In order to minimise overall process complexity they are partitioned or decomposed into increasingly elementary functions with (an objective of) minimal interaction.

Data are defined and grouped according to their subject or meaning. Data complexity is minimized when an object or event of interest is unambigously defined and when a minimum of data is required

Digital Equipment Corporation 12-7 to interpret it and access it. A hierarchy of data models can be built by aggregating or adding subjects at increasing levels of detail.

With processes and data being modelled by two different techniques with different structures the boundary between the models is made distinct. Nevertheless, the models are interdependent. Each process manipulates a certain set of data and has a certain view of the objects which the data repre­ sents. Conversely, each data element is used by a variety of processes. The data must fit the view of each process and the processes must treat each data element consistently. This implies that at each stage of modelling synchronisation is required. This is a function of the system architecture.

12.6.1 Conceptual Modelling

At the conceptual level, DMR defines the processes to be carried out and the interpretation of the data.

The conceptual level reflects management strategies for operating the business independently of the way the system will function or the equipment on which it will run. Entity-relationship diagrams (as

described by Chen, 1976) are used to provide a graphic representation of the conceptual data model.

In DMR the conceptual data model is defined as:

"a representation of the objects or entities about which an information system collects, stores or

produces data; of the associations or relationships occuring among entities when the system causes

or responds to an event, and of the attributes of those entities and relationships"

The DMR/ER technique utilises stepwise decomposition. Firstly a macro model of the system is

developed showing the relationship to other systems and major entities at the organisational level.

This is the context data model. Entities of immediate interest to the developing system are then

grouped into subject data bases (domains). The subject data bases so formed may then be modelled

in more detailed entity-relationship diagrams or with binary modelling tools.

The conceptual process model is represented graphically by data flow diagrams (as described by Your­

don). It contains only logical level detail, that is, the model depicts only relationships between data

and processes independent of the methods or tools employed to transfer data or execute processes.

12-8 Digital Equipment Corporation As with the conceptual data model the process model is structured as a hierarchy. A context process model is first developed which is then decomposed into subsystems and functions.

DMR define the conceptual process model as:

"a representation of the data flows describing situations or events to which the system responds, of the functions or processes that are stimulated by the data flows and produce ths system response, of the external entities of the system's environment acting as sources or links of data flows and of data stores holding the data the system needs in order to respond to events"

Both definitions are consistent with the ISO definitions.

12.6.2 Functional modelling

The functional model describes the behaviour of processes, their interaction with each other and the paths they use to access the data. The functional model is also a representation of the way the system will interact with the environment. Technical details are user transparent. The functional model

equates to the external model as defined by ANSI/SPARC.

The functional process model is based on the conceptual model. Using an interative partioning process

(described at the conceptual level) the functional model adds detail through inclusion of organisational

and geographical structure, work methods, automation guidelines and implementation strategy. Data

flow diagrams are supplemented by narrative descriptions of the process logic.

The functional data model includes record, data element and access path detail. It is the conceptual

data model enhanced by access path information, the limitations of the DBMS available, automation

guidelines and efficiency considerations. The functional model as defined by DMR, is consequently

navigational when non-relational data base management systems are used. Optimisation is based

on qualitative and quantitative factors. Qualitative in the sense of considering geographical distribu­

tion, recovery, and required level of data independence, and quantitative in the sense of transaction

volumes and storage considerations.

Digital Equipment Corporation 12-9 The functional data model is represented by data structure diagrams which show a record as a rectan­ gle, and a link as an arrow. The links between two record types specify the maximum and minimum number of occurences which can be associated with the binary relationship. These maximum and minimum occurences are used to guide the process of record formation in a fashion similar to that of

NIAM.

12.6.3 Physical modelling

The physical model used by DMR describes the internal processes and data structures used to build the

system. It represents the technical organisation of the system and corresponds to the ANSI/SPARC in­ ternal level. Data and process models are represented by the record layouts, environment parameters,

and program structure charts respectively. The detail required at the physical level varies depending

on the implementation environment. For a system developed in a relational database environment with a high level programming language the required detail is significantly reduced in comparison to

an environment with a network database and low level language.

12. 7 An inventory application

Business planning conducted at a corporate level in the early 1980's identified the need for steady

but significant cuts in inventory levels through all stages of the manufacturing process, reductions in

product cycle times (from date of receiving a customer order to date of shipment) and a commitment to

principles of Just in Time GIT) and Total Quality Control (TQq. These were seen as critical responses

to intensifying market competition.

One aspect of the response by Kaufbeuren was to investigate means of improved inventory control

through an Automated Storage and Retrieval System (ASRS). This required the installation of high bay

storage units for location and lot controlled pallet storage of component parts and work in process.

The objective was to reduce storage space requirements to 25% of the former level and to provide

greater inventory visibility and hence inventory control. Furthermore when linked to an Automated

Guided Vehicle (AGV) material movement system it would allow fully automated material flow on the

production floor.

12-10 Digital Equipment Corporation The system architecture was designed by the MIS group in Kaufbeuren. Flexibility and modularity were two important criteria as the system would be required to interface with present business planning and control systems (MAXCIM) and with future material transport systems. A macro model of material flow was prepared by a cross-functional team representing material planning, process engineering

(layout), advanced manufacturing technology (physical material flow) and management information systems. Conceptual data and process models evolved from these group meetings over a 12 month period. Entity relationship modelling, binary data modelling and data flow diagrams were used as documentation and group communication tools.

Parallel to the conceptual planning work, a sub-committee was formed to address the priority re­

quirements of the ASRS system. Using the inventory partition of the evolving conceptual model the functional specifications were developed consisting of four major components. A MAXCIM interface, an ASRS control module, an AGV interface and the underlying hardware module at the physical level.

Physical design and development of the interfaces was undertaken by the MIS group in Kaufbeuren whilst the ASRS control module was developed, based on Kaufbeuren functional specifications, by

external contractors. Data analysis and database design phases were conducted for the first time using

data and process modelling as defined by DMR. Elapsed time for the functional modelling phase was

aproximately 8 months during which several versions were generated.

At the functional level both process and data modelling were relatively complex due to the variety of

transactions possible and the integrity requirements. Transaction variety stemmed from the need to

cater for storage and control of inventory with lot, location and quality characteristics. This last factor

was particularly significant because of the need to process test engineering, quality engineering and

material rework transactions with resultant samples, returns and rejections. Data requirements were

complex for this reason but also because of the requirement to 'track' components in 'downstream'

manufacturing operations from source data.

With these characteristics the project represented a good test application for the methodology and

modelling techniques. Prior to DMR, an in-house 'methodology' had been used which had provided

Digital Equipment Corporation 12-11 guidance in project management and had also provided tools for development. Missing however, were specific techniques for the design and analysis phases.

In the final system configuration some 10 major transaction types were documented which would

pass data through the MAXCIM/ASRS interface. A similar number of transactions in each component

system were also identified. Some 20 MAXCIM files representing 280 data elements were impacted

and 5 RDB databases in ASRS representing 34 data elements resulted. Binary data modelling was

used for analysis, documentation and design to produce a functional data model. This model was

then 'rationalised' to meet the performance requirements of a process control environment.

12.8 Modelling experiences

The systems analyst representing MIS was assigned responsibility for development of the functional

and physical level models. As preparation, a two week course on data (binary) and process modelling

was undertaken. The analyst had no prior exposure to data modelling techniques but through data

flow diagrams had process modelling experience.

The cross functional team previously mentioned met on a fortnightly basis for 2-3 hours alternating

the discussion between business analysis and systems design reviews. Data analysis training was not

extended to the users as it was felt that the concepts and techniques could be explained through usage

and under analyst guidence. The users assumed responsibility for eliciting the 'business' (functional)

data model. Design documentation was largely generated as a byproduct of these meetings.

The analyst comments on the early business analysis sessions indicated that some problems had been

experienced by the users in accepting the technique. This was believecj to be because the users were

familiar with 'transactions', 'processes' and 'procedures' but not with the concept of a low le11el data

orientated view of their business. As a result more structure was required in those first sessions to

guide the users and to prompt for 'details' of data and the relationships. In later sessions a 'free

association' approach was followed as the users became familiar with the purpose and direction of

data analysis and binary modelling. Based on this experience the project team agreed that future teams

12-12 Digital Equipment Corporation should attend some preliminary data modelling training. Communications between team members was nevertheless at a high level. At the end of the project the Materials' user commented:

"We were able to discuss the business issues and data flows in a a structured manner but free of the normal systems issues and technical jargon. This allowed us to feel comfortable with the modelling process and to develop a sense of data ownership."

That opinion was further supported by the analyst who confirmed that prior to binary data modelling

MIS had "struggled to maintain user participation" in the crucial analysis and design stages of projects.

It was theorised that ER modelling had been too complicated for the casual user and that unlike binary modelling it was perceived as systems work.

Design of the data models was conducted by the analyst outside the regular meeting times however as they developed the models were subject to regular review and the process of modelling explained

such that users were made aware of the impact of their assumptions and decisions on the model. An

interesting decision was the use of Entity-Relationship diagrams for representation purposes. Justifica­

tion for their use was that binary modelling had shown its strength in support of data analysis and

modelling but that a conceptual model in ER format was easier to understand.

"We wanted user involvement and a data focus. We also wanted to re-examine without prejudice our

data assumptions. A bottom-up approach achieved this, however for final representation and wider

(user) review we believed ER diagrams suited our purposes best."

General user comments on the modelling process mostly reflected satisfaction with the level of par­

ticipation. This prompted a feeling of greater control. "Detailed examination of the data also forced

us to review our business practice and perhaps to see opportunities for change which had not been

identified at the outset of the project." It was felt that a role existed for the more traditional modelling

techniques (of which ER was associated) in the domain of macro (business) modelling but that espe­

cially with functional modelling benefits had been realised with a 'details to generalities' approach of

binary modelling. The major problem was seen as the tendency of binary analysis to "exceed project

Digital Equipment Corporation 12-13 boundaries" as indicated by some discussions which had gone off track during the 'free association' sessions. Tight control over scope was therefore felt important in preventing an 'all-encompassing' project with 'never ending' analysis.

In summary, the benefits attributed to the introduction of data and process modelling techniques encompassed:

• design verifiability

Project participants, including the external consultants, users and development team members

were able to examine the assumptions underlying the data modelling (data relationships as re­

flected in participations for example) to determine the validity of the design. Binary data mod­

elling provided a means of examining the 'conventional wisdom' in a critical and rigorous manner.

• enhanced problem understanding

The development team was able to use the documentation generated from the binary modelling

phase to come to a common understanding and agreement on the business issues and prob­

lems. Such agreement was essential for coding and testing phases particularly as the design

specifications formed the basis of the contractual agreement with the external vendors.

• management of complexity - data orientated system

Rather than define the functions first then fit (or adjust) the data model as required, the data model

was completed in conjunction with the functional model. This approach produced a simpler (better

defined) process model as reflected in the resulting manual and computer procedures.

Indirect benefits included a better working relationship between project participants (through en­

hanced communication) and improved project control (due to a heavier investment in analysis and

design in the planning stages). For a larger project the benefits were expected to be greater in this

area. Testing and implementation were also seen to have been eased due to the clear definiton of

functions and responsibilities which existed.

12-14 Digital Equipment Corporation 12.9 Conclusion

Digital Equipment Corporation represents a leading edge technology firm, one which is dominant in the hardware and software realms of the minicomputer and workstation market. Strategic business requirements indicated the need for proprietory relational database systems and associated tools.

These were subsequently developed however in order to promote effective and efficient usage of these products it was recognised that existing lifecycle methodologies would require change. In the lifecycle reviews which followed process and data modelling were identified as key techniques to be supported.

The case study looked at the Digital environment, the systems lifecycle methodology review process and the use of data modelling in an inventory application. It was seen that a combination of data modelling techniques was used, ER for macro modelling and model representation purposes, and binary modelling for the data analysis and functional/conceptual modelling. Such a result is interesting but not altogether surprising. The theory would suggest that ER has strengths in conceptual modelling

and representation with the later being a significant factor in its use on a project with a high level of

user involvement. That ER was used for final representation purposes reflects somewhat on the actual

facilities available in the binary modelling technique itself. As the binary modelling was most similiar

to Kent, lack of diagrammatic support favoured ER. The binary model was chosen nevertheless for

the detailed phases because of its ease of use, documentation support and communicability. These

features were confirmed during the project.

As with the A.M.P. case study indirect support for quality improvements were found, attributed to the

use of binary modelling. The extended analysis phase and heavier (than normal) user involevment

was believed to have improved the specification and detailing of requirements. Whether this had

translated into a better system was only confirmed in a subjective manner by project participants.

With the low level of project complexity it was difficult to verify the metrics of abstraction and semantic

expressiveness. Based on the tools used in the project however problems might well be experienced

in this area. Both ER, and the Kent like technique, seemed not to provide adequate support for

large complex data models. This is undoubtedly an area which requires development as expectations

Digital Equipment Corporation 12-15 indicate that significant advantages from binary modelling could accrue on such projects. A more sophisticated binary approach, perhaps that offered by NIAM might be beneficial. The advantage offered by the DMR methodology is that such a technique would fit seamlessly into the systems development process should it be required.

12-16 Digital Equipment Corporation CHAPTER 13

SUMMARY

In this report a comparative analysis of data modelling theory and practice has been conducted. Com­ mencing with a justification for the significant volume of research in the area of data modelling, the report has argued for the creation of a reference framework in which competing modelling method­ ologies could be evaluated. Pursuant to the goal of standardisation of terminology the language of the

International Standards Organisation was adopted whenever possible. In Chapter 3 the philosophy of the nature of data and reality was discussed as a forerunner to the difficult task of integrating the diverse perspectives of data, data models and database architectures found in the literature. In Chap­ ter 4 a classification of data models was presented followed by a summary of the conceptual schema and database model as defined by the ANSI/SPARC committee.

Based on this framework and terminology, a feature analysis was conducted of four data modelling methods. Each represented an approach to data modelling varying however, in comprehensiveness, application and philosophy. The major concepts of each model were described in Chapter 8. In

Chapter 9 a comparative analysis of the methods was conducted, using a taxonomy derived from the

Comparative Review of Information Systems conference.

Based on the findings from this analysis it was evident that most of the 'theories' represented normative positions and in order to support or reject those positions field testing was seen to be necessary. This presented a difficulty. Several of the data modelling theories (KENT, ACM/PCM and to a lesser­

extent NIAM) were not widely known in development environments thereby limiting the potential of

most types of field research. Due to this 'sample' shortage a three environment case study design was

adopted with subject selection based on availability of relevant data. The major findings are presented

in the next section, followed by a discussion of the research limitations. Based on a combination of

these two sections the report concludes with a review of future research opportunities.

Summary 13-1 13.1 Case study conclusions

"The change process and the solutions introduced correlated with the sophistication of the environ­ ment"

In each of the three case study environments the concept of a data driven approach to systems design was found to have strong support. In the commercial environments adoption of a data driven approach was seen as a response to increasing system complexity and of the need to integrate organisational data requirements in an effective manner. Digital Equipment Corporation, finding the existing lifecycle methodology inadequate to support present and anticipated requirements commenced a controlled search for a replacement. In the requirements specification which resulted, the need to support data analysis, data modelling and the concept of data independence, was emphasised. Australian

Mutual Provident, with clearly a different set of business objectives, did not conduct a formal search for the data modelling method/methodology which was introduced. In contrast to Digital, change resulted from the efforts of a contract analyst who introduced NIAM. Being less extensive than the

DMR methodology from Digital, the impact changed the process of analysis and design but left other

lifecycle phases unchanged. At the University of New South Wales change in the data modelling

method also reflected more of an ad-hoe approach (opportunity) than of a planned search. The

KENT technique, being less extensive than NIAM impacted only the analysis phase and some aspects

of design. Drawing these results togther the non-surprising conclusion is that the methodology or

method introduced should match the requirements and sophistication of the environment. Whilst the

objectives have been similar, a single concept, binary modelling for example, was seen not to have

provided the complete solution (Digital).

"Effective communication and user involvement were enhanced"

A single theme was dominant in each of the case studies, that being the role of communication be­

tween project participants. The phrases 'increased user involvment', 'demystified analysis', 'ease of

learning' and 'simplicity of the concept' reflected on the positive experiences with binary data mod­

elling. In both of the commercial environments users reported increased project participation with

13-2 Summary corresponding gains in effectivity. Through the benefit of a 'traceable design process' the belief was strongly held (users, analysts and IS management) that better specifications and designs had resulted.

In the University environment an improved understanding of data modelling and normalisation con­ cepts (by students) was acknowledged. Here (through direct observation and participation) it was seen that significant discussion was generated regarding the 'modelled reality' and that binary modelling had faciltated this. In all environments the idea that a 'correct' data model could be 'produced' by IS seemed to have been dispelled in light of the improved user awareness of the data modelling process.

"Binary modelling has its' limitations"

At A.M.P. it was seen that NIAM had been enthusiastically embraced as the data modelling standard, displacing ER modelling in the process. Nevertheless, there remained a role for ER modelling in the creation of a macro or business model. This was dear acknowledgement that NIAM and binary modelling could not be all things for all people. A bottom up approach was an addition to the overall analysis task rather than a replacement to top-down analysis. Acceptance of a dual analysis and design

approach was also made explicit in the methodology adopted by Digital. ER had a defined role in

the preparation of overviews and scope definition. It was also the model of choice for representation .. purposes. Convergence of bottom-up and top-down analysis was stressed on consistency grounds.

In the University environment such a dual-method approach was seen as less important on practical

grounds due to the small project size. In the theory however the usefulness of such an approach was

stressed.

"Rush towards relational technology strengthens, implications for data modelling"

From the environment descriptions of each case study the push towards relational technology was

seen to be strengthening. A.M.P. a major customer of IBM Corporation was moving towards impl­

mentation of DB2 with the expectation that the package would be used almost exclusively for new

systems. Digital having recently announced a major version release of its' relational package ROB fol­

lowed this by an agressive marketting push into transaction processing. This was significant because

Summary 13-3 it could be interpreted as support for the relational concept irrespective of the system type (whether high performance or otherwise). Whilst continuing to support existing network database users, Digital now markets a relational approach as the primary system solution. This has played hand in hand with developments in distributed data processing. A result of this growth in relational implmentations will be to place further demands on the methodologies and theories which support it. The imple­ mentation of data and process modelling, along the lines promoted by the International Standards

Organisations would be expected to become widespread. Binary data modelling, being an element of these standards, should continue to develop thereby attracting significant research interest.

13.2 Research Limitations

The three case studies presented in this report have been drawn from diverse environments. The

University of N.S.W. representing a research and teaching facility, Australian Mutual Provident a large insurance company, and Digital Equipment Corporation a multi-national computer manufacturer. In addition to this, each of the case studies investigated the implementation and usage of a different data modelling technique. Such a heterogeneous sample would restrict the validity of generalisation across these environments. This was anticipated and as such the purpose of the research has not been to

draw out general conclusions nor to extrapolate the results to other environments. The major purposes

have been one, to undertake exploratory/descriptive research with the objective of identifying areas where empirical research might be beneficial, and two, to provide some qualitative feedback for the

ongoing theoretical development and implementation of data modelling. Such objectives were best

served through seeking a cross-section of information systems environments.

Bearing these objectives in mind the following weaknesses nevertheless exist in the case studies.

Firstly, time restictions and disclosure restictions in the corporate environments dictated the level of

detail which could be obtained and the research methods which could be employed. This resulted in a

heavier than desired emphasis being placed on interviews and verbal recall as it was often not possible

to obtain documentation. In these two cases interviews were also restricted through nomination of

employees by the corporation rather than through selection by the researcher. Consequently, selection

13-4 Summary bias may have served to influence the findings. Due to the reliance on verbal communication much of the material gathered was also necessarily of a subjective nature.

Secondly, data modelling theory suggests that many of the impacts (benefits) of its' use will be realised over the lifetime of the information system. Some will be immediate but others will only be evident in the medium to long term. Quality improvements for example, perhaps reflected in lower maintenance costs would not be initially evident. In order to find evidence for these effects longitudinal analysis was/is required at the project level.

Thirdly, in the University environment the project sizes and levels of complexity could not be taken as representative of commercial (real world) projects. The impact of this was to reduce the external validity of some findings however it was possible to concentrate on several metrics, learning, com­ munication and representation, for which the results could be extrapolated to other environments.

Finally, the data collection phase for the three environments extended over several years. Since this

phase has been completed it is likely that each environment has moved ahead in the application of data

modelling and systems design and in the sophistication of useage. Such changes would be expected

particularly since each environment had demonstrated a limited history of binary data modelling and

in the corporate environments a learning phase was clearly in process. As a consequence the case

studies should not be taken as current descriptions of their respective environments.

13.3 Future Research

Each of the case study environments offers rich potential for empirical research. In this section

the corporate environment and the university environments are examined and a number of research

alternatives considered.

Perhaps the most promising area for research in the corporate environments involves longitudinal

analysis and the collection of project lifecycle data. Specifically, data showing the percentage dis­

tributions of analysis, design, and development effort against total project effort. Such data would

support comparisons between projects which had utilised binary data modelling and those which had

not. The data could also be used as input for one of the many parametric models developed for

Summary 13-5 project estimation and control. Post-hoe analysis of the project data based on the model forecasts might then reveal if there was justification for significant change in the parameter relationships or model assumptions for binary data modelling projects (reflecting a change in the lifecycle structure).

An important measure would also be the level of user involvement perhaps calculated as a percentage of total project effort and total analysis effort.

Within the bounds of a longitudinal analysis measures of system quality would be beneficial. Although difficult to operationalise (perhaps a subjugate of change requests per unit period in programs or data might be used) the quality measure could then be be checked for correlation with changes in the lifecy­ cle phase distributions. [This would enable predictions from the theory to be checked suggesting that increased user involvement and increased analysis (at the conceptual level) will lead to improved sys­ tems quality). A multiple organisation analysis conducted within a homogeneous industry/environment would add further validity to the results from the longitudinal studies.

Within the University environment the possibility exists to provide greater experimental control than

in the commercial environment. Consequently for a limited range of variables for which external va­

lidity could be established it would be possible to design an 'experiment' with small projects and small

groups utilising alternative modelling techniques. Such an exercise could be accomplished by ran­

domly dividing the students from one of the database subjects into a group instructed to use Kent and

another instructed to use ER modelling. Based on the identical project description and deliverables

being used, such variables as project time in phases, degree of data model normalisation achieved,

data model modifications required before implementation (after completion of the conceptual model),

time taken to understand the modelling technique and standard of documentation produced could

be collected. In addition a number of qualitative measures concerning the development experience,

for example, ease of use and understanding, or inter-group communication, might be collected via

questionaire.

After a first pass completion it should be possible to modify the requirements and then to once more,

collect data on the previously mentioned variables. This would then allow some aspects of a 'real

13-6 Summary world' environment (change) to be simulated and measured. The experimental design could be further

expanded by adding a third group with the instructions to complete the project without the benefit of

a conceptual modelling phase.

Summary 13-7 APPENDIX A

DATABASE ARCHITECTURE

User B 1 User B2 User B3 Host language Host language Host language + DSL + DSL + DSL

• External ·External External view A External view B - - - - - schema A schema B

External/conceptual' External/conceptual/ mappirn-g "--A-----'=-----=--m-a~pping B ~ Schemas and mappings built Conceptual Conceptual view and maintained schema by the database administrator (OBA)

Storage structure definition u (Internal schema)

• User interlace

t Source: [Dale 86 p33)

Database Architech.:,re A-1 APPENDIX B

UNIVERSE OF DISCOURSE

C O N C E P T U A L U N I V E R S E SCHEMA 0 F

DISCOURSE~

I N F O R M A T I O N

B A S E

1. Classification, abstraction, generalisation, establishing rules etc. about the Universe of Discourse

and recording them. This is a human process, describing a (shared) mental model of the Universe

of Discourse. t

2. Recording facts and happenings about the Universe of Discourse including what entities are of

interest.

t Source: (Grielhuysen 85 /03-15)

Universe of Discourse B-1 APPENDIX C

DATA SYSTEM DESIGN

Information System Design r ------, Application Design - - - - - r- Data System Design -, t I Information Requirements Design I I Requirements Specification "-./ I I Conceptual Design I I I I I Conceptual Schema 'v I I Implementation Design I I I DBMS (Relational or ...) I Schema I ""-./ I I I Physical Design I I I I Storage Schema 'v' I '------.J L. ------J Component (or level of the process) I I V Link (or interface) between two levels

t StlurCt': !Agosti 84 phi

Data System Design C-1 APPENDIX D

SUBJECT DESCRIPTIONS

14.608 Database Systems

Advanced data storage concepts, including detailed study of alternative approaches to database man­ agement systems. Management information needs and database specification in a commercial en­ vironment. Detailed evaluation, with project work, of a microcomputer based management system.

Information retrieval concepts, relational query systems, security, control and audit considerations.

14.603 Computer Information Systems 2

Systems design: physical design of business systems, specifications and updating of VSAM files, man machine dialogue procedures, top-down structured design and evolutionary design methodologies.

Introduction to communications networks. Operating systems concepts: processor, storage, device and process management, segmentation and paging systems. COBOL programming.

14.606 Management Information Systems Design

Organisational impact, information systems design methodologies, requirements elicitation, logical and physical design, implementation procedures, principles of data management, data analysis, telecommunications networks, systems design in a distributed environment, commercial programming practice, systems development case studies using spreadsheet, file management and word processing software.

14.992G Data Management

A review of data management principles including both simple and complex file designs, and the

concept of database management systems. Alternative database management systems architectures,

including network hierarchical and relational approaches. Database query systems, including relational

algebra. Case studies and assignments embodying these principles.

Subject Descriptions D-1 APPENDIX E

A.M.P. DATA BASE DESIGN PROCESS

D.B. DESIGN OTHER DATA MODELS PHASES INPUTS

GENERIC APPLICATION BUSINESS DATA DATA INFORMATION HODEL ANALYSIS STRUCTURE

APPLICATIO REMOVAL BOUNDARIES DATA 1------• OF REDUNDANT .,._____ --t FOR HODEL ENTITIES~ MPLEMENTATI

IMPLEMENT­ KAP TRANSACTION ATION STATISTICSONTO .,.______US.f\GE DATA -~-s;n.cs· HODEL HODEL

CurfPOS.ITE RATIONALIZE USAGE FOR PHYSIC u.------s HAPS IHPLEMEHTATI

····-· . . ,.,., .

RATION~ DEFINE I,ZED IHS C9HPOSITE ACCESS SAGE KAPS KEYS

IMS PHYSICAL KEY DESIGN HAPS

A.M.P. Data Base Design Process E-1

IMS PHYSICAL DATA BASE DESIGN BIBLIOGRAPHY

Agosti, M., Johnson, R.G. (1984). A Framework of Reference for Database Design. DATA BASE Summer

1984, 3-9.

Brodie, M.L. (1983). Association: A Database Abstraction for Semantic Modelling in Entity-Relationship

Approach to Information Modelling and Analysis. edited by Chen, P.P.S. North-Holland, 577-601.

Brandt, I. A Comparitive Study of Information Systems Design Methodologies in INFORMATION SYSTEMS

DESIGN METHODOLOGIES: A Comparitive Review. edited by Olle, T.W. North-Holland, (1982),

9-35.

Brodie, M.L., Silva, E. Active and Passive Component Modelling: ACM/PCM in INFORMATION SYS­

TEMS DESIGN METHODOLOGIES: A Comparitive Review. edited by Olle, T.W. North-Holland,

(1982), 41-91.

Brodie, M.L., Silva, E.O., Ridjanovic D. On a Framework For Information Systems Design Methodologies

In: INFORMATION SYSTEMS DESIGN METHODOLOGIES: A Feature Analysis. edited by Olle,

T.W. North-Holland, (1983), 231-241.

Bubenko, J.A., Gustafsson, M.R., Karlsson, T. Comments on some Comparisons of Information System De­ sign Methodologies In : INFORMATION SYSTEMS DESIGN METHODOLOGIES: A Feature Analysis. edited by Olle, T.W. North-Holland, (1983), 243-249.

Chen, P.P.S. The Entity-Relationship Model - Toward A Unified View of Data. ACM Transactions on

Database Systems, vol 1, No.l March (1976) p9-36.

Chilson, D.W., Kudlac, M.E. Database Design: A survey of Logical and Physical Design Techniques. DATA

BASE Fall 1983, 11-19.

Codd, E.F. Further Normalisation of the Data Base Relational Model. in Data Base Systems, Courant

Computer Science Symposia Series, Vol. 6. Englewood Cliffs, N.J. Prentice-Hall (1972).

1 Codd, E.F. Extending the Database Relational Model to Capture More Meaning. ACM TODS 4, No.4

(December 1979).

Codd, E.F. Data Models in Database Management. Proc. Workshop on Data Abstraction, Databases and

Conceptual Modelling. ACM SIGPLAN Notices 16, No. 1 0anuary 1981).

Date, C.J. (1986). An Introduction to Database Systems. Volume 1, Fourth Edition. Addison-Wesley,

Sydney.

Davis, B.G., Olson, H.M. (1985). Management Information Systems. Conceptual Foundations, Structure and

Development. Second Edition. McGraw-Hill, Sydney.

Griethuysen van, J.J. Concepts and Terminology for the Conceptual Schema and Information Base. Interna­

tional Standards Organisation Document No. ISO/TC97/SC5-N695 (August 1985).

Kahn, B.K. (1985). Requirement Specification Techniques. In Principles of Database Design: Volume 1

Logical Organisations edited by: Yao, S.B. Prentice-Hall, New Jersey pp 1-65.

Kent, W. (1978). Data and Reality. North-Holland.

Kent, W. (1984). Fact-Based Data Analysis and Design. Journal of Systems and Software 4, pp99-121.

King, R., McLeod, D. (1985). Semantic Data Models. In Principles of Database Design: Volume 1

Logical Organisations edited by : Yao, S.B. Prentice-Hall, New Jersey pp 1-65.

McFadden, F.R., Hoffer, J.A. (1985). Data Base Management. Benjamin-Cummings, California.

Ramon, A.O.1. (1983) Information Derivability Analysis in Logical Information Systems. CACM, Vol. 26,

No. 11 (September '83).

Rzevski, G. On the Comparisons of Design Methodologies In : INFORMATION SYSTEMS DESIGN

METHODOLOGIES: A Feature Analysis. edited by Olle, T.W. North-Holland, (1983), 259-266.

2 Shoval, P. (1985) Essential Information Structure Diagrams and Database Schema Design. Information Sys­ tems Vol 10, No.4 pp417-423.

Verheijen, G.M.A., Van Bekkum, J. (1982) NIAM: An Information Analysis Method. in INFORMATION

SYSTEMS DESIGN METHODOLOGIES: A Comparitive Review. edited by Olle, T.W. North-Holland,

(1982), 537-589.

Wasserman, A.I., Freeman, P., Porcella, M. Characteristics of Software Development Methodologies In :

INFORMATION SYSTEMS DESIGN METHODOLOGIES: A Feature Analysis. edited by Olle, T.W.

North-Holland, (1983), 37-62.

3