Semantic Database Prototypes

Published in Journal of Information Systems 3 (2) April, 1993, pp. 119-144, copyright Blackwell Scientific 1993. SEMANTIC DATABASE PROTOTYPES Richard Baskerville School of Management State University of New York Binghamton, New York 13902-6000 Keywords and Phrases: Action Research, Data Base, Data Base Design, Information, Information Systems Design, Prototypes, Prototyping, Semantics, Semantic Data Model, Systems Analysis Abstract This paper describes of a technique for improving the semantic consensus of conceptual database designs. Semantic consensus is a condition where there is pragmatic agreement among database designers and all of the users about which aspect of reality is being represented by a particular database element, and how that representation is being coded. The technique, called semantic database prototyping, involves a prototype that has been designed and constructed purely as a consequence of the semantic data model. The purpose of the semantic database prototype is to promote direct user validation during the conceptual database design phase of information systems analysis and design. Its distinguishing characteristic is its capture of data element occurrences within the context of the database design. The research method was action research, and the project is also briefly described. INTRODUCTION The purpose of this paper is to describe a technique for the extension of prototyping into the task of database modelling. This semantic database prototype (SDP) technique is motivated by the need for concise communication between the user and the database designer. The approach is predicated on the notion that a prototype can serve as an abstraction level for conveying a data model. The scope of the paper includes the issue of semantics in database design, a description of the action research that led to the discovery of the technique, and a sufficiently detailed description of semantic database prototyping to permit the reader to apply the technique. Before concluding, the paper discusses some opportunities for future research in this area. Database design is a critical step in information systems development. After implementation, a change to any fundamental attribute, relationship or constraint of an integrated database element may require modifications to every application that concerns that element. Such changes thus result in expensive application maintenance. Consequently, the Information Engineering (IE) literature emphasizes the importance of a careful, stable, data design relative to process design. (Finkelstein, 1981, Martin and Finkelstein, 1981, Inmon, 1988 and Martin, 1990). It follows that mis-communication between database designer and user may be profoundly ruinous to a new information system's success. A technique for improving the accuracy of this communication (such as the one described below) is of enormous importance for data-oriented system designs. Semantic Data Base Prototypes 2 SEMANTIC THEORY IN DATABASE MODELS The two purposes of models, in general, are epistemological and logical. On the one hand, they help us to express our understanding; on the other, they help us to infer from the abstract that which is arcane in the reality (Harre, 1972). For these purposes, data models provide constructs by which we may abstract the inherent structure, operations, and constraints of data in reality (Tsichritzis and Lochovsky, 1982). Semantic theory is relevant because it regards the meaning in language, or the relationship between signs and objects in the world. These theoretical semantics of data models have historically provided a basis for discussions in this arena (Klein & Lyytinen, 1991). Metadata Constructs Versus Capture Research based on semantic theory has two directions. On the one hand, metadata construct research inevitably focusses on the proper inner workings of database management software: which possible data constructs are allowed, what mechanisms can be used for presenting data, etc. Manufacturers of database packages would closely follow research into ideal metadata constructs. On the other hand, research into the process of capturing the metadata deals with approaches and methods for developing knowledge about an application domain and implementing a usable data model on whatever constructs are provided by the system. Analysts and designers of computer-based information systems would closely follow research into ideal metadata capture techniques. Semantic theory plays an important role in both of these research arenas, much in the same way as algorithm theory is important in both compiler and application program design. We provide a context for the work that follows with a brief review of each of these research streams. Semantic Metadata Construct Theory This work generally seeks an orthogonal taxonomy of the semantic requirements in data modelling, and originates in the concept of a semantic database model. Closely related to the relational data model, semantic models are distinguishable in that these ignore all implementation constraints. Chen's (1976) entity-relationship model is foremost among the semantic models. This research community has focussed on the encoding of the universal semantic rules needed for metadata in the semantic database model. Such semantic database research seeks to minimize database design errors by offering constructs that permit an accurate, effective model. This model (i.e., the database) reflects the most essential real-world attributes, relationships and constraints relevant to the design domain. "Such a semantics-based database description and structuring formalism is intended to serve as a natural application modeling mechanism to capture and express the structure of the application environment in the structure of the database." -- Hammer and McLeod (1981, p. 352) Earlier Hammer and McLeod (1976) focussed on metadata constructs that facilitate semantic integrity. They delineated various levels of semantic integrity that database designers might achieve with semantic data models. Ronald Stamper's (1979) continuing work in LEGOL led to a definition of semantic normal form. This database design approach goes beyond the normalization of the unintended anomalies in operations on data, and seeks to reduce the unintended anomalies that accompany operations on the metadata. More recently, this project has led to the proposal for a `normbase': constructs that can represent entire systems of social norms (Stamper, et al., 1991). Ongoing experiments include the NORMA language, which represents a group's social knowledge as a system of semantic and norms constructs. Semantic Data Base Prototypes 3 Semantic Metadata Capture Theory This body of research regards the cognitive, social, and organizational processes by which designers learn, understand and capture the values of the metadata that is to be encoded by the constructs described earlier. One segment of this literature regards the conceptual difficulties of end-user database development. Jeffrey Hoffer (1982) pioneered the study of users and metadata capture from the perspective of information center requirements. Throughout this stream of research, the semantic data models have consistently baffled and repelled end-users. Hoffer concluded that people tend to rely on data-process flow models when given a choice, it is the process that gives meaning to the data. Juhn and Naumann (1985) conducted an experimental comparison of the metadata validation task for end-users. These users were randomly assigned a validation task for a familiar database domain in the form of one of four models (two semantic models and two relational models). The results were mixed, the semantic models imparted a better understanding of relationship and cardinality, but the relational models supported a better understanding of primary and foreign keys. Batra, Hoffer and Bostrom (1988) studied the impact of semantic data models on the end-user metadata capture process. Again, the results are mixed: the semantic data model and the relational model were each more confusing to end-users in specific circumstances. Importantly, the average total error rate by end-users with both relational and semantic data models hovered around 45%. Batra et al. suggest that the solution is to train and support users in the metadata discovery and validation tasks. The LEGOL project represents a different segment of metadata capture research. Rather than train users in semantic data modelling, the LEGOL team developed MEASUR, a comprehensive set of tools for the analysis, design and construction of an information system based on social norms (Stamper, Althans & Backhouse, 1988). This work accepts that "formal semantic theories fail to account for meanings that relate language to reality" (p. 69), and suggests that software engineering must find ways of overcoming the semantic misunderstandings created by designer-user differences in intentionality, culture, responsibility and commitment. The work below is distinguished from the previous research by two characteristics. First, unlike the work in end-user database design, SDP permits end-user validation of a semantic data model by capitalizing on iterative learning, linguistic and semiotic semantics, and prototyping. Training users in semantic data modelling is not necessary. Second, unlike MEASUR, this technique is narrowly focussed on the critical designer-user communication problem. Accordingly, SDP can be easily integrated into many existing systems development philosophies. Comprehensive business-oriented

Semantic Database Prototypes

Development Team Principal Investigator Prof

Data Warehouse: an Integrated Decision Support Database Whose Content Is Derived from the Various Operational Databases

The Entity-Relationship Model — 'A3s

Data Model Transformations: Relational to Dimensional

DATA INTEGRATION GLOSSARY Data Integration Glossary

Object Triple Mapping Bridging Semantic Web and Object- Oriented Programming

Describing Data Patterns. a General Deconstruction of Metadata Standards

KDI EER: the Extended ER Model

Week 4 Tutorial - Conceptual Design

Numerical Analysis, Modelling and Simulation

I Introduction Entity Relationship Model: Basic Concepts Mapping Ca

The Entity-Relationship Model : Toward a Unified View of Data