Heterogeneous Distributed Database Systems for Production Use

Heterogeneous Distributed Database Systems for Production Use GOMER THOMAS Bellcore, 444 Hoes Lane, Piscataway, NJ 08854 GLENN R. THOMPSON Amoco Production Company Research Center, 4502 East 41st Street, Tulsa, Oklahoma 74135 CHIN-WAN CHUNG Computer Science Department, GM Research Laboratories, Warren, Michigan 48090 EDWARD BARKMEYER National Institute of Standards and Technology, Gaithersburg, Maryland 20899 FRED CARTER Zngres Corporation, 1080 Marina Village Parkway, Alameda, California 94501 MARJORIE TEMPLETON Data Integration, Inc., 3233 Federal Avenue, Los Angeles, California 90066 STEPHEN FOX Xerox Custom Systems Division, 7900 Westpark Drive, McLean, Virginia 22102 BERL HARTMAN Sybase, Inc., 6475 Christie Avenue, Emeryville, California 94608 It is increasingly important for organizations to achieve additional coordination of diverse computerized operations. To do so, it is necessary to have database systems that can operate over a distributed network and can encompass a heterogeneous mix of computers, operating systems, communications links, and local database management systems. This paper outlines approaches to various aspects of heterogeneous distributed data management and describes the characteristics and architectures of seven existing heterogeneous distributed database systems developed for production use. The objective is a survey of the state of the art in systems targeted for production environments as opposed to research prototypes. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems-distributed databases; H.2.4 [Database Management]: Systems; H.2.5 [Database Management]: Heterogeneous Databases General Terms: Design Additional Key Words and Phrases: Database integration, distributed query management, distributed transaction management, federated database, multidatabase, system architecture Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1990 ACM 0360-0300/90/0900-0237 $01.50 ACM Computing Surveys, Vol. 22, No. 3, September 1990 238 l Thomas et al. CONTENTS term is typically used to imply multiple data models.) It is considered to be for production use if it has been, or is being, developed for that purpose, even if it is INTRODUCTION currently only in an advanced prototype 1. HETEROGENEOUS DISTRIBUTED state. DATABASE CAPABILITIES Although a number of specific database 1.1 Schema Integration systems are described or mentioned in this 1.2 Distributed Query Management 1.3 Distributed Transaction Management paper, this paper is not intended to contain 1.4 Administration a definitive list of heterogeneous distrib- 1.5 Types of Heterogeneity uted database systems for production use 2. STANDARDS ACTIVITIES or to endorse or recommend any particular 3. SOME EXISTING SYSTEMS 3.1 ADDS (Amoco Production Company, system. The information about specific sys- Research) tems may or may not be entirely accurate, 3.2 DATAPLEX (General Motors Corporation) and it is likely that other systems on the 3.3 IMDAS (National Institute of Standards and market, in production use, or under devel- Technology, U. Florida) opment have capabilities equaling or ex- 3.4 Ingres (Ingres Corporation) 3.5 Mermaid (Data Integration, Inc.) ceeding those mentioned here. 3.6 MULTIBASE (Xerox Advanced Information Section 1 discusses different types of dis- Technology) tributed database capabilities that can be 3.7 SYBASE (Sybase, Inc.) provided by different systems. Section 2 4. SUMMARY ACKNOWLEDGMENTS describes the status of remote database ac- REFERENCES cess standards, a key factor in the ease of developing heterogeneous distributed database systems. Section 3 describes the characteristics and architecture of a sam- INTRODUCTION pling of existing heterogeneous distributed database systems developed for production The objectives of this survey paper are to use. These descriptions are current as of provide insight into the kinds of heteroge- late 1989, written by individuals who have neous distributed database capabilities been involved in the development of these available in off-the-shelf systems and to systems. Section 4 gives a brief summary describe the practicalities of in-house de- of the state of the art. velopment of their capabilities; that is, what can be achieved and what approaches 1. HETEROGENEOUS DISTRIBUTED are likely to be promising. DATABASE CAPABILITIES This paper takes a broad view of the terms “distributed”, “heterogeneous”, and Different types of capabilities can be “production use.” A database system is con- provided by heterogeneous distributed sidered to be distributed if it provides access database systems. They include schema to data located at multiple (local) sites in a integration, distributed query processing, network, even if it does not provide full distributed transaction management, ad- facilities for schema integration, distrib- ministrative functions, and coping with uted query management, and/or distributed different types of heterogeneity. Schema transaction management. It is considered integration has to do with the way in which to be heterogeneous if the local nodes have users can logically view the distributed different types of computers and operating data. Distributed query management deals systems, even if all local databases are with the analysis, optimization, and exe- based on the same data model and perhaps cution of queries that reference distributed even the same database management sys- data. Distributed transaction management tem. (This definition of heterogeneous is deals with the atomicity, isolation, and admittedly at odds with that commonly durability of transactions in a distributed used by the research community, where the system. Administrative functions include ACM Computing Surveys, Vol. 22, No. 3, September 1990 Heterogeneous Distributed Database Systems l 239 such things as authentication and authori- The simplest form of schema integration zation, defining and enforcing semantic is schema union, whereby a conceptual constraints on the data, and manage- schema seen by distributed users is the ment of data dictionaries and directories. disjoint union of local schemata or perhaps Heterogeneity can include differences in of subsets or views of local schemata. In hardware, operating systems, communica- the simplest form of this, a data item in the tions links, database management system conceptual schema is identified by the (DBMS) vendors, and/or data models. name of a local database concatenated with These are all important aspects of distrib- the name of an item in the local database. uted data management. In considering More commonly, some type of aliasing them, it is important to recognize there is capability is provided. no “ideal” set of capabilities for all envi- More sophisticated capabilities that can ronments or applications. A particular be provided include the following: capability may be invaluable in certain sit- Replication, whereby data items in differ- uations while being totally unsuitable in ent local databases may be identified as others. copies of each other 1.1 Schema Integration Horizontal fragmentation, whereby data items in different local databases may be Each local database has a local schema, identified as logically belonging to the describing the structure of the data in that same table or entity set in the integrated database. Each user has a user view, de- schema scribing that portion of the distributed data Vertical fragmentation, whereby data that is of interest to the user, possibly re- items in different local databases may be organized to meet the user’s requirements. identified as logically representing the There are two general approaches to the same row or entity in the integrated problem of providing mappings between schema but containing different attri- user views and the local schemata: butes for the row or entity (1) All of the local schemata may be inte- Data mapping, whereby data types or grated into a single global schema that data values are converted for conformity represents all data in the entire distrib- with each other uted system. All user views are derived As an example of data mapping, one local from this global schema. database might store temperatures as float- (2) Various portions of various local sche- ing point numbers representing degrees mata may be integrated into multiple Celsius, and another local database might federated schemata. These federated store temperatures as character string rep- schemata may correspond closely to resentations of degrees Fahrenheit. To in- user views, or user views may be further tegrate these schemas, one might want to derived from these federated schemata. map degrees Fahrenheit to degrees Celsius In either approach one is faced with schema and map character string representations integration-the process of developing a of numbers to floating point representa- conceptual schema that encompasses a tions, or vice versa. collection of local schemata. If the local databases are based on differ- 1.2 Distributed Query Management ent data models and/or use different names or representations for particular data ele- Distributed query management provides ments, the usual first step is for each local

Load more