<<

Federated Systems for Managing Distributed, Heterogeneous, and Autonomous

AMIT P. SHETH Bellcore, lJ-210, 444 Hoes Lane, Piscataway, New Jersey 08854

JAMES A. LARSON Intel Corp., HF3-02, 5200 NE Elam Young Pkwy., Hillsboro, Oregon 97124

A federated database system (FDBS) is a collection of cooperating database systems that are autonomous and possibly heterogeneous. In this paper, we define a reference architecture for management systems from system and schema viewpoints and show how various FDBS architectures can be developed. We then define a methodology for developing one of the popular architectures of an FDBS. Finally, we discuss critical issues related to developing and operating an FDBS.

Categories and Subject Descriptors: D.2.1 [Software Engineering]: Requirements/ Specifications-methodologies; D.2.10 [Software Engineering]: Design; H.0 [Information Systems]: General; H.2.0 [Database Management]: General; H.2.1 [Database Management]: Logical Design--data models, schema and subs&ma; H.2.4 [Database Management]: Systems; H.2.5 [Database Management]: Heterogeneous Databases; H.2.7 [Database Management]: General Terms: Design, Management Additional Key Words and Phrases: Access control, database administrator, database design and integration, distributed DBMS, federated database system, heterogeneous DBMS, multidatabase language, negotiation, operation transformation, query processing and optimization, reference architecture, schema integration, schema translation, system evolution methodology, system/schema/processor architecture, transaction management

INTRODUCTION tern (DBMS), and one or more databases that it manages. A federated database sys- Federated Database System tem (FDBS) is a collection of cooperating A database system (DBS) consists of soft- but autonomous component database sys- ware, called a database management sys- tems (DBSs). The component DBSs are

’ The views and conclusions in this paper are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of Bellcore, Intel Corp., or the authors’ past or present affiliations. It is the policy of Bellcore to avoid any statements of comparative analysis or evaluation of vendors’ products. Any mention of products or vendors in this document is done where necessary for the sake of scientific accuracy and precision, or for background information to a point of technology analysis, or to provide an example of a technology for illustrative purposes and should not be construed as either positive or negative commentary on that product or that vendor. Neither the inclusion of a product or a vendor in this paper nor the omission of a product or a vendor should be interpreted as indicating a position or opinion of that product or vendor on the part of the author(s) or of Bellcore. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1990 ACM 0360-0300/90/0900-0183 $01.50

ACM Computing Surveys, Vol. 22, No. 3, September 1990 184 l Amit Sheth and James Larson

CONTENTS or component DBMS, can be a centralized or distributed DBMS or another FDBMS. The component DBMSs can differ in such aspects as data models, query languages, INTRODUCTION and transaction management capabilities. Federated Database System One of the significant aspects of an Characteristics of Database Systems FDBS is that a component DBS can con- Taxonomy of Multi-DBMS and Federated tinue its local operations and at the same Database Systems Scope and Organization of this Paper time participate in a federation. The inte- 1. REFERENCE ARCHITECTURE gration of component DBSs may be man- 1.1 System Components of a Reference aged either by the users of the federation Architecture or by the administrator of the FDBS 1.2 Processor Types in the Reference together with the administrators of the Architecture 1.3 Schema Types in the Reference Architecture component DBSs. The amount of integra- 2. SPECIFIC FEDERATED DATABASE tion depends on the needs of federation SYSTEM ARCHITECTURES users and desires of the administrators 2.1 Loosely Coupled and Tightly Coupled FDBSs of the component DBSs to participate in 2.2 Alternative FDBS Architectures 2.3 Allocating Processors and Schemas the federation and share their databases. to Computers The term federated database system was 2.4 Case Studies coined by Hammer and McLeod [ 19791 and 3. FEDERATED DATABASE SYSTEM Heimbigner and McLeod [1985]. Since its EVOLUTION PROCESS introduction, the term has been used for 3.1 Methodology for Developing a Federated Database System several different but related DBS archi- 4. FEDERATED DATABASE SYSTEM tectures. As explained in this Introduc- DEVELOPMENT TASKS tion, we use the term in its broader con- 4.1 Schema Translation text and include additional architectural 4.2 Access Control 4.3 Negotiation alternatives as examples of the federated 4.4 Schema Integration architecture. 5. FEDERATED DATABASE SYSTEM The concept of federation exists in many OPERATION contexts. Consider two examples from the 5.1 Query Formulation political domain-the United Nations 5.2 Command Transformation 5.3 Query Processing and Optimization (UN) and the Soviet Union. Both entities 5.4 Global Transaction Management exhibit varying levels of autonomy and 6. FUTURE RESEARCH AND UNSOLVED heterogeneity among the components (sov- PROBLEMS ereign nations and the republics, respec- ACKNOWLEDGMENTS REFERENCES tively). The autonomy and heterogeneity is BIBLIOGRAPHY greater in the UN than in the Soviet Union. GLOSSARY The power of the federation body (the Gen- APPENDIX: Features of Some eral Assembly of the UN and the central FDBS/Multi-DBMS Efforts government of the Soviet Union, respec- tively) with respect to its components in the two cases is also different. Just as peo- ple do not agree on an ideal model or the integrated to various degrees. The software utility of a federation for the political that provides controlled and coordinated bodies and the governments, the database manipulation of the component DBSs is context has no single or ideal model of called a federated database management federation. A key characteristic of a feder- system (FDBMS) (see Figure 1). ation, however, is the cooperation among Both databases and DBMSs play impor- independent systems. In terms of an FDBS, tant roles in defining the architecture of an it is reflected by controlled and sometimes FDBS. Component database refers to a da- limited integration of autonomous DBSs. tabase of a component DBS. A component The goal of this survey is to discuss the DBS can participate in more than one fed- application of the federation concept for eration. The DBMS of a component DBS, managing existing heterogeneous and au-

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 185 FDBS FDBMS

. . .

Figure 1. An FDBS and its components.

tonomous DBSs. We describe various ar- tion management strategies) has been pro- chitectural alternatives and components of posed by Elmagarmid [1987]. Such a a federated database system and explore characterization is particularly relevant to the issues related to developing and oper- the study and development of transaction ating such a system. The survey assumes management in FDBMS, an aspect of an understanding of the concepts in basic FDBS that is beyond the scope of this database management textbooks [ Ceri and paper. Pelagatti 1984; Date 1986; Elmasri and Navathe 1989; Tsichritzis and Lochovsky Distribution 19821 such as data models, the ANSI/ SPARC schema architecture, database de- Data may be distributed among multiple sign, query processing and optimization, databases. These databases may be stored transaction management, and distributed on a single computer system or on multiple database management. computer systems, co-located or geograph- ically distributed but interconnected by a Characteristics of Database Systems communication system. Data may be dis- tributed among multiple databases in dif- Systems consisting of multiple DBSs, of ferent ways. These include, in relational which FDBSs are a specific type, may be terms, vertical and horizontal database par- characterized along three orthogonal di- titions. Multiple copies of some or all of the mensions: distribution, heterogeneity, and data may be maintained. These copies need autonomy. These dimensions are discussed not be identically structured. below with an intent to classify and define Benefits of data distribution, such as in- such systems. Another characterization creased availability and reliability as well based on the dimensions of the networking as improved access times, are well known environment [single DBS, many DBSs in a [Ceri and Pelagatti 19841. In a distributed local area network (LAN), many DBSs in DBMS, distribution of data may be in- a wide area network (WAN), many net- duced; that is, the data may be deliberately works], update related functions of partic- distributed to take advantage of these ben- ipating DBSs (e.g., no update, nonatomic efits. In the case of FDBS, much of the updates, atomic updates), and the types of data distribution is due to the existence of heterogeneity (e.g., data models, transac- multiple DBSs before an FDBS is built.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 186 l Amit Sheth and James Larson

derlying data model used to define data DatabaseSystems structures and constraints. Both represen- Differencesin DBMS tation (structure and constraints) and lan- -data models guage aspects can lead to heterogeneity. (structures,constraints, query languages) -system level support l Differences in structure: Different (concurrencycontrol, , recovery) data models provide different structural SemanticHeterogeneity primitives [e.g., the information modeled C using a () in the relational OperatingSystem 0 model may be modeled as a record type -file systems m in the CODASYL model]. If the two rep- -naming, file types, operations m U resentations have the same information -transaction support n content, it is easier to deal with the dif- -interprocess communication I C ferences in the structures. For example, Hardware/System a address can be represented as an entity -instruction set t in one schema and as a composite attri- -data formats8 representation I 0 bute in another schema. If the informa- -configuration n tion content is not the same, it may be very difficult to deal with the difference. Figure 2. Types of heterogeneities. As another example, some data models (notably semantic and object-oriented models) support generalization (and property inheritance) whereas others do not.

Many types of heterogeneity are due to l Differences in constraints: Two data technological differences, for example, dif- models may support different con- ferences in hardware, system software straints. For example, the set type in a (such as operating systems), and commu- CODASYL schema may be partially nication systems. Researchers and devel- modeled as a con- opers have been working on resolving such straint in a relational schema. CODA- heterogeneities for many years. Several SYL, however, supports insertion and commercial distributed DBMSs are avail- retention constraints that are not cap- able that run in heterogeneous hardware tured by the referential integrity con- and system software environments. straint alone. Triggers (or some other The types of heterogeneities in the da- mechanism) must be used in relational tabase systems can be divided into those systems to capture such semantics. due to the differences in DBMSs and those l Differences in query languages: due to the differences in the semantics of Different languages are used to manipu- data (see Figure 2). late data represented in different data models. Even when two DBMSs support the same data model, differences in their Heterogeneities due to Differences in DBMSs query languages (e.g., QUEL and SQL) An enterprise may have multiple DBMSs. or different versions of SQL supported Different organizations within the enter- by two relational DBMSs could contrib- prise may have different requirements and ute to heterogeneity. may select different DBMSs. DBMSs Differences in the system aspects of the purchased over a period of time may be DBMSs also lead to heterogeneity. Exam- different due to changes in technology. Het- ples of system level heterogeneity include erogeneities due to differences in DBMSs differences in transaction management result from differences in data models and primitives and techniques (including differences at the system level. These are , commit protocols, described below. Each DBMS has an un- and recovery), hardware and system

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 187 software requirements, and communication ams between 61 and 75, it may not be capabilities. possible to correlate it to a score in DB2.CLASS.SCORE because both 73 and 77 would have been represented by a score Semantic Heterogeneity of 7.5. Semantic heterogeneity occurs when there Detecting semantic heterogeneity is a is a disagreement about the meaning, inter- difficult problem. Typically, DBMS sche- pretation, or intended use of the same or mas do not provide enough semantics to related data. A recent panel on semantic interpret data consistently. Heterogeneity heterogeneity [Cercone et al. 19901 showed due to differences in data models also con- that this problem is poorly understood and tributes to the difficulty in identifica- that there is not even an agreement regard- tion and resolution of semantic hetero- ing a clear definition of the problem. Two geneity. It is also difficult to decouple examples to illustrate the semantic heter- the heterogeneity due to differences in ogeneity problem follow. DBMSs from those resulting from semantic Consider an attribute MEAL-COST of heterogeneity. relation RESTAURANT in database DBl that describes the average cost of a meal per person in a restaurant without service Autonomy charge and tax. Consider an attribute by the same name (MEAL-COST) of relation The organizational entities that manage BOARDING in database DB2 that de- different DBSs are often autonomous. In scribes the average cost of a meal per per- other words, DBSs are often under separate son including service charge and tax. Let and independent control. Those who con- both attributes have the same syntactic trol a database are often willing to let others properties. Attempting to compare at- share the data only if they retain control. tributes DBl.RESTAURANTS.MEAL- Thus, it is important to understand the COST and DBS.BOARDING.MEAL- aspects of component autonomy and how COST is misleading because they are they can be addressed when a component semantically heterogeneous. Here the DBS participates in an FDBS. heterogeneity is due to differences in A component DBS participating in an the definition (i.e., in the meaning) of FDBS may exhibit several types of auton- related attributes [Litwin and Abdellatif omy. A classification discussed by Veijalai- 19861. nen and Popescu-Zeletin [ 19881 includes As a second example, consider an attri- three types of autonomy: design, commu- nication, and execution. These and an ad- bute GRADE of relation COURSE in database DBl. Let COURSE.GRADE de- ditional type of component autonomy scribe the grade of a student from the set called association autonomy are discussed of values {A, B, C, D, FJ. Consider another below. attribute SCORE of relation CLASS in da- Design autonomy refers to the ability of tabase DB2. Let SCORE denote a normal- a component DBS to choose its own design ized score on the scale of 0 to 10 derived by with respect to any matter, including first dividing the weighted score of all ex- ams on the scale of 0 to 100 in the course (a) The data being managed (i.e., the Uni- and then rounding the result to the nearest verse of Discourse), half-point. DBl.COURSE.GRADE and (b) The representation (data model, query DBB.CLASS.SCORE are semantically het- language) and the naming of the data erogeneous. Here the heterogeneity is due elements, to different precision of the data values (c) The conceptualization or semantic taken by the related attributes. For exam- interpretation of the data (which ple, if grade C in DBl.COURSE.GRADE greatly contributes to the problem of corresponds to a weighted score of all ex- semantic heterogeneity),

ACM Computing Surveys, Vol. 22, No. 3, September 1990 188 l Amit Sheth and James Larson (d) Constraints (e.g., semantic integrity and resources (i.e., the data it manages) constraints and the serializability cri- with others. This includes the ability to teria) used to manage the data, associate or disassociate itself from the fed- (e) The functionality of the system (i.e., eration and the ability of a component DBS the operations supported by system), to participate in one or more federations. (f) The association and sharing with other Association autonomy may be treated as systems (see association autonomy be- a part of the design autonomy or as an low), and autonomy in its own right. Alonso and Barbara [1989] discuss the issues that are The implementation (e.g., record and k) relevant to this type of autonomy. file structures, concurrency control A subset of the above types of autonomy algorithms). were also identified by Heimbigner and Heterogeneity in an FDBS is primarily McLeod [1985]. Du et al. [1990] use the caused by design autonomy among compo- term local autonomy for the autonomy of a nent DBSs. component DBS. They define two types of The next two types of autonomy involve local autonomy requirements: operation the DBMS of a component DBS. Commu- autonomy requirements and service auton- nication autonomy refers to the ability of omy requirements. Operation autonomy re- a component DBMS to decide whether quirements relate to the ability of a to communicate with other component component DBS to exercise control over its DBMSs. A component DBMS with com- database. These include the requirements munication autonomy is able to decide related to design and execution autonomy. when and how it responds to a request from Service autonomy requirements relate to the another component DBMS. right of each component DBS to make de- Execution autonomy refers to the ability cisions regarding the services it provides to of a component DBMS to execute local other component DBSs. These include the operations (commands or transactions sub- requirements related to association and mitted directly by a local user of the com- communication autonomy. Garcia-Molina ponent DBMS) without interference from and Kogan [1988] provide a different clas- external operations (operations submitted sification of the types of autonomy. Their by other component DBMSs or FDBMSs) classification is particularly relevant to the and to decide the order in which to execute operating system and transaction manage- external operations. Thus, an external sys- ment issues. tem (e.g., FDBMS) cannot enforce an order The need to maintain the autonomy of of execution of the commands on a com- component DBSs and the need to share ponent DBMS with execution autonomy. data often present conflicting require- Execution autonomy implies that a com- ments. In many practical environments, it ponent DBMS can abort any operation that may not be desirable to support the auton- does not meet its local constraints and that omy of component DBSs fully. Two exam- its local operations are logically unaffected ples of relaxing the component autonomy by its participation in an FDBS. Further- follow: more, the component DBMS does not need to inform an external system of the order l Association autonomy requires that each in which external operations are executed component DBS be free to associate or and the order of an external operation with disassociate itself from the federation. respect to local operations. Operationally, This would require that the FDBS be a component DBMS exercises its execution designed so that its existence and opera- autonomy by treating external operations tion are not dependent on any single in the same way as local operations. component DBS. Although this may be a Association autonomy implies that a com- desirable design goal, the FDBS may ponent DBS has the ability to decide moderate it by requiring that the entry whether and how much to share its func- or departure of a component DBS must tionality (i.e., the operations it supports) be based on an agreement between the

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 189 federation (i.e., its representative entity Different architectures and types of such as the administrator of the FDBS) FDBSs are created by different levels of and the component DBS (i.e., the admin- integration of the component DBSs and by istrator of a component DBS) and cannot different levels of global (federation) serv- be a unilateral decision of the component ices. We will use the taxonomy shown in DBS. Figure 3 to compare the architectures of l Execution autonomy allows a component various research and development efforts. DBS to decide the order in which exter- This taxonomy focuses on the autonomy nal and local operations are performed. dimension. Other taxonomies are possible Futhermore, the component DBS need by focusing on the distribution and heter- not inform the external system (e.g., ogeneity dimensions. Some recent publica- FDBS) of this order. This latter aspect tions discussing various architectures or of autonomy may, however, be relaxed by different taxonomies include Eliassen and informing the FDBS of the order of Veijalainen [ 19881, Litwin and Zeroual transaction execution (or transaction [ 19881, Ozsu and Valduriez [ 19901, and wait-for graph) to allow simpler and Ram and Chastain [ 19891. more efficient management of global MDBSs can be classified into two types transactions. based on the autonomy of the component DBSs: nonfederated database systems and federated database systems. A nonfederated Taxonomy of Multi-DBMS and Federated database system is an integration of com- Database Systems ponent DBMSs that are not autonomous. A DBS may be either centralized or distrib- It has only one level of management,2 and uted. A centralized DBS system consists of all operations are performed uniformly. In a single centralized DBMS managing a sin- contrast to a federated database system, a gle database on the same computer system. nonfederated database system does not dis- A distributed DBS consists of a single dis- tinguish local and nonlocal users. A partic- tributed DBMS managing multiple data- ular type of nonfederated database system bases. The databases may reside on a single in which all databases are fully integrated computer system or on multiple computer to provide a single global (sometimes called systems that may differ in hardware, sys- enterprise or corporate) schema can be tem software, and communication support. called a unified MDBS. It logically appears A multidatabase system (MDBS) supports to its users like a distributed DBS. operations on multiple component DBSs. A federated database system consists of Each component DBS is managed by (per- component DBSs that are autonomous yet haps a different) component DBMS. A participate in a federation to allow partial component DBS in an MDBS may be cen- and controlled sharing of their data. Asso- tralized or distributed and may reside on ciation autonomy implies that the compo- the same computer or on multiple com- nent DBSs have control over the data they puters connected by a communication sub- manage. They cooperate to allow different system. An MDBS is called a homogeneous degrees of integration. There is no central- MDBS if the DBMSs of all component ized control in a federated architecture be- DBSs are the same; otherwise it is called a cause the component DBSs (and their heterogeneous MDBS. A system that only database administrators) control access to allows periodic, nontransaction-based ex- their data. change of data among multiple DBMSs FDBS represents a compromise between (e.g., EXTRACT [Hammer and Timmer- no integration (in which users must explic- man 19891) or one that only provides access itly interface with multiple autonomous da- to multiple DBMSs one at a time (e.g., no tabases) and total integration (in which joins across two databases) is not called an MDBS. The former is a data exchange sys- * This definition may be diluted to include two levels tem; the latter is a remote DBMS interface of management, where the global level has the author- [Sheth 1987a]. ity for controlling data sharing.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 190 l Amit Sheth and James Larson Multidatabase will consist of heterogeneous component Systems DBSs. In the rest of this paper, we will use the term FDBS to describe a heterogeneous distributed DBS with autonomy of compo- nent DBSs. Nonfederated Federated FDBSs can be categorized as loosely DatabaseSystems DatabaseSystems coupled or tightly coupled based on who e.g., UNIBASE /\ manages the federation and how the com- [Brzezinskiet 784 \ ponents are integrated. An FDBS is loosely coupled if it is the user’s responsibility to create and maintain the federation and Loosely Coupled Tightly Coupled there is no control enforced by the feder- e.g., MRDSM ated system and its administrators. Other [Litwin 19851 terms used for loosely coupled FDBSs are interoperable database system [Litwin and /\ Abdellatif 19861 and multidatabase system Single Multiple [Litwin et al. 1982].3 A federation is tightly Federation Fedsrations coupled if the federation and its adminis- e.g., DDTS e.g., Mermaid trator(s) have the responsibility for creat- [Dwyer and Larson19871 [Templetonet al. 1987a] ing and maintaining the federation and Figure 3. Taxonomy of multidatabase systems. actively control the access to component DBSs. Association autonomy dictates that, in both cases, sharing of any part of a autonomy of each component DBS is sac- component database or invoking a capabil- rificed so that users can access data through ity (i.e., an operation) of a component DBS a single global interface but cannot directly is controlled by the administrator of the access a DBMS as a local user). The fed- component DBS. erated architecture is well suited for mi- A federation is built by a selective and grating a set of autonomous and stand- controlled integration of its components. alone DBSs (i.e., DBSs that are not sharing The activity of developing an FDBS results data) to a system that allows partial and in creating a federated schema upon which controlled sharing of data without affecting operations (i.e., query and/or updates) are existing applications (and hence preserving performed. A loosely coupled FDBS always significant investment in existing applica- supports multiple federated schemas. A tion software). tightly coupled FDBS may have one or To allow controlled sharing while pre- more federated schemas. A tightly coupled serving the autonomy of component DBSs FDBS is said to have single federation if it and continued execution of existing appli- allows the creation and management of cations, an FDBS supports two types of only one federated schema.* Having a single operations: local and global (or federation). This dichotomy of local and global opera- 3 The term multidatabase has been used by different people to mean different things. For example, Litwin tions is an essential feature of an FDBS. [1985] and Rusinkiewicz et al. [1989] use the term Global operations involve data access using multidatabase to mean loosely coupled FDBS (or in- the FDBMS and may involve data managed teroperable system) in our taxonomy; Ellinghaus et al. by multiple component DBSs. Component [1988] and Veijalainen and Popescu-Zeletin [1988] use it to mean client-server type of FDBS in our taxon- DBSs must grant permission to access the omy; and Dayal and Hwang [1984], Belcastro et al. data they manage. Local operations are [1988], and Breitbart and Silberschatz [1988] use it to submitted to a component DBS directly. mean tightly coupled FDBS in our taxonomy. They involve only data in that component 4 Note that a tightly coupled FDBS with a single DBS. A component DBS, however, does not federated schema is not the same as a unified MDBS but is a special case of the latter. It espouses the need to distinguish between local and global federation concepts such as autonomy of component operations. In moSt environment% the DBMS~, dichotomy of operations, and controlled FDBS will also be heterogeneous, that is, sharing that a unified MDBS does not.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 191 federated schema helps in maintaining uni- A type of FDBS architecture called the formity in semantic interpretation of the client-server architecture has been dis- integrated data. A tightly coupled FDBS is cussed by Ge et al. [ 19871 and Eliassen and said to have multiple federations if it allows Veijalainen [1988]. In such a system, there the creation and management of multiple is an explicit contract between a client and federated schemas. Having multiple feder- one or more servers for exchanging infor- ated schemas readily allows multiple inte- mation through predefined transactions. A grations of component DBSs. Constraints client-server system typically does not al- involving multiple component DBS, how- low ad hoc transactions because the server ever, may be difficult to enforce. An orga- is designed to respond to a set of predefined nization wanting to exercise tight control requests. The schema architecture of a over the data (treated as a corporate re- client-server system is usually quite simple. source) and the enforcement of constraints The schema of each server is directly (including the so-called business rules) may mapped to the schema of the client. Thus choose to allow only one federated schema. the client-server architecture can be con- The terms federated database system and sidered to be a tightly coupled one for federated database architecture were intro- FDBS with multiple federations. duced by Heimbigner and McLeod [1985] to mean “collection of components to unite loosely coupled federation in order to share Scope and Organization of this Paper and exchange information” and “an orga- Issues involved in managing an FDBS deal nization model based on equal, autonomous with distribution, heterogeneity, and au- databases, with sharing controlled by ex- tonomy. Issues related to distribution have plicit interfaces.” The multidatabase archi- been addressed in past research and devel- tecture of Litwin et al. [1982] shares many opment efforts on distributed DBMSs. We features of the above architecture. These will concentrate on the issues of autonomy definitions include what we have defined as and heterogeneity. Recent surveys on the loosely coupled FDBSs. The key FDBS related topics include Barker and Ozsu concepts, however, are autonomy of com- [1988]; Litwin and Zeroual [1988]; Ram ponents, and partial and controlled sharing and Chastain [ 19891, and Siegel [1987]. of data. These can also be supported when The remainder of this paper is organized the components are tightly coupled. Hence as follows. In Section 1 we discuss a refer- we include both loosely and tightly coupled ence architecture for DBSs. Two types of FDBSs in our definition of FDBSs. system components-processors and sche- MRDSM [Litwin 19851, OMNIBASE mas-are particularly applicable to FDBSs. [Rusinkiewicz et al. 19891, and CALIDA In Section 2 we use the processors and [Jacobson et al. 19881 are examples of schemas to define various FDBS architec- loosely coupled FDBSs. In CALIDA, fed- tures. In Section 3 we discuss the phases in erated schemas are generated by a database an FDBS evolution process. We also dis- administrator rather than users as’in other cuss a methodology for developing a tightly loosely coupled FDBSs. Users must be rel- coupled FDBS with multiple federations. atively sophisticated in other loosely cou- In Section 4 we discuss four important pled FDBSs to be able to define schemas/ tasks in developing an FDBS: schema views over multiple component DBSs. translation, access control, negotiation, and SIRIUS-DELTA [Litwin et al. 19821 and schema integration. In Section 5 we discuss DDTS [Dwyer and Larson 19871 can be four tasks relevant to operating an FDBS: categorized as tightly coupled FDBSs with query formulation, command transforma- single federation. Mermaide [Templeton tion, query processing and optimization, et al. 1987131and Multibase [Landers and and transaction management. Section 6 Rosenberg 19821 are examples of tightly summarizes and discusses issues that need coupled FDBSs with multiple federations. further research and development. The paper ends with references, a comprehen- @Mermaid is a trademark of Unisys Corporation. sive bibliography, a glossary of the terms

ACM Computing Surveys, Vol. 22, No. 3, September 1990 192 l Amit Sheth and James Larson used throughout this paper, and an appen- structure descriptions) (e.g., table defi- dix comparing some features of relevant nitions in a ), and entity prototype efforts. types and relationship types in the entity-relationship model.

1. REFERENCE ARCHITECTURE l Mappings: Mappings are functions that correlate the schema objects in one A reference architecture is necessary to schema to the schema objects in another clarify the various issues and choices within schema. a DBS. Each component of the reference architecture deals with one of the impor- These basic components can be com- tant issues of a database system, federated bined in different ways to produce different or otherwise, and allows us to ignore details data management architectures. Figure 4 irrelevant to that issue. We can concentrate illustrates the iconic symbols used for each on a small number of issues at a time by of these basic components. The reasons for analyzing a single component. A reference choosing these components are as follows: architecture provides the framework in l Most centralized, distributed, and feder- which to understand, categorize, and com- ated database systems can be expressed pare different architectural options for de- using these basic components. veloping federated database systems. Section 1.1 discusses the basic system com- l These components hide many of the ponents of a reference architecture. Section implementation details that are not 1.2 discusses various types of processors relevant to understanding the im- and the operations they perform on com- portant differences among alternate mands and data. Section 1.3 discusses a architectures. schema architecture of a reference archi- Two basic components, processors and tecture. Other reference architectures de- schemas, play especially important roles scribed in the literature include Blakey in defining various architectures. The pro- [ 19871, Gligor and Luckenbaugh [ 19841, cessors are application-independent soft- and Larson [ 19891. ware modules of a DBMS. Schemas are application-specific components that de- 1.1 System Components of a Reference fine database contents and structure. They Architecture are developed by the organizations to which the users belong. Users of a DBS include A reference architecture consists of various both persons performing ad hoc operations system components. Basic types of system and application programs. components in our reference architecture are as follows: 1.2 Processor Types in the Reference Data: Data are the basic facts and in- Architecture formation managed by a DBS. Database: A database is a repository of Data management architectures differ in data structured according to a data the types of processors present and the model. relationships among those processors. There are four types of processors, each Commands: Commands are requests performing different functions on data ma- for specific actions that are either entered nipulation commands and accessed data: by a user or generated by a processor. transforming processors, filtering proces- Processors: Processors are software sors, constructing processors, and accessing modules that manipulate commands and processors. Each of the processor types is data. discussed below. Schemas: Schemas are descriptions of data managed by one or more DBMSs. A 1.2.1 Transforming Processor schema consists of schema objects and their interrelationships. Schema objects Transforming processors translate com- are typically class definitions (or data mands from one language, called source

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 193

Component Icon (with [Onuegbe et al. 1983; Zaniolo 19791, Example) allowing a CODASYL DBS to be proc- Type essed using SQL commands.

l A program generator that translates SQL commands into equivalent COBOL pro- Processor grams allowing a file system to be proc- essed using SQL commands. Command For some command-transforming pro- cessors, there may exist companion data- transforming processors that convert data produced by the transformed commands Data <--ii-> into data compatible with the commands in the source format. For example, a data- transforming processor that is the com- panion to the above SQL-to-CODASYL Schema command-transforming processor is a table builder that accepts individual database records produced by the CODASYL DBMS and builds complete tables for display to Information the SQL user. Mapping Figure 5(a) illustrates a pair of compan- ion transforming processors. Using infor- mation from schema A, schema B, and the mappings between them, the command- transforming processor converts com- Database mands expressed using schema A’s descrip- tion into commands expressed using schema B’s description. Using the same information, the companion data- transforming processor transforms data Figure 4. Basic system components of the data man- agement reference architecture. described using schema B’s description into data described using schema A’s description. To perform these transformations, a language, to another language, called target transforming processor needs mappings be- language, or transform data from one tween the objects of each schema. The task format (source format) to another format of schema translation involves transform- (target format). Transforming processors ing a schema (schema A) describing data in provide a type of data independence called one data model into an equivalent schema data model transparency in which the data (schema B) describing the same data in a structures and commands used by one pro- different data model. This task also gener- cessor are hidden from other processors. ates the mappings that correlate the Data model transparency hides the dif- schema objects in one schema (schema B) ferences in query languages and data for- to the schema objects in another schema mats. For example, the data structures (schema A). The task of command transfor- used by one processor can be modified to mation entails using these mappings to improve overall efficiency without requiring translate commands involving the schema changes to other processors. Examples of objects of one schema (schema B) into com- command-transforming processors include mands involving the schema objects of the the following: other schema (schema A). The schema l A command transformer that trans- translation problem and the command lates SQL commands into CODASYL transformation problem are further dis- data manipulation language commands cussed in Sections 4.1 and 5.2, respectively.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 194 . Amit Sheth and James Larson

CASchema B (b)

Figure5. Transforming processors. (a) A pair of companion transforming processors. (b) An abstract transforming processor.

Mappings are associated with a trans- command-transforming processor and data forming processor in one of two ways. In converter pair. the first case, the mappings are encoded into the transforming processor’s logic, making the transforming processor specific 1.2.2 Filtering Processor to the schemas. Alternatively, the map- Filtering processors constrain the com- pings are stored in a separate data structure mands and associated data that can be and accessed by the transforming processor passed to another processor. Associated when converting commands and data. This with each filtering processor are mappings is a more general approach. It may also be that describe the constraints on commands possible to generate a transforming proces- and data. These constraints may either be sor for transforming specific commands embedded into the code of the filtering or data automatically. For example, an processor or be specified in a separate data SQL-to-COBOL program generator might structure. Examples of filtering processors generate a specific data-transforming pro- include the following: cessor, the generated COBOL program, that converts data to the required form. Syntactic constraint checker, which For the remainder of this paper we will checks commands to verify that they are illustrate a command-transforming proces- syntactically correct. sor and data converter pair as a single Semantic integrity constraint checker, transforming processor as illustrated in which performs one or more of the follow- Figure 4(b). This higher-level abstraction ing functions: (a) checks commands to enables us to hide the differences between verify that they will not violate semantic a single data-transforming processor, a sin- integrity constraints, (b) modifies com- gle command-transforming processor, or a mands in such a manner that when the

ACM Computing Surveys, Vol. 22, No. 3, September 1990 CommandFiltering I Processor

the Data Structures

(4

(b)

Figure 6. Filtering processors. (a) A pair of companion filtering processors. (b) An abstract filtering processor.

commands are interpreted, semantic in- than one way to translate an update com- tegrity constraints will automatically be mand. We do not discuss the update enforced, or (c) verifies that data pro- task in more detail because we feel that a duced by another processor does not vi- loosely coupled FDBS is not well suited to olate any semantic integrity constraint. support updates, and solving this problem l Access controller, which verifies that the in a tightly coupled FDBS is very similar user is permitted to perform the com- to solving it in a centralized or distributed mand on the indicated data or verifies DBMS [Sheth et al. 1988a]. that the user is permitted to use data produced by another processor. 1.2.3 Constructing Processor Figure 6(a) illustrates two filtering pro- Constructing processors and/or cessors, one that controls commands and replicate an operation submitted by a single one that controls data. Again, we will ab- processor into operations that are accepted stract command- and data-filtering proces- by two or more other processors. Construct- sors into a single filtering processor as ing processors also merge data produced by illustrated in Figure 6(b). several processors into a single data set for An important task that may be solved by consumption by another single processor. a filtering processor is that of view update. They can support location, distribution, This task occurs when the differences in and replication transparencies because a data structures between the view and the processor submitting a command does not schema is such that there may be more need to know the location, distribution, and

ACM Computing Surveys, Vol. 22, No. 3, September 1990 196 . Amit Sheth and James Larson

iYGzA /Data Exoressed\

(2Schema A

(b) Figure 7. Constructing processors. (a) A pair of constructing processors. (b) An abstract constructing processor. number of processors participating in pro- of companion constructing processors. Us- cessing that command. ing information from schema A, schema B, Tasks that can be handled by construct- schema C, and the mappings from schema ing processors include the following: A to schemas B and C, the command de- composer uses the commands expressed us- Schema integration: Integrating mul- ing the schema A objects to generate the tiple schemes into a single schema commands using the objects in schemas B Negotiation: Determining what proto- and C. Schema A is an integrated schema col should be used among the owners of that contains a description of all or parts various schemas to be integrated in de- of the data described by schemas B and C. termining the contents of an integrated Using the same information, the data schema merger generates data in the format of Query (command) decomposition schema A objects from data in the formats and optimization: Decomposing and of the objects in schemas B and C. optimizing a query (command) expressed Again we will abstract the command par- on an integrated schema titioner and data merger pair into a single constructing processor as illustrated in Global transaction management: Performing the concurrency and atomic- Figure 7(b). ity control 1.2.4 Accessing Processor These issues are further discussed in Sec- An accessing processor accepts commands tions 4 and 5. Figure 7(a) illustrates a pair and produces data by executing the

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l commands against a database. It may ac- cept commands from several processors and interleave the processing of those com- mands. Examples of accessing processors include the following: l A file management system that executes access procedures against stored file l A special application program that ac- cepts commands and generates data to be returned to the processor generating the Figure 8. Accessing processor. commands l A data manager of a DBMS containing data access methods those structures. It is an attempt to de- l A dictionary manager that manages ac- scribe all data of interest to an enterprise. cess to dictionary data In the context of the ANSI/X3/SPARC architecture, it is a as Figure 8 illustrates an accessing processor expressed in the data definition language that accepts data manipulation commands of a centralized DBMS. The internal and uses access methods to retrieve data schema describes physical characteristics of from the database. the logical data structures in the conceptual Issues that are addressed by accessing schema. These characteristics include in- processors include local concurrency con- formation about the placement of records trol, commitment, backup, and recovery. on physical storage devices, the placement These problems and their solutions are ex- and type of indexes and physical represen- tensively discussed in the literature for cen- tation of relationships between logical rec- tralized and distributed DBMSs. Some of ords. Much of the description in the the issues of adapting these problems to internal schema can be changed without deal with heterogeneity and autonomy in having to change the conceptual schema. the FDBSs are discussed in Section 5.4. By making changes to the description in the internal schema and making the cor- 1.3 Schema Types in the Reference responding changes to the data in the da- Architecture tabase, it is possible to change the physical In this section, we first review the standard representation without changing any appli- three-level schema architecture for central- cation program source code. Thus it is ized DBMSs. We then extend it to a five- possible to fine tune the physical represen- level architecture that addresses the tation of data and optimize the perfor- requirements of dealing with distribution, mance of the DBMS in providing database autonomy, and heterogeneity in an FDBS. access for selected applications. Most users do not require access to all of the data in a database. Thus they do not 1.3.1 ANSIISPARC Three-Level Schema require access to all of the schema objects Architecture in the conceptual schema. Each user or The ANSI/X3/SPARC Study Group on class of users may require access to only a Database Systems outlined a three-level portion of the database. The subset of the data description architecture [Tsichritzis database that may be accessed by a user or and Klug 19781. The three levels of data a class of users is described by an external description are the conceptual schema, the schema. Because different users may need internal schema, and the external schema. access to different portions of the database, A conceptual schema describes the con- each user or a class of users may require a ceptual or logical data structures (i.e., the separate external schema. schema consists of objects that provide a In terms of the above constructs, filtering conceptual- or logical-level description of processors use the information in the ex- the database) and the relationships among ternal schemas to control what data can be

ACM Computing Surveys, Vol. 22, No. 3, September 1990 198 . Amit Sheth and James Larson

Filtering Processor n

Transforming Processor

Internal

mAccessing Processor

Figure 9. System architecture of a centralized DBMS. accessed by which users. A transforming chitecture for federated systems shown in processor translates commands expressed Figure 10. A system architecture consisting using the conceptual schema objects into of both processors and schemas of an FDBS commands using the internal schema ob- is shown in Figure 11. jects. An accessing processor executes the The five-level schema architecture of an commands to retrieve data from the phys- FDBS includes the following: ical media. A system architecture consist- ing of both processors and schemas of a Local Schema: A local schema is the con- centralized DBS is shown in Figure 9. ceptual schema of a component DBS. A local schema is expressed in the native data model of the component DBMS, and hence 1.3.2 A Five-Level Schema Architecture for different local schemas may be expressed Federated Databases in different data models. The three-level schema architecture is ad- Component Schema: A component equate for describing the architecture of a schema is derived by translating local sche- centralized DBMS. It, however, is inade- mas into a data model called the canonical quate for describing the architecture of an or common data model (CDM) of the FDBS. FDBS. The three-level schema must be ex- Two reasons for defining component sche- tended to support the three dimensions of mas in a CDM are (1) they describe the a federated database system-distribution, divergent local schemas using a single rep- heterogeneity, and autonomy. Examples of resentation and (2) semantics that are extended schema architectures include a missing in a local schema can be added to four-level schema architecture in Mermaid its component schema. Thus they facilitate [Templeton et al. 1987131,five-level schema negotiation and integration tasks per- architectures in DDTS [Devor et al. 1982b] formed when developing a tightly coupled and SIRIUS-DELTA [Litwin et al. 19821, FDBS. Similarly, they facilitate negotia- and others [Blakey 1987; Ram and tion and specification of views and multi- Chastain 19891. We have adapted these database queries in a loosely coupled architectures for our five-level schema ar- FDBS.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems . 199

I Local bb b Schema

Figure 10. Five-level schema architecture of an FDBS.

I onstructinq Processor onstructing Processor onstructina Processor

Filtering Processor Filtering Processor Filtering Processor (F) (F) (Campon:nt) Transforming Processor Transforming ProcessorI)’ Figure 11. System architecture for an FDBS.

The process of schema translation from schema objects. Transforming processors a local schema to a component schema use these mappings to transform com- generates the mappings between the com- mands on a component schema into com- ponent schema objects and the local mands on the corresponding local schema.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 200 l Amit Sheth and James Larson

Such transforming processors and the com- similar to that of federated schema is rep- ponent schemas support the heterogeneity resented by the terms import schema feature of an FDBS. [Heimbigner and McLeod 19851, global schema [Landers and Rosenberg 1982J, Export Schema: Not all data of a com- global conceptual schema [Litwin et al. ponent DBS may be available to the fed- 19821, unified schema, and enterprise eration and its users. An export schema schema, although the terms other than im- represents a subset of a component schema port schemas are usually used when there that is available to the FDBS. It may in- is only one such schema in the system. clude access control information regarding its use by specific federation users. The External Schema: An external schema purpose of defining export schemas is to defines a schema for a user and/or appli- facilitate control and management of asso- cation or a class of users/applications. Rea- ciation autonomy. A filtering processor can sons for the use of external schemas are as be used to provide the access control as follows: specified in an export schema by limiting the set of allowable operations that can be l Customization: A federated schema submitted on the corresponding component can be quite large, complex, and difficult schema. Such filtering processors and the to change. An external schema can be export schemas support the autonomy fea- used to specify a subset of information in ture of an FDBS. a federated schema that is relevant to the Alternatively, the data available to the users of the external schema. They can FDBS can be defined as the transactions be changed more readily to meet chang- that can be executed by a component DBS ing users’ needs. The data model for an (e.g., [Ge et al. 1987; Heimbigner and external schema may be different than McLeod 1985; Veijalainen and Popescu- that of the federated schema. Zeletin 19881). In this paper, however, we Additional integrity constraints: will not consider that case of exporting Additional integrity constraints can also transactions. be specified in the external schema. Access control: Export schemas pro- Federated Schema: A federated schema vide access control with respect to the is an integration of multiple export sche- data managed by the component data- mas. A federated schema also includes the bases. Similarly, external schemas pro- information on data distribution that is vide access control with respect to the generated when integrating export sche- data managed by the FDBS. mas. Some systems use a separate schema called a distribution schema or an allocation A filtering process analyzes the com- schema to contain this information. A con- mands on an external schema to ensure structing processor transforms commands their conformance with access control and on the federated schema into the com- integrity constraints of the federated mands on one or more export schemas. schema. If an external schema is in a dif- Constructing processors and the federated ferent data model from that of the federated schemas support the distribution feature of schema, a transforming processor is also an FDBS. needed to transform commands on the ex- There may be multiple federated sche- ternal schema into commands on the fed- mas in an FDBS, one for each class of erated schema. federation users. A class of federation users Most existing prototype FDBSs support is a group of users and/or applications per- only one data model for all the external forming a related set of activities. For ex- schemas and one interface. ample, in a corporate environment, all Exceptions are a version of Mermaid that managers may be one class of federation supported two query language interfaces, users, and all employees and applications SQL and ARIEL, and a version of DDTS in the accounting department may be an- that supported SQL and GORDAS (a other class of federation users. A concept query language for an extended ER model).

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems 201

Future systems are likely to provide ing local schema. The additional semantics more support for multimode1 external are supplied by the FDBS developer during schemas and multiquery language interfaces the schema design, integration, and trans- [Cardenas 1987; Kim 19891. lation processes. The five-level schema architecture Besides adding to the levels in the presented above has several possible schema architecture, heterogeneity and au- redundancies. tonomy requirements may also dictate changes in the content of a schema. For Redundancy between external and example, if an FDBS has multiple hetero- federated schemas: External schemas geneous DBMSs providing different data can be considered redundant with feder- management capabilities, a component ated schemas since a federated schema schema should contain information on the could be generated for every different operations supported by a component federation user. This is the case in the DBMS. schema architecture of Heimbigner and An FDBS may be required to support McLeod [ 19851 (they use the term import local and external schemas expressed in schema rather than federated schema). In different data models. To facilitate their loosely coupled FDBSs, a user defines the design, integration, and maintenance, how- federated schema by integrating export ever, all component, export, and federated schemas. Thus there is usually no need schemas should be in the same data model. for an additional level. In tightly coupled This data model is called canonical or com- FDBSs, however, it may be desirable to mon data model (CDM). A language asso- generate a few federated schemas for ciated with the CDM is called an internal widely different classes of users and to command language. All commands on fed- customize these further by defining ex- erated, export, and component schemas are ternal schemas. Such external schemas expressed using this internal command can also provide additional access language. control. Database design and integration is a Redundancy between an external complex process involving not only the schema of a component DBS and an structure of the data stored in the databases export schema: If a component DBMS but also the semantics (i.e., the meaning supports proper access control security and use) of the data. Thus it is desirable to features for its external schemas and if use a high-level, semantic data model [Hull translating a local schema into a compo- and King 1987; Peckham and Maryanski nent schema is not required (e.g., the data 19881 for the CDM. Using concepts from model of the component DBMS is the object-oriented programming along with a same as CDM of the FDBS), then the semantic data model may also be appropri- external schemas of a component DBS ate for use as a CDM [Kaul et al. 19901. may be used as an export schema in the Although many existing FDBS prototypes five-level schema architecture (external use some form of the relational model as schemas of component DBSs are not the CDM (Appendix), we believe that fu- shown in the five-level schema architec- ture systems are more likely to use a se- ture of Figure 10). mantic data model or a combination of an Redundancy between component object-oriented model and a semantic data schemas and local schemas: When model. Most of the semantic data models component DBSs uses CDM of the will adequately meet requirements of a FDBS and have the same functionality, CDM, and the choice of a particular one is it is unnecessary to define component likely to be subjective. Because a CDM schemas. using a semantic data model may provide richer semantic constructs than the data Figure 12 shows an example in which models used to express the local schemas, some of the schema levels are not used. No the component schema may contain more external schemas are defined over Feder- semantic information than the correspond- ated Schema 2 (all of it is presented to all

ACM Computing Surveys, Vol. 22, No. 3, September 1990 202 l Amit Sheth and James Larson

Figure 12. Example FDBS schemas with missing schemas at some levels.

federation users using it). Component in one type of schema may also vary from Schema 2 is the same as the Local Schema those in another type of schema. For ex- 2 (the data model of the Component DBMS ample, a federated schema may have 2 is the same as the CDM). No export schema objects describing the capabilities schema is defined over Component Schema of the various component DBMSs in the 3 (all of it is exported to the FDBS). system, whereas no such objects exist in An important type of information asso- the local schemas. ciated with all FDBS schemas is the map- Two important features of the schema pings. These correlate schema objects at architecture are how autonomy is preserved one level with the schema objects at the and how access control is managed. These next lower level of the architecture. Thus, involve exercising control over schemas at there are mappings from each external different levels. Two types of administra- schema to the federated schema over which tive individuals are involved in developing, it is defined. Similarly, there are mappings controlling, and managing an FDBS: from each federated schema to all of the export schemas that define it. The map- l A component DBS administrator (com- pings may either be stored as a part of the ponent DBA) manages a component schema information or as distinct objects DBS. There is one component DBA5 for within the FDBS (which each component DBS. The local, com- also stores schemas). The amount of dic- ponent, and export schemas are con- tionary information needed to describe a trolled by the component DBAs of the schema object in one type of schema may respective component DBSs. A key man- be different from that needed for another agement function of a component DBA type of schema. For example, the descrip- tion of an entity type in a federated schema may include the names of the users that ’ Here a database administrator is a logical entity. In reality, multiple authorized individuals may play the can access it, whereas such information is role of a single (logical) DBA, or the same individual not stored for an entity type in a compo- may play the role of the component DBA for multiple nent schema. The types of schema objects component DBSs.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 203 is to define the export schemas that spec- mas” in Heimbigner and McLeod [1985]), ify the access rights of federation users defining a view using a set of operators to access different data in the component (e.g., defining “superviews” in Motro databases. and Buneman [1981]), or defining a view . A federation DBA defines and manages a using a query in a multidatabase lan- federated schema and the external sche- guage ([Czejdo et al. 1987; Litwin and mas related to the federated schema. Abdellatif 19861; see Section 5.1). In a There can be one federation DBA for tightly coupled FDBS, it takes the form of each federated schema or one federation schema integration ([Batini et al. 19861; see DBA for the entire FDBS. Each federa- Section 4.4). tion DBA in a tightly coupled FDBS is a A typical process of developing federated specially authorized system administra- schemas in a loosely coupled FDBS is as tor and is not a federation user. In a follows. Each federation user is the admin- loosely coupled FDBS, federated schemas istrator of his or her own federated schema. are defined and maintained by the users, First, a federation user looks at the avail- not by the system-assigned federation able set of export schemas to determine DBA. This is further discussed in Sec- which ones describe data he or she would tion 2.1. like to access. Next, the federation user defines a federated schema by importing 2. SPECIFIC FEDERATED DATABASE the export schema objects by using a user SYSTEM ARCHITECTURES interface or an application program or by defining a multidatabase language query The architecture of an FDBS is primarily that references export schema objects. The determined by which schemas are present, user is responsible for understanding the how they are arranged, and how they are semantics of the objects in the export sche- constructed. In this section, we begin by mas and resolving the DBMS and semantic discussing the loosely coupled and tightly heterogeneity. In some cases, component coupled architectures of our taxonomy in DBMS dictionaries and/or the federated additional detail. Then we discuss how sev- DBMS dictionary may be consulted for ad- eral alternate architectures can be derived ditional information. Finally, the federated from the five-level schema architecture by schema is named and stored under account inserting additional basic components, re- of the federation user who is its owner. It moving all basic components of a specific can be referenced or deleted at any time by type, and arranging the components of the that federation user. five-level schema architecture in different A typical scenario for the administration ways. We then discuss assignment of com- of a tightly coupled FDBS is as follows. For ponents to computers. Finally, we briefly simplicity, we assume single (logical) fed- discuss four case studies. eration DBA for the entire tightly coupled FDBS. Export schemas are created by ne- 2.1 Loosely Coupled and Tightly Coupled gotiation between a component DBA and FDBSs the federation DBA; the component DBA has authority or control over what is in- With the background of Section 1, we dis- cluded in the export schemas. The federa- cuss distinctions between the loosely cou- pled and tightly coupled FDBSs in more tion DBA is usually allowed to read the detail. component schemas to help determine what data are available and where they are located and then negotiate for their access. 2.1.1 Creation and Administration of Federated The federation DBA creates and controls Schemas the federated schemas. External schemas The process of creating a federated schema are created by negotiation between a fed- takes different forms. In a loosely coupled eration user (or a class of federation users) FDBS, it typically takes the form of schema and the federation DBA who has the importation (e.g., defining “import sche- authority over what is included in each

ACM Computing Surveys, Vol. 22, No. 3, September 1990 204 l Amit Sheth and James Larson

external schema. It may be possible to in- eration DBA [Litwin and Abdellatif stitute detailed and well-defined negotia- 19861. tion protocols as well as business rules (or some types of constraints) for creating, An example of multiple semantics is as modifying, and maintaining the federated follows. Suppose that there are two export schemas. schemas, each containing the entity SHOE. Based on how often the federated sche- The colors of SHOE in one component mas are created and maintained as well as schema, schemal, are brown, tan, cream, on their stability, an FDBS may be termed white, and black. The colors of SHOE in dynamic or static. Properties of a dynamic the other component schema, schema2, are FDBS are as follows: (a) A federated brown, tan, white, and black. Users defin- schema can be promptly created and ing different federated schemas may define dropped; (b) there is no predetermined pro- different mappings that are relevant to cess for controlling the creation of a feder- their applications. For example, ated schema. As described above, defining a federated schema in a loosely coupled l User1 maps cream in his federated sche- FDBS is like creating a view over the sche- mas to cream in schema1 and tan in mas of the component DBSs. Since such a schema2, federated schema may be managed on the l User2 maps cream in her federated fly (created, changed, dropped easily) by a schema to tan or cream in schema1 and user, loosely coupled FDBSs are dynamic. tan or white in schema2. A tightly coupled federation is almost al- ways static because creating a federated Proponents of the loosely coupled archi- schema is like database schema integration. tecture argue that a federated schema cre- A federated schema in a tightly coupled ated and maintained by a single federation FDBS evolves gradually and in a more con- DBA is utopian and totalitarian in nature trolled fashion. [Litwin 1987; Rusinkiewicz 19871. We feel that a loosely coupled approach may be better suited for integrating a large number 2.1.2 Case for Loosely Coupled FDBS of very autonomous read only databases A loosely coupled FDBS provides an inter- accessible over communication networks face to deal with multiple component (e.g., public databases of the types dis- DBMSs directly. A typical way to formulate cussed by Litwin and Abdellatif [ 19861). queries is to use a multidatabase language User management of federated schemas (see Section 5.1). This architecture has the means that the FDBMS can do little to - following advantages: optimize queries. In most cases, however, the users are free to use their own under- l A user can precisely specify relationships standing of the component DBSs to design and mappings among objects in the ex- a federated schema and to specify queries port schema. This is desirable when the to achieve good performance. federation DBA is unable to specify the mappings in order to integrate data in multiple databases in a manner meaning- 2.1.3 Case for Tightly Coupled FDBS ful to the user’s precise needs [Litwin The loosely coupled approach may be ill and Abdellatif 19861. suited for more traditional business or cor- l It is also possible to support multiple porate databases, where system control (via semantics since different users can im- DBAs that represent local and federation port or integrate export schemas differ- level authories) is desirable, where the users ently and maintain different mappings are naive and would find it difficult to from their federated schemas to export perform negotiation and integration them- schemas. This can be a significant advan- selves, or where location, distribution, and tage when the needs of the federation replication transparencies are desirable. users cannot be anticipated by the fed- Furthermore, in our opinion, a loosely

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 205

coupled FDBS is not suitable for update grated to develop a single federated schema. operations. Updating in a loosely coupled Sometimes an organization will insist on FDBS may degrade data integrity. When a having a single federated schema (also user of a loosely coupled FDBSs creates called enterprise schema or global concep- a federated schema using a view definition tual schema) to have a single point of con- process, view update transformations are trol for all data sharing in the organization often not determined. The users may not across the component DBS boundaries. Us- have complete information on the compo- ing a single federated schema helps in de- nent DBSs and different users may use fining uniform semantics of the data in the different semantic interpretations of the FDBS. With a single federated schema, it data managed by the component DBSs (i.e., is also easier to enforce constraints that loosely coupled FDBSs support multiple cross export schemas (and hence multiple semantic interpretations). Thus different databases) then when multiple federated users can define different federated sche- schemas are allowed. mas over the same component DBSs, and Because one federated schema is created different transformations may be chosen by integrating all export schemas and be- for the same updates submitted on different cause this federated schema supports data federated schemas. Similar problems can requirements of all federation users, it may occur in a tightly coupled FDBS with mul- become too large and hence difficult to tiple federations but can be resolved at the create and maintain. In this case, it may time of federated schema creation through become necessary to support external sche- schema integration. A federation DBA cre- mas for different federation users. ating a federated schema using a schema A tightly coupled FDBS with multiple integration process can be expected to have federations allows the tailoring of the use more complete knowledge of the compo- of the FDBS with respect to multiple nent DBSs and other federated schemas. classes of federation users with different In addition to the update transformation data access requirements. Integrations of issue, transaction management issues need the same set of schemas can lead to differ- to be addressed (see Section 5.4). ent integrated schemas if different seman- A tightly coupled FDBS provides loca- tics are used. Thus this architecture can tion, replication, and distribution transpar- support multiple semantics, but the seman- ency. This is accomplished by developing a tics are decided upon by the federation federated schema that integrates multiple DBAs when defining the federated schemas export schemas. The transparencies are and their mappings to the export schemas. managed by the mappings between the fed- A federation user can select from among erated schema and the export schemas, and multiple alternative mappings by selecting a federation user can query using a classical from among multiple federated schemas. query language against the federated When an FDBS allows updates, multiple schema with an illusion that he or she is semantics could lead to inconsistencies. For accessing a single system. A loosely coupled this reason, federation DBAs have to be system usually provides none of these very careful in developing the federated transparencies. Hence a user of a loosely schemas and their mappings to the export coupled FDBS has to be sophisticated to schemas. Updates are easier to support in find appropriate export schemas that can tightly coupled FDBSs where DBAs care- provide required data and to define map- fully define mappings than in a loosely pings between his or her federated schema coupled FDBS where the users define the and export schemas. Lack of adequate se- mappings. mantics in the component schemas make this task particularly difficult. Let us now 2.2 Alternative FDBS Architectures discuss two alternatives for tightly coupled FDBSs in more detail. In this section, we discuss how processors In a tightly coupled FDBS with a single and schemas are combined to create various federation, all export schemas are inte- FDBS architectures.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 206 . Amit Sheth and James Larson

2.2.1 A Complete Architecture of a Tightly component schemas. Mermaid [Temple- Coupled FDBS ton et al. 1987b] falls into this category.‘j An architecture of a tightly coupled FDBS, No filtering processors or export shown in Figure 11, consists of multiple schemas: All of the component schemas basic components as described below. are integrated into a single federated schema resulting in a tightly coupled sys- l Multiple export schemas and filter- tem in which component DBAs do not ing processors: Any number of exter- control what users can access. This ar- nal schemas can be defined, each with its chitecture fails to support component own filtering processor. Each external DBS autonomy fully. UNIBASE [Brze- schema supports the data requirements zinski et al. 19841 is in this category, and of a single federation user or a class of hence it is classified as a nonfederated federation users. system. l Multiple federated schemas and con- No constructing processor: The user structing processors: Any number of or programmer performs the constructing federated schemas can be defined, each process via a query or application pro- with its own constructing processor. Each gram containing references to multiple federated schema may integrate different export schemas. The programmer must export schemas (and the same export be aware of what data are available in schema may be integrated differently in each export schema and whether data are different federated schemas). replicated at multiple sites. This archi-

l Multiple export schemas and filter- tecture, classified as a loosely coupled ing processors: Multiple export sche- FDBS, fails to support location, distri- mas represent different parts of a bution, and replication transparencies. If database to be integrated into different data are copied or moved between com- federated schemas. A filtering processor ponent databases, any query or applica- associated with an export schema sup- tion using them must be modified. ports access control for the related com- In practice, two processors may be com- ponent schema. bined into a single module, or two schemas . Multiple component schemas and may be combined into a single implemen- transforming processors: Each com- tation schema. For example, a component ponent schema represents a different schema and its export schemas are fre- component database expressed in the quently combined into a single schema with CDM. Each transforming processor a single processor that performs both trans- transforms a command expressed on the formation and filtering. associated component schema into one or more commands on the corresponding 2.2.3 Architectures with Additional Basic local schema. Components There are several types of architectures 2.2.2 Architectures with Missing Basic with additional components that are exten- Components sions or variations of the basic components There are several architectures in which all of the reference architecture. Such compo- of the processors of one type and all sche- nents enhance the capabilities of an FDBS. mas of one type are missing. Several ex- Examples of such components include the amples follow. following: l No transforming processors or com- l Auxiliary schema: Some FDBSs have ponent schemas: All of the local sche- an additional schema called an auxiliary mas are described in a single data model. In other words, the FDBS does not sup- ‘Its design, however, has provisions to store model port component DBSs that use different transformation information and attach a transforming data models. Hence there is no need for processor.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 207 schema that stores the following types of information: “‘“““‘;” Schema) Data needed by federation users but not available in any of the (preexisting) component DBSs. Information needed to resolve incom- patibilities (e.g., unit translation tables, format conversion information). . Statistical information helpful in per- forming query processing and optimi- zation. Figure 13. Using an auxiliary schema to store trans- lation information needed by a constructing processor. Multibase [Landers and Rosenberg 19821 describes the first two types of information in its auxiliary schema, This, however, can limit or conflict with whereas DQS [Belcastro et al. 19881 de- the autonomy of the component DBSs. scribes the last two types of information in its auxiliary schema. Mermaid [Tem- pleton et al. 1987133 describes the third 2.2.4 Extended Federated Architectures type of information in its federated To allow a federation user to access data schema. As illustrated in Figure 13, the from systems other than the component auxiliary schema and the federated DBSs, the five-level schema architecture schema are used by constructing proces- can be extended in additional ways. sors. It is also possible to consider the auxiliary schema to be a part (or sub- l Atypical component DBMS: Instead schema) of a federated schema. of a typical centralized DBMS, a com- . Enforcing constraints among com- ponent DBMS may be a different type of ponent schemas: As illustrated in Fig- data management system such as a file ure 14, an FDBS can have a filtering server, a database machine, a distributed processor in addition to a constructing DBMS, or an FDBMS. OMNIBASE uses processor between a federated schema a distributed DBMS as one of its com- and the component schemas. The filter- ponent DBMSs [Rusinkiewicz et al. ing processor enforces constraints that 19891. Figure 15 illustrates how one span multiple component schemas. The FDBS can act as a backend for another constructing processor, as discussed be- FDBS. By making local schema A2 of fore, transforms a query into subqueries FDBS A the same as external schema B2 against the component schemas of the of FDBS B, the component DBS A2 of component DBSs. Integrity constraints FDBS A is replaced by FDBS B. may be stored in an external schema or l Replacing a component database by a federated schema. The constraints may a collection of application pro- involve data represented in multiple ex- grams: It is conceptually possible to re- port schemas. The filtering processor place some database tables by application checks and modifies each update request programs. For example, a table contain- so when data in multiple component da- ing pairs of equivalent Fahrenheit and tabases are modified, the intercomponent Celsius values can be replaced by a pro- constraints are not violated. This capa- cedure that calculates values on one scale bility is appropriate in a tightly coupled given values on the other. A collection of system in which constraints among mul- conversion procedures can be modeled tiple component databases must be en- by the federated system as a special- forced. An early description of DDTS component database. A special-access [Devor et al. 1982aJ suggested enforce- processor can be developed that accepts ment of semantic integrity constraints requests for conversion information and spanning components in this manner. invokes the appropriate procedure rather

ACM Computing Surveys, Vol. 22, No. 3, September 1990 208 l Amit Sheth and James Larson

Integrity Constraints in External/Federated Schema 1

1 Filtering processor ]

I Constructing Processor

Figure 14. Using a filtering processor to enforce constraints across export schemas.

than access a stored database. Navathe larger granularity and allocate them as et al. [1989] discuss a federated architec- desired. For example, DDTS [Dwyer and ture being developed to provide access Larson 19871 defines two types of modules: to databases as well as application Application Processor and Data Processor programs. (Figure 17). An Application Processor in- cludes a federated schema with the associ- ated constructing processor and all the 2.3 Allocating Processors and Schemas to external schemas defined over the feder- Computers ated schema with the associated filtering It is possible to allocate all processors and processors and transforming processors (if schemas to a single computer, perhaps to present). A Data Processor includes a local allow federation users to access data man- schema, a component schema, and the as- aged by multiple component DBSs on that sociated transforming processor and all ex- computer. Usually, however, different com- port schemas defined over the component ponent DBSs reside on different computers schema with associated filtering processors. connected by a communication system. Dif- An Application Processor performs the ferent allocations of the FDBS components user interface and result in different FDBS configurations. management and coordination functions Figure 16 illustrates the configuration of and is located at every site at which there a typical FDBS. A general-purpose com- are federation users. A Data Processor per- puter at site 1 supports a single component forms the data management functions re- DBS and two federation schemas for two lated to the data managed by a single different classes of federation users. Site 2 component DBS and is located at every site is a workstation that supports two export at which a component DBS is located. A schemas, each containing different data for site can have either or both of the two use by different federation users. Site 3 is modules. Mermaid [Templeton et al. a small workstation that supports a single 1987b] divides the processors and the sche- federation user and no component DBS. mas into four types of modules of smaller Site 4 is a database computer that has one granularity. component DBS but supports no federation Special communication processors can users. also be placed on each computer to enable It may be desirable to group a related set processors on two different sites to com- of processors and schemas into modules of municate with each other. Communication

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 209

FDBS A

I Component Component DBS Al DBS An

External 0)Schema B 1

FDBS B

Figure 15. FDBS B acting as a back end to FDBS A. processors are not shown in our reference federated schema called the Global Repre- architecture. They are placed between any sentation Schema, which is expressed in pair of adjacent processors that are allo- the relational data model. It has an external cated to different computers. schema called the Conceptual Schema rep- resented in the Entity-Category-Relation- 2.4 Case Studies ship (ECR) model [Elmasri et al. 19851. Users formulate requests directly against In this section we relate the terms and the Conceptual Schema in the GORDAS concepts of the reference architecture to query language [Elmasri 19811. The ECR those used in four example FDBSs. Our data model is rich in semantics (e.g., it purpose is not to survey these systems shows cardinality and operation con- [Thomas et al. 19901 but to show how the straints on an entity’s participation in re- reference architecture can be used to rep- lationships). The transforming part of the resent the architectures of various FDBSs Translation and Integrity Control proces- uniformly. This uniform representation sor is responsible for translating requests can greatly simplify the task of studying written in GORDAS on the ECR data and comparing these systems. model into the internal form of a relational 2.4.1 DOTS query language against the Global Repre- sentational Schema. The filtering part Figure 17 illustrates the original architec- of the Translation and Integrity Control ture of DDTS [Devor et al. 1982a] using processor is responsible for modifying the terminology of the reference architec- each query, so when it is processed, the ture (to the left of each colon) and the constraints specified in the Conceptual terminology used by DDTS (to the right of Schema will be enforced. For example, a each colon and in italics). It has a single GORDAS query that deletes a record will

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Site 1 Site 2 Site 3

External Schema External Schema (External Schema 1 External Schema I I L / Filtering Processor Filtering Processor I 1 Filtering, Processor 1 I I Federated Schema Federated Schema I f Federated Schema 3 I I I onstructing Processor Constructing Processor Constructing Processor Constructing Processor

\

Site 4 \

(Export Schema

I 1 Filtering Processor 1 Filtering Processor I (Component Schema 3 (Component Schema

1 TransforTing Proce: 1

I (Local (Local Schema 1 Component DBS 1 1 Component DBS 1 1 Component DBS ]

Figure 16. Typical FDBS system configuration. Federated Database Systems 211

Application Processor

F(T)

Federated: Global Representational

I Constructing: -1

-1 I . a Processor sor

Component I: Local Component n: Local Representational Representational Schema I I Transforming 1: Transforming n: Local Operations Local Operations Module I Module n I I

I Component DBMS 1: Component DBMS n: IDS.2 Wet works RAM (Relational) I I

Figure 17. DDTS architecture. be modified so that it verifies that deleting There are two separate processors in the record will not violate any semantic DDTS’s constructing processor, reflecting integrity constraints before the record is the decision to separate distributed query actually deleted. DDTS has since been ex- optimization from distributed query exe- tended to support external schemas [Dwyer cution. The Materialization and Access and Larson 19871 expressed in the rela- Planning component generates a distrib- tional data model defined over the Global uted execution strategy consisting of sets Representation Model. SQL is used to of commands, each expressed in terms query such external schemas. of one of the Local Representational

ACM Computing Surveys, Vol. 22, No. 3, September 1990 212 l Amit Sheth and James Larson

ederatedl: DAPLEX Global vleW I I ransforming: Transformer

eratedl : DAPLEX Global View 1 ressedusing local terms I Transforming: GlobalQuery Optimizer Constructing: Decomposer Transforming: Filter Constructinal Monitor

omponent 1: Component n: DAPLEXLocal DAPLEXLocal I Schema n I I Transforming: LocalOptimizer LocalOptimizer Transforming: Translator I

l*rrn I Component DBMS n: Relational I

Figure 18. Multibase architecture.

(component) schemas. The distributed ex- 2.4.2 Multibase ecution strategy can be saved for future Figure 18 illustrates the architecture of execution or can be passed immediately to Multibase, again using the terminology of the Distributed Execution Monitor. The the reference architecture and the termi- Distributed Execution Monitor is respon- nology used by Landers and Rosenberg sible for executing the distributed execu- [1982]. The federated schema is expressed tion strategy, coordinating the execution of in a functional data model called DAPLEX. the sets of commands, and returning the The Transformer modifies global queries results to the user. by inserting references to Local and Aux- The Local Operations (transforming) iliary schemas. The modified global query Module accepts a set of commands and is then processed by a series of processors transforms it into a form that can be that generates sequences of DAPLEX executed by the component DBMS. Orig- single-site queries. These processors inally there were two CODASYL com- include: ponent DBMSs. Later a third component DBMS using the relational data model was l A Global Query Optimizer that produces added. a global plan,

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 213

xternal 1: Relational I*** r ternal p: Semantic Subschema 1 Subschema p I I Transforming: Relational Transforming: AR/EL to D/L to D/L Filtering: Access File/ Filtering: Access File/ Access List Access List

I Constructing: Optimizer and Controller

Local Schema 1

Figure 19. Mermaid architecture.

A Decomposer that decomposes the l A Translator converts the DAPLEX global plan into single-site DAPLEX query to a form the component DBMS queries, can process. A Filter that reduces the decomposed An Auxiliary schema holds additional queries by removing operations from data not stored in any component DBMS them that are not supported by the cor- responding component DBMSs, and and information needed to resolve incon- sistencies. Component DBMSs supported A Monitor that controls the distributed by Multibase include both CODASYL and execution of the subqueries. relational. The Optimizer, Decomposer, and Filter may be cyclically invoked for nested global 2.4.3 Mermaid queries. Two types of transformations are performed on each DAPLEX single-site Figure 19 illustrates the architecture of query: Mermaid, again using the terminology of the reference architecture and the termi- l A Local Optimizer determines the opti- nology used by Templeton et al. [1987b]. mal query-processing strategy for the Users may formulate requests using either single-site DAPLEX queries. SQL on a relational schema or ARIEL, a

ACM Computing Surveys, Vol. 22, No. 3, September 1990 214 l Amit Sheth and James Larson

ternal 11 Federated 1: DSL Queries / External 1 l.**P I Constructing: MRDSM

mponent l/ Local 1: MRDS Schema I MRDS Schema n I Component DBMS 1: freon Component DBMS n: MRDS I MRDS n I I

database 1

Figure 20. MRDSM architecture. user-friendly query language developed at Abdellatif 19871 to facilitate formulating Systems Development Corporation (now such queries. Figure 20 illustrates the archi- Unisys), on what is called a Semantic Sub- tecture of MRDSM using the terminology schema. The federated schema is called a of the reference architecture and the ter- Global Schema and is expressed in the re- minology of Litwin [ 19851. lational model. User requests are trans- formed to the internal command language 3. FEDERATED DATABASE SYSTEM called the Distributed Intermediate Lan- EVOLUTION PROCESS guage (DIL). DIL requests are processed by a constructing processor that produces and There are two approaches to managing dis- executes a . A transforming pro- tributed data: installing a distributed cessor (one for each component DBS) DBMS or adding a layer of software above translates DIL requests into the query lan- existing DBMSs to create an FDBS system. guage of the component DBMS, interacts In the first approach, installing a distrib- with the component DBMS, and sends data uted DBMS requires (1) changes and dis- to other transforming processors (e.g., for ruption of the existing applications because joins and unions). Component DBMSs in- they do not have the dichotomy of local clude commercial relational DBMSs on versus global operations, (2) a complete Sun@ workstations and a database machine change of the organizational structure for with a minicomputer host [Thomas et al. information management because this ap- 19901. proach does not respect the autonomy of existing DBSs, and (3) replacement of the 2.4.4 MRDSM existing centralized DBMSs by a distrib- uted DBMS. MRDSM is a loosely coupled FDBS (called The federation approach offers a pre- a multidatabase system by its originators) ferred evolutionary path. It allows contin- in which a programmer/user formulates a ued operation of existing applications to request involving data from component remain unchanged, preserves most of the DBSs. The system provides a multidata- organizational structure, supports con- base language called MDSL [Litwin and trolled integration of existing databases, and facilitates incorporation of new ap- @Sun is a trademark of Sun Microsystems, Inc. plications and new databases. Although

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 215 existing applications need not be changed Both migration and extension of file sys- in an FDBS, as the old applications are tems are expensive. Many developers of the modified, the component databases may be FDBSs elect to use migration because they standardized, and redundant data (unless do not have access to the source code of the required for improving availability or ac- file system and because they want their cess time) may be removed. New applica- data maintained by a commercially avail- tions may be coded using an external able DBMS. Since existing application pro- schema defined on a federated schema. grams are affected, however, this activity is An FDBS evolves through gradual inte- very difficult and often organizationally gration of a set of interrelated DBSs. It very sensitive. evolves as the new component databases The second phase of the FDBS evolution, are added and the existing ones are modi- developing a federated database system, in- fied. This evolution process can be divided volves creating the component, export, fed- into three phases: preintegration, develop- erated, and external schemas, defining the ing a federated database system, and fed- mappings between various schemas, and erated database system operation. These implementing the associated processors. phases (or the activities within the phases) This is a critical task for the success of an need not follow serially from one phase to FDBS. A methodology that can be used to the next; each phase may be performed manage this phase is discussed in Section several times, and previous phases may be 3.1. Tasks performed in this phase are dis- revisited and their results revised. cussed in Section 4. The preintegration phase deals with the The third phase of the FDBS evolution, situation in which data reside in files and federated database system operations, in- are not managed by any DBMS yet need to volves managing and manipulating multi- be accessed by federation users. Two gen- ple integrated databases using an FDBMS. eral approaches are possible: The processors support various run-time FDBS operations (e.g., query processing l Migrate the files to a DBMS: Files and transaction management). The proces- can be migrated to a DBMS by perform- sors and the federated schemas developed ing three activities: (1) develop a com- or generated in the second phase allow the ponent schema that describes the data selective, shared, and consistent access to in the files, (2) load the DBMS with data stored in the component DBSs. An data from the files, and (3) modify exist- FDBMS provides an interface to the users ing application programs to access the and applications and may allow execution DBMS rather than access the files of ad hoc queries. Tasks performed in this directly. phase are discussed in Section 5. l Extend the file system to support DBMS-like features: By extending 3.1 Methodology for Developing a Federated the file system to support DBMS-like Database System features, a file system can be treated as a component DBMS. In this approach, The methodology discussed here extends the following activities are performed: the methodology of Sheth [1988b] for de- (1) create or generate a component veloping schemas for an FDBS to include schema that describes data in the files, processors. Developing a new FDBS pri- (2) create backup and recovery facilities marily consists of integrating existing com- in the file system if the federation users ponent databases. A bottom-up FDBS will perform updates to data in the file development process can be followed for system, and (3) create appropriate filter- this purpose. This process can also be used ing and transforming processors that will for adding a new component database to an convert commands expressed in the FDBS. FDBS’s internal language to the com- When new applications are developed us- mands that can be processed by the file ing an existing FDBS, it is necessary to system. determine whether the data requirements

ACM Computing Surveys, Vol. 22, No. 3, September 1990 216 l Amit Sheth and James Larson

of the application are supported by a fed- respective component DBSs to author- erated schema. If they are not, it is neces- ize part of their databases to be in- sary either to extend a federated schema or cluded in the FDBS based on the create a new federated schema and either negotiations with the federation DBA. to extend an existing component database Develop (or identify if one already ex- or create a new component database. This ists) the appropriate filtering proces- process is called a top-down FDBS devel- sor. opment process. It is an extension of the (3) Integrate schemas: Select a related traditional distributed database design set of export schemas to be integrated process. In practice, elements of both the and integrate them. Integration of each bottom-up and top-down processes are set of export schemas will produce one used to develop an FDBS. federated schema. Develop (or identify A data dictionary/directory (DD/D) if one already exists) a constructing often plays an important role in coordinat- processor that would transform the ing various activities by storing essential commands expressed on federated information. In addition to storing all sche- schemas into commands expressed on mas representing information about the the corresponding export schemas. data managed by the FDBS, a DD/D also This includes generating mappings stores mappings among schemas, informa- with appropriate distribution informa- tion about schemas and databases (such as tion. This step is repeated once for each statistics relevant to ), related set of export schemas and the schema-independent information (such as corresponding federated schema. tables and functions for unit/format con- (4) Define external schemas: If neces- versions or heuristics for query optimiza- sary, define external schemas for each tion) , and various types of system federation user or class of federation informations (such as capabilities of each users. Build or identify the necessary component DBMS, network addresses of filtering and transforming processors. each system hosting a component DBMS, The transforming processor performs and communication facility to be used to schema translation if the data model of communicate with a given system). the external schema is different than the CDM. 3.1.1 Bottom-Up Development Process 3.1.2 Top-Down Development Process A bottom-up FDBS development process is used to integrate several existing databases A top-down FDBS development process is to develop an FDBS. Figure 21 illustrates used when an FDBS already exists and the bottom-up process outlined below: additional user requirements (e.g., to sup- port a new application) are placed on it. (1) Translate schemas: Translate the Figure 22 illustrates the top-down process local schema of a component database outlined below: into a component schema expressed in the CDM. Generate the mappings be- (1) Define or modify external sche- tween the objects in the two schemas. mas: Collect federation user require- Develop (or identify if one already ex- ments and analyze them to define new ists) the transforming processor that external schemas or extensions to the can transform commands expressed on existing external schemas. the component schema into the com- (2) Analyze schemas: Compare relevant mands expressed on the corresponding federated schemas with the external local schema. schemas to identify parts of the exter- (2) Define export schemas: Define ex- nal schemas that are already in the port schemas from a component federated schema and hence supported schema. This step is performed by the by the FDBS. If some part of an exter- administrators (component DBAs) of nal schema is not already supported by

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 217

I Data I ’ I -Schema - - \ lntegratlon ’ &J$&z-;/i

Schema

----- Schema --- Translation- - Local Schema

Figure 21. Bottom-up FDBS developing process.

the FDBS, a federated schema will need (b) The required data are not imple- to be extended or developed to include mented in any component data- that part. We refer to this unsupported base, and a component DBA is part as a temporary schema (which is willing to place the required data discarded at the end of the integration in his or her component database. process). One or more of the compo- In this case, local, component, and nent databases will have to support the export schemas of the relevant da- temporary schema. This can be accom- tabases are modified. plished in one of three ways: (c) The required data are not imple- mented in any component data- (a) The required data exist in one or base, and no component DBA is more component database(s). In willing to place the required data this case, identify the component in his or her component databases. schemas containing the descrip- In this case, the temporary schema tion of required data and negotiate is implemented as a separate da- with their administrators to have a tabase of an existing component description of this data placed in DBMS. Alternatively, a new com- an export schema with appropriate ponent DBMS may be used that access rights. may require a new transforming

ACM Computing Surveys, Vol. 22, No. 3, September 1990 218 . Amit Sheth and James Larson

Requirements CollectIon & - -

External Schema External Schema

Physical -Database Design

Figure 22. Top-down FDBS developing process.

processor. The temporary schema control, negotiation, and schema integra- becomes a new component schema. tion. As identified in Section 3.1, these (3) Integrate schemas: Integrate the tasks are relevant to the development of temporary schema with the relevant the schemas and are affected by the data federated schema and discard the tem- requirements of the federation users. porary schema. 4.1 Schema Translation

4. FEDERATED DATABASE SYSTEM Schema translation is performed when a DEVELOPMENT TASKS schema represented in one data model (the source schema) is mapped to an equivalent Many tasks involved in developing a cen- schema represented in a different data tralized or a distributed DBS [Ozsu and model (the target schema). Schema trans- Valduriez 1990, Chapter 5; Teorey 19901 lation is needed in two situations: are also relevant to developing an FDBS and can be adapted with minor changes. In l Translating a local schema into a com- this section, we discuss four tasks that do ponent schema when the DBMS’s native not typically arise in that context but have data model is different from the CDM. particular significance for developing an l Translating a part of the federated FDBS. They are schema translation, access schema into an external schema when

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 219 the external schema is expressed in a data the source schema. These rules specify how model different than the CDM. each object in the target schema is derived from objects in the source schema. If the For example, assume that if the CDM is mapping rules do not have inverses, the an extended Entity Relationship (EER) view update problem must be solved. model, then one of the component DBMSs The following is a simplified example of to be integrated is a relational DBMS and a set of mapping rules that can be used to another is a CODASYL DBMS, and a class generate a relational schema from a of federated users wants a relational view. CODASYL schema: The following are required: (1) translation of the local schema of the relational DBMS (1) Each record type is mapped to a table into its equivalent component schema ex- with the same name. pressed in the EER model, (2) translation (2) Each field in a record type A is mapped of the local schema of the CODASYL to a of table A. DBMS into an equivalent component (3) Each record identifier in record type A schema expressed in the EER model, and is mapped to a key in table A. (3) translation of a part of the federated (4) Each set (one-to-many relationships schema in the EER model into an exter- between record type A and record type nal schema expressed in the relational B ) is mapped to a column in table B model. Two general approaches for per- that contains values from the key of forming schema translation are discussed table A. This column is called a foreign below. key. The first approach develops explicit mappings between each source and the tar- Consider the CODASYL schema in Fig- get schema. This approach is appropriate ure 23(a) (from Larson [1983a]). The when a (class of) federation user(s) requires COMPANY record type has two fields, specific data structures in the target COMPANY-NAME and CITY. The schema. The DBA must then specify map- PRODUCT record type has two fields, pings that transform the source schema to PRODUCT-NAME and COST. COM- the target schema. There are three cases: PANY-NAME is the unique record identi- fier for the COMPANY record type, and All schema objects in the target schema PRODUCT-NAME is the unique record can be derived from the schema objects identifier for the PRODUCT record type. in the source schema, and there exist There is a PRODUCES set consisting of inverses for all of these mappings. The the COMPANY record type as the owner DBA specifies the mappings and their and the PRODUCT record type as the inverses. member. The PRODUCES set maintains All schema objects in the target schema one-to-many relationship between can be derived from the schema objects EOMPANY and PRODUCT. in the source schema, but inverses may The relational schema of Figure 23(b) not exist for the mappings. If the update results when the above mapping rules are were to be allowed, the DBA must con- applied. The COMPANY table has two struct a filtering processor to solve the fields, COMPANY -NAME and CITY. The associated view update task. PRODUCT table has three fields, PROD- If the target schema contains objects that UCT-NAME, COST, and COMPANY- cannot be derived from the objects in the NAME. COMPANY-NAME is the key for source schema, the DBA must modify the the COMPANY table, and PRODUCT- target schema or construct a constructing NAME is the key for the PRODUCT table. processor that integrates schema objects The COMPANY-NAME column of from other schemas to support all of the PRODUCT table is a ; it con- objects in the target schema. tains only values from the COMPANY- NAME key field of the COMPANY table. The second approach develops mapping Zaniolo [1979] developed a tool that au- rules to generate the target schema from tomatically generates relational schemas

ACM Computing Surveys, Vol. 22, No. 3, September 1990 220 l Amit Sheth and James Larson

PRODUCT PRODUCT-NAME COST COMPANY-NAME*

(a) (b)

Figure23. Equivalent CODASYL and relational schemas. (a) CODASYL schema; (b) relational schema. * -Foreign key.

from CODASYL schemas. Elmasri et al. local schema to a component schema. This [1985] and Teorey et al. [1986] discuss is because a component DBMS is autono- transformations between extensions of the mous, and the database managed by it Entity Relation data model [Chen 19761 should not be changed as a result of the and the relational data model. Lien [1981] translation process. In other words, the describes mappings from the hierarchical translation should be reversible in the sense to the relational data model. Tsichritzis that (a) the component schema in the CDM and Lochovsky [1982] provide a summary should represent the same database repre- of these types of mappings. sented by the local schema and (b) it should In practice, the translation task may re- be possible to translate a command on a quire more than just data model translation component schema into command(s) on the because the source and the target schemas corresponding local schema. A notion may not be able to represent exactly the that may be useful in this context is same semantics, Hence schema translation that of content-preserving transformations poses two contradictory requirements: [Rosenthal and Reiner 19871. (1) Capture additional semantics during the schema translation that can later help in 4.2 Access Control the tasks of schema integration and view update, and (2) maintain only the existing An FDBS should be designed to control semantics because the local schema is not access to component databases by federa- able to support the additional semantics. tion users. The system architecture of an These requirements are discussed next. FDBS (shown in Figure 11) has filtering Using a semantic data model as the CDM processors at two levels. Each can be used can facilitate representation of additional to provide access control. The filtering pro- semantics that may be difficult or impos- cessors relating the export and the compo- sible to specify in a traditional model such nent schemas control access to component as the relational model. As an example, a DBSs. The filtering processor relating the generalization relationship with inherit- external and federated schemas controls ance between two entity types can be ex- access to the federated schemas.7 Negotia- plicitly represented in a component schema tion between the component and federation that uses a semantic data model. As an- DBAs may be necessary to reach an agree- other example, a foreign key between rela- ment on how to control the data a compo- tions Rl and R2 in a local schema expressed nent DBA wants to keep secure from some in the relational model may be explicitly of the federation users while allowing ac- represented as a relationship between the cess to other federation users. Alternately, two entities representing Rl and R2 in the a new federated schema can be established corresponding component schema in an EER model. One, however, should be care- ‘This is similar to using the view mechanism for ful about the additional semantic informa- access control security in centralized and distributed tion provided during the translation from a DBMSs [Bertino and Haas 19881.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 221 for use by a selected subgroup of the origi- maid maintains an access file on each com- nal federation. For example, suppose com- puter for this purpose. Access files control ponent databases each contain information access to Mermaid’s data dictionary and about commercial shipping and military directory. Third, Mermaid maintains an shipping. The following approaches are access list that identifies the external and possible: federated schemas each user is authorized to use. Fourth, the user must be registered Each component DBA includes schema with each computer having a component objects describing both commercial and DBS that is relevant to the user’s federated military shipping in their respective ex- schema. In addition, Mermaid provides ex- port schemas. This information is inte- tensive run-time support for access control, grated into a single federated schema. uses data encryption (for passing access The federation DBA creates two external control information over the network and schemas, one for accessing commercial for storing it in access files and access lists), shipping information and one for access- and provides an audit trail and a journal ing military shipping. In this scenario, trail for security purposes. the local DBAs trust the federation DBA One of the heterogeneities that may be to provide appropriate access controls on encountered in an FDBS is the existence each external schema. of different and incompatible mechanisms Each component DBA generates two ex- for expressing and enforcing access control port schemas, one containing the schema policies. An access control mechanism used objects describing commercial shipping by an FDBMS, such as the view mecha- and the other containing schema objects nism, may also conflict with the autonomy describing military shipping. Two feder- of the component DBMSs. A solution that ated schemas are created, one dealing uses preprocessing is discussed by Wang with commercial shipping and the other and Spooner [1987]. dealing with military shipping. In this scenario, the component DBAs control 4.3 Negotiation appropriate access on each export schema and hence is preferred to the A federation DBA manages federated sche- previous one. mas. A component DBA manages the ex- port schemas defined over the component Other access control and security issues DBS he or she manages. A federation DBA include the following: and component DBAs must reach an agree- How users are identified, grouped into ment about the contents of the export sche- classes, and named for access control se- mas and operations allowed on the export curity purposes. Templeton et al. [1987a] schemas such that federated schemas can discuss the issue of installing a new user. be defined over them to support federation How data are identified, grouped into users. The dialogue between the two admin- classes, and named for access control se- istrators to reach this agreement is curity purposes [Abbott and McCarthy called negotiation. To facilitate negotia- 1988; Templeton et al. 1987b]. tion, the administrators follow protocols governing the messages exchanged during What operations are used for controlling a negotiation. security privileges [Fagin 1978; Larson In terms of the reference schema archi- 1983b; Selinger and Wade 19761. tecture, there are two ways to perform ne- A case study of the access control fea- gotiation. First, a component DBA may tures of Mermaid [Templeton et al. 1987a] allow the federation DBA to read the com- is interesting. It uses access control at four ponent schemas he or she controls but does levels. First, a user must have an account not give any specific rights for data access. on the computer where the Mermaid user When a federation DBA determines data interface can be run. Second, the user access requirements, he or she sends a re- should be registered with Mermaid. Mer- quest to the component DBA to define an

ACM Computing Surveys, Vol. 22, No. 3, September 1990 222 l Amit Sheth and James Larson

export schema with appropriate access ing export schemas in a bottom-up FDBS rights. This request typically includes in- development process). The tasks are quite formation on the data to be accessed, the similar and are treated uniformly as types of access (retrieval or update), and schema integration in this paper. the federation users who will access the Many approaches and techniques for data. In some cases, it may also include schema integration have been reported in requirements on the component DBS such the literature. The survey paper of Batini as constraints to be satisfied (e.g., how cur- et al. [1986] discusses and compares 12 rent or consistent the data must be and methodologies for schema integration. It maximum response time), and whether the divides schema integration activities into federation DBA or user may grant others five steps: preintegration, comparison, con- some or all of the access rights for this formation, merging, and restructuring. In export schema [Larson 1983131. The com- the context of FDBS development, prein- ponent DBA may accept or refuse any part tegration activities involve translation of of the request for any reason in defining schemas into a CDM so they can be com- the export schemas. In the second alterna- pared and specification of global con- tive, all component DBAs may define the straints and naming conventions. The export schemas a priori and require latter involves activities that may be useful the access from the FDBMS to be limited in the comparison step (e.g., specifying a to the contents and access rights contained thesaurus that may be used for identifying in those export schemas. naming conflicts, homonyms, or syno- Two of the many aspects of FDBS oper- nyms). Although schema translation has ations for which negotiation protocols need been studied independently from schema to be defined are the following: integration, the two tasks are highly inter- related. Schema translation may be more l A component DBA decides to withdraw constrained when integrating existing access to a schema object or change a DBSs than during view integration because schema object in a local schema. the constraints of the existing data struc- l A federation user decides that access to a tures cannot be changed. schema object is no longer needed. The comparison step (also called schema Heimbigner and McLeod [1985] present analysis) involves two activities: (a) analyz- negotiation protocols for a multisited dis- ing and comparing the objects of the tributed dialogue among the DBAs in an schemas (and possibly databases) to be in- FDBS. Alonso and Barbara [1989] consider tegrated, including identification of naming the case in which the federated schema is conflicts (e.g., homonym and synonym de- a materialized view (they call it quasi- tection), domain (i.e., value type) conflicts, copy). In this context, they explore ways to structural differences, constraint differ- express the needs of a federation DBA pre- ences, and missing data and (b) specifying cisely, to determine the degree of sharing the interrelationships among the schema the component DBA is willing to offer, and objects. The conforming step is closely tied to estimate the cost of a specific agreement to the comparing step since it is difficult to for both the component DBS and the compare unless the related information is FDBS. represented in a similar form in different schemas. Unless the schemas are represented in 4.4 Schema integration the same model, analyzing and comparing View integration refers to integrating mul- their schema objects is extremely difficult. tiple user views into a single schema (e.g., It is important to note that comparison of federated schema development in a top- the schema objects is primarily guided by down FDBS development process). Schema their semantics. not bv their svntax. Hence integration refers to integrating (usually ex- the choice of the CDM is critical. The CDM isting) schemas into a single schema (e.g., should be semantically rich; that is, it federated schema development by integrat- should provide abstraction and constraints

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 223

so that the semantics relevant to schema E2, and El is-disjoint-and-nonintegra- integration can be presented. Thus, seman- bleewith E2. Similar relationships exist tic (or conceptual) data models are much among relationship types. Elmasri et al. preferred over traditional data models such [1986] and Sheth et al. [1988b] use the as relational, network, or hierarchical. relationships among attributes in a heuris- Dayal and Hwang [1984] suggest that the tic algorithm to identify pairs of entity concept of generalization is important for types and relationship types that may be schema integration. One reason is that sim- related by the first three types of relation- ilar or related concepts are represented at ships. The actual relationships are specified different levels of abstractions in different by the person (the DBA in a tightly coupled schemas. For example, one schema may FDBS, a user in a loosely coupled FDBS) contain attributes Office-Phone-Number integrating the schemas. Using classifica- and Home-Phone-Number, both of which tion in the CANDIDE semantic data model can be shown as specializations of the at- [Beck et al. 19891 and the relationships tribute Telephone-Number in another among attributes, Sheth and Gala [1989] schema. suggest that many of the relationships Analyzing and comparing schema objects among entity types and relationship types is followed by specifying the interrelation- can be automatically discovered without ships among the schema objects. Consider additional human input. integrating two EER schemas. In this case, In [Sheth et al. 1988b], specification of there are three types of schema objects: relationships among schema objects is used entity types, relationship types, and attri- as an input to a lattice generation or merg- butes. In the methodology discussed by ing activity to generate a single integrated Elmasri et al. [1986] as well as in several schema. Other efforts, particularly those other schema integration methodologies, that do not rely on identifying relationships relationships among attributes in the two among schema objects, as discussed above, schemas are specified first. Relationships provide a set of operators for the DBA among other object types (e.g., the entity to correlate or integrate the objects to gen- types and the relationship types) follow. erate an integrated schema [Motro and Two attributes, al and a2, may be related Buneman 19811. in one of the three ways [Larson et al. 1989; One reason why a completely automatic Sheth and Gala 19891: al is-equivalent-to schema integration process (particularly a2, al includes (or is-included-in) a2, or for discovering attribute relationships) is al is-disjoint-with a2. Determining such not possible is because it would require that relationships can be time consuming and all of the semantics of the schema be com- tedious. If each schema has 100 entity pletely specified. This is not possible be- types, and an average of five attributes per cause, among other reasons, (1) the current entity type, then 250,000 pairs of attributes semantic (or other) data models are unable must be considered (for each attribute in to capture a real-world state completely, one schema, a potential relationship with (2) it will be necessary to capture much each attribute in other schemas should be more information than is typically cap- considered). Sheth and Gala [1989] argue tured in a schema, and (3) there can be that this task cannot be automated, and multiple views and interpretations of a hence we may need to depend on heuristics real-world state; and the interpretations to identify a small number of attribute pairs change with time. Convent [1986] formally that may be potentially related by a rela- argues that integrating relational schemas tionship other than is-disjoint-with. is undecidable. Two entity types, El and E2, may be Three tools developed to perform schema related in one of five ways [Elmasri et al. integration are reported in Hayes and Ram 1986; Navathe et al. 19861: El equals E2, [ 19901, Sheth et al. [1988b], and Souza El includes (or is-included-in) E2, El [ 19861. Sheth et al. [ 1988131, for example, overlaps (or may-be-integratable-with) describe a forms-based interactive tool to E2, El is-disjoint-but-integrable-with integrate EER schemas. It accepts the

ACM Computing Surveys, Vol. 22, No. 3, September 1990 224 l Amit Sheth and James Larson definitions of the schemas to be integrated, type/scale differences. Some systems using guides the integrator through the process multidatabase languages can define multi- of defining attribute equivalencies, uses at- ple semantics over the same data (e.g., dy- tribute equivalencies to rank entity type namic attributes in Litwin and Abdellatif and relationship type pairs that may be [ 19861). Examples of multidatabase lan- related heuristically, accepts assertions guages include MDSL [Litwin and Abdel- about the relationships among the entity latif 19871, MSQL [Litwin et al. 19871, types and relationship type pairs, checks GSQL [Jacobs 19851, a relational language for their consistency, and performs the with extended abstract data types [Czejdo merging task automatically. et al. 19871, and a graphical multidatabase After schemas have been integrated, it query language [Rusinkiewicz et al. 19891. may be necessary to decide how to allocate CALIDA [Jacobsen et al. 19881) provides a the data among multiple component DBSs. menu-based interface to formulate queries This distribution design task has been stud- against multiple component DBMSs. ied extensively [Ceri et al. 19871. 5.2 Command Transformation 5. FEDERATED DATABASE SYSTEM A command transformation processor OPERATION translates commands in one language, The previous section discussed some of the called the source language (or operations in important tasks for developing an FDBS. one data model, called the source data Many tasks relevant to the operation (i.e., model) into commands in another lan- run-time system) of distributed DBMSs are guage, called the target language (or oper- also relevant to the operation of multida- ations in another data model, called the tabase systems and FDBSs. In this section, target data model). Converting between we discuss four tasks that are either specific procedural and nonprocedural languages is to an FDBS (or a multi-DBMS system) or of particular interest. are significantly different from similar tasks in a distributed DBMS. These tasks 5.2.1 Converting a Nonprocedural Language are application independent. into a Procedural Language A common example of this type of conver- 5.1 Query Formulation sion is transforming relational algebraic op- The same query languages used in cen- erations into a sequence of CODASYL tralized and distributed DBMSs can programming language statements. Con- be used for formulating queries in a sider the relational operation join that tightly coupled FDBS. This is because a causes two tables to be combined. Two lightly coupled FDBS provides location, rows, one from each table, are joined if their distribution, and replication transpar- values satisfy a specified Boolean (the join encies. Most loosely coupled FDBSs pro- condition) condition. Using the example of vide a multidatabase language to allow a Section 4.1, joining the COMPANY and federation user to access data from multiple PRODUCT tables where the COMPANY- component DBSs. A multidatabase lan- NAME of a COMPANY equals the COM- guage provides functions that are not pres- PANY-NAME of a PRODUCT results in ent in data manipulation languages used in a single table consisting of columns of both centralized and distributed DBMSs [Litwin the COMPANY and PRODUCT tables. et al. 19871. It provides a capability to Using code generation techniques, it is pos- define federated schemas as views over sible to automatically generate code con- multiple export schemas (or parts of sisting of two nested loops. The outer loop component schemas) and to formulate reads successive COMPANY records. The queries against such a view. In addition the inner loop reads successive PRODUCT rec- language deals with problems identified in ords in the PRODUCES set (owned by the discussion on schema integration, such COMPANY) and constructs rows of the as naming conflicts and data structure/ joined table.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 225

This idea can be generalized as follows. thesizing relational algebraic operations For each relational operation, R, develop from a sequence of CODASYL program- an algorithm, A, which generates equiva- ming language statements. This problem is lent CODASYL code. For each nested op- quite difficult. Demo [ 19831 suggests a data eration, Rl and R2, develop an algorithm flow analysis approach that may be used for using the corresponding generation al- for some cases. Solving the general case is gorithms Al and A2 to generate an equiv- an open research problem. Fortunately, the alent CODASYL code. For example, if a need for this type of transformation seldom query involves joining the DISTRIBUTOR arises because a procedural language is sel- table with the result of joining the COM- dom chosen for the federation level. PANY and PRODUCT tables, then gen- erate two nested loops that perform the 5.3 Query Processing and Optimization join of COMPANY and PRODUCT and place those two loops inside another loop In a loosely coupled FDBS, the FDBMS that generates the join of DISTRIBUTOR can support little or no query optimization table with the result of joining COMPANY (see Section 2.1). In a tightly coupled and PRODUCT. FDBS, the FDBMS can perform extensive Another approach involves generating all query optimization. Query processing in- possible strategies for performing nested volves converting a query against a feder- relational operations, then selecting the ated schema into several queries against best strategy. Heuristics can be used to the export schemas (and the corresponding generate only promising strategies, de- component DBSs) and executing these creasing the time to perform optimization queries. Query processing in an FDBMS is at the cost of producing a near-optimal similar to that in a distributed DBMS (see rather than optimal strategy. Rules for con- e.g., [Yu and Chang 19841 for a survey of verting one strategy into a better strategy query-processing techniques for distributed are described by Chu and Hurley [1982] DBMSs). In an FDBMS, however, the and Ceri and Pelagatti [ 19841. following additional complexities may An approach taken by some researchers be introduced due to heterogeneity and is to use an intermediate language that can autonomy: serve as an umbrella for multiple target The cost of performing an operation may languages and be able to express most, if be different in different component not all, of the operations expressed in the DBSs. Due to autonomy of a component target languages. Piatetsky-Shapiro and DBS, the cost of performing the opera- Jakobson [1987] define a language called tion in a component DBMS may not be DELPHI that combines the power of rela- known or may only be approximately tional algebra with many additional known. In addition, this cost may vary database operations, including grouping, from time to time as the local system load sorting, aggregates, and nested queries. changes. They also describe rule-based transforma- The component DBMSs may differ in tions to convert DELPHI to different tar- their abilities to perform local query get query languages such as SQL and optimizations. fourth-generation languages. In CALIDA [Jacobson et al. 19881, which is a loosely The system and database operations pro- coupled FDBS, a user’s interactions with a vided by each of the component DBMSs menu-based interface results in a query in and the FDBMSs may be different. Ex- DELPHI, which is then converted into amples of system operations include the queries in the desired target language. ability to create temporary relations and the ability to receive and reference data from other sites. Examples of database 5.2.2 Converting a Procedural Language into a Nonprocedural Language operations include the different rela- tional algebraic operations (join, selec- An example of converting a procedural lan- tion, etc.) and aggregations. Not all guage into a nonprocedural language is syn- component DBMSs may support the

ACM Computing Surveys, Vol. 22, No. 3, September 1990 226 . Amit Sheth and James Larson

same operations and aggregations (or Similar trade-offs exist in a distributed their semantics may vary, e.g., one com- DBMS, but the heterogeneity and auton- ponent DBMS may remove duplicates omy add considerably to the complexity. but another may not). Three FDBMS query optimization design approaches are as follows: Landers and Rosenberg [1982] discuss optimization problems and solutions Simple LDIs and GDM: The GDM adopted for some of the above issues in transforms the global transaction into Multibase. Mermaid has implemented and the smallest possible local transactions. tested comprehensive algorithms for query An LDI (and hence a component DBS) optimization [Chen et al. 19891 that involve receives multiple local transactions for a dynamically changing network environ- each global transaction. The LDI sends ment and different processing costs at dif- the result of each local transaction to the ferent component DBMSs. GDM. The GDM merges all the results. A query evaluation plan can be formu- Multibase takes this approach. It results lated as a program in an intermediate task in heavy workload for the GDM and sig- specification language and then executed. nificant communication between the DOL (Distributed Operation Language) GDM and the LDIs. [Rusinkiewicz et al. 19901 is an example of Medium complexity of the GDM and such a language. Its main functions include LDIs: The GDM transforms the global invocation of local and remote tasks (proc- transaction into a set of the largest pos- esses), synchronization of their execution, sible local transactions (one for each rel- data exchange between tasks (including re- evant component DBS). This reduces the formatting), and exception handling. The GDM workload and communication be- commands for the local systems can be tween the GDM and the LDIs. Fewer embedded in the DOL programs that are local transactions are sent to LDIs, and forwarded to the local systems and executed the data returned to the GDM is a result under local control. of processing a more complete local Query processing in an FDBMS can be transaction). divided into global processing performed by Complex GDM and LDIs: In this case, the Global Data Manager (GDM) (a con- GDM generates efficient programs that structing processor) of an FDBMS and Zocal involve participation of LDIs in the processing performed by the Local Data- global optimization. Partial results can base Interface (LDI) (a transforming pro- be sent to the GDM or other LDIs. LDIs cessor) associate with a component DBS. have to support additional functionalities Global query processing and optimization like sorting, removing duplicates, and relate to processing a query or transaction handling and merging temporary files. submitted by a federation user, called a DDTS and Mermaid take this approach. global transaction, and dividing it into mul- It results in a further reduction in com- tiple subtransactions, called local transac- munication and GDM workload. Rusin- tions, for the LDIs of the appropriate kiewicz and Czejdo [1987] discuss an component DBSs. Local query processing algorithm that attempts to maximize lo- and optimization relate to processing a lo- cal processing and minimize data transfer cal transaction at a single component DBS. between component DBSs. Each is discussed below. Global optimization involves evaluating Local optimization involves optimizing the following trade-offs: execution of local transaction received from the GDM. An FDBMS provides additional requirements as well as opportunities for l The amount of work done by the GDM and the complexity of the LDI, and local optimization as follows:

l The amount of communication and pro- l Query languages for the component cessing done by different component DBMSs may be different. Thus, opera- DBSs. tion transformation for different pairs of

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 227

the query language used by the FDBS FDBMS support different concurrency and the query language provided by a control mechanisms [Gligor and Popescu- component DBMS may involve different Zeletin 19861. The problem of global dead- techniques. lock detection must also be addressed l Physical database organization of each of [Logar and Sheth 19861. the component DBSs may be different. Several researchers are currently study- Similarly, criteria for access path selec- ing these problems [Alonso et al. 1987; tion for different component DBMSs Breitbart and Silberschatz 1988; Elmagar- may be different. Due to the autonomy mid and Helal 1988; Pu 19871. As argued of the component DBMS, however, the by Barker and Ozsu [1988] and Du et al. GDM or an LDI may not have enough [ 19891, however, the published solutions information on these matters. often make unrealistic and pessimistic as- sumptions, support a low level of concur- Onuegbe et al. [1983] and Dayal and rency, and/or sacrifice autonomy in order Goodman [1982] develop strategies for to obtain higher concurrency. local query optimization in DDTS and It is unlikely that a theoretically elegant Multibase, respectively. solution that provides conflict serializabil- ity without sacrificing performance (i.e., concurrency and/or response time) and 5.4 Global Transaction Management availability exists. One often accepted The Global Transaction Manager (GTM) trade-off is to limit the functionality of is responsible for maintaining database concurrency control in favor of preserving consistency while allowing concurrent up- site autonomy. Examples of this would be dates across multiple databases. Support- to allow only unsynchronized retrievals, ing global transaction management in an preclude multisite updates, or perform local environment with multiple heterogeneous updates off-line. Another approach is to and autonomous component DBSs is very devise mechanisms that specifically suit the difficult. This is underlined by the fact that limitations of a given environment and none of the prototype FDBMSs support provide the required level of consistency. global transaction management. Nowhere Eliassen and Veijalainen [1987] propose a does the autonomy of component DBMSs concept of S-Transactions (for semantic present more problems than in supporting transactions) suited for a banking environ- updates. ment consisting of a network of highly au- There are two types of transactions to be tonomous systems. It may be desirable to managed: global transactions submitted to devise solutions that do not meet the con- the FDBMS by federation users and local flict serializability criteria but that are transactions directly submitted to a com- practical and meet a desired level of consis- ponent DBMS by local users. The basic tency. Du and Elmagarmid [1989] propose problem in supporting global concurrency a weaker consistency criterion called Quasi- control is that the FDBMS does not know Serializability8 that works provided there about local transactions since a component are no value dependencies (e.g., referential DBMS is autonomous. That is, local wait- integrity constraints) across databases. for relationships are known only to the Garcia-Molina and Salem [1987] and transaction manager of the component Alonso et al. [1987] propose a concept of DBMS. Without knowledge about local as Sagas that provides semantic atomicity but well as global transactions, it is highly un- does not serialize execution of global trans- likely that efficient global concurrency con- actions. More work on weaker consistency trol can be provided. Due to the existence of local transactions, it is very difficult to recognize when the execution order differs ‘A Quasi-Serializable schedule is one in which the local transaction schedules are serializable and the from the serialization order at any site [Du global transactions are executed serially. We need to et al. 19891. Additional complications occur understand the practical significance of this and other when different component DBMSs and the proposed weaker consistency criteria better.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 228 l Amit Sheth and James Larson

criteria and a better understanding of their structured data, often called business data. significance in practical terms is needed. Recently, however, there has been a great Techniques to specify and execute the deal of activity for constructing DBMSs transactions that selectively provide ato- that manage data for the so-called nontra- micity, isolation, and durability properties ditional applications. These applications need to be further researched. Little atten- require less structured data in the forms tion has been given to the problems of fault such as text, image, graphics, and voice. tolerance, ensuring the integrity of redun- Current database research activities, par- dant data, and supporting integrity con- ticularly related to object-oriented database straints across component databases. systems, address centralized management of these types of data. We now need to investigate issues in integrating such sys- 6. FUTURE RESEARCH AND UNSOLVED tems [Sheth 1987b, 1988131. PROBLEMS The second significant extension is to We discussed the concept of federation in create information systems that not only the context of database systems. A feder- include database systems but also applica- ated database system is a collection of co- tion programs and expert systems. Such a operating but autonomous and possibly system may be called a federated knowledge heterogeneous database systems. A refer- base system. One of the main problems is ence architecture was used to study various to represent information contents, process- FDBS architectural alternatives and their ing capabilities, and semantics (including implications. A methodology for developing behavioral aspects) of programs and expert FDBSs, particularly the tightly coupled systems adequately. We need to define a FDBSs, was discussed. Finally, we dis- description that plays a role with respect to cussed important tasks that need to be an application program that is equivalent performed in order to develop and operate to the role a schema plays for a database. FDBSs. One such effort is that of a “capability Several problems need further research schema” for an application program and development. They include the [Ryan and Larson 19861. Models for fed- following: eration also need to be developed for such environments. Identifying and representing all seman- tics useful in performing various FDBS ACKNOWLEDGMENTS tasks such as schema translation and schema integration and determining con- We thank the referees, S. March, M. Rusinkiewicz, tents of schemas at various levels. G. Thomas, and M. Templeton, who have helped us Lack of software tools to aid in perform- toward the current version; their time and care is gratefully acknowledged. ing various FDBS tasks with a high de- gree of automation and an integrated REFERENCES toolset for developing, maintaining, and managing FDBSs. ABBOTT, K., AND MCCARTHY, D. 1988. Admi- Lack of adequate transaction manage- nistration and autonomy in a replication- transuarent distributed DBMS. In Proceedinm of ment algorithms that provide a specified the 14th International Conference on Very L&g; level of consistency (i.e., are correct with Data Bases (Aug.), pp. 195-205. respect to a given consistency criteria) ALONSO, R., AND BARBARA, D. 1989. Negotiating and fault tolerance at acceptable perfor- data access in federated database systems. In mance within the heterogeneity and au- Proceedings of the 5th International Conference tonomy constraints of an FDBS. on Data Engineering (Feb.), pp. 56-65. ALONSO, R., GARCIA-M• LINA, H., AND SALEM, K. How to address management and effi- 1987. Concurrency control and recovery for ciency issues related to autonomy of com- global procedures in federated database systems. ponent DBSs. In 9. Bull. IEEE-CS TC Data Em.- 10. 3 (Sept.), 5-11. The focus of the past activities in FDBSs BARKER, K., AND Ozsu, T. 1988. A survey of issues has been on databases that store more in distributed heterogeneous database systems.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 229

Tech. Rep. TR 88-9, Univ. of Alberta Edmonton, 243, Goos, G. and Hartmanis, J. Eds. Springer- Canada. Verlag, New York, pp. 141-156. BATINI, C., LENZERINI, M., AND NAVATHE, S. CZEJDO, B., RUSINKIEWICZ, M., AND EMBLEY, D. 1986. A comparative analysis of methodologies 1987. An approach to schema integration and for database schema integration. ACM Comput. query formulation in federated database systems. Suru. 18, 4 (Dec.), 323-364. In Proceedings of the 3rd International Conference BECK, H., GALA, S., AND NAVATHE, S. 1989. on Data Engineering (Feb.), pp. 477-484. Classification as a query processing technique in DATE, C. 1986. An Introduction to Database Systems, CANDIDE semantic data model. In Proceedings Vol. 1, 4th ed. Addison-Wesley, Reading, Mass. of the 5th International Conference on Data En- DAYAL, U., AND GOODMAN, N. 1982. Query optimi- gineering (Feb.), pp. 572-581. zation for CODASYL database systems. In Pro- BELCASTRO, V., ET AL. 1988. An overview of the ceedings of the ACM SZGMOD Conference, distributed query system D&S. In Proceedings of pp. 13&156. the International Conference on Extending Data DAYAL, U., AND HWANG, H. 1984. View definition Base Technolozy (Venice, Italv. Mar.). In Com- and generalization for database integration in a puter Science.--Vol. 303,’ Sp&ger-V&lag, New multidatabase system. IEEE Trans. Soft. Eng. York, pp. 170-189. SE-IO, 6 (Nov.), 628-644. BERTINO, E., AND HAAS, L. 1988. Views and security DE 1987. Special issue on federated database systems in distributed database management systems. In (mainly transaction management aspects). Proceedings of the International Conference on Q. Bull. IEEE-CS TC Data Eng. 10, 3 (Sept.). Extending Database Technology (Venice, Italy, DEMO, B. 1983. Program analysis for conversion Mar.). In Computer Science. Vol. 303, Springer- from a navigational to a specification data base Verlag, New York, pp. 155-169. interface. In Proceedings of the 9th International BLAKEY, M. 1987. Basis of a partially informed dis- Conference on Very Large Data Bases (Florence, tributed database. In Proceedings of the 13th In- Italy, Oct.), pp. 387-398. ternational Conference on Very Large Data Bases DEVOR, C., ELMASRI, R., LARSON, J., RAHIMI, S., AND (Brighton, UK, Sept.), pp. 381-388. RICHARDSON, J. 1982b. Five-schema architec- BREITBART, Y ., AND SILBERSCHATZ, A. 1988. ture extends DBMS to distributed applications. Multidatabase update issues. In Proceedings of Electron. Des. (Mar. 18), 27-32. the ACM SZGMOD Conference (June), 135-142. DEVOR, C., ELMASRI, R., AND RAHIMI, S. 1982a. The BRZEZINSKI, Z., GETTA, J., RYBNIK, J., AND STEP- design of DDTS: A testbed for reliable distributed NIEWSKI, W. 1984. UNIBASE-An integrated database management. Tech. Rep. HP-82-273: access to databases. In Proceedings of the 10th 17-38, Honeywell Computer Sciences Center, International Conference on Very Large Data Camden, Minn. Bases (Singapore, Aug.), pp. 388-395. DP 1988. Special issue on heterogeneous distributed CARDENAS, A. 1987. Heterogeneous distributed da- database systems. L. Lilien, Ed. D&rib. Process. tabase management: The HD-DBMS. In Proc. Tech. Comm. News. (Q.) 10, 2 (Nov.). ZEEE 75,5 (May), 588-600. Du, W., AND ELMAGARMID, A. 1989. Quasi serializ- CERCONE, N., MORGESTERN, M., SHETH, A., AND ability: A correctness criterion for global concur- LITWIN, W. 1990. Resolving semantic hetero- rency control in interbase. In Proceedings of the geneity. Panel at the International Conference on 15th International Conference on Very Large Data Data Engineering (Feb.). Bases (Amsterdam, Aug.), pp. 347-355. CERI, S., AND PELAGATTI, G. 1984. Distributed Du, W., ELMAGARMID, A., AND KIM, W. 1990. Databases-Principles and Systems. McGraw- Effects of local autonomy on heterogeneous dis- Hill, New York. tributed database systems. MCC Tech. Rep. CERI, S., PERNICI, B., AND WIEDERHOLD, G. 1987. ACT-OODS-EI-059-90, Microelectronics and Distributed database design methodologies. In Computer Technology Corp., Austin Tex. Proc. IEEE 75, 5 (May), 533-546. Du, W., ELMAGARMID, A., LEU, Y., AND OSTERMANN, CHEN, P. 1976. The entity-relationship model: S. 1989. Effects of local autonomy on global Toward a unified view of data. ACM Trans. Da- concurrency control in heterogeneous database tabase Syst. 1, 1 (Mar.), 9-36. systems. In Proceedings of the 2nd International CHEN, A., BRILL, D., TEMPLETON, M., AND Yu, C. Conference on Data and Knowledge Systems for 1989. Distributed query processing in Mermaid: Manufacturing and Engineering (Oct.). A frontend system for multiple databases. IEEE DWYER, P., AND LARSON, J. 1987. Some experiences J. Selected Areas Commun. 7, 3 (Apr.), 390-398. with a distributed database testbed system. In CHU, W., AND HURLEY, P. 1982. Optimal query pro- Proc. IEEE 75, 5 (May), 633-647. cessing for distributed databases systems. IEEE ELIASSEN, F. AND VEIJALAINEN, J. 1987. An S- Trans. Comput. C-31 (Sept.), 835-850. transaction definition language and execution CONVENT, B. 1986. Unsolvable problems related to mechanism. Tech. Rep. No. 275, GMD, Harden- the view integration approach. In Proceedings of bergplatz, D-1000 Berlin 12, FRG. the Znternational Conference on Database Theory ELIASSEN, F., AND VEIJALAINEN, J. 1988. A func- (Rome, Italy, Sept.). In Computer Science, Vol. tional approach to information system interoper-

ACM Computing Surveys, Vol. 22, No. 3, September 1990 230 l Amit Sheth and James Larson

ability. In Research into Networks and Distributed MIT/LCS/TM-141, Massachusetts Institute of Applications (Proceedings of the EUTECO ‘88), Technology, Cambridge, Mass. Speth, R. Ed., Elsevier Science Publishers B.V., HAMMER, K., AND TIMMERMAN, T. 1989. North-Holland, pp. 1121-1135. Automating data conversion between heteroge- ELLINGHAUS, D., HALLMANN, M., HOLTKAMP, B., neous databases. Tech. Rep. ACA-ST-046-89, AND KREPLIN, K. 1988. A multidatabase system Microelectronics and Computer Technology for transaction autonomy. In Proceedings of the Corp. International Conference on Extending Database HAYES, S., AND RAM, S. 1990. Multi-user view in- Technology (Venice, Italy, Mar.). In Computer tegration system (MUVIS): An expert system for Science, Vol. 303, Springer-Verlag, New York, view integration. In Proceedings of the 6th Znter- pp. 600-605. national Conference on Data Engineering (Feb.). ELMAGARMID, A. 1987. When will we have true het- HEIMBIGNER, D., AND MCLEOD, D. 1985. A feder- erogeneous databases? (A position statement on ated architecture for information management. ). In Proceedings of the ACM Trans. Off. Znf. Syst. 3, 3 (July), 253-278. Fall Joint Computer Conference (Dallas, Tex., Oct.), p. 746. HULL, R., AND KING, R. 1987. Semantic database ELMAGARMID, A., AND HELAL, A. 1988. Supporting modeling: Survey, applications, and research is- updates in heterogeneous distributed database sues. ACM Comput. Suru. 19,3 (Sept.), 201-260. systems. In Proceedings on the International Con- IEEE 1987. Special issue on distributed database ference on Data Engineering (Feb.). systems. Proc. IEEE 75, 5 (May). ELMASRI, R. 1981. GORDAS: A data definition, IISS 1986. The integrated information support sys- query and update language for the entity-cate- tem. Gateway 2, 2. Industrial Technology Insti- gory-relationship model of data. Tech. Rep. HR- tute (Mar.-Apr.). 81-250, Honeywell Inc., Camden, Minn. JACOBS, B. 1985. Applied Database Logic II: Hetero- ELMASRI, R., AND NAVATHE, S. 1989. Fundamentals geneous Distributed Query Processing. Prentice- of Database Systems. Benjamin/Cummings, Red- Hall, Englewood Cliffs, N.J. wood City, Calif. JACOBSON, G., PIATETSKY-SHAPIRO, G., LAFOND, C., ELMASRI, R., LARSON, J., AND NAVATHE, S. 1986. RAJINIKANTH, M., HERNANDEZ, J. 1988. Schema integration algorithms for federated da- CALIDA: A knowledge-based system for inte- tabases and logical database design. Tech. Rep., grating multiple heterogeneous databases. In Pro- Honeywell Corporate Systems Development Di- ceedings of the 3rd International Conference on vision, Camden, Minn. Data and Knowledge Bases (Jerusalem, Israel, ELMASRI, R., WEELDREYER, J., AND HEVNER, A. June), pp. 3-18. 1985. The category concept: An extension to KAUL, M., DROSTERN, K., AND NEUHOLD, E. entity-relationship model. Data and Knowledge 1990. ViewSystem: Integrating heterogeneous Engineering 1 (June). North-Holland, The Neth- information bases by object-oriented views. In erlands, pp. 75-116. Proceedings of the 6th International Conference FAGIN, R. 1978. On an authorization mechanism. on Data Engineering (Los Angeles, Calif., Feb.), ACM Trans. Database Syst. 3, 3, 310-331. pp. 2-10. GARCIA-M• LINA, H., AND KOGAN, B. 1988. Node KIM, W. 1989. Research directions for integrating autonomy in distributed systems. In Proceedings heterogeneous databases. In 1989 Workshop on of the International Symposium on Databases in Heterogeneous Databases (Chicago, Ill., Dec.). Parallel and Distributed Systems (Austin, Tex., LANDERS, T., AND ROSENBERG, R. 1982. An over- Dec.), pp. 158-166. view of Multibase. In Distributed Databases, GARCIA-M• LINA, H., AND SALEM, K. 1987. Sagas. H.-J. Schneider, Ed., North-Holland, The Neth- In Proceedings of the ACM SZGMOD Conference erlands, pp. 153-184. (May), pp. 249-259. LARSON, J. 1983a. Bridging the gap between network GE, L., JOHANNSEN, W., LAMERSDORF, W., REIN- and relational database management systems. HARDT, R., AND SCHMIDT, J. 1987. Import and Comput. (Sept.), 82-92. export of database objects in a distributed envi- ronment. In Proceedings of the ZFZP WG 10.3 LARSON, J. 1983b. Granting and revoking discre- Conference on Distributed Processing (Amster- tionary authority. Znf. Syst. 8, 4, 251-261. dam, Oct.). LARSON, J. 1989. Four reference architectures for GLIGOR, V., AND LUCKENBAUGH, G. 1984. Inter- distributed database management systems. connecting heterogeneous database management Computer, Standards and Interfaces, Vol. 8, systems. Comput. 17, 1 (Jan.), 33-43. pp. 209-221. GLIGOR, V., AND POPESCU-ZELETIN, R. 1986. LARSON, J., NAVATHE, S., AND ELMASRI, R. 1989. A Transaction management in distributed hetero- theory of attribute equivalence in databases with geneous database management systems. Znf. Syst. applications to schema integration. IEEE Trans. I I, 4,287-297. Softw. Eng. 15, 4 (Apr.), 449-463. HAMMER, M., AND MCLEOD, D. 1979. On database LIEN, Y. 1981. Hierarchical schemata for relational management system architecture. Tech. Rep. databases. ACM Trans. Database Syst. 6,48-69.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems 231

LITWIN, W. 1985. An overview of the multidatabase RAM, S., AND CHASTAIN, C. 1989. Architecture of system MRDSM. In Proceedings of the ACM Na- distributed data base systems. J. Syst. Softw. 10, tional Conference (Denver, Oct.), pp. 495-504. 2, 77-95. LITWIN, W., 1987. The future of heterogeneous da- ROSENTHAL, A., AND REINER, D. 1987. Theo- tabases. In Proceedings of the Fall Joint Computer retically sound transformations for practical da- Conference (Dallas, Tex., Oct.), pp. 751-752. tabase design. In Proceedings of the 6th Znterna- LITWIN, W., AND ABDELLATIF, A. 1986. Multi- tional Conference on Entity-Relationship database interoperability. IEEE Comput. 19, 12 Approach (New York, Nov.), pp. 97-113. (Dec.), 10-18. RUSINKIEWICZ, M. 1987. Heterogeneous databases: LITWIN, W., AND ABDELLATIF, A. 1987. An overview Towards a federation of autonomous systems. In of multidatabase manipulation language MDSL. Proceedings of the Fall Joint Computer Conference In IEEE Proc. 75, 5 (May), 621-632. (Dallas, Tex., Oct.), pp. 751-752. LITWIN, W., AND ZEROUAL, A. 1988. Advances in RUSINKIEWCIZ, M., AND CZEJDO, B. 1987. An ap- multidatabase systems. In Research into Net- proach to query processing in federated database works and Distributed Applications (Proceedings systems. In Proceedings of the 20th International of the EUTECO ‘88). Sneth. R.. Ed. Elsevier Conference on System Sciences (Hawaii, Jan.), Science Publishers’ B.V., fiorth-Holland, pp. 430-440. pp. 1137-1151. RUSINKIEWICZ, M., OSTERMANN, S., ELMAGARMID, LITWIN, W., ABDELLATIF, A., NICOL,AS, B., VIGIER, A., AND LOA, K. 1990. The distributed opera- P., AND ZEROUAL, A. 1987. MSQL: A multida- tion language for specifying multisystem appli- tabase language. Tech. Rep. 695, INRIA, BP 105, cations. In Proceedings of the 1st International 78153 Le-Chesnay, France. Conference on Systems Integration (Morristown, LITWIN, W., BOUDENANT, J., ESCULIER, C., FERRIER, N.J., Apr.). A., GLORIEUX, A., LA CHIMIA, J., KABBAJ, K., RUSINKIEWICZ, M., ELMASRI, R., CZEJDO, B., MOULINOUX, C., ROLIN, P., AND STANGRET, C. GEORAKOPOULOUS, D., KARABATIS, G., JA- 1982. SIRIUS Systems for Distributed Data MOUSSI, A., LOA, L., AND Ll, Y. 1989. Management. In Distributed Data Bases, H.-J. OMNIBASE: Design and implementation of a Schneider, Ed. North-Holland, The Netherlands, multidatabase system. In Proceedings of the 1st pp. 311-366. Annual Symposium in Parallel and Distributed LOGAR, T., AND SHETH, A. 1986. Concurrency con- Processing (Dallas, Tex., May), pp. 162-169. trol issues in heterogeneous distributed database RYAN, K., AND LARSON, J. 1986. The use of E-R management systems. Tech. Memo, Honeywell models in capability schemas. In Proceedings of Computer Sciences Center, Camden, Minn. the 5th International Conference on the Entity- MOTRO, A., AND BUNEMAN, P. 1981. Constructing Relationship Approach (Dijoin, France, Nov.). superviews. In Proceeding of the ACM SZGMOD SELINGER, P., AND WADE, B. 1976. An authorization Conference (May), pp. 54-64. mechanism for a relational database system. NAVATHE, S., ELMASRI, R., AND LARSON, J. ACM Trans. Database Syst. 1, 3, 242-255. 1986. Integrating user views in database design. SIEGEL, M. 1987. A survey on heterogeneous data- IEEE Comput. 19, 1 (Jan.), 50-62. base systems. Tech. Note 87-174.1, GTE Labo- NAVATHE, S., GALA, S., KAMATH, A., KRISHNAMUR- ratories, Waltham, Mass. THY, A., SAVASERE, A., AND WANG, W. 1989. A SHETH, A. 1987a. Heterogeneous distributed data- federated architecture for heterogeneous infor- base systems: Issues in integration. The 3rd Zn- mation systems. In the Workshop on Heteroge- ternational Conference on Data Engineering. neous Database Systems (Chicago, Ill., Dec.). IEEE Press, Washington, D.C. ONUEGBE, E., RAHIMI, S., AND HEVNER, A. SHETH, A. 1987b. When will we have true heteroge- 1983. Local query translation and optimization neous database systems? In Proceedings of the in a distributed system. In AFZPS Conference Fall Joint Computer Conference (Dallas, Tex., Proceedings, Vol. 52, National Computer Confer- Oct.), pp. 747-748. ence, AFIPS Press, pp. 229-239. SHETH, A. 1988a. Managing and integrating un- Ozsu, M., AND VALDURIEZ, P. 1990. Principles of structured and structured data: Problems of rep- Distributed Database Systems. Prentice-Hall, resentation, features, and abstraction. In Englewood Cliffs, N.J. Proceedings of the 4th International Conference PECKHAM, J., AND MARYANSKI, J. 1988. Semantic on Data Engineering. pp. 598-599. data models. ACM Comput. Suru. 20, 3 (Sept.), SHETH, A. 1988b. Building Federated Database Sys- 153-190. tems. In D&rib. Process. Tech. Comm. Newsl. 10, PIATETSKY-SHAPIRO, G., AND JAKOBSON, G. 1987. 2 (Nov.), 50-58. An intermediate database language and its rule- SHETH, A., AND GALA, S. 1989. Attribute relation- based transformation to different database lan- ships: An impediment in automating schema in- guages. Data and Knowledge Engineering 2,1-29. tegration. In the Workshop on Heterogeneous Pu, C. 1987. Superdatabases: Transactions across Database Systems (Chicago, Ill., Dec.). database boundaries. In Q. Bull. IEEE-CS TC SHETH, A., LARSON, J., AND WATKINS, E. Data Eng. 10, 3 (Sept.), 19-25. 1988a. TAILOR: A tool for updating views. In

ACM Computing Surveys, Vol. 22, No. 3, September 1990 232 l Amit Sheth and James Larson

Proceedings of the International Conference on et al. 1986, Brzezinski et al. 1984, Cardenas 1987, Extending Database Technology (Venice, Italy, Cardenas and Pirahesh 1980, Chung 1990, Deen Mar.). In Computer Science, Vol. 303, Springer- et al. 1985, Devor et al. 198213,Dwyer et al. 1986, Verlag, New York, pp. 190-213. Dwyer and Larson 1987, Ferrier and Stangret SHETH, A., LARSON, J., CORNELLIO, A., AND 1983, Furlani et al. 1983, Ge et al. 1987, Hammer NAVATHE, S. 1988b. A tool for integrating con- and McLeod 1979, Heimbigner and McLeod 1985, ceptual schemas and user views. In Proceedings IISS 1986, Jacobson et al. 1988, Landers and of 4th International Conference on Data Engi- Rosenberg 1982, Litwin 1985, Litwin and Vifier neering. pp. 176-183. 1987, Litwin et al. 1982, Rajinikanth et al. 1990, Rusinkiewicz et al. 1989, Smith et al. 1981, Spac- SOUZA, J. 1986. SIS: A schema integration system. capietra et al. 1982, Stephenson and Main 1986, In Proceedines of the BNCOD5 Conference. Stocker et al. 1984, Templeton et al. 1983, Tem- pp. 167-185. - ’ pleton et al. 1987b, Tsubaki and Hotaka 1980. TEMPLETON, M., LUND, E., AND WARD, P. 1987a. Pragmatics of access control in Mermaid. In Schema Translation: Elmasri et al. 1985, Jajodia Q. Bull. IEEE-CS TC Data Eng. 10, 3 (Sept.), and Ng 1983, Kalinichenko 1978, Katz 1980, Klug 33-38. 1981, Larson 1983a, Lien 1981, Morgestern 1981, Navathe 1980, Navathe and Cheng 1983, TEMPLETON, M., BRILL, D., CHEN, A., DAO, S., Pelagatti et al. 1978, Senko 1976, Sibley and LUND, E., MCGREGOR, R., AND WARD, P. Hardgrave 1977, Shoval and Even-Chaime 198713. Mermaid: A front-end to distributed het- 1987, Vassiliou and Lachovsky 1980, Zaniolo erogeneous databases. In Proc. IEEE 75,5 (May), 1979 (references), Zaniolo 1979, (bibliography). 695-708. Schema Integration: Batini and Lenzerini 1984, TEOREY, T. 1990. Database Modeling and Design: Batini et al. 1986, Czejdo et al. 1987, Deen et al. The Entity-Relationship Approach, Chaps. 8-9. 1987, Effelsherg and Mannino 1984, Elmasri Morgan Kaufmann, San Mateo, Calif. 1980, Elmasri et al. 1986, Hayes and Ram 1990, TEOREY, T., YANG, D., AND FRY, J. 1986. A logical Larson et al. 1989. Mannino and Effelsberg 1984, design methodology for relational databases using Navathe et al. 1986, Sheth et al. 1988b,-Sheth the extended entity-relationship model. ACM and Gala 1989, Souza 1986. Comput. Surv. 18, 2 (June), 197-222. Multidatabase Languages: Deen et al. 1987, Ja- THOMAS, G., et al. 1990. Heterogeneous distributed cobs 1985, Lamersdorf et al. 1987, Litwin and database systems for production Use. Comput. Abdellatif 1987, Litwin et al. 1988, Piatetsky- Surv. 22, 3 (Sept.), 237-266. Shapiro and Jakobson 1987, Rusinkiewicz et al. TSICHRITZIS, D., AND KLUG, A. Eds. 1978. The 1989. ANSI/XB/SPARC DBMS framework. Inf. Syst. Operation Translation and Optimization: Czejdo 3, 4. et al. 1987, Dayal 1983, Deen et al. 1987, Katz TSICHRITZIS, D., AND LOCHOVSKY, F. 1982. Data 1980, Onuegbe et al. 1983, Vassiliou and Lachov- Models, Chap. 14. Prentice-Hall, Englewood sky 1980, Zaniolo 1979 (bibliography). Cliffs, N.J. Transaction Management: Alonso et al. 1987, VEIJALAINEN, J., AND POPESCU-ZELETIN, R. 1988. Bernstein and Goodman 1981, Breibart et al. Multidatabase systems in ISO/OSI environment. 1987, Breitbart and Silberschatz 1988, DE 1987, In Standards in Information Technology and In- Du and Elmagarmid 1989, Du et al. 1989, Eliassen dustrial Control, Malagardis, N., and-Williams, and Veiialainen 1987. Elmagarmid and Du 1990. T., Eds. North-Holland. The Netherlands. , DD.__ Elmagaimid and Helal 1988, Elmagarmid and 83-97. Leu 1987, Gligor and Popescu-Zeletin 1985, Logar WANG, C., AND SPOONER, D. 1987. Access control and Sheth 1986, Pu 1987, Thomson 1987, in a heterogeneous distributed database manage- Veijalainen and Popescu-Zeletin 1986. ment system. In Proceedings of the 6th Sympo- ADIBA, M., AND DELOBEL, C. 1977. The problem of sium on Reliability in Distributed Software and the cooperation between different D.B.M.S. In Database Systems (Mar.). Architecture and Models in Data Base Manage- Yu, C., AND CHANG, C. 1984. Distributed query pro- ment Systems, Nijssen, G., Ed. North-Holland, cessing. ACM Comput. Surv. 16, 4, (Dec.), The Netherlands. 399-433. ADIBA, A., AND PROTAL, D. 1978. A cooperative sys- ZANIOLO, C. 1979. Design of relational views over tem for heterogeneous data base management network schemas. In Proceedings of the ACM systems. Znf. Syst. 3, 209-215. SIGMOD Conference, pp. 179-190. BATINI, C., AND LENZERINI, M. 1984. A methodol- ogy for data schema integration in entity- relationship model. IEEE Trans. Softw. Eng. BIBLIOGRAPHY SE-IO, 6 (Nov.). BELL, D., et al. 1987. MULTI-STAR: A multidata- Multi-DBMS and Federated Database Systems: base system for health information systems. In Adiba and Delobel 1977, Adiba and Portal 1978, Proceedings of the 7th International Conference Belcastro et al. 1988, Bell et al. 1987, Breitbart on Medical Informatic (Rome, Italy, Sept.).

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 233

BREITBART, Y., OLSON, P., AND THOMPSON, G. 1986. FURLANI, C., et al. 1983. The automated manufac- Database integration in a distributed heteroge- turing research facility of the national bureau of neous database system. In Proceedings 2nd Inter- standards. In Proceedings of the Summer Com- national Conference on Data Engineering. pp. puter Simulation Conference (Vancouver, Can- 301-310. ada, July). BREITBART, Y., SILBERSCHATZ, A., AND THOMPSON, HSIAO, D., AND KAMEL, M. 1989. Heterogeneous G. 1987. An update mechanism for multidata- databases: Proliferation, issues, and solutions. base systems. In Q. Bull. IEEE-CS TC Data Eng. IEEE Trans. Knowledge Data Eng. 1, 1,45-62. 10, 3 (Sept.), 12-18. HONG, S., AND MARYANSKI, F. 1988. Database de- CARDENAS, A., AND PIRAHESH, H. 1980. Data base sign tool generation via software reusability. In communication in a heterogeneous data base Proceedings of the COMPSAC. management system network. Inf. Syst. 5,55-79. JAJODIA, S., AND NG, P. 1983. On representation of CD 1988. Common Data model*plus. Product liter- relational structures by entity-relationship dia- ature, Control Data Corp. grams. In Entity Relationship Approach to Soft- CHUNG, C. 1990. DATAPLEX: An access to heter- ware Engineering, Davis et al., Eds. Elsevier ogeneous distributed databases. Commun. ACM Science Publishers, New York. 33, 1 (Jan.), 70-80. KALINICHENKO, L. 1978. Data model transforma- CODASYL, 1971. Database task group of CODA- tion method based on axiomatic data model ex- SYL programming language committee. Report. tension. In Proceedings of the 4th Very Large Data (Apr.). Base Conference. DAYAL, U. 1983. Processing queries over generaliza- KATZ, R. 1980. Database design and translation for tion hierarchies in a multidatabase system. In multiple data models. Ph.D. dissertation, Memo Proceedings of the 9th International Conference No. UCB/ERL M80/24, College of Eng., Univ. of on Very Large Data Bases. California, Berkeley, Calif. DEEN, S., AMIN, R., AND TAYLOR, M. 1987. Data KLUG, A. 1981. Multiple view, multiple data model integration in distributed databases. IEEE Trans. support in the CHEOPS database management Softw. Eng. SE-13, 7. system. Tech. Rep. 418, Computer Science Dept., DEEN, S., AMIN, R., OFORI-DWUMFUO, G., AND Univ. of Wisconsin-Madison, Madison, Wise. TAYLOR, M. 1985. The architecture of a gener- LAMERSDORF, W., ECKHARDT, H., EFFELSBERG, W., alized distributed data base system-PERCI*. JOHANNSEN, W., REINHARDT, K., AND SCHMIDT, Comput. J. 28, 3, 209-215. J. 1987. Database programming for distributed DWYER, P., KASRAVI, K., AND PHAM, M. 1986. A office systems. In Proceedings IEEE Comp. SOC. heterogeneous distributed database management Symp. on Office Automation (Gaithersburg, Md.). system (DDTS/RAM). Tech. Rep. CSC-86- 7:8216, Honeywell Computer Sciences Center, LARSON, J., NAVATHE, S., AND ELMASRI, R. 1989. A Camden, Minn. theory of attribute equivalence in databases with applications to schema integration. IEEE Trans. W., AND MANNINO, M. 1984. EFFELSBERG, Softw. Eng. 15, 4 (Apr.), 449-463. Attribute equivalence in global schema design for heterogeneous distributed databases. Inf. Syst. 9, LITWIN, W., AND VIGIER, P. 1987. New capabilities 3/4. of the multidatabase system MRDSM. In HLSUA Forum XLV Proceedings (New Orleans, ELMAGARMID, A., AND Du, W. 1990. A paradigm for concurrency control in heterogeneous distributed Oct.). database systems. In Proceedings of the 6th Inter- MANNINO, M., AND EFFELSBERG, W. 1984. national Conference on Data Engineering. (Feb.). Matching techniques in global schema design. In Proceedings of First International Conference on ELMAGARMID, A., AND LEU, Y. 1987. An optimistic concurrency control algorithm for heterogeneous Data Engineering (Apr.), pp. 418-425. distributed database systems. In Q. Bull. IEEE- MORGESTERN, M. 1989. A unifying approach for CS TC Data Eng. 10, 3 (Sept.), 26-32. conceptual schema to support multiple data ELMASRI, R. 1980. On the design, use, and integra- models. In Proceedings of the 2nd International tion of data models. Ph.D. dissertation, Rep. Conference on Entity-Relationship Approach STAN-CS-80-801, Computer Science Dept., (Washington, D.C., Oct.), pp. 281-299. Stanford Univ., Stanford. NAVATHE, S. 1980. An intuitive approach to nor- FERRIER, A., AND STANGRET, C. 1983. Hetero- malize network structured data. In Proceedings of geneity in the distributed database management the 6th International Conference on Very Large system SIRIUS-DELTA. In Proceedings of the Data Bases. 8th Very Large Data Base Conference (Mexico NAVATHE, S., AND CHENG, A. 1983. A methodology City). for database schema mapping from extended en- FONG, E., AND GOLDFINE, A. 1986. Data base direc- tity relationship models into the hierarchical tions: Information resource managementi Mak- model. In Entity Relationship Approach to Soft- ing it work, Executive Summary. SIGMOD ware Engineering, Davis, et al. Eds. Elsevier Sci- RECORD 15, 3 (Sept.). ence Publishers, New York.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 234 l Amit Sheth and James Larson

PELAGATTI, G., PAOLINI, P., AND BRACCHI, G. Meitner Institut, Berlin GmnH, D-1000 Berlin 1978. Mapping external views to common data 39, FRG. model. Znf Syst. 3. ZANIOLO, C. 1979. Multimode1 external schemas for RAJINIKANTH, M., JACOBSON, G., LAFOND, C., PAPP, CODASYL data base management systems. In w., AND PIATETSKY-SHAPIRO, G. 1990. Data Base Architecture, Bracchi, G., and Nijssen, Multiple database integration in CALIDA: De- G., Eds. North-Holland, The Netherlands. sign and integration. In Proceedings of the 1st International Conference on Systems Integration (Apr.). REINER, D., BROWN, G., FRIEDELL, M., LEHMAN, J., GLOSSARY MCKEE, R., RHEINGANS, P., AND ROSENTHAL, A. 1987. A Database designer’s workbench. In Entity-Relationship Approach, Spaccapietra, S., Accessing Processor: Software that ac- Ed. Elsevier Science Publishers. New York. cepts commands and produces data by pp. 347-360. executing commands against a database. SENKO, M., 1976. DIAM as a detailed example of Class of Users: A set of users performing the ANSI SPARC architecture. In Modelline in closely related tasks. Data Base Management Systems, Nijssen, G.,-Ed. Common Data Model (CDM): A data North-Holland, The Netherlands. model to which schemas of different com- SHIPMAN, D. 1981. The functional data model data language DAPLEX. ACM Trans. Database Svst. ponent DBMSs are translated for the 6, l-(Mar.), 140-173. purpose of representation in a com- SMITH, J., et al. 1981. Multibase: Integrating heter- mon format and facilitation of schema ogeneous distributed database systems. In Pro- integration. ceedings of the National Computer Conference. Component Database Administrator SPACCAPIETRA, S., DEMO, B., DILEVA, A., PARENT, (component DBA): The administrator C., CELLIS, C., AND BELFAR, K. 1982. An ap- of a component DBS (also called local proach to effective heterogeneous database co- operation. In Distributed Data Sharing Systems, DBA) who, among other things, decides van de Riet. R., and Litwin. W.. Eds. North- data access rights of local users as well Holland, The Netherlands, pp: 209-218. as (individuals and/or classes) of federa- STEPHENSON, G., AND MAIN, R. 1986. ARCHEDDA tion uses. It is the component DBA’s Prototype. Final Report. CRI, Great Britain. responsibility to manage the local STOCKER, P., et al. 1984. Proteus: A heterogeneous schema, translate it to create the com- distributed data-base project. In Databases: Role and Structure, Gray, P., and Atkinson, M., Eds. ponent schemas, and define export Cambridge University Press, New York. schemas. TEMPLETON, M., BRILL, D., CHEN, A., DAO, S., AND Component DBMS: A DBMS participat- LUND, E. 1986. Mermaid: Experiences with net- ing in a multidatabase system. A com- work operation. In Proceedings of the 2nd Znter- ponent DBMS participating in an FDBS national Conference on Data Engineering. is autonomous and allows local opera- TEMPLETON, M., BRILL, D., HWANG, A., KAMENY, I., tions as well as global (federation) oper- AND LUND, E. 1983. An overview of the mer- maid system. A frontend to heterogeneous data- ations that meet its constraints. bases. In Proceedings of EASCON 83. Component Schema: A translation of a THOMSON, G. 1987. Multidatabase concurrency local schema into an equivalent schema control. Ph.D. dissertation, Oklahoma State Uni- in the common data model. versity. Constructing Processor: Software that TSUBAKI, M., AND HOTAKA, R. 1980. Distributed partitions and/or replicates operations multidatabase environment with a supervisory data dictionary database. In Entity-Relationship produced by a single processor for exe- Approach to System Analysis and Design, Chen cution by two or more processors. Also P., Ed. North-Holland, The Netherlands. software that merges data produced by VASSILIOU, Y., AND LOCHOVSKY, F. 1980. DBMS two or more processors for consumption transaction translation. In Proceedings COMP- by another processor. SAC 80, IEEE Computer Software and Applica- tion Conference. Database Management System VEIJALAINEN, J., AND POPESCU-ZELETIN, R. 1986. (DBMS): Software that manages a col- On multi-database transactions in a cooperative, lection of structured data. Management autonomous environment. Tech. Rep., Hahn- includes providing data management

ACM Computing Surveys, Vol. 22, No. 3, September 1990 Federated Database Systems l 235

services including data access, constraint or data that can be passed to another enforcement, and consistency manage- processor. ment. Global (or External) Operation: An op- Database System (DBS): A database eration that is not submitted by a local system (DBS) consists of a DBMS that user (e.g., an operation submitted to the manages one or more databases. component DBS by the FDBMS or an- Distributed DBMS: A system that man- other component DBS). ages multiple databases. Local Schema: A schema of a component Export Schema: A subschema of a com- DBS in a native data model of the com- ponent schema specifically defined to ex- ponent DBMS. press the access constraints on a class of Local User: A user of a component DBS. federation users to access the data rep- Multidatabase System (MDBS): A sys- resented in the component schema. tem that allows operations on multiple External Schema: A subschema or a databases. view defined over a federated schema Processor: Software that performs oper- primarily for a pragmatic reason of not ations on data and commands. having to define too many federated Schema Object: A description of a data schemas or to tailor a federated schema element in a schema. For example, a for smaller groups of federation users schema expressed in the Entity Relation- than that of a federated schema. ship model has three types of schema Federated Database System (FDBS): objects: entity types, relationship types, A system that is created to provide op- and attribute definitions. An example of erations on databases managed by auton- an entity type schema object is an “Em- omous, and possibly heterogeneous, ployee” entity type. DBSs. The software that manages an Schema, Subschema, and View: A rep- FDBS is called a federated DBMS resentation of the structure (syntax), se- (FDBMS). mantics, and constraints on the use of a Federated Schema: An integration of database (or its portion) in a particular several export schemas created to repre- data model. A schema is a collection of sent data access requirements and rights schema objects. A subschema is a collec- of a class or group of federation users. tion of subsets of that schema’s objects. Federation Administrator (federation A view is any connected portion of a DBA): The administrator of a federated schema. In other words, a schema is a schema or the federated database sys- collection of views. tem who, among other things, creates Transforming Processor: Software that and maintains federated and external translates commands from one language schemas. or format to another language or format Federation User: A user of an FDBS or or translates data from one format to an application running over an FDBS. another. Filtering Processor: Software that con- User: An individual or an application us- strains operations that can be applied ing a database system.

Appendix: Features of Some FDBS/Multi-DBMS Efforts Table A.1 summarizes the choice of a common data model and the types of component DBMSs integrated into some of the experimental prototype systems. Table A.2 gives an empirical and partial list of significant or interesting features of some of these efforts. No effort is made to present all systems that exhibit FDBS features or to capture all important details of those mentioned.

ACM Computing Surveys, Vol. 22, No. 3, September 1990 236 l Amit Sheth and James Larson

Table A.l. Data Models of Various Multi-DBMS/FDBS Efforts Types of Component Effort CDM DBMSs Reference ADDS Extended Relational Hierarchical, Relational, Files [Breitbart et al. 19861 CALIDA Relational like Relational, Relational like, [Jacobson et al. 19881 Files DDTS Relational Network, Relational [Dwyer and Larson 19871 DQS Relational Hierarchical, Network, [Belcastro et al. 19881 Relational IISS E-R Variant (IDEFlx) Network, Relational [IISS 19861 Mermaid Relational (DIL) Relational (partial integration [Templeton et al. 1987b] of text) MRDSM Relational Relational [Litwin 19851 Multibase Functional (DAPLEX) Network, Relational [Landers and Rosenberg 1982 ] OMNIBASE Extended Relational Relational [Rusinkiewicz et al. 19891 SIRUS-DELTA Relational Network, Relational [Litwin et al. 19821 SCOOP E-R Hierarchical, Network, [Spaccapietra et al. 19821 Relational ’ Extended E-R is used for integrity constraint enforcement at the federated schema level and at the external schema level. The primary CDM can be said to be relational since the internal command language is based on the .

Table A.2. Significant Features Effort ADDS Tightly coupled, limited updates, in-house use of the prototype system CALIDA More like loosely coupled, intermediate language, interactive, menu- driven user interface, data dictionary editor/browser, in-house use of the prototype system DDTS Tightly coupled architecture, integrity constraint checking, local query optimization, use of local and long haul communication DQS Tightly coupled, completeness of implementation, auxiliary schema, IBM environment, attention to autonomy IISS More like tightly coupled, queries must be compiled, forms-based user interface Mermaid Tightly coupled, completeness of implementation, data dictionary, access control, global query optimization, extensions to include texts, experience with communication systems, workstation environment MRDSM Loosely coupled architecture, multidatabase language, dealing with mul- tiple semantic interpretations Multibase Tightly coupled, completeness of implementation, schema integration, global query optimization, auxiliary schema, architecture, number of component DBMSs integrated in various prototypes OMNIBASE Loosely coupled, DEC environment, query processing, distributed DBMS as a comnonent DBMS

Received November 1988; final revision accepted June 1990.

ACM Computing Surveys, Vol. 22, No. 3, September 1990