Database Integration and Federation Federated Database Systems

Database Integration and Federated Database Systems for Federation Managing Distributed, Heterogeneous, and Autonomous Databases. Erwin M. Bakker A.P. Seth, J.A. Larson DBDM 26-09 2005 ACM Computing Surveys, Vol.22, No. 3, September 1990 Federated Database Systems Federated Database Systems Database System (DBS) The term Federated Database Systems Database Management System (DBMS) was first used by Hammer and McLeod One or more Databases in 1979 and by Heimbigner and McLeod Federated Database System (FDBS) A collection of cooperating but autonomous component in 1985 database systems (integrated to various degrees) The federated database management system (FDBS) Used for different architectures provides a controlled and coordinated manipulation of the component DBSs Key characteristic: cooperation among independent systems 1 Federated Database Systems Federated Database Systems Component Database System (DBS) Can be centralized or distributed Can participate in one ore more federations Can be a FDBMS itself Can continue its local operations while participating in federation Managed by users of the federation or by the administrations of the FDBS and DBSs Characteristics of Federated Database Systems Database Systems Application of the federation concept for Characterization based on managing existing heterogeneous and Distribution Heterogeneity autonomous DBSs Data models Transaction management Various alternative architectures and Autonomy components Based on networking Development and operation of such Single, many DBSs in LAN, WAN systems Update related No updates, nonatomic, atomic 2 FDBS Data Distribution FDBS Heterogeneity Much of the data distribution is due to the existence Many Types of Heterogeneity of multiple participating DBSs Stored on single or multiple computers DBSs Co-located or geographically distributed Due to differences in DBMSs Vertical and horizontal database partitions Due to semantics of the data Availability Operating Systems Reliability Improved access times Hardware/Systems (In distributed databases the distribution of data is induced for these benefits) Communication Etc. FDBS Heterogeneity Differences in structure Different data models give different structural primitives (relations vs record type) For example: address can be an entity in one schema and a composite attribute in another Object oriented: generalization and inheritance (whereas others do not have those primitives) Differences in constraints Semantics like referential integrity constraints could be available or not (=> extra triggers would be needed to capture those semantics) Differences in Query Languages QUEL, SQL, MySQL, etc. Transaction Management Concurrency control, commit protocols, recovery, etc. 3 FDBS Semantic Heterogeneity FDBS Semantic Heterogeneity Meaning, interpretation, intended use of the same or related Meaning, interpretation, intended use of the same or related data (Difficult) data (Difficult) Example 1: Example 2: Attribute MEAL_COST of relation RESTAURANT in DB1 Attribute GRADE of relation COURSE in DB1 Average cost of a meal per person in a restaurant without Is grade of a student from the set of values [A,B,C,D,F] (C ~ service charge and tax [61,75]) Attribute MEAL_COST of relation BOARDING in DB2 Attribute SCORE of relation CLASS in DB2 Average cost of a meal per person inclusive charge and tax A normalized score on the scale from 0 to 10 rounded to 0.5 How do you compare? (7.5 ~ [73,77] DB1.RESTAURANTS.MEAL_COST with DB2.BOARDING.MEAL_COST How do values correspond? DB1.COURSE.GRADE with DB2.CLASS.GRADE FDBS Autonomy DBSs in a federation are in principle autonomous Different aspects (Popescu-Zeletin 1988) Design autonomy Communication autonomy Component DBMS can decide with who and when to communicate When and how to respond to requests Execution autonomy 4 FDBS Execution Autonomy FDBS Association Autonomy DBMS decides on priority and inference of DBS decides how much to share Functionality Local operations vs external operations Resources (data) When there is execution autonomy a FDMBS can not Services enforce an order of execution of the command on a Ability to associate or disassociate with the FDBMS component DBMS Control over local operation DBMS can abort commands that not meet local These are conflicting requirements with the need to constraints share => relaxing autonomy DBMS does not need to inform FDBMS about order 1. Entry or departure of DBS in FDBS must be based on an agreement Operationally it means treating external operations 2. Informing the FDBS about the order of execution the same way as local operations Federated Databases Systems Reference Architecture in terms of Processors and schemas FDBS Evolution process Tasks for FDBS development Schema translation Access control Negotiation Schema integration Tasks for FDBS operation: Query formulation Command transformation Query processing and optimization Transaction management 5 Reference Architecture: Components Most centralized, distributed and federated database systems can Be expressed using these basic Components. Processor Types Transforming processor Processor perform different functions on for data model transparency the data manipulation commands and Translates commands from one language accessed data: to another language Transforming processor (for data model Transforms data from one format to transparency) another format to enable Filtering processor Filtering processor Schema translation Constructing processor Needs mappings between objects of the Accessing processor schema’s 6 Filtering processor for data model transparency Constrain the commands and associated data that can be passed to another processor. Syntactic constraint checkers (commands syntactically correct) Semantic integrity constraint checker Check that the commands will not violate semantic integrity constraints (e.g. negative age, postal code, etc.) Modifies commands so that they will be semantically correct. Verifies produced data on semantically constraints Access control Constructing processor Partition and/or replicate an operation submitted by a single processor into operations that are accepted by two or more processors Also merge resulting data from several processors into a single data set for consumption by another processor Transparency on Location Distribution replication 7 Schema Types In the Accessing Processors Reference Architecture Centralized DBMS three levels (ANSI/X3/SPARC): Conceptual schema Conceptual or logical data structure description and their relationships Internal schema Physical characteristics of the logical data structures (placement of records, indexes, physical representations of relations, etc.) Chances can be done without implications on the conceptual schema External schema Subset/part of the database that may be accessed by a user or class of users is described by an external schema 8 A Five Level Schema Architecture for Federated Databases Local Schema Is the conceptual schema of a component DBS Component Schema Derived by translating local schemas into a data model A centralized DBMS in terms called the canonical or common data model of the FDBS Of our architecture. They describe divergent or local schema by using a single representation They add semantics that is missing in local schema They facilitate negotiation and integration for tightly coupled FDBS; and negotiation and specification of views and multiple database queries in loosely coupled FDBS) A Five Level Schema Architecture A Five Level Schema Architecture for Federated Databases for Federated Databases Export Schema External Schema define a schema for a user and/or A subset of a component schema thaht is available for a application, because of FDBS Customization Includes access control info (a filtering processor can be used for this) Federated schema can be large and difficult to change. External schema can be defined on a subset, using a different To facilitate control and management of association data model to meet user needs. autonomy Additional Integrity Federated Schema Can be specified, for example for data mining applications Integration of multiple export schemas Access control Info on data distribution is produced when integrating export schemas (sometimes kept in a separate distribution Export schemas provide access control to the data managed by or allocation schema) the component databases External schemas provide access control to the data managed Multiple federated schemas; one for each class of federation users, e.g., managers, employees, applications by the federated database 9 System Architecture for a Federated Database System The five level schema and the processor concept allows us to give the following system architecture for a FDBS. System Architecture for a Federated Database System The five level schema has some redundancies. Redundancies between external and federated schemas A federated schema could in principle be produced for every user, making the external schema redundant Redundancies between external schema and schema of a component DBS and an export schema If access control is correctly handle by the DBS and translation of a local schema into a component schema is not necessary When component DBS use the CDM (Common Data Model) used by the FDBS, it is not necessary to define a component schema 10 Additional Basic Components in Federated Database System Auxiliary schema

Load more