Query Languages for Relational Multidatabases

Query Languages for Relational Multidatabases

VLDBJourna~2, 153-171 (1993), Fred J. Maryanski, Editor ©VLDB Query Languages for Relational Multidatabases John Grant, Witold Litwin, Nick Roussopoulos, and Timos Sellis Received December 5, 1990; revised version received June 18, 1992; accepted November 2, 1992. Abstract. With the existence of many autonomous databases widely accessible through computer networks, users will require the capability to jointly manipu- late data in different databases. A multidatabase system provides such a capability through a multidatabase manipulation language, such as MSQL. We propose a theoretical foundation for such languages by presenting a multirelational algebra and calculus based on the relational algebra and calculus. The proposal is illus- trated by various queries on an example multidatabase. It is shown that properties of the multirelational algebra may be used for optimization and that every multire- iational algebra query can be expressed as a multirelational calculus query. The connection between the multirelational languages and MSQL, the multidatabase version of SQL, is also investigated. Key Words. Multidatabase, multirelational algebra, multirelational calculus, query optimization. 1. Introduction The great diversity of computing systems and software accessible through computer networks has created a major problem in the interoperability and integration of heterogeneous systems and components. A special case in the area of database systems, and the topic of this article, is the logical grouping and cohesive access of relational databases. A large mainframe with a database system or a database server like Compuserve may support dozens of databases. A local network can have several database servers and a database for hundreds of workstations (e.g., the ADMS system, Roussopoulos and Kang, 1986). A public database access system, such as the videotex system Teletel in France, may provide access to thousands of databases for millions of people. With the introduction of INGRES in Ultra 4.0, the OS2/DB presentation John Grant, Ph.D., is Professor, Dept. of Computer and Information Sciences, Towson State University, Towson, MD 21204; Witold Litwin, Ph.D., is Professor, Universite Paris Dauphine; PI. du Mal. de Lattre de 'Ihssigny,75775 Paris Cedex 16, France; Nick Roussopoulos, Ph.D., is Professor, and Timos Sellis, Ph.D., is Associate Professor, Dept. of Computer Science, University of Maryland; College Park, MD 20742, USA. 154 manager and other inexpensive relational systems, SQL databases are spreading out on workstations and PCs. A major consequence of this proliferation of databases is that users will usually have data in many databases and will frequently need to jointly manipulate data in different databases. A Teletel user who wishes to go out to a movie and dinner may wish to find the cinemas and restaurants that are on the same street in a cinema guide and restaurant guide database. The manager of a company may need to see the account balances that the company has at different bank branches. This manager may further wish to query all the bank branches (and their databases) simultaneously. A traveler looking for the cheapest route may need to query several airline, rail, and bus databases. Furthermore, an address change may require the updating of the corresponding databases. Such manipulation requires functions that do not exist in standard manipulation languages which are designed to manipulate data in a single database. Classical data manipulation languages deal with single relations as objects of manipulation, while the need is to allow sets of relations as the objects. Additionally, data in different databases may be presented in a different format so that corresponding relations may not be union-compatible. The differences between similar data result from different perceptions of the same real universe--local needs, competition for customers, political incompatibility, etc. It would be impossible to provide integration via a global schema because there would be no agreement on the global administrator. The example databases given in the next section, based on actual databases, show some of the differences that may occur. A system for the manipulation of data in autonomous databases is called a multidatabase system (MDBS) and the corresponding language is called a multi- database manipulation language (MML) (Litwin et al., 1982). Databases that may be manipulated together without global integration are called interoperable (Litwin and Abdellatif, 1986). Litwin et al., 1989 present the multidatabase extension of SQL, called MSQL. All the functions of SQL are by definition functions of MSQL. MSQL also contains new functions designed for a nonprocedural manipulation of data in different and nonintegrated SQL databases. The rationale is that a user's query over these databases should be expressible as a single statement. For this purpose, MSQL manipulates sets of tables, called multitables or multirelations. In particular, the result of a query may be a multitable. One way to deal with a multitable query is to decompose it into the set of relational queries it expresses and then to transform those queries to the relational algebra for optimization. Clearly, this is not necessarily the most efficient method. In particular, the techniques of Sellis (1988) for common subexpression processing are applicable. In addition to the detection of common subexpressions, the processing of useless queries should be avoided. A promising technique for dealing with multitable queries is to define an extension of the relational algebra and calculus for multitables. The system may then optimize a query both at a high level multitable formulation (first in the calculus and then in the algebra) and then perform optimization at the VLDB Journal 2 (2) Grant: Query Languages for Relational Multidatabases 155 single relational level. This is the approach we introduce here. Research related to multidatabase systems has been reported extensively in the past. In some sense, all the work in Distributed (homogeneous) Database Systems (Ceri and Pelagatti, 1984) is related. However, the work on heterogeneous databases is of more interest here; a comprehensive survey on this subject appears in Elmagarmid and Pu (1990). Most of the systems and approaches described address issues related to the interconnection of database systems that use different query languages and methods to allow exchange of data among such databases. The question of which model should be used to express queries in such systems is not addressed, however. Our research attempts to define such a model in the form of an algebra and a calculus that generalizes the standard relational algebra respectively for relational multidatabases. The structure of this article is as follows. Section 2 introduces the multidatabase model, presents the basic definitions and the example databases. Section 3 defines the multirelational algebra and uses it to solve queries on the example databases. Section 4 presents the multirelational calculus, uses it to solve some of the example queries, and reduces the multirelational algebra to the multirelational calculus. Section 5 shows the connections of the multirelational algebra and calculus with MSQL, the extension of SQL to multidatabases. Section 6 provides properties of the multirelational algebra that can be used for query optimization. Section 7 concludes this article. 2. Model and Example for Multidatabases We begin with a few definitions from relational databases. A relation schema is a relation name together with a set of attributes identified by names and drawing values from some domains. An element of a relation defined by the schema is called a tuple. A relation is a collection of tuples defined on the same schema. A database schema is a database name together with a set of relation schemas. A database is a set of relations satisfying the database schema. A query is a dynamic definition of a relation, including schema and instance, on a database. A multidatabase is a collection of databases; it may be explicitly or implicitly named. A multirelation is a set of relations dynamically defined by a multirelational query. The relations of a multirelation are usually in different databases of a multidatabase; they may have different arities, attribute names, and domain types for models of the same elements of the real universe. Database and multidatabase names may be used as prefixes to clarify the naming. A multirelational query defines a (target) muitirelation from a (source) multirelation through multirelational operations (to be defined later). A query contains some relation names, attribute names, and possibly domain value specifications (e.g., value expressions or unit and precision definitions, that are bound by the system to the actual names and value types). Some of the issues involving name equality, different units, and precision 156 will be considered in Section 6. A multirelational(data)model extends relational queries to multirelational queries. The two axioms given below relate our multirelational data model to the relational model. Axiom 1 refers to the naming of operations. Axiom 2 refers to compatibility. Axiom 1: A multirelational operation, O, transforms one or more multirelations into a multirelation. Axiom 2: For any relation, R, and relational operation on R, there is a multirelational operation with the same or appropriately modified name that provides the same result when applied to the multirelation M = {R}. By compatibility,

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us