A Language for Interoperability in Relational Multi-Database Systems*

SchemaSQL - A Language for Interoperability in Relational Multi-database Systems* Laks V. S. Lakshmanan+ Fereidoon SadriS Iyer N. Subramaniant 1 Introduction Abstract In recent years, there has been a tremendous pro- liferation of databases in the work place, dominated We provide a principled extension of SQL, called by relational database systems. An emerging need SchemaSQL , that offers the capability of uni- for sharing data and programs across the different form manipulation of data and meta-data in re- databases has motivated the need for Multi-database lational multi-database systems. We develop a systems (MDBS), sometimes also referred to as hetero- precise syntax and semantics of SchemaSQL in geneous database systems and federated database sys- a manner that extends traditional SQL syntax tems. Systems capable of operating over a distributed and semantics, and demonstrate the following. (1) network and encompassing a heterogeneous mix of SchemaSQL retains the flavour of SQL while sup- computers, operating systems, communication links, porting querying of both data and meta-data. (2) and local database systems have become highly desir- It can be used to represent data in a database able, and commercial products are slowly app’earing in a structure substantially different from origi- on the market. For surverys on MDBS, see [ACM901 nal database, in which data and meta-data may (in particular, Sheth and Larson, and Litwin, hark, be interchanged. (3) It also permits the cre- and Roussopoulos), and Hsiao [Hsi92]. ation of views whose schema is dynamically de- One of the fundamental requirements in a multi- pendent on the contents of the input instance. (4) database system is interoperabikty, which is the ability While aggregation in SQL is restricted to values to uniformly share, interpret, and manipulate informa- occurring in one column at a time, SchemaSQL tion in component databases in a MDBS. Almost all permits “horizontal” aggregation and even aggre- factors of heterogeneity in a MDBS pose challenges for gation over more general “blocks” of informa- interoperability. These factors can be classified into se- tion. (5) SchemaSQL provides a great facility mantics issues (e.g., interpreting and cross-relating infor interoperability and data/meta-data manage- formation in different local databases), syntactic issues ment in relational multi-database systems. We (e.g., heterogeneity in database schemas, data models, provide many examples to illustrate our claims. and in query processing, etc.), and systems issues (e.g., We outline an architecture for the implementa- operating systems, communication protocols, consis- tion of SchemaSQL and discuss implementation tency management, security management, etc). We algorithms based on available database technology focus on syntactic issues here. We consider the prob- that allows for powerful integration of SQL based lem of interoperability among a number of component relational DBMS. relational databases storing semantically similar information in structurally dissimilar ways. As was pointed l This work was supported by grants from the Natural Sci- out in [KLKSl], the requirements for interoperability ences and Engineering Research Council of Canada (NSERC), even in this case fall beyond the capabilities of conven- the National Science Foundation (NSF), and The University of North Carolina at Greensboro. tional languages like SQL. t Dept of Computer Science, Concordia University, Mon- Some of the key features required of a language for treal, Canada. {laks,subbu}Ocs.concordia.ca interoperability in a (relational) MDBS are the fol- $ Dept of Mathematical Sciences, University of North Car- olina, Greensboro, NC. [email protected] lowing. (1) The language must have an expressive Permission to copy without fee all or port of this material is power that is independent the schema with which a gmnted provided that the copies ore not, mode or distributed for database is structured. For instance, in most conven- direct commercial advantage, the VLDB copyright notice and tional relational languages, some queries (e.g., “find all the title of the publication and its dote appear, and notice is department names”) expressible against the database given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires o fee univ-A in Figure 2 are no longer expressible when the and/or special permission from the Endowment. information is reorganized according to the schema of, Proceedings of the 22nd VLDB Conference say, univ-B there. This is undesirable and should be Mumbai(Bombay), India, 1996 avoided. (2) To promote interoperability, the language 239 must permit the restructuring of one database to con- a variable that ranges over the (tuples of the) relation form to the schema of another. (3) The language employees (in the usual SQL jargon, these variables must be easy to use and yet sufficiently expressive. are called aliases.) The select and where clauses re- (4) The language must provide the full data manip- fer to (the extension of) attributes, where an attribute ulation and view definition capabilities, and must be is denoted as <var>. <attName>, var being a (tuple) downward compatible with SQL, in the sense that it variable declared in the from clause, and attName be- must be compatible with SQL syntax and semantics. ing the name of an attribute of the relation (extension) We impose this requirement in view of the importance over which var ranges. and popularity of SQL in the database world. (5) Fi- When no ambiguity arises, SQL permits certain ab- nally, the language must admit effective and efficient breviations. Queries of Figure l(b,c) are equivalent to implementation. In particular, it must be possible to the first one, and are the most common ways such realize it non-intrusive implementation that would re- queries are written in practice. Note that in Figures quire minimal additions to component RDBMS. l(b) and l(c), empl 0 y ees acts essentially as a tuple Ccmtcibutions: (1) We propose a language called variable. SchemaSQL which meets the above criteria, re- The SchemaSQL syntax extends that of SQL in sev- view the syntax and semantics of SQL, and develop eral directions. SchemaSQL as a principled extension of SQL. As a The federation consists of databases, with each result, for a SQL user, adapting to SchemaSQL is database containing relations. The syntax allows relatively easy. (2) We illustrate via examples the fol- to distinguish between (the components of) dif- lowing powerful features of SchemaSQL : (i) uniform ferent databases. manipulation of data and meta-data; (ii) creating re- structured views and the ability to dynamically create To permit meta-data queries and restructuring views, SchemaSQL permits the declaration of output schemas; (iii) the ability to express sophisti- other types of variables in addition to the (tuple) cated aggregate computations far beyond those expressible in conventional languages like SQL (Sections 3.1, variables permitted in SQL. 4.1). (3) We propose an implementation architecture Aggregate operations are generalized in for SchemaSQL that is designed to build on existing SchemaSQL to make horizontal and block aggre- RDBMS technology, and requires minimal additions to gations possible, in addition to the usual vertical it, while greatly enhancing its power (Section 5). We aggregation in SQL. provide an implementation algorithm for SchemaSQL, In this section we will concentrate on the first two and establish its correctness. We, also discuss novel aspects. Restructuring views and aggregation are dis- query optimization issues that arise in the context of cussed in Section 4. this implementation. (4) Finally, we propose an ex- Variable Declarations in SchemaSQL tension to SchemaSQL for systematically resolving the SchemaSQL permits the declaration of variables semantic heterogeneity problem arising in a MDBS en- that can range over any of the following five sets: (i) vironment (Section 6). names of databases in a federation; (ii) names of the In this paper we illustrate the semantics of relations in a database; (iii) names of the attributes in SchemaSQL mainly via examples. A precise seman- the scheme of a relation; (iv) tuples in a given relation tics of SchemaSQL, together with many more exam- in a database; and (v) values appearing in a column ples illustrating its powerful features can be found in corresponding to a given attribute in a relation. Vari- [LSS96b]. able declarations follow the same syntax as <range> <var> in SQL, where var is any identifier. However, 2 Syntax there are two major differences. (1) The only kind of range permitted in SQL is a set of tuples in some re- Our goal is to develop SchemaSQL as a principled lation in the database, whereas in SchemaSQL any of extension of SQL. To this end, we briefly analyze the five kinds of ranges above can be used to declare the syntax of SQL, and then develop the syntax of variables. (2) More importantly, the range specifica- SchemaSQL as a natural extension. Our discussion tion in SQL is made using a constant, i.e. an identifier below is itself a novel way of viewing the syntax and referring to a specific relation in a database. By con- gemantics of SQL, which, in our opinion, helps a better trast, the diversity of ranges possible in SchemaSQL understanding of SQL subtleties. permits range specifications to be nested, in the sense In an SQL query, the (tuple) variables are declared that it is possible to say, e.g., that X is

Load more