SQL and Management of External Data
Total Page:16
File Type:pdf, Size:1020Kb
SQL and Management of External Data Jan-Eike Michels Jim Melton Vanja Josifovski Oracle, Sandy, UT 84093 Krishna Kulkarni [email protected] Peter Schwarz Kathy Zeidenstein IBM, San Jose, CA {janeike, vanja, krishnak, krzeide}@us.ibm.com [email protected] SQL/MED addresses two aspects to the problem Guest Column Introduction of accessing external data. The first aspect provides the ability to use the SQL interface to access non- In late 2000, work was completed on yet another part SQL data (or even SQL data residing on a different of the SQL standard [1], to which we introduced our database management system) and, if desired, to join readers in an earlier edition of this column [2]. that data with local SQL data. The application sub- Although SQL database systems manage an mits a single SQL query that references data from enormous amount of data, it certainly has no monop- multiple sources to the SQL-server. That statement is oly on that task. Tremendous amounts of data remain then decomposed into fragments (or requests) that are in ordinary operating system files, in network and submitted to the individual sources. The standard hierarchical databases, and in other repositories. The does not dictate how the query is decomposed, speci- need to query and manipulate that data alongside fying only the interaction between the SQL-server SQL data continues to grow. Database system ven- and foreign-data wrapper that underlies the decompo- dors have developed many approaches to providing sition of the query and its subsequent execution. We such integrated access. will call this part of the standard the “wrapper inter- In this (partly guested) article, SQL’s new part, face”; it is described in the first half of this column. Management of External Data (SQL/MED), is ex- The other aspect of the problem with external plored to give readers a better notion of just how ap- data is the management problem (as well as a re- plications can use standard SQL to concurrently trieval problem). Huge amounts of critical data reside access their SQL data and their non-SQL data. in file systems, including (at least from an SQL per- Jim Melton and Andrew Eisenberg spective) “non-traditional data”, such as engineering diagrams, photographs, and other alternative media. Managing External Data There are often existing applications that rely on this The cost of writing applications that access SQL data data remaining in the file system, yet it is often ad- using an SQL database management system and con- vantageous to keep information about that file data in currently access non-SQL data using a different data the database, because the database is easily queried. manager continues to absorb resources unnecessarily. Problems with this very typical configuration include It is exacerbated by the necessity for application pro- trying to keep database data and file data in sync, grammers to use different interfaces to access data having to use a different interface for the files than under the control of different managers. for the database, and having to have different Responding to customer needs, Oracle, for ex- authorization mechanisms for file data and for data- ample, offers its Open Transparent Gateway, Sybase base data. SQL/MED addresses these problems by has its OmniConnect, and IBM provides DataJoiner. introducing a new data type called DATALINK. The Other vendors, including both major database ven- datalinks part of the standard is described in the latter dors and smaller niche-market players, offer analo- half of the column. gous products. More recently, IBM’s Garlic research efforts [5] have focussed on providing an API to Accessing External Data using which third-party developers can build wrappers that the Wrapper Interface allow SQL-servers to easily access a wide variety of non-SQL data sources. SQL is based on the relational model, so external data must be represented as relational tables if it is to fit seamlessly into the context of an SQL-server. SQL/MED introduces the notion of foreign tables to represent data stored externally to the SQL-server— an automobile price list can be represented as a Foreign- foreign table with columns for make, model, year, SQL- SQL/MED Data price, etc. External data sources often make available Server API Wrapper several such collections, all of which can be accessed via a single network connection, so SQL/MED introduces the concept of a foreign server that allows access to a set of foreign tables—a single website may provide separate price lists for trucks, autos, Implementation-dependent API motorbikes, etc., each presented as a separate table. When the SQL-server decomposes a query into fragments, each fragment is submitted to the appropriate foreign server for the foreign table Foreign Foreign referenced by that fragment. In the initial version of Server Tables SQL/MED, a fragment cannot reference more than a single table. Future versions will allow query fragments referencing multiple foreign tables to be Figure 1 — Components Used by SQL/MED submitted to a foreign server as a single request. It is also often true that several data sources In Figure 1, the relationships among these pri- share a common interface. In such cases, it is desir- mary components defined and used by SQL/MED are able to use a single code module to access all of these illustrated. The SQL-server and the foreign-data sources, each of which is represented by a foreign wrapper communicate through the SQL/MED API, server. This common code module is manifested in but they need not be implemented to run in a single SQL/MED by a foreign-data wrapper. Each foreign process context; indeed, they may reside on separate server to be accessed by a foreign-data wrapper can computers connected by a network. Many configura- be characterized by configuration information—such tions are possible, including one in which the foreign- as a host name and port number—that differentiates it data wrapper and the foreign server are combined from others accessed via the same wrapper. Because into a single program. The communication between the kinds of information required are likely to vary the foreign-data wrapper and the foreign server is from wrapper to wrapper, the standard does not determined solely by the authors of those components specify a fixed set of configurable attributes. Instead, and is not addressed by SQL/MED. the concept of generic options—attribute/value pairs The purpose of foreign tables is to support a used by wrappers for configuration purposes—is in- transparent view on data — called external data — troduced. Option values are cached in the catalogs of that is not stored and managed by the local SQL- the SQL-server; the SQL-server does not interpret server. By transparent, we mean that the user does them, but makes their values available to the wrapper not need to be aware of the fact that the data is not upon request. Generic options can be associated with actually managed by the local SQL-server. Instead, each of the foreign data modeling concepts intro- the user can access a foreign table in a SELECT duced by SQL/MED, including foreign servers, for- statement as though it were a regular base table or eign tables, and their columns. Thus, a wrapper that view. External data is presented to the user as ordi- represents data stored in files as foreign tables can nary SQL-data, even though it may be stored in a file associate an option with each table that specifies the system, in HTML-formatted web pages, or in some character used to delimit fields in the corresponding other specialized format. file. The bulk of SQL/MED’s text specifies an API Before the data of a foreign table can be accessed — a set of functions — by which an SQL-server and using SQL/MED’s facilities, the SQL-server must a foreign-data wrapper conduct their business. first be informed of the foreign table’s existence. This The SQL/MED functions that the SQL-server is done using the CREATE FOREIGN TABLE has to provide are called the foreign-data wrapper statement: interface SQL-server routines. This term denotes all CREATE FOREIGN TABLE table-name those functions that must be supported by a con- [ ( col-def, col-def, ... ) ] forming SQL-server. By contrast, the foreign-data SERVER foreign-server-name wrapper interface wrapper routines are the [ generic-options ] SQL/MED functions that a conforming foreign-data Execution of this statement creates a new schema wrapper has to provide. object, a foreign table (perhaps more clearly, a for- eign table descriptor), in a schema that belongs to the one or more foreign tables based on information SQL-server. The foreign table (descriptor) identifies available from the foreign server: the foreign server that manages the data to be pre- IMPORT FOREIGN SCHEMA schema-name sented as though it were a table. [ LIMIT TO ( table-name-list ) Of course, since creation of a foreign table re- | EXCEPT ( table-name-list ) ] FROM SERVER server-name quires a reference to a foreign server, the foreign INTO local-schema-name server (that is, its representation in the metadata of This statement presumes that the foreign server the SQL-server) must first be created: CREATE SERVER server-name recognizes the concept of a schema containing tables [ TYPE server-type ] (or that the foreign-data wrapper simulates that con- [ VERSION server-version ] cept). Execution of this statement allows importation [ AUTHORIZATION auth-info ] of the table definitions (including those tables’ col- FOREIGN DATA WRAPPER w-name umn definitions) for every table contained in the [ generic-options ] specified foreign schema — possibly limited to cer- The optional TYPE and VERSION values are tain tables or with specified tables omitted.