SQL and Management of External Data

Jan-Eike Michels Jim Melton Vanja Josifovski Oracle, Sandy, UT 84093 Krishna Kulkarni [email protected] Peter Schwarz Kathy Zeidenstein IBM, San Jose, CA {janeike, vanja, krishnak, krzeide}@us.ibm.com [email protected]

SQL/MED addresses two aspects to the problem Guest Column Introduction of accessing external data. The first aspect provides the ability to use the SQL interface to access non- In late 2000, work was completed on yet another part SQL data (or even SQL data residing on a different of the SQL standard [1], to which we introduced our management system) and, if desired, to readers in an earlier edition of this column [2]. that data with local SQL data. The application sub- Although SQL database systems manage an mits a single SQL query that references data enormous amount of data, it certainly has no monop- multiple sources to the SQL-server. That statement is oly on that task. Tremendous amounts of data remain then decomposed into fragments (or requests) that are in ordinary operating system files, in network and submitted to the individual sources. The standard hierarchical , and in other repositories. The does not dictate how the query is decomposed, speci- need to query and manipulate that data alongside fying only the interaction between the SQL-server SQL data continues to grow. Database system ven- and foreign-data wrapper that underlies the decompo- dors have developed many approaches to providing sition of the query and its subsequent execution. We such integrated access. will call this part of the standard the “wrapper inter- In this (partly guested) article, SQL’s new part, face”; it is described in the first half of this column. Management of External Data (SQL/MED), is ex- The other aspect of the problem with external plored to give readers a better notion of just how ap- data is the management problem (as well as a re- plications can use standard SQL to concurrently trieval problem). Huge amounts of critical data reside access their SQL data and their non-SQL data. in file systems, including (at least from an SQL per- Jim Melton and Andrew Eisenberg spective) “non-traditional data”, such as engineering diagrams, photographs, and other alternative media. Managing External Data There are often existing applications that rely on this The cost of writing applications that access SQL data data remaining in the file system, yet it is often ad- using an SQL database management system and con- vantageous to keep information about that file data in currently access non-SQL data using a different data the database, because the database is easily queried. manager continues to absorb resources unnecessarily. Problems with this very typical configuration include It is exacerbated by the necessity for application pro- trying to keep database data and file data in sync, grammers to use different interfaces to access data to use a different interface for the files than under the control of different managers. for the database, and having to have different Responding to customer needs, Oracle, for ex- authorization mechanisms for file data and for data- ample, offers its Open Transparent Gateway, Sybase base data. SQL/MED addresses these problems by has its OmniConnect, and IBM provides DataJoiner. introducing a new data type called DATALINK. The Other vendors, including both major database ven- datalinks part of the standard is described in the latter dors and smaller niche-market players, offer analo- half of the column. gous products. More recently, IBM’s Garlic research efforts [5] have focussed on providing an API to Accessing External Data using which third-party developers can build wrappers that the Wrapper Interface allow SQL-servers to easily access a wide variety of non-SQL data sources. SQL is based on the relational model, so external data must be represented as relational tables if it is to fit seamlessly into the context of an SQL-server. SQL/MED introduces the notion of foreign tables to represent data stored externally to the SQL-server— an automobile price list can be represented as a Foreign- foreign with columns for make, model, year, SQL- SQL/MED Data price, etc. External data sources often make available Server API Wrapper several such collections, all of which can be accessed via a single network connection, so SQL/MED introduces the concept of a foreign server that allows access to a set of foreign tables—a single website may provide separate price lists for trucks, autos, Implementation-dependent API motorbikes, etc., each presented as a separate table. When the SQL-server decomposes a query into fragments, each fragment is submitted to the appropriate foreign server for the foreign table Foreign Foreign referenced by that fragment. In the initial version of Server Tables SQL/MED, a fragment cannot reference more than a single table. Future versions will allow query fragments referencing multiple foreign tables to be Figure 1 — Components Used by SQL/MED submitted to a foreign server as a single request. It is also often true that several data sources In Figure 1, the relationships among these pri- share a common interface. In such cases, it is desir- mary components defined and used by SQL/MED are able to use a single code module to access all of these illustrated. The SQL-server and the foreign-data sources, each of which is represented by a foreign wrapper communicate through the SQL/MED API, server. This common code module is manifested in but they need not be implemented to run in a single SQL/MED by a foreign-data wrapper. Each foreign process context; indeed, they may reside on separate server to be accessed by a foreign-data wrapper can computers connected by a network. Many configura- be characterized by configuration information—such tions are possible, including one in which the foreign- as a host name and port number—that differentiates it data wrapper and the foreign server are combined from others accessed via the same wrapper. Because into a single program. The communication between the kinds of information required are likely to vary the foreign-data wrapper and the foreign server is from wrapper to wrapper, the standard does not determined solely by the authors of those components specify a fixed set of configurable attributes. Instead, and is not addressed by SQL/MED. the concept of generic options—attribute/value pairs The purpose of foreign tables is to support a used by wrappers for configuration purposes—is in- transparent on data — called external data — troduced. Option values are cached in the catalogs of that is not stored and managed by the local SQL- the SQL-server; the SQL-server does not interpret server. By transparent, we mean that the user does them, but makes their values available to the wrapper not need to be aware of the fact that the data is not upon request. Generic options can be associated with actually managed by the local SQL-server. Instead, each of the foreign data modeling concepts intro- the user can access a foreign table in a SELECT duced by SQL/MED, including foreign servers, for- statement as though it were a regular base table or eign tables, and their columns. Thus, a wrapper that view. External data is presented to the user as ordi- represents data stored in files as foreign tables can nary SQL-data, even though it may be stored in a file associate an option with each table that specifies the system, in HTML-formatted web pages, or in some character used to delimit fields in the corresponding other specialized format. file. The bulk of SQL/MED’s text specifies an API Before the data of a foreign table can be accessed — a set of functions — by which an SQL-server and using SQL/MED’s facilities, the SQL-server must a foreign-data wrapper conduct their business. first be informed of the foreign table’s existence. This The SQL/MED functions that the SQL-server is done using the CREATE FOREIGN TABLE has to provide are called the foreign-data wrapper statement: interface SQL-server routines. This term denotes all CREATE FOREIGN TABLE table-name those functions that must be supported by a con- [ ( col-def, col-def, ... ) ] forming SQL-server. By contrast, the foreign-data SERVER foreign-server-name wrapper interface wrapper routines are the [ generic-options ] SQL/MED functions that a conforming foreign-data Execution of this statement creates a new schema wrapper has to provide. object, a foreign table (perhaps more clearly, a for- eign table descriptor), in a schema that belongs to the one or more foreign tables based on information SQL-server. The foreign table (descriptor) identifies available from the foreign server: the foreign server that manages the data to be pre- IMPORT FOREIGN SCHEMA schema-name sented as though it were a table. [ LIMIT TO ( table-name-list ) Of course, since creation of a foreign table re- | EXCEPT ( table-name-list ) ] FROM SERVER server-name quires a reference to a foreign server, the foreign INTO local-schema-name server (that is, its representation in the metadata of This statement presumes that the foreign server the SQL-server) must first be created: CREATE SERVER server-name recognizes the concept of a schema containing tables [ TYPE server-type ] (or that the foreign-data wrapper simulates that con- [ VERSION server-version ] cept). Execution of this statement allows importation [ AUTHORIZATION auth-info ] of the table definitions (including those tables’ col- FOREIGN DATA WRAPPER w-name umn definitions) for every table contained in the [ generic-options ] specified foreign schema — possibly limited to cer- The optional TYPE and VERSION values are tain tables or with specified tables omitted. meaningful only to specific implementations and In many cases, foreign servers recognize the valid values are not specified by SQL/MED. The concept of ownership of the data they manage. For optional AUTHORIZATION clause allows the state- those situations, the SQL-server is given the ability to ment to specify the effective owner (from the view- map its user identifiers to those of the foreign server point of the SQL-server) of the foreign server. Unlike through the statement: virtually all other SQL objects, including foreign CREATE USER MAPPING FOR auth-id tables, foreign servers are not represented in a SERVER server-name schema belonging to the SQL-server, but are repre- [ generic-options ] sented at a higher level: the catalog (which contains SQL statements involving foreign tables man- schemas and a few other objects). aged by the specified foreign server are executed as Of course, in order to create (the descriptor that though the authorization identifier at the SQL-server represents) a foreign server, we have to name the were “really” the corresponding entity recognized by foreign-data wrapper that manages the foreign server, the foreign server. so we must first execute: Careful readers will observe that we have not yet CREATE FOREIGN DATA WRAPPER wrap-name discussed the various occurrences of generic- [ AUTHORIZATION auth-id ] options that appear in most of those statements [ LIBRARY library-name ] we’ve described. Generic options, introduced earlier LANGUAGE language-name in this paper, are specified using the syntax we’ve [ generic-options ] shown here. As with foreign servers, foreign-data wrappers’ representations are stored in an SQL-server’s cata- logs and not in a schema. Similarly, the optional Processing Queries with For- AUTHORIZATION clause allows specification of the eign-Data Wrappers nominal owner of the wrapper. Communication between an SQL-server and a for- The optional LIBRARY clause specifies a char- eign-data wrapper can occur in either of two modes: acter string literal that identifies a software library decomposition mode or pass-through mode. In pass- (using an implementation-defined syntax) containing through mode, the SQL-server transfers the query those SQL/MED routines that implement the specific string, as is, to the foreign-data wrapper. The wrapper foreign-data wrapper. (Some SQL/MED routines are and the data source are solely responsible for ana- implemented by the SQL-server, while others are lyzing and executing the query. Although implemen- implemented as part of the foreign-data wrapper.) tation of pass-through mode by a foreign-data The LANGUAGE clause specifies the name of the wrapper is optional, it is especially useful when the programming language in which the library routines foreign server is also an SQL-engine. Due to space are written (C or PL/I, for example). constraints, we do not discuss pass-through mode The sequence of statement execution must be: further in this paper. CREATE FOREIGN DATA WRAPPER ... In decomposition mode, the SQL-server breaks a CREATE FOREIGN SERVER ... query into fragments, each to be executed by a par- CREATE FOREIGN TABLE... ticular foreign server. The interaction between the Alternatively, once the foreign server has been SQL-server and the foreign-data wrapper can be di- created, it may be possible (depending entirely on the vided into two phases: capabilities of the foreign-data wrapper and the for- eign server) to execute a single statement to create • A query planning phase, in which the foreign- connection provides a context for subsequent inter- data wrapper and the SQL-server cooperatively action between the SQL-server and a particular for- produce an execution plan for the fragment. eign server. Creation of a connection does not imply • A query execution phase, in which the agreed- that the foreign-data wrapper actually connects to the upon plan is executed and foreign data is re- data source at this time; whether and when the for- turned to the SQL-server. eign-data wrapper establishes a connection to the data In both the query planning and execution phases, source is up to the wrapper implementation. To create information must be exchanged between the foreign- a connection to a foreign server, the SQL-server in- data wrapper and the SQL-server. To make con- vokes the ConnectServer() routine provided by forming implementations simpler, and to facilitate the appropriate foreign-data wrapper, supplying (via implementation in different programming language a handle) a data structure that identifies the foreign environments, SQL/MED utilizes a functional inter- server and includes the names and values of any ge- face based on handles. For example, suppose infor- neric options that were supplied when the foreign mation managed by the SQL-server is to be passed to server was defined by DDL. The SQL-server also a foreign-data wrapper. Instead of explicitly defining supplies another data structure that describes the user the layout of a data structure for this purpose, the on whose behalf the query is being executed and that standard requires the SQL-server to pass an integer contains the user mapping information described ear- handle representing such a structure to the foreign- lier. This information may be used by the foreign- data wrapper. For each type of handle that can be data wrapper to authenticate the user at the foreign exchanged, the standard specifies a set of applicable server. ConnectServer() returns the newly- functions that take a handle as a parameter and ex- established connection to the SQL-server. Once a tract a value from the corresponding data structure. connection has been created, it is not necessarily de- To simplify the description that follows, we will not stroyed after processing a single query fragment. The explicitly refer to handles. However, the reader SQL-server can preserve and reuse the connection should assume that whenever a data structure is later. passed from the SQL-server to a foreign-data wrap- Once a connection is established, the SQL-server per or vice-versa, the exchange is done by means of a invokes the foreign-data wrapper’s InitRe- handle. quest() routine, passing the query fragment to the foreign-data wrapper in the form of an SQL/MED Query Planning Phase request. A request is a data structure that abstractly During query planning, the interaction between the describes the SQL statement corresponding to the SQL-server and the foreign-data wrapper is based on query fragment, rather than an explicit representation a request/reply paradigm. The SQL-server builds a of the statement as a character string. This approach request representing the query fragment. The foreign- eliminates the need for each foreign-data wrapper to data wrapper analyzes the request and returns a reply parse SQL, an onerous task for wrappers designed for that describes that portion of the request that can be data sources that do not use SQL. The components of handled by the foreign server. The SQL-server must a request are subordinate data structures representing compensate for any part of the query fragment that the individual clauses of an SQL statement, e.g., the cannot be executed by the foreign server. SELECT clause, FROM clause, etc. The flexibility of this paradigm is essential, since Each element of the SELECT list is represented the query processing capabilities of data sources may by a value expression that describes a result column vary widely. Experience in research and industry has of the fragment. In the initial version of the standard, shown that a declarative approach to describing the each value expression must denote a simple column capabilities of data sources leads to an unmanageable name; future versions will support more complex explosion of descriptive attributes. As will be seen expressions. Each element of the FROM clause is rep- below, the full power of this paradigm is not ex- resented by a table reference that identifies a foreign ploited in the initial version of SQL/MED. However, table. The initial version of the standard limits the we believe it will be essential for accommodating the FROM clause to a single table reference. Other wide array of data sources that should be accessible clauses, such as WHERE, , etc., are also using SQL/MED. not currently supported. While future versions of Before execution planning for a query fragment SQL/MED will allow for more complex requests, the can begin, the SQL-server must identify the relevant effect of the current limitations is that only query foreign server and create a connection. Since a for- fragments of the form “SELECT eign-data wrapper may simultaneously manage mul- FROM FTN” can be described, FTN is the tiple connections to various foreign servers, the name of a foreign table and each element of refers to a column of that table. Since To initiate query execution, the SQL-server invokes all the columns in the request are needed for the the Open() routine in the foreign-data wrapper, SQL-server to produce the complete query result, the passing the execution plan as an argument. To fetch a foreign-data wrapper must be able to process the en- row of the result, the SQL-server invokes the Iter- tire request. ate() routine. Once all rows have been fetched, the However, once the standard supports more com- SQL-server invokes the Close() routine to allow plex requests, a foreign-data wrapper may be unable the foreign-data wrapper to clean-up after the execu- to process the entire request. For example, the request tion. The execution plan may be reused, for example might include a predicate that cannot be evaluated by if the query fragment represents the inner table of a the foreign server. In such a case, the foreign-data nested-loop join. When it is no longer needed, the wrapper could return only the basic data values and SQL-server invokes the FreeExecutionHan- the SQL-server could compensate by applying the dle() routine to deallocate the execution plan. predicate and filtering the result. When a foreign-data SQL/MED makes use of descriptors to exchange wrapper receives a request via the Init- data between the SQL-server and a foreign-data Request() routine, it examines the request by in- wrapper. These descriptors are adapted from those voking routines implemented by the SQL-server used in the SQL/CLI standard [3]. Descriptors are (“foreign-data wrapper interface SQL-server rou- implemented by the SQL-server, and encapsulate tines”, in the parlance of the standard). Routines are both the types and values of a row of data. Depending provided to extract table references from the FROM on the SQL-server implementation environment, the clause (GetTableRefElem(), GetTableRef- value may be stored in the descriptor itself, or the TableName()), as well as the values of generic descriptor may point to a buffer. Use of pointers is options associated with a referenced table (Get- generally more efficient, but they are difficult to sup- TableOpts()). Other routines supply information port in some languages (e.g., Java). about the columns in the SELECT list (Get- SQL/MED defines only the fields that make up SelectElem(), GetValExprColName()) and the descriptors; it does not define the actual pro- their generic option values (GetTableColOpt()). gramming language data structures that implement Once the foreign-data wrapper has analyzed the them. As with other data structures we have de- request, it constructs an SQL/MED reply. The struc- scribed, SQL/MED specifies various routines that get ture of a reply is similar to that of a request, but the and set the value of a descriptor field when given a corresponding routines for examining the reply descriptor and the field’s identity. (GetReplyTableRef(), GetReply- A descriptor consists of zero or more descriptor areas. Each area describes one column, which can be SelectElem(), etc.) are implemented by the for- an instance of any SQL data type, including types eign-data wrapper (i.e., they are “foreign-data wrap- like ROW and ARRAY as well as basic types like per interface wrapper routines”). The reply is INTEGER, VARCHAR, etc. If the implementation returned to the SQL-server from InitRequest(), programming language of the wrapper supports a along with a second data structure, an execution plan. data type that corresponds directly to the SQL data The content of this second data structure is deter- type in the descriptor (as described in the SQL/CLI mined solely by the foreign-data wrapper, and it is standard) the SQL-server must be able to convert this not interpreted by the SQL-server. Its purpose is to standard representation to any internal representation encapsulate all the information that is needed by the it requires. If no such correspondence exists (e.g., foreign-data wrapper to execute the portion of the DATE query fragment represented by the reply. As will be cannot be directly mapped into any program- described below, the SQL-server will hand the exe- ming language that SQL/MED supports), then the cution plan back to the foreign-data wrapper at the wrapper should use the canonical character string start of the query execution phase. The execution representation of the type as described in plan is the wrapper’s means of preserving informa- SQL/Foundation [4]. tion between the planning and execution phases of query processing. By contrast, the SQL-server can A Simple Example discard the reply when query planning is complete. Example 1 illustrates the principles outlined above. Example 1: Consider a table of employees Query Execution Phase stored in a Unix® text file. Each line of the file con- During the execution phase, the portion of the query tains one employee record, with fields separated by a fragment described by the reply is executed by the ‘:’. Assuming that an appropriate foreign-data wrap- foreign-data wrapper and the underlying data source. per and foreign server have already been declared, the following DDL statement could be employed to Using Datalinks declare this file as a foreign table: CREATE FOREIGN TABLE Personnel ( DATALINK is a new SQL data type that allows stor- id INTEGER, ing in an SQL column a reference to a file that is lo- last_name VARCHAR(30), cated in a file system external to the database system first_name VARCHAR(25), ...) (DBS). Datalinks extend the functionality of database SERVER myForeignServer systems to include control over external files without OPTIONS ( Filename the need to store their contents directly in the data- '/usr/joe/personnel.txt', Delimiter ':' ) ; base. Advantages of this technology include being Since the foreign-data wrapper that will be used able to use on files the robust , to access this file supports access to any file of this recovery, and authorization mechanism of the data- general type, each foreign table definition specifies base management system, while avoiding the expense the appropriate filename and delimiter using generic and application breakage caused by importing the file options. A user who would like to know how many contents into the database (such as through LOBs). people with the last name “Miller” are stored in the Management of files on multiple distinct file servers Personnel table could submit the following query allows robust centralized control over distributed (note that the user does not need to be aware that Per- resources across intranets. Datalinks are very useful sonnel is a foreign table): for any application that involves processing of infor- SELECT COUNT (last_name) mation from multiple heterogeneous sources, in- FROM Personnel cluding databases and file systems, where it is WHERE last_name = 'Miller'; required that this information be kept consistent be- The SQL-server first parses and validates this tween the different sources. query, ensuring that it is syntactically and semanti- Examples for such applications are e-commerce, cally correct and that the user has all necessary customer relationship management, supply chain privileges. Next, it examines the FROM clause and management, e-business applications, automotive discovers that it contains a reference to a foreign ta- insurance, and health insurance applications where ble. Therefore, the SQL-server establishes a connec- files such as X-rays, ECG results, vehicle damage tion to myForeignServer and formulates a request pictures, etc., need to be combined. It also includes equivalent to the SQL statement “SELECT CAD/CAM applications involving design documents last_name FROM Personnel”. The request and plans, configuration and asset management ap- does not include the predicate from the WHERE plications such as web assets (web pages, server-side clause or the aggregate function in the SELECT list, programs and templates), and the management of but future versions of the standard will allow such large volumes of scientific data residing in file sys- requests. tems. For example, inventory data can be stored in The foreign-data wrapper examines the request the database while pictures of the products reside in using the foreign-data wrapper interface SQL-server files. To avoid a picture being accidentally removed routines defined by the standard to obtain the name of or being replaced by another one that shows a differ- the referenced table, associated options, the columns ent product, datalinks are used. to be retrieved and their types, etc. In this example, A datalink value is stored in a column of type the wrapper returns a reply indicating that the request DATALINK, just as a whole number would be stored can be completely satisfied, as well as an execution in a column of type INTEGER. In addition to being plan. A simple implementation of the foreign-data stored as the value of a column, a datalink value can wrapper might use Unix shell commands to extract also be stored as the value of an attribute of a user- the requested columns from the file. In this case, the defined structured type. SQL/MED allows a variety execution plan would contain the relevant commands of options to be specified for instances of the ( e.g., cut -d: -f2 /usr/joe/personnel). DATALINK type. With these options, it can be de- The SQL-server examines the reply handle and dis- termined how strict the database controls the file. The covers that the foreign-data wrapper can completely possibilities range from no control at all (the file does handle the request. It incorporates the execution plan not even have to exist) to full control, where removal for the fragment into its overall execution plan, which of the datalink value from the database leads to a must include extra processing steps to filter and ag- deletion of the physical file. Examples 2 and 3 show gregate the result set that will be returned by the for- how a column of type DATALINK would be defined eign-data wrapper. in a CREATE TABLE statement. Example 2: The DBS does not care whether a value that will be inserted into the DATALINK col- umn references a file that really exists — references to files that do not exist will be handled appropriately 3. The datalinker reports whether or not the file by the application. All file permissions are enforced exists. as specified at the original file system. 4. If the file could be successfully linked to the CREATE TABLE houses ( database, then the database system reports a suc- id INTEGER, cessful execution to the client application; oth- name VARCHAR(30), erwise, it returns an error. address VARCHAR(100), picture DATALINK Figure 2 illustrates the relationships between an SQL- NO LINK CONTROL ); server, a datalinker, and a linked file. Example 3: The DBS requires full control over the referenced file — that is, the DBS determines SQL-server who is allowed to read the file, allows only imple- mentation-defined updates of the file’s content, and wants the file deleted when it is no longer referenced by a datalink value. Object Datalinker CREATE TABLE products ( id INTEGER, name VARCHAR(30), picture DATALINK FILE LINK CONTROL INTEGRITY ALL READ PERMISSION DB WRITE PERMISSION BLOCKED RECOVERY YES File Manager Linked File ON UNLINK DELETE ); Even though a datalink value may “look like” a character string, it is actually an encapsulated value that contains a logical reference from the database to Figure 2 — Datalink-related relationships a file stored outside the database. Consequently, As soon as the tables are populated, a user can SQL/MED defines a function named DLVALUE() execute queries against them. SQL/MED supports that generates the datalink value from a file reference, five functions that convert a datalink value, or parts given in the form of a URL. Examples 4 and 5 illus- of it, into a character string with which the user can trate the use of this function in connection with the work. For example, in order to see a picture of a INSERT statement. product or house, the user may need to obtain a file Example 4: reference for the file, then access the file directly INSERT INTO products from the file system using this file reference. The VALUES (12, 'fender', user cannot get the picture directly from the DBS, DLVALUE('file://myFileServer/ myDirectory/fender.jpg')); because it is not stored in the DBS! Suppose the user would like to see a picture of a car fender, of which Example 5: INSERT INTO houses she knows the product id. An application could exe- VALUES (12001, cute the following statement to retrieve the necessary 'villa on the hill', information: 'San Jose, CA', SELECT DLURLPATH(picture) INTO :hv DLVALUE(‘http://someServer FROM products .someCompany.com/ WHERE id = 12; file_may_not_exist/xyz.jpg’)); After successful execution of this statement, the In order to “take control” of the file, the database host variable hv contains the proper file reference, as system utilizes a component called a datalinker. The well as an access token that entitles the user to access interaction between a database system, a datalinker, the file. The application can now use the native API and a client application during an operation of the file server to access the file. If it does so, an involving a datalink value (as shown in Example 4) is OPEN() call from the application to the file server is described by the following four steps: intercepted by the datalinker. The datalinker checks 1. The client application sends the SQL INSERT the validity of the access token and, if valid, eventu- statement to the database system. ally grants access to the file. 2. The database system contacts the datalinker to One of the strengths of database systems is secu- determine whether the file exists, and, if so, in- rity. In order to extend this strength to datalinks, the forms the datalinker that the file is now con- datalinker has to work closely with the database sys- trolled by the database system. tem. For example, if an application attempts to open a file with an incorrect access token (or no access token Summary at all) and that file is referenced by a datalink value that is stored in a column declared with READ We have discussed SQL’s new part 9, SQL/MED, which provides features that enable SQL applications PERMISSION DB, then the datalinker must prohibit to integrate non-SQL data along with data stored in this access. The same holds true for all attempts to an SQL-server. This facility, like much new technol- rename or a file (and therefore render the ogy, may appear complex at first glance, but we be- stored datalink value useless). If that file is refer- lieve that it is remarkably simple for the amount of enced by a datalink value in a column declared with power it provides. INTEGRITY ALL, then the datalinker must not al- In fact, one of the primary goals of the low renaming or deleting of the file. SQL/MED architecture is to enable non-DBMS de- ON Example 6 shows the effect of the option velopers to write foreign-data wrappers for a wide UNLINK DELETE of the CREATE TABLE state- variety of non-SQL data sources in as portable and ment in Example 3 when a datalink value is deleted. rapid manner as possible. We believe that a third- Example 6: With the following DELETE state- party marketplace for such foreign-data wrappers, ment, the row will be removed from the table that portable between various vendors’ database systems, was inserted in Example 4. could arise within just a few years. DELETE FROM products WHERE id = 12; As soon as publication of SQL/MED has been Additionally, the referenced file will also be de- finalized (probably late in the first quarter of 2001), leted, even though it existed before the row was in- copies of this part of the SQL standard can be pur- serted into the table. chased both as PDF downloads and in hardcopy from the ANSI Electronic Standards Store from the NCITS SQL/MED’s Future Standards Store (both cited below). In fact, all parts The version of SQL/MED that was completed in of the SQL standard can be acquired in the same 2000 (MED:2000) is limited in several ways. Most manner. Needless to say, the PDF downloads are importantly, it provides a read-only interface to data much less expensive than the paper versions! managed by foreign servers. The next version, which will probably be published in late 2002 or early 2003, References will probably add the ability to insert data into for- [1] ISO/IEC 9075-9:2000, Information technology eign tables, such data, and delete that data. — Database languages — SQL — Part 9: Man- In addition, MED:2000 required that the foreign- agement of External Data (SQL/MED), Interna- data wrapper support only “SELECT * FROM tional Organization for Standardization, 2000 foreign-table”. The next version of SQL/MED [2] SQL Standardization: The Next Steps, Andrew will allow an SQL-server to transmit a complete Eisenberg and Jim Melton, SIGMOD Record, WHERE clause, or at least a complete predicate, to the Vol. 29, No. 1, March, 2000 foreign-data wrapper for evaluation. It may also be [3] ISO/IEC 9075-3:1999, Information technology possible to instruct the foreign-data wrapper (or the — Database languages — SQL — Part 3: Call- foreign server) to perform joins, projections, group- Level Interface (SQL/CLI), International Organi- ing operations, and various SQL functions. zation for Standardization, 1999 The expected datalink changes for the next ver- [4] ISO/IEC 9075-2:1999, Information technology sion of SQL/MED will provide usability and func- — Database languages — SQL — Part 2: Foun- tionality improvements. The most important change dation (SQL/Foundation), International Organi- being discussed is the ability to update a linked file zation for Standardization, 1999 and provide file data recovery. In the current version [5] Mary Tork Roth, Peter M. Schwarz, Don't Scrap of SQL/MED, updating a file (while the option to It, Wrap It! A Wrapper Architecture for Legacy recover lost file data is in place) requires two steps: Data Sources, VLDB 1997: 266-275 the first to unlink the file and the second to modify the file and re-link it. The ability to update in place will allow such updates in one step. This functional- Web References ity will provide the ability to use datalinked files for [1] ANSI’s Electronic Standards Store: such functions as library checkout and checkin and a http://webstore.ansi.org way to back out any uncommitted file changes and [2] NCITS’ Standards Store: restore to the previous committed version. http://www.cssinfo.com/ncits.html