<<

Accessing a Relational through an Object-Oriented Database Interface (extended abstract)

J. A. Orenstein D. N. Kamber Object Design, Inc. Credit Suisse [email protected] [email protected]

Object-oriented database systems (ODBs) are Typically, their goal is not to migrate data from designedfor use in applicationscharacterized by complex relational databasesinto object-oriented . data models, clean integration with the host ODBs and RDBs are likely to have different programming language, and a need for extremely fast performance characteristics for some time, and it is creation, traversal, and update of networks of objects. thereforeunlikely that one kind of systemcan displace Theseapplications are typically written in C or C++, and the other. Instead,the goal is to provide accessto legacy the problem of how to store the networks of objects, and databasesthrough object-oriented interfaces. update them atomically has been difficult in practice. For this reason, we designed and developed, in Relational databasesystems (RDBs) tend to be a poor fit conjunction,with the Santa Teresa Labs of IBM, the for these applications because they are designed for ObjectStoreGateway, a systemwhich providesaccess to applications with different performance requirements. relationaldatabasesthrough the ObjectStoreapplication ODBs are designedto meet theserequirements and have programming interface (API). ObjectStore queries, proven more successful in pioviding”persistence for collectionand cursoroperations are translatedinto SQL. applicationssuch as ECAD and MCAD.’ The streamsresulting from execution of the SQL Interest in ODBs has-spread be$ond the CAD query are turnedinto objects;these objects may be either communities, to areas such as finance and transient, or persistent, stored in an ObjectStore telecommunications. These applications have many database.The main design goal of the Gateway was to similarities to CAD applications. For example, in insulate application developers from SQL and the financial applications, the data structuresdescribing a relational schema,and to permit them to work entirely mutual fund’s portfolio can be quite complex, in an object-orientedparadigm. applications are written in C or C++, and fast traversal In this abstract we describe how we adapted the of the data structuresis important. ObjectStore API to the purposes of the Gateway, and These application areas often have an additional how we extended the API when there were no requirement- the need to make use of “legacy” data equivalentconcepts with no equivalentin ObjectStore. storedin relational databasesystems (RDBs). For years, We assume the reader is familiar with ObjectStore the developers of these applications have worked in LAMB9 1; OREN92].,The accompanyingpresentation non-object-oriented languages; and have had to deal will discuss the use of the Gateway in an application with the problem of turning tuple streams into the developed by Credit Suisse to. yield enhanced complex data structures manipulated by their concurrency in both worlds and requiring only a applications. Now, the developers who have started minimal coupling between the object-oriented model using object-oriented languagesand databasesystems and the ER model. would like to continue the transition by insulating themselves from the relational model and SQL. Overview of the &tqway’s L&sigh The main design goal of the Gateway was to support the developersof ObjectStoreapplications that need access to legacy data, whiIe insulating these developersfrom all aspectsof the RDB: the schemaof Permission to copy without fee all or part of thiu material is granted the RDB being accessed;the programming interface to provided that the copies are not made or distributed for direct com- mercial advantage, the VLDB copyright notice and the title of the the RDB; and SQL. Because of the need to provide publication and its date appear, and notice is given that ‘copying is by permission of the Very Large Data Bsse Endowment. To copy access to existing databases, we cannot control the otherwise, or to republish, requires a fee and/or special permission relational schema.For example,it would be convenient from the Endowment. Proceedinga of the Slat VLDB Conference to dictate the typesand values of primary and foreign Zurich, Switzerland, 1905 keys as this would simplify the export of object ids and

702 pointers. However, this approach cannot be used with be used for one-to-oneand one-to-manyrelationships). legacyRDBs. There are a few ways to represent relationships in There is little point in eliminating SQL and the ObjectStore’.SML supports all combinations of these relational schema from if the alternative is a relational and ObjectStore representations. If a completely new and unfamiliar interface. For this relational schema uses modeling techniques not reason, another important design goal was to allow supportedby SML, then the user should createan SQL developersto use the existing ObjectStore view. Over time, SML will be enhancedto supportmore and API. For example,application developers should be modelingmethodologies. able to use pointers, embedded collections, and C++ functions can be associatedwith mappings relationships as they do normally. It should not be between tables and classes,and,between columns and necessaryto introduce similar but different constructs, datamembers. Examples: nor shouldit be necessaryto work with a relational style Run the function Employee:: of schema,(i.e. no pointers, no embeddedcollections), initialize whenever an Employee dressedup as C++. We found it necessaryto balance object is generatedfrom an EMPtuple. ’ this design goal againstperformance goals. In a number of situations it was necessary to compromise the Run the function Name : : import whenever Gateway’s API to avoid introducing serious three CHARcolumns are mapped to a data performanceproblems. member of type Name, (the, three CHAR These design goals motivate the central piece of columns represent first, middle and last the Gateway’s design, the schema mapping. A schema IUUWS). mapping captures the correspondences between a Run the function Name : : export, which relational schema and an ObjectStore schema. For turns a Name into three strings, whenever a example,a schemamapping would record the fact that a Name is written back to. the relational correspondsto the two data membersin an database. ObjectStore schema that represent a relationship. A schema mapping is created using a declarative language. Once this has heen’done; Gateway Connecting to a applications can be built by identifying the schema In or&r to connect to an RDB, the user id and mapping and linking with Gateway libraries. The passwordmust be supplied. This is done via an object representingthe databasez Gateway uses the schema mapping to translate os-gw-database& empdb = ObjectStore operations into SQL, and to materialize os-w-gateway:: objects from the resulting tuple streams and status get-database("EMPDB"); indicators. (The schemamapping is also used as the empdb.-userid("henry"1; foundation for ObjectStore SQL Client T a product empdb.setgassword("eraserhead"); which provides access to ObjectStore databasesvia EMPDBis the logical name of the RDB beii accessed; SQL.1 this name was given-in SML. empdb is an object representing this database.In addition to setting the Schema mapping ’ user id and password,various RDB-specific properties There are three inputs to a schemamapping, 1) a can be controlled using a function of relational schema,2) an ObjectStore schema,and 3) a os-gw-databasethattakes keywordsandvalues. declarative~specification of the connectionsbetween the Oncethe user id and passwordhave been supplied, two schemas. All three parts are specified in SML the relational database is accessible from the (Schema Mapping .Language). The schemas are application. imported from relational and ObjectStore databases namedin the SML specification. ObjectStore functions that are The bulk, of an SML specification is in the t&slated to SQL descriptionof the connections.This part of the language -The Gateway translates ObjectStore collection is basedon the premisethat variousmodeling constructs methods (including queries) and methods into are.expressed by common“idioms” in each datamodel. SQL. When one of these methods is applied to a For example,a betweena foreign key and a is often used to represent one-to-one and one-to- many relationships; two suchjoins through a “junction ’ Either with or without the relationship wrappers: the “many” ” representa many-manyrelationship (but can also side of a relationship can be.a type-safe collection, e.g. os~Set&nploy&>, or type-unsafe, e.g. 08-s&

703 collection representing a table, a semantically of a databaseinorder to support any navigation that the equivalent set of SQL commands is generated and application might perform following the query. This is submittedto the RDB. In addition to query translation, clearly impractical. ObjectStorecursors operations are translatedinto SQL Another possibility is to retrieve just the primary cursor operations,and insertions to and removalsfrom key columns and turn these into object identities or ObjectStorecollections are translatedinto SQL INSJZRT pointers. When the application tries to dereferenceone and DELETE statements. of thesepointers, anotherquery is generatedto retrieve There is no function call required by ObjectStore the pointed-to object. As with updates,not all pointer to either expressan update to an object, or to note that dereferencesare visible to ObjectStore,(only the fast to an update has occurred. For example, the age of a a given pageis noted), so.it is impossibleto know when personp can be incrementedusing this C statement: a SQL query implementing the dereferenceshould be . p->age = p->age + 1; generated.Second, suppose that wecould intercept all Read and write sets are formed by intercepting read- dereferences.What SQL query would be generated?All and write-protect violations [LAMB9 11. This that is known is the identity andprobably the type of the mechanismis completelytransparent. However, because object to .beretrieved. This would allow us to constructa thereis no layer of ObjectStdiesoftware involved in the simple SQL query which fetchesa single tuple given its update, (except in case of the first read or write to a primary key, (assumingwe havea table that mapsobject page),it is impossiblefor the Gatewayto know when an ids to keys). This is clearly going to result in very poor updateto the RDB shouldbe issued.We themforefound performance,compared to the SQL,queriesgiven above. it necessaryto introduce a function indicating that an Using an RDB as an object server which returns one object has beenmodified. When an application invokes object with each query is not a good idea. The only this function, a SQL UPDATE statementis issued. advantage of this schemeis transparency - the application can navigate freely, and the Gateway Prefetch paths guarantees that the required objects are always Consider how this query might be translatedinto hilable. ,,I SQL2: employees[: salary B !jOOOOt] We dealt with this issue by requiring the applicationto indicate what objects to retrieve. A graph This query yields a set of Employee object ids. What of related objectsis’described using “prefetch paths”. A shouldthe targetlist of the SQL query contain?Aliteral prefetch path is simply a description of a path. It starts translationof the Objectstore query would retrieve.only on an ob#t of one classand navigatesthrough pointer- the primary key columns, i.e., enough to and set-valueddata members.ObjectStore aheady has produceobject ids. SELECT id an objtitdescribing paths, OS-index-path, used for FROM EMP naming path indexesi,They are reused by Gateway to WHEREsalary > 50000 supportprefetch ’ “’ Another possibility is to retrieve the entire Employee Object haterialization object.The SQL query would then be this: SELECT id, last-name, - The Gateway generatesobjects to representdata first-name, middle-initial, retrievedfrom an RDB. Object identity is determinedby ss-number, salary, primary keys. If the materialize&objects are all department transient, then it isclear that object identIty should be FROM EMP determined+vithinthe context of a single process.E.g., WHEREsalary > 50000 if the Employee with id 419 is retrieved, that can be a This isn’t really correct. If the application is going to distinct object from the Employeewith id 419 retrieved navigate from the Employee to a Department via bya different executionof the sameapplication, (or by a Employee::depaytmenl, (a pointer in,the different application). However, if objects are ObjectStoreschema), then the Department .@cIbetter materializedpersistently, into an Object&ore database, be present.This meansthat the query has to be extended in what contextshouldobjecf identity be determined? to retrieve matching Departments, or there needs-do The dambaseitself’couldprovide the context. That is, if be a secondquery to retrieve those Departments. In Employee 4 19 has been materialized into a particular general,a query might need to retrieve a large fraction database,then a secondattempt to+do so will’fail, but if the secondretrieval materializesan object in a second database,that will succeed. This isn’t a satisfactory 2 ‘I)ML” wtation is used fot clatiiy. The Gatcwy cimntly suppoe solution, becausean application may accessmultiple only queries submiacd through tbc hmUioa cd in*.

704 ObjectStore databases within the same transaction: It would therefore be possible for the application to see Transaction management two objects with the same value for Employee::id. Objects tore provides serializable transactions and Another reason this approach fails is that it might be two-phase . An attempt is made to coordinate desirable to have multiple materialization contexts RDB transaction boundaries with ObjectStore within the sameObjectStore database. transaction boundaries, but two-phase commit between We solved this problem by realizing that the ObjectStoreand RDBs is not yet supported. context of materialization needs to be explicitly By default, a gateway application can execute any controlled by the application. For this reason, the number of RDB transactions, and these are run using concept of a snapshothas been added to the Gateway. “cursor stability” semantics. Repeatable read is an Object materialization always takes place in the context option. Inconsistencies due to lack of serializability are of a snapshot. It’is‘possible to use different snapshots dealt with as discussedabove. during the execution of ‘an application, but there is always exactly one snapshot “in effect” at a given time. At the Gateway interface, a snapshot is represented by its name, and a snapshot is selected by specifying the Ilam& I In concrete terms, a snapshot is little morethan a References mapping from primary key to abject identity. The Gateway must deal with the situation in which objects [LAMB911 Lamb, C., Landis, G., Orenstein, J., and are retrieved multiple times and change between the Weinreb, D. retrievals. The application controls the Gatewaysactions ‘The GbjectStoreDatabase System”, Comm.ACM 34,lO by selecting one of four behaviors: 1) Use the,value in oqober 1991. , the object and ignore the tuple; 2) Update theobject with non-key values from the tuple; 3) Raise an ‘[ORENi)2] Oreqstein, J.; Haradhvala, S, Margulies, B., exceljtion; and 4) ignare the fact that ti “object has and Sakahara,D. already been materialized, ~d’m&rialize another one. “Query Processingin the ObjectstoreDatabase System”, (4) is potentially dangerous, but in apljlications where Proceedings of the ACM SIGMOD Conference, San the absenceof object identity is known to be safe, tis’ Diego; Californiia;‘June 1992. mode leads to a tieiformance improvement since the tables mapping keys to objectids do not have to be maintained.

. . , ’

705