Report on the

OBJECT-ORIENTED DATABASE WORKSHOP

Held in conjunction with the OOPSLA '88 Conference on Object-Oriented Programming Systems, Languages and Applications 1 26 September, 1988 San Diego, , U.S.A.

John Joseph, Satish Thatte, Craig Thompson, and David Wells

Information Technologies Laboratory Instruments Incorporated P.O. Box 655474, M/S 238 Dallas, IX 75265 Net: Thatte@ti-csl

Introduction

Object-oriented database (OODB) systems com- and two universities (Brown University and Uni- bine the strengths of object-oriented program- versity of Massachusetts) served as panelists. ming systems and data models, with those of Three months ahead of the workshop date, each database systems. This one day workshop was panel member was given a set of four issues to organized and chaired by Satish Thatte of Texas study and was asked to prepare a brief position Instruments. It was a sequel to a similar work- statement. shop held in conjunction with the OOPSLA '87 To encourage vigorous interactions and exchange conference at Orlando, FL, on October 5,1987. of ideas among the participants, the workshop Based on the comments received from the atten- was limited to participants based on the submis- dees at the OODB Workshop/OOPSLA '87, this sion of an abstract describing their work related year's workshop was organized as a sequence of to OODBs. The workshop announcement drew a four panels: Architecture, Transactions for Co- very enthusiastic response. Forty abstracts were operative Design, Schema Evolution and Version submitted. A total of 55 people (18 panel mem- Management, and Query Processing. bers and 37 participants) attended the workshop. Researchers from three companies selling OODB This report is organized in 6 major sections. Sec- products (Ontologic, Servio Logic, tions 2 through 5 present panel proceedings. Sec- and Graphael), three companies with large inter- tion 6 concludes the report. nal OODB R&D efforts (Digital Equipment Cor- poration, Hewlett Packard, Texas Instruments),

78 SIGMOD RECORD, Vol. 18, No. 3, September 1989 2 Panel on Architecture Patriee Anota (Graphael): All modules in Graphael's G-base are written in one language, Panelists: Patrice Anota (Graphael), Craig Da- Lisp; in that sense G-base design is seamless. mon (Ontologic), Patrick O'Brien (DEC), Tom I see two bottlenecks in achieving good perfor- Ryan (HP), Jacob Stein (Servio Logic), and mance: disk I/O speed, and the time to trans- David Wells (TI). late between in-memory and disk representations Issues: of objects. In G-Base, there is a difference be- tween in-memory and disk representationsof ob- • OODB subsystems: What are the ma- jects. The translation time dominates the disk jor subsystems of an OODB, and how are I[O time! they interconnected or layered? Candidate Connectivity between OODBs and conventional subsystems are: disk-resident object man- databases is important, as so much of today's agement, in-memory object management, information resides in conventional databases. object translation service which translates Graphael has connected G-Base to Oracle's re- objects between in-memory representation lational DBMS. Graphael is pushing a federated and disk-resident representation, transac- database approach to connect OODBs to rela- tion manager, version manager, and query tional databases. processor. Craig Damon (Ontologic): Ontologic's Vbase • Seamlessness: Is it possible to develop an OODB has three basic subsystems: Abstrac- object-oriented data model that is a seam- tion Manager for message dispatching and ab- less extension of the host object-oriented straction creation, the Cache Manager, and the programming language? If so, how, and Transaction Manager. The cache manager deals what are the advantages? A seamless with segraent.s (units of clustering and disk trans- object-oriented data model should be con- fers). Object representations in memory and on trasted with object-oriented data models disk are the same, hence no translation is re- that have different type systems from the quired. The transaction manager deals with all type system of the object-orientedprogram- the concurrency and recovery issues. ruing language. Integration with host language can be done al- • Performance issues: What is a good perfor- most seam!essly. Of course, there are some prob- mance metric for OODBs (navigational,i.e., lems in trying to be seamless within an existing pointer chasing speed, or query throughput, language. For example, if the existing language or query response time?) What are the has no notion of versions, clustering, indices, and major performance bottlenecks in OODB transaction, introduction of such concepts will systems, and what are potential solutions? create a seam. We do not want a dual type sys- What is the potential of using parallelpro- tem, one for persistent objects and one for tran- cessors and massive amounts of main mem- sient objects. It is very important to be able to ories to improve OODB performance? deal with persistent and non-persistent objects in exactly the same way. However, this would • Connectivity issues: Are stand alone require development of the right new language; OODBs really viable? And if not, how an most existing object-oriented languages, unfor- OODB should be connected to conventional tunately, were not designed for that. databases?

SIGMOD RECORD, Vol. 18, No. 3, September 1989 79 A good metric for an OODB depends on the minimum number of disk I/O's. This is done by type of applications. There are three good per- transferring chunks of data as it is not eifllcient formance benchmarks for OODBs: Rick Cattell to fault on each object. To do any of this well, of the SUN Database Group wrote one that is one has to figure out how to cluster objects. towards relational model, but did allow the ob- We have looked at several applications where ject model to come through; Jim Skein of SUN stand-alone OODB's are going to be useful. improved the benchmark; and Arne Berre devel- CAD and CASE now usually use home-grown oped a benchmark at Graduate Center. DBs, so they'd be quite happy to replace them. Performance bottlenecks in OODBs are: object- Applications like CAD do need to talk to other oriented abstractions, dereferencing of pointers databases for product information, etc. To allow to persistent objects, disk I/O, and inter-process this, we have written a relation type in our lan- communication overhead. guage. I do not think that federated databases Patrick O'Brien (DEC): Trellis/OWL OODB are a priority for OODBs. has three major components: persistent ob- Tom Ryan (HP): My remarks axe from a ject repository, language interfaces, and a co- database perspective, not from an object- operative programming environment. Seamless- oriented programming language perspective. I hess is very hard to retrofit (such is the case am assuming that client and server environments with C++). On the other hand, some languages are perwsive. I think there will always be a (like Trellis/OWL) allow a tighter seamless inte- database interface of some sort, at the least some gration with the repository. An important goal code was written by the database company and to achieve is seamless integration for some lan- some was written by the application develop- guages. Languages like C and Modula can call ers. The heart of the system is query process- Trellis types, so they can use Trellis as their DDL ing. Something that the database community and DML. Having two different type systems is has learned is to not be chasing pointers, but a major loss. to state queries declaratively and let the system At the top level is a simple language-independent compile and optimize them. The bottom level model of objects with slots and methods. Each is where we do buffer management, concurrency, object has a type; but there is no concept of and recovery. We have learned that it is nice to a type hierarchy or of an extendable type sys- break out the schema manager. Since schema tem. Compiler writers integrate to this; applica- data tends to be high level and is used a lot for tion programmers never see it. On disk, objects query processing, there needs to be special access are grouped into segrnen~ of large uninterpreted paths through it. byte strings. For performance, the model keeps Even in what appears to be stand-alone systems, the same structure for objects in memory and eventually there is need for inter-connectivity. I on disk and accesses them without translation. see external access through the query processor, Locking at the granularity of a Trellis object has primarily to get things optimized. If the appli- too much overhead, so the segment is the unit of cation chases pointers, typically there is an n- concurrency control. The memory object man- squared algorithm for retrieval across a network; ager provides object abstraction. For the ap- but if the retrieval is formulated as a query, the plication, pointer chasing is the primary perfor- query processor can understand where the data mance metric. If the object being referenced is live using the schema and optimize. not in memory, it needs to be brought in with

gO SIGMOD RECORD, Vol. 18, No. 3, September 1989 In a traditional environment, the semantics of time, code is hard to maintain, and reuse of com- an operation tends to be iterative and spread ponents is hardly supported. The low level per- out through the code; with a declarative query formance problem is the disk, even with clus- language, the database is free to organize the tering. If clustering is static, only one class of iteration the way it wants. Our contribution is applications can be optimized for disk access. OSQL - Object-Oriented Extension to Sequel. For a long time, I thought that since the exe- Performance should be measured by save and re- cution model and the data language are com- trieve of complex states by object ID, bulk oper- bined, prefetching is possible. Mike Stonebraker ations that do simple things to all objects, and pointed out that since a lot of computation can query processing rates as in the relational world. be done in 30 milliseconds, it is not possible to figure out what to fetch far enough ahead of time Jacob Stein (Servio Logic): The transaction to realize any advantage. manager has to talk with the version manager, in-memory manager and disk manager. To get Stand-alone OODBs are viable in the short run. cooperative concurrency control, the user has to Eventually, inter-connectivity and communica- talk to the execution model; a lot of times the tion are necessary. Object-oriented data man- only way to implement this is to do the oper- agers should help with inter-operability because ation once when initiating the transaction and the behavioral component provides data inde- once after the transaction commits. If the trans- pendence. actions need to run on distributed heterogeneous David Wells (TI): At the bottom, our Zeit- networks without a virtual machine layer, there geist OODB is a storage server consisting of a is going to be trouble because even different com- cache and a log of anything ever written. The pilers for the same language on the same machine log supports versioning and the ability to incre- may not create the record fields in the same or- mentally update very large objects. In the stor- der. Layering works well during the prototyping, age server, objects are untyped bit buckets. The but once performance concerns enter the picture, next level is a client-memory manager that un- the distinction between layers starts to disap- derstands a passive data model, Common Lisp or pear. C, but does not understand inheritance, method I do not think in terms of query language; I think dispatching, etc. At this level, are client objec~ in terms of an execution model, some sub-part of which are graphs of storage objects where you which might be a query language subject to tra- bring in the root and get other nodes faulted in as ditional database optimizations. Seamlessness in they are touched. Object translation chases the that sense is possible. The goals of supporting closure of this graph and deals with the environ- multiple languages with as little translation as ment. During translation we ran across things possible are at odds with providing seamlessness. which were part of the Common Lisp environ- Seamlessness is not possible if complete compat- ment that were always there and should be linked ibility with some archaic languages is required. to rather than translated. The object manager understands method dispatching, remote evalua- The best performance metric is complete opera- tions, and set-sending, and the notion of modifi- tions at the application level, but the reasons for cations so it can tell the client memory manager building the object-oriented data management which pieces of its state have been modified. cannot be ignored: C runs real fast; but the application development (coding) ta~kes a long We don't think of the application as accessing a

SIGMOD RECORD, Vol. 18, No. 3, September 1989 81 relational system itself. Slots of an object may ified, some very high-level objects of the docu- be insta~tiated from a relational system using ment will be brought in. This way, the whole a faulting mechanism to execute a remote SQL document will not be brought in just to scan the query when the slot is touched. titles of documents. Although we have had pretty good success with Subramanyam (DEC): What architectural fea- seamlessness, we believe it is not completely pos- tures will aid OODB's? sible with current languages, so we're trying to Audience: There was a company called On- minimize the seam. No language that I am fa- togony trying to build an accelerator board miliar with lets me look at the value of a symbol for SUNs. Object-id translation would be the over time. Also, the notion of a transaction is biggest win for hardware accelerator. not part of any language. Wells (TI): Massive memory to keep objects in We believe navigation dominates performance their computation form to avoid translation. because of object composition; objects tend to be large and complex and when you send a mes- Thatte (TI): There are two camps here: trans- sage to one object, you potentially need access lation or no translation. If you don't do trans- to all of its state. At the macro level, we be- lation, how do you take care of object ID's in lieve that set queries are essential, but that nav- memory and disk? igation still dominates. We need to be able to Panelist: Our model uses segments; object IDs fault very quickly. We think that prefetching is are offsets into segments. The segments are going to work. We need to be able to evalu- mapped in and out. There is no translation cost ate methods in parallel. Having a large enough for every small object. memory to keep translated computational forms Beech (HP): The most exciting thing I've heard of objects around would largely eliminate the in so far was Craig Damon saying that it was pos- bound translation problem. sible to get an order of magnitude improvement in OODB performance compared to relational databases. Question/Answer Period - Panel on Arch|tecture Damon (Ontologic): It is mostly from not having to do normalization, being able to cluster, etc.. Langworthy (Brown Univ.): People talk about We're slightly faster on going through all the ob- clustering to aid in performance; how do you deal jects and looking up a field. For relational type with different applications using the data? things, we tend to be comparable. For things Panelist: Right now we do not. We use static like the Cattell benchmark, we tend to be much clustering and we recluster for access patterns. faster because it is expressed much more natu- Reclustering in virtual memory when you pull rally. You can optimize differently because the things off of disk will help. procedural language is closer to what the appli- cation is trying to do than having intermediate Panelist: We've looked at allowing the user to steps like relational joins (see OOPSLA'88 paper describe different groups or clustering of objects, by Duhl and Damon). from a prefetching point of view. When the user wants to access, for example, a compound docu- Kern (InterAct Corp.): How quickly can two pro- cesses that use the same objects access them if ment, he describes it as an aggregate object that they both need to modify the objects? Data- may be prefetched. If prefetching is not spec-

82 SIGMOD RECORD, Vol. 18, No. 3, September 1989 bases tend to cache in big pieces of data. When from failed commits. There are some interest- you try to do transaction commits, can you do ing problems with garbage collection in virtual it on smaller objects? memory in that low-level faulting and transla- Wells (TI): The object manager allows the ap- tion routines have to be careful not to trip over plication to identify which pieces of objects have the garbage collector. changed. Franklin (MCC): In a seamless system, a lot of Thatte (TI): Hardware support to detect modi- transient objects are generated. These objects fication would help. should not be written to the database. It is of- ten difficult to distinguish transient objects from Damon (Ontologic): By not translating we pay a persistent objects. dereferencing cost, but then we know every piece that has been modified. It also says we have Wells (TI): It is; in the sense that for something concurrency control at the sub-object level. to be persistent, it has to be connected to some- thing which has a persistent ID. Panelist: We do a similar thing - we decompose large objects using a tree-like structure. If we Mahbod (HP): It probably makes sense to pay modify a small piece of large object and then the translation price once, because objects are commit, we don't write the whole thing back out. going to be residing there for a long time. One We maintain read or write sets and do as much of the panelists mentioned that object identifiers garbage prevention as possible. are offsets into segments. That creates a problem if you delete an object and then create another Franklin (MCC): Is anybody doing anything spe- object with the same object identifier. cial with garbage collection, especially people that are interested in the seamlessness? Panelist: The applications dictate how long ob- jects remain in memory. I still think that pay- Ryan (HP): In IRIS we don't have a garbage col- ing the cost of translating 40 byte objects for the lection problem. We don't allow garbage. If you millions of objects that make up a large CAD de- delete an object, it disappears and if there is a sign is going to be prohibitive. The other thing is reference to it, the reference is gone. But per- that the object ID is really the concatenation of sistent objects do not ever really go away; they the segment ID and offset to make up the unique are always accessible through queries. We began identifier. When you're referencing another ob- with a reference counting scheme, but there's too ject within the same segment, it is pointing at an much contention in the object directory. offset to find out where that object is. References Stein (Servio Logic): We strongly believe in outside the segment use the whole ID. garbage collection. I don't see how you can pro- Damon (Ontologic): Ideally, in memory, you vide se~nlessness if you have to worry about want virtual or even physical addresses. On dangling references. We do generation scaveng- disk you need representations which are not de- ing in our caches to eliminate garbage. Right pendent on any particular processes or nodes; now, our garbage collection is run off-line, but this is the reason for translation. Something we have not run into any degenerate situations that was not addressed earlier was the issue yet where we would not be able to keep up in of whether you want to take programming lan- background mode. guages through a natural representation of a Wells (TI): Since we keep everything ever writ- record or pointers, etc., and when you move it ten we don't have garbage other than the objects onto disk format the objects as a bunch of rela-

SIGMOD RECORD, Vol. 18, No. 3, September 1989 83 tions. These are two separate issues and I don't Melville (Columbia Data Systems): If sending a think they were quite separated properly in the message to an object hides whether the result is discussion earlier. I think you should do every- physically stored or calculated, can the OODB thing by large UID's. Then to speed up refer- retain the results of expensive calculations? ences, use a scheme similar to what Baker pro- Damon (Ontologic): We have a certain amount posed for Lisp, using short local references which of support for that. It's very much akin to loop can be accessed fast and indirections for long ref- invariance, where something has been declared erences. unchangeable by some external agent. You have Stein (Servio Logic): If one reference is to be more semantics here; so you can do more opti- used many times, it makes sense to go to a virtual mizations. memory address. Wells (TI): That seems to me to be more of Damon (Ontologic): Right. We can bring in a an object-oriented programming language issue whole segment, translate everything in the seg- than a database issue. It would be nice if the ment, or we can translate on an object basis, or database could provide that, but it's something we can translate on a per reference basis, on de- which has to make it back into the language it- mand. The tuning of the system to make that self. decision probably is going to have a big impact Melville (Columbia Data Systems): I think I on the performance of the system. disagree somewhat. If you look at the rela- Shah (Calma Co): How do you preserve sharing tional world, you begin with view definitions and between objects? you can optimize expressions based on compos- Wells (TI): Any storage object is reachable from ing the view, pre-modification,and similar tech- a large number of roots. One can enter the graph niques. An additional performance gain, recog- at any storage object. When a new version of an nized as absolutely necessary in some areas, is to object is written from root 1, root 2 will see the do view materialization, and maintain the mate- new version when it subsequently reads. This is rialized view rather than reconstructing it each one of the reasons why we do versioning at a low time. level. Stein (Servio Logic): You can't separate the two. Harper (Univ. of Glasgow): How seriously do In a fairly seamless system, the programming you take the notion of data independence as it language execution affects the view of the data. is understood in database management systems? This is partially a programming language prob- To what extent are you able to factor out the log- lem and partially a data management problem. ical behavior of objects from their physical rep- Wells (TI): In Lisp, a flavor definition is highly resentation and what control would you give the redundant for efficiency of method dispatch. If programmer over those physical representations, a definition is changed, a mechanism automat- indexing, clustering, replication, and so on? ically propagates that to the redundant slots. O'Brien (DEC): I think we take it more seriously That's something that you want in a program- than other database systems because they con- ruing language as well. It is not simply a centrate just on data structures; we are encapsu- database issue. I think that's part of the point lating implementation behavior as well. So, we that people have been making here; that we are really at a higher level of independence. shouldn't think of these as being either database issues or language issues, but as object issues.

84 SIGMOD RECORD, Vol. 18, No. 3, September 1989 Whether the object is persistent or not is irrele- Bob Handsaker (Ontologic): At the very least, vant. transactions must be available as a tool to al- Moss (Univ. of Massachusetts): I don't think low work without uncontrolled interference. The it is a language design issue, it's a language im- ability to nest transactions to decompose the de- plementation issue. Memoizing results is not in- sign process into little steps, and then build up consistent with the semantics of a language. I've those steps should be available. Something that heard people say this isn't a database issue, it's a was not mentioned in the list of issues, is that language issue. It is an implementation issue and people really want check-pointing, independent what we are doing in our work is striving towards of the visibility. We have multiple transaction integration of programming language issues and models and an object can be instructed to run database issues. in a particular model. We really have transac- tion streams. One transaction model covers long Ryan (HP): It is more of a data independence design transactions with optimistic concurrency issue. It should be transparent whether or not control. The other transaction model supports something has been pre-computed or needs to be short transactions and pessimistic concurrency recomputed on the fly. From the IRIS perspec- control. Transaction models can be added incre- tive, we do allow property values to be material- mentally by adding storage managers. ized. It is an implementation issue; it should be hidden from the user. Transaction conflictsmust be consistent with the semantics of the operations, not their implemen- tation. There are databases where people have missed that. For example, if two objects re- 3 Panel on Transactions for side on the same physical page, there may be a Cooperative Design Work transaction conRict. This is not apparent to the application builder, and it is insidious because Panelists: Bob Handsaker (Ontologic), Eliot performance tuning can kill a working system. Moss (Univ. S sachusetts), Tore msch (HP), The other thing has to do with data abstrac- Craig Schaffert (DEC), Jacob Stein (Servio tion. If there are two queries looking at pay- Logic), David Wells (TI). roll records, semantically these are two read-only Issues: transactions; if the database is writing an audit trail, the semantics of the audit trail should be invisible to the user. The role and legitimacy of long and nested transactions, and object versions. We have built a few things like shared dictionar- ies where we want to use type specific concur- The communication facilityneeded to allow rency control to get higher through-put. We've inter-transaction for cooperation, had objects in one transaction stream share a representation living in another transaction Should concurrency control be pessimistic, stream. They communicate changes through the optimistic, or something else? shared representation which has different visibil- ity than the external objects. The important Type-specific concurrency control to exploit thing to point out is that we have used this ides the sem~utics of abstract data types to in- of the two transaction streams for two different crease the degree of concurrency. things.

SIGMOD RECORD, Vol. 18, No. 3, September 1989 85 Craig Sehaffert (DEC): We are building a clustering is needed; but then there should be a system that performs well for the short term. way of declaring that particular objects are iso- We break our system into an object reposi- lated in a locking sense, so that they won't be tory and the higher level layers because of the sorted into the same lock cluster. way we think about transactions. In the object Eliot Moss (Univ. of Massachusetts): I have repository we have traditional atomic serializable been working with the Trellis/OWL folks at transactions which appear to happen instanta- DEC for a number of years and I am also neously. They are used to keep the repository building my end-object server called Mneme at consistent. We do not generalize the concept of UMass. The basic philosophy is to build effi- transaction into a long-term transaction as On- cient read/write serializable objects at one level, tologie does. The purpose of our repository is so normal short transactions at a low level pro- to support building libraries of tools which allow vide hooks for more advanced things to be built applications to be built. A saga is an entity in at a higher level. We see short and long transac- the repository and is subject to manipulation of tions on different levels in the system; long and code at either the application or the library level. cooperative type transactions are viewed as se- Transactions do not communicate. Our lowest quences of short or traditional transactions man- level transactions only affect the world by affect- aged at a higher level of abstraction. To do this ing the repository. We think that rather than we need mutual exclusion and notification. Mu- some low level mechanism of having one saga tual exclusion allows us to lock the data struc- send messages to another (or using semaphores ture in which to record concurrency control infor- or something like that) we want application spe- mation. ObServer, for example, can lock some- cific interaction. The repository will probably thing for a user, but notify the user if some- need to provide some help along these lines, body else wants to use that object. I am lean- probably in terms of supporting notification. ing towards something more like a blackboard One thing that has been proposed is keeping on which you can post interesting facts and be read-write sets for sagas, but the applications we waiting for interesting patterns to occur in the want to support in the short-run don't seem to blackboard and match them. need that. With visibility in traditional transactions, while Our low level transactions are short and don't a transactions is running, people don't see the ef- conflict very often. Any sort of concurrency fects; when the transaction commits, the results control should work, but we think that a vari- become visible. Visibility, undo, and scheduling ety of policies should be available. We suspect should all be independently controllable. that optimistic concurrency control is going to We have developed a coordination model from be a good default, but for certain special purpose work with Zdonik and Skarra. If we are try- things like logging, you need user-defined atomic ing to cooperate, we decide what the goals are objects. One of the reasons that transactions and these goals are posted into a data struc- seem to be short and have little actual conflict, ture. We express the goals and pieces of work to is that in the application areas we have talked be done, and people pick them up and execute about, the sagas tend to use a check-in/check- them. To coordinate processes, record the fact out model. that someone is taking something, and to pre- To avoid object locking overhead, some sort of vent interference use constraints. A transaction

86 SIGMOD RECORD, Vol. 18, No. 3, September 1989 is viewed as an evolving set of goals which is fin- tell previous transactions what went wrong; but ished when the goals are all satisfied. Locking is at the same time since this is not committable viewed az constraints imposed on objects to pre- information, it is discarded at the end. vent various interferences. A group of threads There may be several versions of an object that of control that cooperate is called a transaction a user is working on; all these versions are visible group. This supports a transaction goal hierar- to him, but when he checks in at the next higher chy that could co-exist with traditional, single level, only the version that is checked in is visi- level or message transactions. ble. The intermediate versions either disappear Tore Risch (HP): In the IRIS OODB, there is totally, or are not visible at the level above that a set of functions for each object; these functions user. Otherwise, users clutter the world with can be derived or stored. Internally we use a rela- versions that really are of no use to anyone but tional storage manager, so the kind of lock IRIS the particular users. has is what is provided by the relational stor- We like the idea of having optimistic and pes- age manager; this currently is pessimistic lock- simistic concurrency control. Locking is appro- ing. Each attribute is expressed as a separate priate for long transactions, though for partic- query. You can imagine very large objects, hav- ular objects there tend to be hot spots. The ing very many attributes, so we break down large question of granularity for locking or checkout is objects into separate relations. Of course, it's a hard one. What is an object is a big question expensive at execution, sometimes. here. One can introduce abstraction above the For cooperative design work and inter transac- object level; we have a notion called segments, tion communication, I'm thinking about using and segments can be locked. Another interest- triggers or demons or something like "retrieve- ing way of looking at locking is to think of the always" in Postgres. We use short transactions individual objects as gate keepers for the large for the long transactions and build a message collections underneath. One of the difficulties passing scheme based on triggers. A difference in object-oriented systems in locking a cluster of between triggers and normal message passing objects is that object identity is an over used con- is that with message passing you have to code cept. Identity of objects are handed out very fre- the cooperation between the programs directly, quently. When the identity objects are handed while with triggers you simply enable the trig- out, others can use the identity to jump into the ger. This makes it easy to remove triggers. One middle of a cluster. I would like to see some more shouldn't pay for these kind of triggers when run- work done on adding key references back in. ning outside the part of the program where they Type specific concurrency control is useful, but belong. Finally, these kind of triggers should it is real hard to design and implement. From be clearly separate from the database definition a specification language, it takes a lot of work part of your system, they really belong to the to figure out what a concurrency control mecha- application. nism is even for a very simple type; like the fact Jacob Stein (Servio Logic): I would like to see that you can do multiple additions without any the world as a hierarchy of transactions. What I conflict, as long as the total comes out right. like about nested transactions is that they can be David Wells (TI): We need to be able to com- overloaded for error recovery. We can encapsu- pose transactions, which means that we need a late information from an aborted transaction to nesting model. We'd like to be able to go back

SIGMOD RECORD, Vol. 18, No. 3, September 1989 87 and see old states of the database to improve Question/Answer Period - Panel on incrementalism and increase concurrency. We Transactions for Cooperative Design need adjustable object granularity. Decompos- Work ing into small objects improves response time and improves concurrency. Changing the way Subramanysm (DEC): If you are using a per- storage objects glue together to compose large sistent programming language for accessing the client objects also gives you data independence. This helps if we get into a conflict on a storage object-oriented database, which is preferable: language constructs such as transactions or object because we can chop it into even smaller storage objects and hope to avoid the conflict. built-in types? What are the issues in each, what are the trade-offs and which might be better for Cooperation is always vacuously possible within accessing these transaction facilities from con- a single transaction outside database control. ventional languages? The question is what the database needs to do in Panelist: Let us say that acquiring a lock and order to help cooperation. We need to be able to releasing it are always paired and represented in control the modes of sharing and need to be able to resolve constraint violations. At some point, a syntactic construct. It's feasible to do most of what you want to do with just library calls, it is a transaction is going to attempt to commit co- operating sub-transactions, which independently just a little less safe and less elegant. However, have produced an inconsistent result. At that that's just in terms of how the language looks. point, the database ought to be able to tell the The semantics of the language are modified in sub-transactions what the problem is about. a fundamental way when you add transactions; the difference between the notion of a program People talk about cooperative work as if it's dif- that runs privately against its own memory vs. ferent from the transaction model. Is something that runs in some sort of shared envi- Cooperation = Nested Transactions 4- Adjustable ronment is something that is not transparent to Weights 4- Adjustable Object Granularitg? If we the language. could make a transaction arbitrarily cheap, we could have arbitrarily fine-grained transactions. Audience: A related question is exactly what the semantics of the transactions have to do with To see the result of one of these fine-grained transactions, send a message requesting to make the semantics of the language; like local vari- the committed fine-grained transactions visible; ables and stack frames and those objects that that is, commit them to a higher level. axe under transaction control. This gets back to Mr. Subramanyam's point about semantics; not Prefetching is necessary for performance and it whether these are represented as procedure calls, does not make any sense to prefetch without get- but how these are affecting things that we nor- ting permission to use the object; this means mally think of as being in the programming lan- it should be possible to lock at the same time. guage domain and not in the database domain. Prefetches and locks should not be propagated so far in advance of actual use, otherwise some- Audience: Does an abort re-initialize all the local variables? How are you supposed to ever do an body else may be unnecessarily blocked. When two transactions are near each other in the object abort and then do anything else with any know- ledge if it erases everything? graph, a warning may be propagated to declare intention to prefetch and prelock. Audience: What we need is a model of compu- tation that has a semantics of time.

88 SIGMOD RECORD, Vol. 18, No. 3, September 1989 Moss (Univ. of Massachusetts): With respect to when transactions I have looked at increase their cooperation, the work that I've been undertak- weight. ing with Stan Zdonik is to try to devise some O'Brien (Dec): What if I am running in one top sort of computational model where we can talk transaction, and my friend is running another about what it means to be cooperating and fol- top transaction, and we need to exchange infor- low whatever the rules of cooperation are. The mation? The problem arises again. I don't think rules say whether a given sequence of things done nesting solves the problem of cooperation. by the different parties is a good sequence or a Wells (TI): Is it reasonable for designers that bad sequence. We can come up with all these want to cooperate not to be cooperating under mechanisms, like all these other kinds of locks some umbrella transaction? and so forth, but, no one is telling us how to use them. O'Brien (DEC): That assumes the two designers don't know in advance that their work is going O'Brien (DEC): Depending on how one defines to conflict in some way. I don't know if people transaction in that equation David Wells wrote, actually work that way, where they don't know I would either agree or disagree. If transaction is they're going to step on somebody else's toes. defined in the traditional serializable sense, nest- ing and adjusting the weight and granularity of Audience: There must be a way to dynamically locking do not affect the fundamental semantics. associate one transaction with another transac- If visibility is isolated, and visibility control is in- tion. cluded in the notion of transaction, then a situa- Waldie (DEC): Eliot Moss mentioned earlier that tion where you can begin to cooperate emerges. short transactions belong in the database and Audience: If visibility is separated from transac- long transactions belong in a higher layer, which tions, an uncommitted state in a transaction may is configuration management which can start be made visible to another transaction. This transactions in the database. For controlled may lead to commit of a transaction using a state sharing of information, use work flow control lay- that hasn't been committed to the database yet. ered on top of the database. By aUowing the visibility to be separated from O'Brien (DEC): Jacob Stein talked about object the transactions, dependencies which are hard to ID's being over used and alluded to keys or name track may be introduced. references to objects in locking. How would it be Moss (Univ. of Massachusetts): Obviously, it's better? a dangerous situation to rely on uncommitted Stein (Servio Logic): There is something wrong data and presumably, when you've got the infor- with identity. If I need to ask you about your- mation, you knew that it was uncommitted and self, I may need your identity to send you a mes- so you shouldn't do anything that's important sage; but I can hold onto that identity and use with it; be prepared to be told that it is aborted it for purposes not originally intended. Identity and do it again. has too strong a covenant. If I have the identity Wells (TI): The answer has to do with weight. of an object, that means that object will exist A transaction cannot commit at a higher level tomorrow. But, if I have someone's address, it weight than the commit of anything it has seen. doesn't mean he is going to live there tomorrow. If I look at a light-weight result, I cannot commit We want something weaker than identity. In the at a higher weight. I can upgrade my weight relational model, we had keys. Part of the prob-

SIGMOD RECORD, Vol. 18, No. 3, September 1989 89 lem with integrating back into object-oriented be aware of when to use that style of binding models, is that we have different notions of se- vs. the new capability, that a database with ref- mantics here, extensional semantics, where the erential integrity provides, namely, binding to a class denotes collection of its instances, and we specific object. have intentional semantics where there is no au- Audience: There is a danger that previous work tomatic class maintained, and the question then in these areas is ignored, particularly the notion becomes, what collection is the key maintained of weak and strong entities in the entity rela- with regards to? tionship model. Keys, for example, is a com- O'Brien (DEC): I don't understand how letting pletely different issue from this idea of a strong out keys helps us solve the situation. You give entity and a weak entity. Another interesting out your name and then next time I give you your idea in databases, is the idea of an external view name back you say, "I'm not there anymore, I'm of things as compared with the conceptual view gone", and I don't have a handle on it anymore. of things. Stein (Servio Logic): I have not worked it out Audience: I have some experience in building completely. I just think that identity is too CAD systems; I can give you some feedback. The strong and that by somehow having a weaker or notion of external symbolic reference works very second class pointer to objects in the system, we well. Cluster is identified only by a single root, will have a lot more flexibility. which has a name. If two different clusters need Moss (Univ. of Massachusetts): In some designs, to reference each other, it is done by an external object ID's actually say a little bit about the lo- symbolic reference. It avoids the need to keep cation of the objects and one problem with hand- track of each tiny object that's in the cluster. ing out the ID is that it pins you down. Go- ing back to something more like a key gives you more ~exibility to move things around. Another 4 Panel on Schema Evolution use for keys is to allow you to delete things. If and Version Management you're going to do set-oriented access to data I agree with Jacob Stein. In the Mneme system, Panelists: Lougi Anderson (Servio Logic), Bob we group objects into segments and it might turn Handsaker (Ontologic), John Joseph (TI), Mike out that there is a rule that disallows certain Killian (DEC), Srom Mahbod (HP), and Stan objects to have references to them from other Zdonik (Brown Univ.). segments or from other files. An attempt to cre- ate such a reference will be a constraint viola- Issues: tion. There are really a couple of different issues bound together here that we as a body haven't • Architectural issues: Is change management separated out very well yet. a distinct layer in an OODB; can change Audience: Tying identity to location is more management be decoupled from the object- of an implementation issue. I don't think one oriented data model? should use keys to get around that problem. Us- ing keys where a weak binding is needed is bind- • Compile time vs. run-time trade-offs: What ing by description. Nothing in object-oriented trade offs have to be made with respect to object representation and generated code databases takes that away from us. We have to for methods to support schema evolution?

90 SIGMOD RECORD, Vol. 18, No. 3, September 1989 • Primitives for change management: What implemented versioning; schema evolution is are the primitive mechanisms to allow client supported. In our design, object versions are applications to manage schema evolution modeled as different states of the object. There and version management? Are the primi- is one object with different states; queries are tives the same for schema changes and ob- done over this object and version management ject changes? is a special case of reference resolution. Refer- ence resolution can be implemented at different places; one extreme is to implement it deep down Lougl Anderson (Servio Logic): Gemstone, at the storage level; an alternative is to bring ver- which is Servio Logic's OODB system, has no sioning to a higher level in the system. Versions built-in version or configuration management. of an object are modeled as graphs whose nodes Gemstone is written in Smalltalk; so one can are versions (object states) and whose arcs are build a change management system for Gem- transformations. Aggregates of versions with at stone by defining the appropriate classes and most one version per object is a slice and aggre- inheriting them into Gemstone. In a generic gates of transformations is a transaction. Slices database system, there is no one change man- help in defining contexts, which provide defaults agement model that is good for all applications. for reference resolution. Transactions group to- Our idea is that there will be a set of low-level gether transformations across multiple objects; a primitives to support the requirements of a num- transaction may also commit multiple versions of ber of different applications. We need primi- tive level support for versions and configurations the same object. Schema evolution can be seen in order to assure adequate performance and to as a use of version management. When a type is versioned, something needs to be done on its support things like differential versions and in- cremental configurations. Also, the name service instances. These updating of instances may be immediate or delayed. There are questions of de- scheme is going to be impacted by the version mechanisms. Version management can support faulting the behavior of such migrated instances , taking care of inserting new slots and verifying schema evolution if the schema information is in object form. There are some hard problems in constraints on slots. The real problem is that in versioning an object, it goes through changes to the change management area; for instance, the mix of old and new values, missing values, and values of its properties; in a schema evolution the migrated objects go through a format change. changing integrity constraints. We understand relatively small changes in the schema; but when Compiler optimization that impacts schema evo- do several small changes accumulate to produce lution may be done at various levels. A com- a schema which should be considered a differ- piler generates code based on type information ent schema? In versioning, one needs to dis- available to it; but when schema evolves these tinguish between time-dependent data and ver- assumptions are no more valid. Vbase uses a sions of data; the versioning mechanism should MAKElike dependency tracking that can detect when schema information is out of date. understand the temporal semantics of the appli- cation. If a change management scheme employs John Joseph (TI): Zeitgeist currently supports defaults, it is necessary to version the defaults; linear versioning. In the Zeitgeist design, schema this is especially true in the presence of branch- evolution and version management form a dis- ing versions. tinct layer. This layer is supported by a generic change management virtual machine (CMVM). Bob Handsaker (Ontologic): Vbase has not

SIGMOD RECORD, Vol. 18, No. 3, September 1989 91 CMVM manages the apphcations' or the users' schema evolution without versioning, especially view of evolution. This view, which is usually a in a compile-time environment. There are two graph, is not directly available to the Zeitgeist approaches to when schema modifications take object manager. The CMVM and the object effect - "pay me now or pay me later." You ei- manager query each other for information shar- ther pay now in terms of a much more expen- ing. A versioned object in the database has en- sive compilation phase (all instances are updated capsulation information which indicates whether when schema changes), or you pay it later at run the CMVM needs to be called to instantiate the time. Schema modifications can be performed object. In this model, one can implement dif- incrementally to avoid a long compilation phase. ferent versioning schemes without affecting the Then, how do you cache an obsolete object and database managers. We have done some work transform it into a more up-to-date object? You in schema evolution in Common Lisp with Fla- can test that in every reference to the object vors on TI Explorers. Schema evolution is im- which is run-time intensive, or you could realize plemented using version management. We treat that every time you touch an object, you must be the type definitions as objects and version them; within an operation that is appropriate for that in our system, instances of different versions of a object and the operation will do the conversion. type can coexist. When the schema evolves it is In Trellis/OWL the compiler has intimate know- not necessary to coerce all existing instances to ledge of the layout of storage in the database so follow this evolution. If a coercion is necessary, when the schema changes, the compiler is going it is done when the object to be coerced is ac- to have to determine that and update the code. cessed and it is done by creating and returning So we pay the up front compilation cost with the a new version. The notion of immutability of an benefit of faster run-time. Another aspect of up- object is fundamental in Zeitgeist. We find that dating objects is that there is no way you can most of the work done (by us and others) con- fix up an object and have it remain in-place all centrate on the structural consistency of objects the time. When an object has to move because when the type definition changes (eg; adding or of size expansion, a forwarder may be used to deleting a slot). An equally important and more make this transparent to the runtime. difficult area to investigate is behavioral consis- Brom Mahbod (HP): In 1PalS, version control tency. Most applications manage a transient ob- is not a distinct layer; it is part of the model ject base in addition to the persistent database. and it cannot be decoupled. We allow objects Issues of change management are very important to move from one version to another by intro- in the transient world. Transactions and depen- duction of the notion of the generic instance. dency tracking in the transient world need to be When you create a version, the original object researched. becomes of type generic in addition to having Mike Killian (DEC): Trellis/OWL has no ver- all the other types that it has and a new ob- sion implementation at this time. We look ject is created with a separate object identifier at schema evolution as a change of structure and becomes the first version of that object. Iris and behavior of objects. Versioning is viewed dynamically allows the translation or the chang- as an application issue and we must support ing of references and generic instances to a par- some primitives for versioning. Schema evolu- ticular version, depending on how the user set tion is versioning of types . In our experience, that up in an object called the context. Differ- it is very difficult and very expensive to support ent users can have different contexts. There are

92 SIGMOD RECORD, Vol. 18, No. 3, September 1989 three types of things in the II~.IS model; objects, bad either. What's really crucial is the notion of types, and functions. Objects can dynamically type changing transparency, that is, being able have types, acquire types and drop types. Types to change the interface of the type in some fun- constrain the applicability of functions to ob- damental way and at the same time, be able jects. So type mutation is meaningless in IPdS. to define additional mechanisms such that pro- There are two types of things that constitute mu- grams that were written before the change do tation of schema, type lattice mutation and func- not break. I believe that converting objects is tion mutation. Function mutation could happen a really bad idea. If I convert all my objects to either in the definition of function or by chang- the most current schema version or type version, ing the definition or the constraints in the func- then what happens to all those old programs that tion. I believe that versioning is the only solu- were written with the assumption that the old tion to schema evolution . The solution relies version was what was current? An interesting on versioning the schemas and then allowing the thought is that, since storage is cheap, we may user to define exception handling to the context carry with the objects all sorts of information mechanism. In addition to its OID, an object pertaining to versions and evolving types. When has a schema version number. When a function a function is applied, the correct version of the is applied to an object, the schema version of the function for the version time will be selected and function that matches the schema version num- called. ber is called; otherwise the context is consulted Question/Answer Period - Panel Schema for particular exception handling. This could be Evolution and Version Management a pre-amble to a function that will execute code around the function. O'Brien (DEC): You talked about difficulty with changing types. When you add behavior or Stan Zdonik (Brown Univ.): Our view is that versions are a distinct layer. Our system (En- change a type, usually you change the code that core) is organized as follows: there is an under- is using that type. If code is written in terms lying storage manager that manages the map- of the interfaces, a change in the type may not ping of pieces of storage to the disk, transactions affect the code. and so forth. On top of that, there is the data Zdonik (Brown Univ.): I am talking about model, which contains operations, types, proper- changing the interface. We need to have old pro- ties, and things of that nature. On top of that, at grams continue to run. It must be possible to the application levels, build the services includ- have multiple semantic versions that make sense ing complex objects and versions. Versiom'ng is in an environment at the same time. There is captured by the idea of a version set. The ver- also the problem of new programs dealing with sion set stands for all versions of an object and old data. We do not want to convert large the versions are properties of this conceptual ob- amounts of data to a new form. Maybe, we need ject. The versions themselves have properties. to provide exception handlers which apply to the These data structures interact with a referencing data so that it is not necessary to do massive mechanism to support dynamic as well as hard- conversions. wired reference resolutions. Lots of times there Audience: Is the notion of context supposed to are things you might want to do to schema deft- eliminate versioning? nition or type definition. Representation change Joseph (TI): No. Context is a way of organizing is trivial; reorganizing the type lattice is not too the versions so that specifying a context deter-

SIGMOD RECORD, Vol. 18, No. 3, September 1989 93 mines the default versions of some user objects. things different ways and may want to add more Audience: How do you deal with complex objects to the versioning - configuration control and ac- whose sub-objects may be in different contexts? cess control. One suggestion will be a model in which version control, access control etc. are Joseph (TI): At the time he runs the application mixed into the classes. These mixins will have the user has to make sure that (if he is using information pertaining to the services mixed-in, a default context) a version of each of these sub for example a notify list for access control. This objects is in his context. This is not an issue usu- way we are not limiting the users. ally since contexts basically aggregate all objects in a particular application. Mahbod (HP): The reason you put versions in the database is to be able to do queries based on Audience: What is the panel's stand on when a versions. You should not burden the database type, after multiple evolutions, can no longer be with unnecessary version information which can considered a version of the type? be handled at a higher level. IRIS has object Anderson (Servio Logic): The semantics of the identifiers and so if versions are named at the application may have to determine this. There user level we can make associative queries based are also questions of performance, maintenance on names. Version details are based on the ap- of the versions and penalty for exception han- plication needs; but there may be fundamental dling. things like a predecessor/successor relationship Joseph (TI): I agree with her. The system has important in all versioning applications. no way of figuring this out. So at some point in Killian (DEC): There should be a primitive set of time the user might say that now he has a new tools for generating versioning systems and mul- class. tiple types of versioning systems. I think one of Killian (DEC): You may be able to mutate a the primitives will be to copy an object without type indefinitely; when two people take a type necessarily having duplicate pages for every byte and mutate it in different directions and then in the object. We may see, in time, object bases try to merge the mutations there are going to tuned for applications like CAD because the ob- be problems. We may need to put limitations in ject granularity is grossly different from business SUch cases. systems. Handsaker (Ontologic): There should be no Wells (TI): Stan Zdonik said it is necessary for structural limitations as to how far you can do old programs to access new objects and for new schema evolution. The limitations should come programs to access old objects. It seems like that from the application or practical considerations. very much restricts the changes to the interface that can be made. Joseph (TI): Some statistics may be gathered if you are implementing schema evolution using ex- Zdonik (Brown Univ.): But there has to exist ception handlers. If most of your messages on some mapping between the old concept and the objects of a type trigger those exception han- new. If such a thing doesn't exist, then we have dlers, it may be time to think of that as having had it. If it does, however, you want to have evolved into a new type. a framework for embedding the meaning of that method and exception handling. Straw (Eastman Kodak): Maybe there is a good reason that most databases do not support ver- Thatte (TI): MCC has published a taxonomy of sioning. Many people probably want to version schema modifications. Do you have any feedback

94 SIGMOD RECORD, Vol. 18, No. 3, September 1989 from the users as to what the frequent schema initial stages where things are in flux and the modifications are? other is after things have settled down. Do these Killian (DEC): Initially, during development, modes warrant different approaches? adding and deleting variables; then hierarchy Killian (DEC): You're right about the two changes. phases. In our limited experience, a single ap- Joseph (TI): In our experience, most frequent proach (compilation) suffices in both phases. change is adding an instance slot; followed by Gerson (Xerox PARC): When we version schema renaming an inherited slot. as well as object instances, how do we go back Thatte (TI): Can we use this information to op- to an instance which was. of a type that has since timize schema evolution implementation? evolved? Killian (DEC): Yes, especially in a compiler- Joseph (TI): In our system, when an object in- based system. stance is converted to an evolved type, the origi- nal instance is not destroyed. We make a new in- Wells (TI): Does the change manager have to stance, which is of the new type; and have access feed something to an object manager to do object to both of them pointing to the correct versions materialization? of the schema. Joseph (TI): Sometimes; the object encapsula- Speyer (Apollo): Is there some way the system tion in the database should carry this informa- can help with merging ; or is it all up to the user? tion so that the change manager is called when necessary; for example, if the object is kept in a Panelists: It is really up to the user to tell the delta form. The invocation engine needs to be database what is to be done or teach it how to able to handle it; change manager will appear as do it. The user has to define the semantics of an exception handler to this engine. merge. In general, there is no one uniform se- mantic merge. Wells (TI) : We are discovering that, even with minimal version management, the version man- Langer (Object Design): The merging problem ager has to feed information to the translation is even harder than that. One needs to worry mechanism. Aren't we getting into trouble on a about, eg., in programs, semantic dependencies global basis? which may be created, altered, or destroyed by merging. Killian (DEC): It depends how closely coupled your compiler and environment are with your database. I think we take the viewpoint that the 5 Panel on Query Processing environment and the compiler work very closely with the database. The compiler has to have the Panelists: Patrice Anota (Graphael), David knowledge, not so much the database. Beech (HP), Jacob Stein (Servio Logic), Robert Bjornerstedt (Univ. of Stockholm) : Our sys- Strong (Ontologic), and Craig Thompson (TI). tem, AVANCE, uses version management on a Issues: low level. So, we can support mutable versions and efficienttype evolution. • Object-Oriented Data Model: will there be Franklin (MCC) : In schema evolution, there are a "standard" object-oriented data model at least two modes of development. One is the that can be shared from different host lan- guages? If so, how can that data model be a

SIGMOD RECORD, Vol. 18, No. 3, September 1989 95 seamless extension of the language? (There David Beech (HP): A shared object model is is a tension between seamlessness and a uni- desirable and would combine the good ideas of versal data model since different language the database and programming language fields. have their own data models.) We need more than persistence; we need queries when sets of objects grow large. We want a • Behavioral Queries: How will behavioral strong semantics for objects. But migration to OODB queries be expressed in the query a universal data model will be evolutionary; no language? internal mechanism will force the programming • Whither SQL++: should SQL be extended language community to converge on a common to support object/behavior queries? Since data model, but a shared OODB data model navigation is dominant in many OODB ap- might. Seamless data models are the wrong plications,will SQL++ support navigation? path: that approach requires a different data model for sharing for each language and provides • Query Optimization: what sort of optimiza- no way to share data across languages. tions will be possible in OODBs? Certainly we want queries that use functions or Patrice Anota (Graphael): Graphael's G-base behavioral operators in project or where clauses, uses a proprietary object-oriented data model, eg. Select name(p) from p in Person, c chosen to be simple, simpler than Lisp flavors, in Child where adoresp(c, CLOS, or C++. Object types have properties ' 'San Diego' ~). Also analogous to relational (referred to as binary relations)and are arranged operators on the data dictionary, we'd want to in a single-inheritancetype hierarchy. Instances query OODB meta data (eg. select type with are referred to by instance ids; the value of a method ', adoresp' ' would return Child ... ). property of an instance is typically another in- A key issue is: do we allow behavioral operators stance id; instances are arranged in sets, or rela- that have side-effects? tions. Types can be instances of recta-types and, IRIS's Object Sequel is upwards compatible with as instances, can receive messages, a property we SQL permitting migrations of existing SQL ap- term self-reflezive. plications to it but permitting new applications We see SQL as tied to the relationalmodel and to make use of the richer OODB. Proposals for suggest a logical query language as the interface SQL-3 are already being made to ANSI to in- for an OODB. Our query language is G-Logis, troduce ADTs, sub relations,multi-column keys. a Prolog-like language. Filters are queries on Now is the time and ANSI is the likely place to the binary data modal. For instance, the filter move toward some sort of object-oriented SQL. "What are the French Companies?" looks like: We do not distinguish between a navigational (? ( country (:= *x (obj company)) (obj nation and a set-oriented database interface. nation.name ("France"))) ). Jacob Stein (Servio Logic): A standard object Filters can be indexed. Inheritance saves the ap- data model will not happen: CLOS, C++, and plication programmer from inference steps over Smalltalk data models will not change to have conventional Prolog. Reordering subgoals can be the same data model. So an impedance mis- used (when no side effects like cut are present) match is inevitable. We should expect to do but that can require using application seman- translations between the language data model tics. Occurrence use count can save memory and the database data model. when data is no longer needed.

96 SIGMOD RECORD, Vol. 18, No. 3, September 1989 All queries are behavioral; an interesting sub- are behavioral. Actually, behavior and state are class is structural. The difficult behavioral indistinguishable behind a function interface. queries are those with side-effects where you get We implemented an SQL++, complete with different results if you send a message twice in a functions to provide chained references, function row or where iterating over a collection in differ- calls, etc. embedded in the SQL. But we found ent orders gives different results. that users want to operate on subgraphs and do SQL++ is inevitable; many variants will exist. graph matching so query optimization now deals If one can do arbitrary message sends, one can not only with cardinality of graphs but also their probably do recursive queries. Navigation paths their structure. can probably be extracted from an SQL query. Craig Thompson (TI): A relational database Optimization applies to methods as well as struc- monolithically provides 1) sets, set operations, ture. The query optimizer will have to dive into and set optimization and 2) persistence and shar- the methods; that means talking to the query ing for set data type. If we start with C++, optimizer. Take advantage of structure and type CLOS, and data types from other languages information. If typing doesn't determine clus- and make them persistent (like Wisconsin Ex- tering then typing doesn't help the query opti- odus, TI Zeitgeist, or PS-Algol), we can pro- mizer; in our system we can cluster for a user- vide seamless persistence with respect to a lan- specified reason. Optimization over behavioral guage without inventing yet another data model queries with side effects is hard, but it doesn't (type system). If we want sets (which we often follow that we need distinct languages for queries do), let's build a query engine that can oper- and programs. For instance, one can use the it- ate polymorphically on sets, transient or persis- erator operator or denote selection blocks with tent, and hook it to different languages. Since we different end-markers to use the optimizer. already need a mechanism to pass parameters Robert Strong (Ontologic): Following Codd, between languages for remote procedure calls, a data model is a set of data structures, opera- let's use that standard type conversion machin- tors on those structures, and constraints. Three ery (Matchmaker, Mercury, XDR) to translate distinct data model problems are: 1) what can persistent types between different language data we represent in the data model; 2) how do we models when needed. provide a unified view for the application pro- ADTs can have behavioral operators ("Calculate grammer who needs to access several different a netlist" or "Fire an employee") independent of databases (this is the issue for standards like query operations on sets of ADTs: that is, we can SQL++); 3) data exchange to move data from decouple the set-oriented query language from one environment to another. There is not much the data model. Issues relevant for behavioral commonality in the way researchers attack these operators are: 1) do nested operations require problems. nested transactions? 2) are schemas, methods, In order to "Sort chamelions from other lizards," and ultimately programs stored in the database? we need an operator that puts chamelions on a and 3) can I grant a method to others? When green or brown background to check their color. sets are added, other issues emerge: 1) can I sup- A relational system can not do this. But the port queries on transient sets (converse of seam- problem now is avoiding updating the database lessness)? 2) should the user be prevented from while querying it in any query since all queries using behavioral operators that can have side-

SIGMOD RECORD, Vol. 18, No. 3, September 1989 97 effects? and 3) how can the user be so prevented? pose the relational engine on top of objects after Different groups want different extensions to an application is built using this approach. SQL: schema evolution, version management, Audience: C users are happy to get objects in temporal queries, cooperative response, triggers, C++. Are SQL users going to be similarly happy incremental view updates, recursive queries etc., to get SQL++? Their predominantly business as well as ADTs. We need an "open SQL" with applications may be satisfied with SQL. more internal interfaces. If SQL-h-h = SQL + Thompson (TI): For our applications, like CAD, ADT then we want ADTs in create statements persistent data types and navigation via point- and ADT operators in project and select clauses. ers are often enough; we don't need set-oriented We also need abstract indices and an extensible queries. Our query engine is an add-in that sup- query optimizer (as in Wisconsin's Exodus, Post- ports operations on sets when they are needed. gres, or our Zeitgeist query module). Then there By requiring persistent arrays or other data is the issue of how searnlessly we add the query types to be stored in persistent sets (as in Post- engine to host languages. gres or Iris) the CAD application would have lots There are several opportunities for optimiza- of 1 row, 1 column relations, each containing a tion: the storage layer provides clustering and persistent data type. prefetching. Normal program language opti- Strong (Ontologic): SQL++ is a marketing is- mization applies to behavioral operators. For sue, not a technical issue. sets, we can optimize in the presence of friendly (registered) operators; otherwise optimization is Beech (HP): SQL users who don't need SQL++ blocked but we can still express the query. We features won't use them, but those with more want an open optimizer to support abstract in- complex modeling needs, who now must model dexes, semantic query optimization, cooperative their problem using an ER model and then map response, incremental view update, incremental to a relational model by hand, would have a more query reformulation, AND/OR parallelism, dis- powerful object-oriented model and not have to make the mapping. Users would get the benefit tributedqueries, etc. of a migration path and also a richer data model. Question/Answer Period - Panel on Strong (Ontologic): An example of that is Query Processing chained references which eliminate the use of joins. Audience: How would you use a standard rela- Stein (Servio Logic): SQL could be an inter- tional model with ADTs? Do you mean to trans- late ADTs into an off-the-shelf model? face to the outside world and not be used in the OODB implementation. How does IRIS com- Thompson (TI): That is one approach. In Zeit- pute projects? Does it make a new type on the geist, our query module supports select-project- fly? join operators but accesses tables through iter- Beech (HP): ... as a parameterized tuple type, ator and accessor functions. So if we make the though the user doesn't have to name the type. data structures look like a table by defining ap- propriate accessors, they can be queried. Both Strong (Ontologic): Following that up, if you transient and persistent data can be queried us- project some fields, does the result inherit some ing this approach. of the behavior of the original type? Thatte (TI): This is the reverse of Postgres: ira- Beech (HP): No, it is not that type anymore.

98 SIGMOD RECORD, Vol. 18, No. 3, September 1989 Projected attributes are new types. Thompson (TI): We are failing to separate Stein (Servio Logic): As an example, if I have a things. Our "query" language needs to do joins method M(a, b) on type T(a, b, c) and I project on sets, if we have sets. We don't have sets in T(a, c) I can't inherit method M. This is not a the DML if we provide persistent C++ unless we property of SQL or SQL++. add sets to C++. Strong (Ontologic): The problem is, SQL only Beech (HP): We superimpose a query language returns structural relations, not first class ob- on top of ADTs: for instance, jects with behavior. create type person select for each person ... Beech (HP): But the attributes are still objects Now you aren't operating on tables, you are it- with their own methods and you could make the erating over instances of a type. resulting relation into a type. Audience: What you need is a generalized set Audience: Those of you with an SQL++, what constructor and a first order language, which kinds of object-oriented characteristics did you could be a generalized framework from which loose? SQL++ or Daplex could be built. Strong (Ontologic): We gave up the ability to Subramanyam (DEC): Are you suggesting type treat the result of the query as objects. conversions between persistent data types in dif- Beech (HP): We just added a new language on ferent languages? top of our existing system and didn't have to give Thompson (TI): I am suggesting that we use the up anything from the implementation. typ~ conversion machinery we'd need anyway for Thompson (TI): Vbase returns copies of data, remote procedure calls. Most C applications will not object-ids, wanting to avoid giving the user work on persistent C data types; the same for a handle on database objects, right? other languages. Strong (Ontologic): Stated differently, I would Thatte (TI): The MIT Mercury System (Com- worry about update anomalies and object-size. mon System) by Liskov and CMU's Matchmaker Handsaker (Ontologic): An implementation both implement this sort of inter language com- question, how much implementation work is munication. shared between SQL++ and someone's similar Strong (Ontologic): I don't see how to do those object-oriented query language? How much of conversions for different data models where in- the query processor and optimizer are reusable? heritance, etc., is different. Thompson (TI): We need a parser for SQL++ Thompson (TI): Nor do I think the database sys- that translates to our query processor. If we use tem can. People who want to share data across a pre processor that translates from SQL++ to languages will have to do the conversion work SQL plus foreign function calls (by analogy with themselves. C++), then we give up a lot, like query opti- Audience: You are going in two directions: Mer- mization. cury, etc, are decoupling state to avoid sharing; Handsaker(Ontologic): Would you do joins in they pass information via call by value for RPCs. your OODB? Audience: But researchers are using P~PC call by Strong (Ontologic): Ideally, yes, you'd do joins, reference schemes. projects, selects, and navigation.

SIGMOD RECORD, Vol. 18, No. 3, September 1989 99 Thatte (TI): Certainly, we don't expect behav- database. ioral operators written in one language to oper- Audience: It would be useful to introduce fiat ate on data stored native to another language. relations as one primitive type in your OODB. Audience: What we, the database community, Strong (Ontologic): Conceptually, we have taken should provide is the basic underpinnings of a the properties and operations of an object and data model. viewed them as the fields of a tuple. You want Audience: If you can say, "find all the objects to have an ADT called table. Now the problem that ..." then you need to get at all the objects. is to get objects into the table. We flatten the Thompson (TI): Not everything is a set. You object world for purposes of querying. can do set queries to sets and other operations Audience: What I am saying is simpler. Just do on other persistent objects. query operations on table types, not require all Strong (Ontologic): That's why I suggest the ADT instances to be in tables. graph mapping approach. Thompson (TI): I agree. You will need all the Beech (HP): Even with a very flexible graph select-project-join machinery anyway. If I define matching, you still need to get to those instances. my accessors appropriately, I hope to use the In IRIS, functions, or navigation, would give you same query engine on fiat tables or on objects a way to get to objects; not all items would have in an inheritance hierarchy. to be in sets. Stein (Servio Logic): In Servio Logic, all queries are against collections, not against types. It is 6 Conclusion too inefficient to have all instances be in a col- lection. The major differences between OODBs that Audience: If you don't know what the cardinal- were discussed in the Architecture panel con- ity of a relation is, how do you take advantage cerned the data model and the object repre- of this for query optimization? sentations. With respect to data model, two schools of thought emerged: persistent exten- Stein (Servio Logic): But many systems in prac- sions of existing programming languages and tice don't actually use cardinality information in new object-oriented languages designed specifi- query optimization. cally for OODBs. Movement of objects between Audience: How do you do query optimization external and computational forms is done differ- when you don't know what the operators are? ently by the systems represented: some systems Thompson (TI): Exodus does this. In 1984, translate between distinct representations while Stonebraker suggested ways of registering ADTs others use the same representation internally and and operators with the database. We do this in externally. The former has a performance ad- our query processor. vantage during application execution while the latter has an advantage during data movement. Beech (HP): Even unoptimized, users get the ad- It was not clear which of these models will dom- vantage of using the mechanism the database inate. A final difference concerned the use of system provides. Even without optimization, existing storage mechanisms such as relational you don't have to cross the database boundary a databases to store objects vs. the use of custom million times to evaluate your C program in the object servers.

100 SIGMOD RECORD, Vol. 18, No. 3, September 1989 The panel on Transactions for Cooperative De- that OODBs should provide set-oriented query sign Work agreed on the need for nested transac- capabilities. Some systems equate sets and tions as fundamental for OODBs. Concurrency classes; others allow persistent objects that can- is generally achieved by locking. There was lit- not be queried with set machinery but are ac- tle agreement about what is needed to support cessible through navigation. Panelists varied on cooperative design work, although there were in- the desirability of seamlessness; object-oriented triguing ideas related to the use of triggers, light- extensions to SQL provide a rich common data weight nested transactions, and goal-based co- model but do not solve the impedance mismatch operation. Little has actually been implemented between a host language and an embedded DDL; yet. The question of specifying objects by object persistent languages solve this problem but re- ID or by key was raised. It was generally agreed quire the database to support several program that OODBs should support both forms of ref- data models. Seaanlessness with respect to a erence and that object identity is fundamental query capability implies that programming lan- to the concept of objects. The issue of the ef- guages that do not support sets needs to be fect of database constructs, such as transaction- extended. All the panelists described systems abort, on transient state was raised. This issue that permit methods on classes; however sys- will be resolved only by a joint programming lan- tems differ on whether users can redefine acces- guage/database approach and is an impediment sor methods, whether methods can contain ar- to seamlessness with respect to existing program- bitrary code, and how to optimize user-defined ruing languages. methods. The panel on Schema Evolution and Version The format of the workshop as four panels helped Management addressed the topic of change man- the participants engage in focused discussion on agement from the points of view of database de- a variety of topics. There are common ideas and sign, implementation, and use. Change manage- concepts in the systems discussed in the work- ment includes the management of the evolution shop; however, details of design and implemen- of database objects and the database schema. tation differ from one system to the other. Fun- Schema evolution is a concern in object-oriented damental concepts like object identity and evolu- systems because the dynamic nature of typical tion of types are still in a fuzzy state. Seamless- OODB applications calls for frequent changes hess was an issue of much interest. The discus- in the schema. Most commercial and research sions in the panel sessions indicate that there will systems recognize the need for version manage- be lot of activity in the areas of query languages ment; full fledged implementations of version- and data sharing between OODBs and conven- ing do not exist in any system. A majority of tional databases. Finally, the OODB workshop the panel members felt that change management is an excellent forum for exchange of the latest needs to be thought of as a layer on top of the ideas in OODB and we like to see this forum OODB, with the OODB providing low level sup- continue into future years. port for primitive, linear versioning. The pan- elists agreed that schema evolution needs to be handled using a version control system. A con- cern was the performance penalty incurred in the presence of extensive schema modifications. The panel on Query Processing generally agreed

SIGMOD RECORD, Vol. 18, No. 3, September 1989 101 \ OBJECT-ORIENTED DATABASE .WORKSHOP OOPSLA '88 Conference on Object-Oriented Programming: Systems, Languages, and Applications 26 September 1988 San Diego, California, U.S.A. Workshop Chairman: Satish Thatte (Texas Instruments)

Architecture: 8:30AM- 10:00 AM Patrice Anota (Graphael), Craig Damon (Ontologie), Patrick O'Brien (DEC), Tom Ryan (HP), Jacob Stein (Servia Logic), David Wells (TI) 1) OODB subsystems: What are the major subsystems of an OODB, and how are they interconnected or layered? Candidate subsystems are: disk-resident object management, in-memory object management, object translation service which translates objects between in-memory representation and disk-resident representation, transaction manager, version manager, and query processor. 2) Seamlessness: Is it possible to develop an object-oriented data model that is a seamless extension of the host object-oriented programming language? If so, how, and what are the advantages? A seamless object-oriented data model should be contrasted with object­ oriented data models that have different type systems from the type system of the object­ oriented programming language. 3) Performance issues: What is a good performance metric for OODBs (navigational, i.e., pointer chasing speed, or query throughput, or query response time)? What are the major performance bottlenecks in OODB systems, and what are potential solutions? What is the potential of using parallel processors and massive amounts of main memories to improve OODB performance? 4) Connectivity issues: Are "stand-alone" OODBs really viable? And if not, how should an OODB be connected to conventional databases?

Transactions for cooperative design work: 10:30 AM - 12:00 noon Bob Handsaker (Ontologie), Eliot Moss (Univ. of Massachusetts), Tore Risch (HP), Craig Schaffert (DEC), Jacob Stein (Servia Logic), David Wells (TI) 1) Transaction metaphor for cooperative design work: What is the role of long transactions (sagas), nested transactions, and object versions in cooperative design work? An important issue to be considered is: Should long transactions and object versions be fundamental concepts supported by the database or layered on top of primitives supplied by the database? One side of the argument states that different design application areas (such as CASE, CAD, AI, etc.) have different needs in these areas and so the database should not dictate policy on these issues - rather it should provide conventional transac­ tion mechanisms, primitives to support versioning, and have a layer developed above the database that sets the policy for the particular application area. 2) Inter-transaction communication: Should the transaction model provide facilities for inter-transaction communication for cooperative design work? What are good abstrac­ tions for inter-transaction communication (e.g., semaphores, message queues)? What are the appropriate building-blocks to facilitate construction of new types of inter-transaction communication objects? 3) Concurrency control: What is your opinion or experience on various concurrency control schemes, such as pessimistic vs. optimistic? What should be the granularity for locking? .:::: individual objects? object clusters? / ~ 4) Experiences with type-specific concurrency control: What is your opinion or experience " .. in user-defined atomic types? How can they be used to increase the amount of concurrency?

Schema evolution and version management: 1:30PM-3:00PM Lougi Anderson (Servia Logic), Bob Handsaker (Ontologie), John Joseph (TI), Mike Killian (DEC), Brom Mahbod (HP), Stan Zdonik (Brown University) 1) Architectural issues: Is version and configuration management capability a distinct layer in your OODB? Can schema evolution and version management be decoupled from the object-oriented data model? 2) Primitives for schema evolution and version management: What are the primitives for schema evolution and object versions? Candidate primitives are: versions, configurations, transformations, validation, baseline, contexts, branching, merging. 3) Schema evolution: Proposals for schema evolution promise to maintain structural in­ variants. Should user-defined side-effects also be supported and, if so, how? How is ver­ sion/ configuration management related to schema evolution? In actual practice, is schema evolution useful when there are massive rearrangements in a design? 4) Compile time vs. run-time trade-offs: What tradeoff's have to be made with respect to object representation and generated code for methods to support schema evolution (pay me now or pay me later kinds of issues) - for example, you can do a lot of work at compile time to avoid any run time perform~nce hits but this may be very cumbersome for supporting schema evolution. '

Query processing: 3:30PM-5:00PM Patrice Anota (Graphael), David Beech (HP), Jacob Stein (Servia Logic), Bob Strong (Ontologie), Craig Thompson (TI) 1) Object-oriented data model: Much of the value provided by SQL as a query language standard in the relational database world is possible only because there is a relational data model which is (nearly) universally understood. What are the characteristics of an object­ oriented data model which can provide the same commonality among OODBs that SQL provides among relational databases? Will there be a "standard" object-oriented data model that can be shared from many host programming languages? What are the issues involved in defining such a standard? If such an approach is not realistic, how can information captured in an OODB be shared among applications written in different languages? 2) Behavioral queries: Since OODBs, unlike conventional databases, support behavior as well as state for the objects represented, what is the place for the behavioral query and how might it be expressed? 3) Whither SQL++ ?: Most applications of object-oriented systems seem to develop a nav­ igational flavor to them, rather than a set-oriented flavor. What sort of query language would best support this characteristic? Is it conceivable and useful to extend SQL to de­ velop "SQL++" for object-oriented queries? 4) Query optimization: Notions of TYPE (as opposed to CLASS or SET) are sometimes used in object systems to allow the optimization of applications which use the database. Are these same notions sufficient to optimize object-oriented queries? What optimizations are most important?