Bulk Loading Techniques for Object Databases and an Application to Relational Data

Bulk Loading Techniques for Object Databases and an Application to Relational Data Sihem Amer-Yahia Sophie Cluet Claude Delobel INRIA, BP 105 INRIA, BP 105 University of Paris XI 78153 Le Chesnay, France 78153 Le Chesnay, France LRI, 91405 Orsay, France [email protected] [email protected] [email protected] OsTechnology [BDK92] is used in that manner on in- dustrial databases supporting several gigas of data. Abstract The migration problem turns out to be extremely complicated. It is not rare to find migration programs We present a framework for designing, in a requiring hours and even days to be processed. Fur- declarative and flexible way, efficient migra- thermore, efficiency is not the only aspect of the prob- tion programs and an undergoing implemen- lem. As we will see, flexibility in terms of database tation of a migration tool called RelOO whose physical organization and decomposition of the migra- targets are any ODBC compliant system on tion process is at least as important. In this paper, we the relational side and the 02 system on the propose a solution to the data migration problem pro- object side. The framework consists of (i) viding both flexibility and efficiency. The main orig- a declarative language to specify database inality of this work is an optimization technique that transformations from relations to objects, but covers the relational-object case but is general enough also physical properties on the object database to support other migration problems such as object- (clustering and sorting) and (ii) an algebra- object, ASCII-object or ASCII-relational. In order to based program rewriting technique which op- validate our approach, we focus on the relational to timizes the migration processing time while object case. taking into account physical properties and Relational and object database vendors currently transaction decomposition. To achieve this, provide a large set of connectivity tools and some of we introduce equivalences of a new kind that them have load facilities (e.g., ObjectStore DB Con- consider side-effects on the object database. nect, Oracle, GemConnect, ROBIN [Exe97], Persis- tence [INC93], OpenODB [PacSl]). However, these 1 Introduction tools are either not flexible, or inefficient or too low- level. So far, the scientific community has mainly fo- The efficient migration of bulk data into object cused on modeling issues and has provided sound so- databases is a problem frequently encountered in prac- lutions to the problem of automatically generating ob- tice. This occurs when applications are moved from ject schemas from relational ones (e.g. [Leb93, FV95, relational to object systems, but more often, in ap- Hai95, MGG95, JSZ96, DA97, RH97, Fon97, BGD97, plications relying on external data that has to be re- IUT98]). Our approach is complementary and can be freshed regularly. Indeed, one finds more and more used as an efficient support for these previous propos- object replications of data that are used by satel- als. In a manner similar to [WN95], we focus on the lite Java or C++ persistent applications. For cx- problem of incremental loading of large amount of da- ample, we know that the 02 database system from ta in an object database taking into account transaction decomposition. Additionally, we propose opti- Permission to copy without fee all or part of this material is mization techniques and the means to support appro- granted provided that the copies are not made OT dwkibuted for direct commercial advantage, the VLDB copyright notice and priate physical organization of the object database. the title of the publication and its date appear, and notice is One major difficulty when designing a tool is to given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee come up with the appropriate practical requirements. and/or special permission from the Endowment. Obviously, every tool should provide efficiency and Proceedings of the 24th VLDB Conference user-friendliness. In order to understand what more New York, USA, 1998 was required from a relational-object migration sys- 534 tern, we interviewed people from OzTechnology. This gration process. (5) Optimization based on rewriting led us to the conclusion that flexibility was the most follows taking into account user physical constraints. important aspect of the problem. By flexibility we (6) Then, migration programs are generated that can mean that users should have the possibility to orga- be parameterized by load size or processing time. (7) nize the output object database according to the needs To achieve the migration, the user (or a batch pro- of applications that will be supported on it. For in- gram) launches these programs with the appropriate stance, they should be able to specify some particular parameters whenever the system is available (i.e., not clustering of objects on disk. Also, users should be busy with other important processing). able to slice the migration process into pieces that are In this paper, we describe the RelOO language to run whenever and for as long as appropriate so that specify logical translations and physical constraints, (i) relational and object systems are not blocked for the rewriting technique and the optimizer and code long periods of time or at a time that is inappropriate generator we implemented. We do not describe the (i.e., when there are other important jobs to perform) object schema construction which is rather obvious. and (ii) a crash during the migration process entails minimal recovery. 3 The Specification Language In this paper, we present (i) a framework for design- The RelOO language is simple and declarative. It ing, in a declarative and flexible way, efficient migra- is used to express logical database transformations as tion programs and (ii) an undergoing implementation well as physical database organization. The complete of a migration tool called RelOO. The language pro- syntax of this language is given in [AY98]. posed allows capturing most automatically generated mappings as proposed by, e.g., [Leb93, FV95, Hai95, 3.1 Logical Database Transformation MGG95, JSZ96, DA97, RH97, Fon97, BGD97, IUT98]. An important characteristic of the optimization tech- Database schema transformations from relational to nique is that it is not restricted to the “relations to- object systems have been largely addressed. No- wards objects” case. It can be used to support dump- tably, semantical relational properties such as nor- ing and/or reloading of any database or other kinds of mal forms, keys, foreign keys, existing relationships migration processes (objects to objects, ASCII to ob- and inclusion dependencies have been used to auto- jects, etc.). The RelOO system, we present here, is an matically generate a fait/&Z object image of the re- implementation of this general framework with a given lational schema. We do not address this issue here search strategy with any ODBC compliant system on but provide a simple language that can be used to the relational side and the O2 system on the object implement these approaches and that supports effi- side. cient optimization strategies. This language is defined In this paper, we assume that there are no updates in the same spirit as existing object views languages to the relational database during the migration pro- (e.g. [AB91, SLT91, Ber91, Run92, DdST95]). Classes cess. This is a strong limitation but we believe this is (generated from one or several relations), simple and an orthogonal issue that does not invalidate our ap- complex attributes, class hierarchies, aggregation links proach. The declarativity of the migration specifica- and ODMG-like relationships [ABDf94] can be cre- tion language we propose should allow the use of the ated from any relational database. Because we focus systems’ log mechanism to take updates into account. on data migration, we illustrate only some of the lan- The paper is organized as follows. Section 2 ex- guage features using a simple example. Part of this plains the migration process. Section 3 introduces our example will be used in the sequel to discuss the ef- language. In Section 4, we present the algebraic frame- ficiency and flexibility issues that constitute our core work, investigate various migration parameters and contribution: give some examples of appropriate program rewriting. r-employee[emp:integer,lname:string,fname:string, In Section 5, we give a short overview of the RelOO town:string,function:string,salary:real] system optimizer and code generator. Section 6 con- cludes this paper. 2 The Migration Process Figure 1 illustrates our view of the migration process. The user specifies the process by means of a program containing two parts that concern (1) the logical transformations from relational to object schemas and bases, and (2) the physical constraints that should be achieved. The translation specification is given to a first module that generates (3) the object schema Figure 1: The Migration Process and (4) a default algebraic representation of the mi- 535 r-project[name:string,topic:string,address:string, For instance, the name Employees will be related to boss:integer] all employees who do not live in “Rocquencourt” and r-emp-prj[p-roject:string,employee:integer] are not researchers (function #’ Researcher’). Relations r-employee and r-project provide infor- The two create link statements define, respec- mation on, respectively, employees and projects in a tively, a reference link manages and an ODMG rela- research institute. In our example, they are used to tionship (attribute and its inverse) staff/projects. construct three object classes (Employee, Researcher The first one links each employee in Class Employee and Project). The underlined attributes are the keys to the projects he/she manages (if he/she is a manag- of each relation. These are required to maintain exist- er). Since, researchers are employees, they inherit this ing links between tuples in the relational database and link.

Bulk Loading Techniques for Object Databases and an Application to Relational Data

Understanding the Hidden Web

The Best Nurturers in Computer Science Research

Essays Dedicated to Peter Buneman ; [Colloquim in Celebration

INF2032 2017-2 Tpicos Em Bancos De Dados III ”Foundations of Databases”

Author Template for Journal Articles

Data on the Web: from Relations to Semistructured Data and XML

O. Peter Buneman Curriculum Vitæ – Jan 2008

SIGMOD Record’S Series of Interviews with Distinguished Members of the Database Community

Example-Guided Synthesis of Relational Queries

Paris C. Kanellakis

Efficient Maintenance Techniques for Views Over Active Documents

A Formal Study of Collaborative Access Control in Distributed Datalog