Compiling Mappings to Bridge Applications and Databases

Compiling Mappings to Bridge Applications and Databases Sergey Melnik Atul Adya Philip A. Bernstein Microsoft Research Microsoft Corporation Microsoft Research One Microsoft Way One Microsoft Way One Microsoft Way Redmond, WA 98052, U.S.A. Redmond, WA 98052, U.S.A. Redmond, WA 98052, U.S.A. [email protected] [email protected] [email protected] ABSTRACT relational (O/R) mapping scenarios where a set of objects is par- Translating data and data access operations between applications titioned across several relational tables may require transforma- and databases is a longstanding data management problem. We tions that contain outer joins, nested queries, and case statements present a novel approach to this problem, in which the relationship in order to reassemble objects from tables (we will see an example between the application data and the persistent storage is specified shortly). Implementing such transformations is difficult, especially using a declarative mapping, which is compiled into bidirectional since the data usually needs to be updatable, a common require- views that drive the data transformation engine. Expressing the ap- ment for many enterprise applications. A study referred to in [22] plication model as a view on the database is used to answer queries, found that coding and configuring O/R data access accounts for up while viewing the database in terms of the application model al- to 40% of total project effort. lows us to leverage view maintenance algorithms for update trans- Since the mid 1990’s, client-side data mapping layers have be- lation. This approach has been implemented in a commercial prod- come a popular alternative to handcoding the data access logic, fu- uct. It enables developers to interact with a relational database via eled by the growth of Internet applications. A core function of such a conceptual schema and an object-oriented programming surface. a layer is to provide an updatable view that exposes a data model We outline the implemented system and focus on the challenges of closely aligned with the application’s data model, driven by an ex- mapping compilation, which include rewriting queries under con- plicit mapping. Many commercial products (e.g., Oracle TopLink) straints and supporting non-relational constructs. and open source projects (e.g., Hibernate) have emerged to offer these capabilities. Virtually every enterprise framework provides Categories and Subject Descriptors: a client-side persistence layer (e.g., EJB in J2EE). Most packaged H.2 [Database Management], D.3 [Programming Languages] business applications, such as ERP and CRM applications, incor- General Terms: Algorithms, Performance, Design, Languages porate proprietary data access interfaces (e.g., BAPI in SAP R/3). Keywords: mapping, updateable views, query rewriting Today’s client-side mapping layers offer widely varying degrees of capability, robustness, and total cost of ownership. Typically, the mapping between the application and database artifacts is rep- 1. INTRODUCTION resented as a custom structure or schema annotations that have Developers of data-centric solutions routinely face situations in vague semantics and drive case-by-case reasoning. A scenario- which the data representation used by an application differs sub- driven implementation limits the range of supported mappings and stantially from the one used by the database. The traditional rea- often yields a fragile runtime that is difficult to extend. Few data son is the impedance mismatch between programming language access solutions leverage data transformation techniques developed abstractions and persistent storage [12]: developers want to encap- by the database community. Furthermore, building such solutions sulate business logic into objects, yet most enterprise data is stored using views, triggers, and stored procedures is problematic for a in relational database systems. A second reason is to enable data in- number of reasons. First, views containing joins or unions are usu- dependence. Even if applications and databases start with the same ally not updatable. Second, defining custom database views and data representation, they can evolve, leading to differing data rep- triggers for every application accessing mission-critical enterprise resentations that must be bridged. A third reason is independence data is rarely acceptable due to security and manageability risks. from DBMS vendors: many enterprise applications run in the mid- Moreover, SQL dialects, object-relational features, and procedural dle tier and need to support backend database systems of varying extensions vary significantly from one DBMS to the next. capabilities, which require different data representations. Thus, in In this paper we describe an approach to building a mapping- many enterprise systems the separation between application mod- driven data access layer that addresses some of these challenges. els and database models has become a design choice rather than a As a first contribution, we present a novel mapping architecture technical impediment. that provides a general-purpose mechanism for supporting updat- The data transformations required to bridge applications and able views. It enables building client-side data access layers in a databases can be very complex. Even relatively simple object-to- principled way but could also be exploited inside a database engine. The architecture is based on three main steps: Permission to make digital or hard copies of all or part of this work for • personal or classroom use is granted without fee provided that copies are Specification: Mappings are specified using a declarative not made or distributed for profit or commercial advantage and that copies language that has well-defined semantics and puts a wide bear this notice and the full citation on the first page. To copy otherwise, to range of mapping scenarios within reach of non-expert users. republish, to post on servers or to redistribute to lists, requires prior specific • Compilation: Mappings are compiled into bidirectional permission and/or a fee. SIGMOD’07, June 12–14, 2007, Beijing, China. views, called query and update views, that drive query and Copyright 2007 ACM 978-1-59593-686-8/07/0006 ...$5.00. update processing in the runtime engine. 461 1. Data manipulation "SELECT p FROM SalesPeople AS p ApplicationsUsers Tools using Entity SQL WHERE p.Hired.Year > 2000" 2. CRUD on objects foreach(Person p in aw.SalesPeople) Mapping compiler UI Tools (create/read/ if(p.Hired.Year > 2000) {...} Query Update aw.SaveChanges(); views update/delete) views API Merge views 3. Language-integrated var people = from p in aw.SalesPeople Mapping generator where p.Hired.Year > 2000 queries (LINQ) select p; Language integration Object services Query pipeline Update pipeline Figure 1: Interacting with data in the Entity Framework Entity SQL Entities ∆Entities • Unfold query views Apply view Assemble • • Push relational ops maintenance to Execution: Update translation is done using algorithms for entities bidirectional views materialized view maintenance. Query translation uses view Runtime• Optimize time Design unfolding. Expression trees Tuples ∆Tables Data providers To the best of our knowledge, this mapping approach has not been exploited previously in the research literature or in commer- Database system cial products. It raises interesting research challenges. Our second contribution addresses one of these challenges— Figure 2: System architecture formulation of the mapping compilation problem and algorithms that solve it. The algorithms that we developed are based on techniques for answering queries using views and incorporate a number EDM is an extended entity-relationship model. It distinguishes of novel aspects. One is the concept of bipartite mappings, which entity types, complex types, and primitive types. Instances of en- are mappings that can be expressed as a composition of a view and tity types, called entities, can be organized into persistent collec- an inverse view. Using bipartite mappings enables us to focus on tions called entity sets. An entity set of type T holds entities of either left or right ‘part’ of mapping constraints at a time when gen- type T or any type that derives from T . Each entity type has a key, erating query and update views. Another novel aspect is a schema which uniquely identifies an entity in the entity set. Entities and partitioning approach that allows us to rewrite the mapping con- complex values may have properties holding other complex values straints using union queries and facilitates view generation. Fur- or primitive values. Like entity types, complex types can be spe- thermore, we discuss support for object-oriented constructs, case cialized through inheritance. However, complex values can exist statement generation, bookkeeping of tuple provenance, and sim- only as part of some entity. Entities can participate in 1:1, 1:n,or plification of views under constraints. m:n associations, which essentially relate the keys of the respec- The presented approach has been implemented in a commercial tive entities. Notice that a relational schema with key and foreign product, the Microsoft ADO.NET Entity Framework [1, 11]. A key constraints can be viewed as a simple EDM schema that con- pre-release of the product is available for download. tains a distinct entity set and entity type for each relational table. This paper is structured as follows. In Section 2 we give an We exploit this property to uniformly specify and manipulate map- overview of the Entity Framework. In Section 3 we outline our pings. Henceforth we use the general term extent to refer to entity mapping approach and illustrate it using a motivating example. The sets, associations, and tables. mapping compilation problem is stated formally in Section 4. The Entity SQL is a data manipulation language based on SQL. En- implemented algorithms are presented in Section 5. Experimental tity SQL allows retrieving entities from entity sets and navigating results are in Section 6. Related work is discussed in Section 7. from an entity to a collection of entities reachable via a given asso- Section 8 is the conclusion. ciation. Path expressions can be used to ‘dot’ into complex values.

Load more