Applying Model Management to Classical Meta Data Problems
Total Page:16
File Type:pdf, Size:1020Kb
Applying Model Management to Classical Meta Data Problems Philip A. Bernstein Microsoft Research One Microsoft Way Redmond, WA 98052-6399 [email protected] Abstract design and its implementation, • mapping source makefiles into target makefiles, to Model management is a new approach to meta data drive the transformation of make scripts from one management that offers a higher level programming programming environment to another, and interface than current techniques. The main abstrac- • mapping interfaces of real-time devices to the tions are models (e.g., schemas, interface defini- interfaces required by a system management environ- tions) and mappings between models. It treats these ment to enable it to communicate with the device. abstractions as bulk objects and offers such opera- Following conventional usage, we classify these as meta tors as Match, Merge, Diff, Compose, Apply, and data management applications, because they mostly ModelGen. This paper extends earlier treatments of involve manipulating descriptions of data, rather than the these operators and applies them to three classical data itself. meta data management problems: schema integra- Today’s approach to implementing such applications is tion, schema evolution, and round-trip engineering. to translate the given models into an object-oriented representation and manipulate the models and mappings 1 Introduction in that representation. The manipulation includes Many information system problems involve the design, designing mappings between the models, generating a integration, and maintenance of complex application model from another model along with a mapping between artifacts, such as application programs, databases, web them, modifying a model or mapping, interpreting a sites, workflow scripts, formatted messages, and user mapping, and generating code from a mapping. Database interfaces. Engineers who perform this work use tools to query languages offer little help for this kind of manipulate formal descriptions, or models, of these manipulation. Therefore, most of it is programmed using artifacts, such as object diagrams, interface definitions, object-at-a-time primitives. database schemas, web site layouts, control flow We have proposed to avoid this object-at-a-time diagrams, XML schemas, and form definitions. This programming by treating models and mappings as manipulation usually involves designing transformations abstractions that can be manipulated by model-at-a-time between models, which in turn requires an explicit and mapping-at-a-time operators [6]. We believe that an representation of mappings, which describe how two implementation of these abstractions and operators, called models are related to each other. Some examples are: a model management system, could offer an order-of- • mapping between class definitions and relational magnitude improvement in programmer productivity for schemas to generate object wrappers, meta data applications. • mapping between XML schemas to drive message The approach is meant to be generic in the sense that a translation, single implementation is applicable to all of the data • mapping between data sources and a mediated schema models in the above examples. This is possible because to drive heterogeneous data integration, the same modeling concepts are used in virtually all mod- • mapping between a database schema and its next eling environments, such as UML, extended ER (EER), release to guide data migration or view evolution, and XML Schema. Thus, an implementation that uses a • mapping between an entity-relationship (ER) model representation of models that includes most of those and a SQL schema to navigate between a database concepts would be applicable to all such environments. There are many published approaches to the list of meta Permission to copy without fee all or part of this material is granted data problems above and others like them. We borrow provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the from these approaches by abstracting their algorithms into publication and its date appear, and notice is given that copying is by a small set of operators and generalizing them across permission of the Very Large Data Base Endowment. To copy otherwise, applications and, to some extent, across data models. We or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 2003 CIDR Conference thereby hope to offer a more powerful database platform map1 Given: S1, S2, map1, SW for such applications than is available today. S1 SW 1. map2 = Match(S1, S2) In a model management system, models and mappings 1. map2 2. map3 = are syntactic structures. They are expressed in a type 2. map3 Compose(map1, map2) system, but do not have additional semantics based on a 3. map4 3. < S3 ,map4> = constraint language or query language. Despite this S2 S3 Diff(S2, map3) limited expressiveness, model management operators are powerful enough to avoid most object-at-a-time pro- Figure 1 Using model management to help generate a gramming in meta data applications. And it is precisely data warehouse loading script this limited expressiveness that makes the semantics and amples to demonstrate that model management is a credi- implementation of the operators tractable. ble approach to solving problems of this type. Although Still, for a complete solution, meta data problems often this paper is not the first overview of model management, require some semantic processing, typically the manipula- it is the most complete proposal to date. Past papers pre- tion of formulas in a mathematical system, such as logic sented a short vision [5,6], an example of applying model or state machines. To cope with this, model management management to a data warehouse loading scenario [7], an offers an extension mechanism to exploit the power of an application of Merge to mediated schemas [22], and an inferencing engine for any such mathematical system. initial mathematical semantics for model management [1]. Before diving into details, we offer a short preview to We also studied the match operator [23], which has see what model management consists of and how it can developed into a separate research area. This paper offers yield programmer productivity improvements. First, we the following new contributions to the overall program: summarize the main model management operators: • The first full description of all of the model • Match – takes two models as input and returns a management operators. mapping between them • New details about two of the operators, Diff and • Compose – takes a mapping between models A and B Compose, and a new proposed operator, ModelGen. and a mapping between models B and C, and returns a • Applications of model management to three well mapping between A and C known meta data problems: schema integration, • Diff – takes a model A and mapping between A and schema evolution, and round-trip engineering. some model B, and returns the sub-model of A that We regard the latter as particularly important, since they does not participate in the mapping offer the first detailed demonstration that model manage- • ModelGen – takes a model A, and returns a new ment can help solve a wide range of meta data problems. model B based on A (typically in a different data The paper is organized as follows: Section 2 describes model than A’s) and a mapping between A and B the two main structures of model management, models • Merge – takes two models A and B and a mapping and mappings. Section 3 describes the operators on between them, and returns the union C of A and B models and mappings. Section 4 presents walkthroughs of along with mappings between C and A, and C and B. solutions to schema integration, schema evolution, and Second, to see how the operators might be used, consi- round-trip engineering. Section 5 gives a few thoughts der the following example [7]: Suppose we are given a about implementing model management. Section 6 discusses related work. Section 7 is the conclusion. mapping map1 from a data source S1 to a data warehouse SW, and want to map a second source S2 to SW, where S2 is similar to S1. See Figure 1. (We use S1, SW, and S2 to 2 Models and Mappings name both the schemas and databases.) First we call 2.1 Models Match(S1, S2) to obtain a mapping map2 between S1 and For the purposes of this paper, the exact choice of model S2, which shows where S2 is the same as S1. Second, we representation is not important. However, there are call Compose(map1, map2) to obtain a mapping map3 several technical requirements on the representation of between S2 and SW, which maps to SW those objects of S2 models, which the definitions of mappings and model that correspond to objects of S1. To map the other objects management operators depend on. of S2 to SW, we call Diff(S2, map3) to find the sub-model First, a model must contain a set of objects, each of S3 of S2 that is not mapped by map3 to SW, and map4 to which has an identity. A model needs to be a set so that identify corresponding objects of S2 and S3. We can then its content is well-defined (i.e., some objects are in the set call other operators to generate a warehouse schema for S3 while others are not). By requiring that objects have iden- and merge it into SW. The latter details are omitted, but we tity, we can define a mapping between models in terms of will see similar operator sequences later in the paper. mappings between objects or combinations of objects. The main purpose of this paper is to define the seman- Second, we want the expressiveness of the representa- tics of the operators in enough detail to make the above tion of models to be comparable to that of EER models.