A Data Warehouse Conceptual Data Model for Multidimensional Aggregation

A Data Warehouse Conceptual Data Model for Multidimensional Aggregation

A Data Warehouse Conceptual Data Model for Multidimensional Aggregation Enrico Franconi Ulrike Sattler Dept. of Computer Science LuFG Theoretical Computer Science Univ. of Manchester RWTH Aachen Manchester M13 9PL, UK D-52074 Aachen, Germany [email protected] [email protected] the fact that (1) experiences in the field of databases have proved that conceptual modelling is crucial for the design, Abstract evolution, and optimisation of a database, (2) a great va- riety of data warehouse system are on the market, most This paper presents a proposal for a Data Ware- of them providing some implementation of multidimen- house Conceptual Data (CDWDM) Model which sional aggregation, and (3) query optimisation with aggre- allows for the description of both the relevant ag- gated queries [Nutt et al., 1998; Cohen et al., 1999] is even gregated entities of the domain—together with more crucial for data warehouses than it is for databases— their properties and their relationships with other which makes semantic query optimisation using a concep- relevant entities—and the relevant dimensions in- tual model even more important. As a consequence of volved in building the aggregated entities. The the absence of a such an extended modelling formalism, proposed CDWDM is able to capture the database a comparison of different systems or language extensions schemata expressed in an extended version of the for query optimisation is difficult: a common framework in Entity-Relationship Data Model; it is able to in- which to translate and compare these extensions is missing, troduce complex descriptions of the structure of new query optimisation techniques developed for extended aggregated entities and multiply hierarchically or- schema and/or query languages cannot be compared appro- ganised dimensions; it is based on Description priately. Logics, a class of formalisms for which it is possi- In order to address these questions, a formal framework ble to study the expressivity in relation with decid- must be developed that encompasses the abstract principles ability of reasoning problems and completeness of of the data warehouse related extensions of traditional rep- algorithms. resentation formalisms. In this paper, we present some pre- liminary outcome from the research done within the “Foun- 1 Introduction dations of Data Warehouse Quality” (DWQ) long term re- search project, funded by the European Commission (n. Data Warehouse—and especially OLAP—applications ask 22469) under the ESPRIT Programme. With respect to for the vital extension of the expressive power and func- the global picture, the role of our research within DWQ tionality of traditional conceptual modelling formalisms in is to study a formal framework at the conceptual level (see order to cope with aggregation. Still, there have been few Figure 1). The conceptual data model we are investigat- attempts [Catarci et al., 1995; Cabibbo and Torlone, 1998] ing should be able to abstract and describe the entities and to provide such an extended modelling formalism, despite relations which are relevant both in the whole enterprise, and in the user analysis of such information. In the follow- The copyright of this paper belongs to the paper’s authors. Permission to ing, we will refer to this formalism as the Data Warehouse copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage. Conceptual Data Model (DWCDM). Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW’99) 1.1 A Data Warehouse Conceptual Data Model Heidelberg, Germany, 14. - 15.6. 1999 A DWCDM must provide means for the representation of a (S. Gatziu, M. Jeusfeld, M. Staudt, Y. Vassiliou, eds.) multidimensional conceptual view of data. More precisely, http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-19/ a DWCDM provides the language for defining multidimen- E. Franconi, U. Sattler 13-1 Client Client REGION W Client S model schema Data N Store Juice 10 Cola 13 Data Soap Enterprise Warehouse Data model schema Warehouse Store PRODUCT Source Source Source model schema Data Store JanFeb MONTH Conceptual Logical Physical level level level Figure 2: Sales volume as a function of product, time, lo- cation. Figure 1: The role played by the Data Warehouse Concep- tual Data Model with respect to the DWQ architecture. For instance, a spatial dimension might have a hierarchy with levels such as country, region, city, office. sional information within a conceptual model in the data Measures (which are also known as variables or warehouse global information base. As stated above, the metrics)—like Sales in the example, or budget, revenue, model is of support for the conceptual design of a data inventory, etc.—in a multidimensional array correspond to warehouse, for query and view management, and for up- columns in a relational database table whose values func- date propagation: it serves as a reference meta-model for tionally depend on the values of other columns. Values deriving the inter-relations among entities, relations, aggre- within a table column correspond to values for that mea- gations, and for providing the integrity constraints neces- sure in a multidimensional array: measures associate val- sary to reduce the design and maintenance costs of a data ues with points in the multi-dimensional world. For ex- warehouse. Hence a DWCDM must be expressive enough ample, the measure of the sales of the product Cola, in to describe both the abstract business domain concerned the northern region, in January, is 13,000. Thus, a dimen- with the specific application (Enterprise model)—just like sion acts as an index for identifying values within a multi- a conceptual schema in the traditional database world—and dimensional array. If one member of the dimension is se- the possible views of the enterprise information a specific lected, then the remaining dimensions in which a range of user may want to analyse (Client model)—with particular members (or all members) are selected defines a sub-cube. emphasis on the aggregated views, which are peculiar to a If all but two dimensions have a single member selected, data warehouse architecture (see Figure 1). A multidimen- the remaining two dimensions define a spreadsheet (or a sional modelling object in the logical perspective—e.g., a slice or a page). If all dimensions have a single member materialised view, a query, or a cube—should always be selected, then a single cell is defined. Dimensions offer a related with some (possibly aggregated) entity in the con- very concise, intuitive way of organising and selecting data ceptual schema. for retrieval, exploration and analysis. Usual pre-defined In the following, we will briefly introduce the ideas be- or user-defined dimension levels (or Roll-Ups ) for aggre- hind a multidimensional data model (see, e.g., [Agrawal gating data in DW are: temporal (e.g., year vs. month), et al., 1995; Cabibbo and Torlone, 1998]) and compare geographical/spatial (e.g., Rome vs. Italy), organisational it with a traditional relational data model. A more com- (meaning the hierarchical breakdowns of your organisa- prehensive introduction has been done in the forthcoming tion, e.g., Institute vs. Department), and physical (e.g., Car book “Fundamentals of Data Warehousing” [Baader et al., vs. Engine). 1999], Chapter 4 on Multidimensional Aggregation. A value in a single cell may represent an aggre- Relational database tables contain records (or rows). gated measure computed from more specific data at some Each record consists of fields (or columns). In a normal re- lower level of the same dimensions. Aggregation in- lational database, a number of fields in each record (keys) volves computing aggregation functions—according to may uniquely identify each record. In contrast, a multidi- the attribute hierarchy within dimensions or to cross- mensional database contains n-dimensional arrays (some- dimensional formulas—for one or more dimensions. For times called hypercubes or cubes), where each dimension example, the value 13,000 for the sales in January, may has an associated hierarchy of levels of consolidated data. have been consolidated as the sum of the disaggregated val- E. Franconi, U. Sattler 13-2 Calls (av. duration) Date (Day) 1/1/99 2/1/99 3/1/99 4/1/99 5/1/99 Cell E Land Line 1 Direct Data Source (Point Type) PABX Calls (av. duration) Date (Week Day) Mon Tue Wed Thu Fri Sat Sun E Consumer 2 Source (Customer Type) Business Figure 3: The cubes reporting the average duration of calls by dates in days and sources in point types, and by dates at the level of week days and sources at the level of customer types. ues of the weekly (or day-by-day) sales. Another example model for both the conceptual and the logical levels; our introducing an aggregation grounded on a different dimen- proposal is compatible with the DWCDM presented in sion is the cost of a product—e.g., a car—as being the sum [Calvanese et al., 1998c]. of the costs of all of its components. The paper is organised as follows. Section 2 infor- In order to provide an adequate conceptualisation of mally introduces an extended ER formalism which allows multidimensional information, a DWCDM should provide for the description of the explicit structure of multidimen- the possibility of explicitly modelling the relevant aggre- sional aggregations; the section briefly describes the se- gations and dimensions. According to a conservative point mantics of the conceptual data model in terms of a logi- of view, a desirable DWCDM should extend some standard cal representation of multidimensional databases, as pro- modelling formalism (such as Entity-Relationship) to al- posed by [Cabibbo and Torlone, 1998]. Section 3 will low for the description of both aggregated entities of the propose a basic modelling language—based on Description domain—together with their properties and their relation- Logics—which is expressive enough to capture the Entity- ships with other relevant entities—and the dimensions in- Relationship Data Model.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us