A Data Warehouse Conceptual for Multidimensional Aggregation

Enrico Franconi Ulrike Sattler Dept. of Computer Science LuFG Theoretical Computer Science Univ. of Manchester RWTH Aachen Manchester M13 9PL, UK D-52074 Aachen, Germany [email protected] [email protected]

the fact that (1) experiences in the field of have proved that conceptual modelling is crucial for the design, Abstract evolution, and optimisation of a , (2) a great va- riety of data warehouse system are on the market, most This paper presents a proposal for a Data Ware- of them providing some implementation of multidimen- house Conceptual Data (CDWDM) Model which sional aggregation, and (3) query optimisation with aggre- allows for the description of both the relevant ag- gated queries [Nutt et al., 1998; Cohen et al., 1999] is even gregated entities of the domain—together with more crucial for data warehouses than it is for databases— their properties and their relationships with other which makes semantic query optimisation using a concep- relevant entities—and the relevant dimensions in- tual model even more important. As a consequence of volved in building the aggregated entities. The the absence of a such an extended modelling formalism, proposed CDWDM is able to capture the database a comparison of different systems or language extensions schemata expressed in an extended version of the for query optimisation is difficult: a common framework in Entity-Relationship Data Model; it is able to in- which to translate and compare these extensions is missing, troduce complex descriptions of the structure of new query optimisation techniques developed for extended aggregated entities and multiply hierarchically or- schema and/or query languages cannot be compared appro- ganised dimensions; it is based on Description priately. Logics, a class of formalisms for which it is possi- In order to address these questions, a formal framework ble to study the expressivity in relation with decid- must be developed that encompasses the abstract principles ability of reasoning problems and completeness of of the data warehouse related extensions of traditional rep- algorithms. resentation formalisms. In this paper, we present some pre- liminary outcome from the research done within the “Foun- 1 Introduction dations of Data Warehouse Quality” (DWQ) long term re- search project, funded by the European Commission (n. Data Warehouse—and especially OLAP—applications ask 22469) under the ESPRIT Programme. With respect to for the vital extension of the expressive power and func- the global picture, the role of our research within DWQ tionality of traditional conceptual modelling formalisms in is to study a formal framework at the conceptual level (see order to cope with aggregation. Still, there have been few Figure 1). The conceptual data model we are investigat- attempts [Catarci et al., 1995; Cabibbo and Torlone, 1998] ing should be able to abstract and describe the entities and to provide such an extended modelling formalism, despite relations which are relevant both in the whole enterprise, and in the user analysis of such information. In the follow- The copyright of this paper belongs to the paper’s authors. Permission to ing, we will refer to this formalism as the Data Warehouse copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage. Conceptual Data Model (DWCDM). Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW’99) 1.1 A Data Warehouse Conceptual Data Model Heidelberg, Germany, 14. - 15.6. 1999 A DWCDM must provide means for the representation of a (S. Gatziu, M. Jeusfeld, M. Staudt, Y. Vassiliou, eds.) multidimensional conceptual view of data. More precisely, http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-19/ a DWCDM provides the language for defining multidimen-

E. Franconi, U. Sattler 13-1 Client Client REGION W Client S model schema Data N Store Juice 10 Cola 13

Data Soap Enterprise Warehouse Data model schema Warehouse Store PRODUCT

Source Source Source model schema Data Store JanFeb MONTH Conceptual Logical Physical level level level Figure 2: Sales volume as a function of product, time, lo- cation. Figure 1: The role played by the Data Warehouse Concep- tual Data Model with respect to the DWQ architecture. For instance, a spatial dimension might have a hierarchy with levels such as country, region, city, office. sional information within a in the data Measures (which are also known as variables or warehouse global information base. As stated above, the metrics)—like Sales in the example, or budget, revenue, model is of support for the conceptual design of a data inventory, etc.—in a multidimensional array correspond to warehouse, for query and view management, and for up- columns in a relational database table whose values func- date propagation: it serves as a reference meta-model for tionally depend on the values of other columns. Values deriving the inter-relations among entities, relations, aggre- within a table column correspond to values for that mea- gations, and for providing the integrity constraints neces- sure in a multidimensional array: measures associate val- sary to reduce the design and maintenance costs of a data ues with points in the multi-dimensional world. For ex- warehouse. Hence a DWCDM must be expressive enough ample, the measure of the sales of the product Cola, in to describe both the abstract business domain concerned the northern region, in January, is 13,000. Thus, a dimen- with the specific application (Enterprise model)—just like sion acts as an index for identifying values within a multi- a conceptual schema in the traditional database world—and dimensional array. If one member of the dimension is se- the possible views of the enterprise information a specific lected, then the remaining dimensions in which a range of user may want to analyse (Client model)—with particular members (or all members) are selected defines a sub-cube. emphasis on the aggregated views, which are peculiar to a If all but two dimensions have a single member selected, data warehouse architecture (see Figure 1). A multidimen- the remaining two dimensions define a spreadsheet (or a sional modelling object in the logical perspective—e.g., a slice or a page). If all dimensions have a single member materialised view, a query, or a cube—should always be selected, then a single cell is defined. Dimensions offer a related with some (possibly aggregated) entity in the con- very concise, intuitive way of organising and selecting data ceptual schema. for retrieval, exploration and analysis. Usual pre-defined In the following, we will briefly introduce the ideas be- or user-defined dimension levels (or Roll-Ups ) for aggre- hind a multidimensional data model (see, e.g., [Agrawal gating data in DW are: temporal (e.g., year vs. month), et al., 1995; Cabibbo and Torlone, 1998]) and compare geographical/spatial (e.g., Rome vs. Italy), organisational it with a traditional relational data model. A more com- (meaning the hierarchical breakdowns of your organisa- prehensive introduction has been done in the forthcoming tion, e.g., Institute vs. Department), and physical (e.g., Car book “Fundamentals of Data Warehousing” [Baader et al., vs. Engine). 1999], Chapter 4 on Multidimensional Aggregation. A value in a single cell may represent an aggre- Relational database tables contain records (or rows). gated measure computed from more specific data at some Each record consists of fields (or columns). In a normal re- lower level of the same dimensions. Aggregation in- lational database, a number of fields in each record (keys) volves computing aggregation functions—according to may uniquely identify each record. In contrast, a multidi- the attribute hierarchy within dimensions or to cross-

mensional database contains n-dimensional arrays (some- dimensional formulas—for one or more dimensions. For times called hypercubes or cubes), where each dimension example, the value 13,000 for the sales in January, may has an associated hierarchy of levels of consolidated data. have been consolidated as the sum of the disaggregated val-

E. Franconi, U. Sattler 13-2

Calls (av. duration) Date (Day)

   1/1/99 2/1/99 3/1/99 4/1/99 5/1/99 

Cell E

Land Line 1 Direct Data

Source (Point Type) PABX

Calls (av. duration) Date (Week Day)

Mon Tue Wed Thu Fri Sat Sun E

Consumer 2 Source (Customer Type) Business

Figure 3: The cubes reporting the average duration of calls by dates in days and sources in point types, and by dates at the level of week days and sources at the level of customer types. ues of the weekly (or day-by-day) sales. Another example model for both the conceptual and the logical levels; our introducing an aggregation grounded on a different dimen- proposal is compatible with the DWCDM presented in sion is the cost of a product—e.g., a car—as being the sum [Calvanese et al., 1998c]. of the costs of all of its components. The paper is organised as follows. Section 2 infor- In order to provide an adequate conceptualisation of mally introduces an extended ER formalism which allows multidimensional information, a DWCDM should provide for the description of the explicit structure of multidimen- the possibility of explicitly modelling the relevant aggre- sional aggregations; the section briefly describes the se- gations and dimensions. According to a conservative point mantics of the conceptual data model in terms of a logi- of view, a desirable DWCDM should extend some standard cal representation of multidimensional databases, as pro- modelling formalism (such as Entity-Relationship) to al- posed by [Cabibbo and Torlone, 1998]. Section 3 will low for the description of both aggregated entities of the propose a basic modelling language—based on Description domain—together with their properties and their relation- Logics—which is expressive enough to capture the Entity- ships with other relevant entities—and the dimensions in- Relationship Data Model. The core part of the paper (Sec- volved. This document is about a proposal for a Data tion 4) shows how it is possible to translate a schema ex- Warehouse Conceptual Data Model based on the Entity- pressed in the extended ER with aggregations in a suitable Relationship model where aggregations and dimensions are Description Logics theory, allowing for reasoning services first class citizens. The data model it is based on De- such as satisfiability of a schema or the computation of a scription Logics (DL), which have been proved useful for logically implied statement, such as an implicit taxonomic a logical reconstruction of the most popular conceptual link between entities. data modelling formalisms, including the (enhanced) ER model. Advantages of using Description Logics are their 2 Modelling the Structure of Aggregation high expressivity combined with desirable computational properties—such as decidability, soundness and complete- We introduce in this section an extension of the Entity- ness of deduction procedures. The devised logic has a de- Relationship Conceptual Data Model for representing the cidable reasoning problem, thus allowing for automated structure of aggregations. Thus, a conceptual schema will reasoning over the whole conceptual representation. The be able to describe abstract properties of multidimensional presented framework extends the ideas pursued in [Cal- cubes, their interrelationships, and, most notably, their vanese et al., 1998b] regarding conceptual modelling using components. A Data Warehouse Conceptual Schema may Description Logics as a data model, and the Information In- contain detailed descriptions of the structure of aggregates, tegration framework presented in [Calvanese et al., 1998a; but it may not explicitly include aggregation functions. 1998c] based on an extended Description Logics data Aggregations are first class citizens of the representation

E. Franconi, U. Sattler 13-3 duration Dest 1,1 code 1,1 Day Date Call Point type 1,1 X Source X

Mon Tue Wed Thu Fri Sat Sun Consumer Business

XX

Cell Land Line Direct Line PABX Figure 4: The Conceptual Data Warehouse Schema for the base data considered in Figure 3.

language: it is possible to describe the components of ag- in nested concepts, and (2) concrete domains like the inte- gregations, and the relationships that the properties of the gers, the non-negative integers, the rationals, and the reals. components may have with the properties of the aggrega- tion itself; it is possible to build aggregations out of other 2.1 An extended Entity-Relationship Model aggregations, i.e., it is possible for an aggregation to be explicitly composed by other aggregations. This approach As stated in [Agrawal et al., 1995], a “good” data ware- closely resembles the one pursued by [Catarci et al., 1995; house system should support user-definable multiple hier- De Giacomo and Naggar, 1996], in the sense of proposing a archies along arbitrary dimensions. In Section 1.1 we have conceptual data model in which aggregations are first-class briefly defined a dimension as an index for identifyingmea- entities intensionally described by means of their compo- sures within a multidimensional data model. In the concep- nents. tual data model, “dimension” is a synonym for a domain of an attribute (or of attributes) that is structured by a hierar- As we have pointed out, the description of an aggre- chy and/or an order. In order to support multiple hierar- gation is not going to include a specification of how val- chies, the data model must provide means for defining and ues of its attributes are computed from attribute values of structuring these hierarchies, and for arbitrary aggregation its components using aggregation functions such as min, along the hierarchies. average, or sum. While including such constructs in the A conceptual data model where both multidimensional conceptual model is obviously important, if we restrict our aggregations and multiply hierarchically organised dimen- attention to data models which are computable (in a gen- sions can be abstracted and described can be used in query eral sense), then we should be very conservative. The rea- languages and for semantic optimization in multidimen- son for this comes from an important result of the research sional data bases. In fact, in the few attempts where a cube within the DWQ project which identifies the borders for the algebra introduces the notion of multiple dimensions and possible extensions of a Data Warehouse Conceptual Data of levels within dimensions (e.g., [Cabibbo and Torlone, Model towards the explicit inclusion of aggregation func- 1997; Vassiliadis, 1998]) the Data Warehouse Conceptual tions [Baader and Sattler, 1998]. It has turned out that the Schema can serve as a reference meta-model for deriving explicit presence of aggregation functions, when viewed as the inter-relations among levels and dimensions. a means to define new attribute values for aggregated enti- Let us now consider a concrete example related to the ties, and built-in predicates in a concrete domain increases analysis of the average duration of telephone calls accord- the expressive power of the basic conceptual model in such ing to their dates and source types. The base data involved a way that all interesting inference problems may easily in the analysis is represented at the conceptual level in Fig- become undecidable. Moreover, this result is very tightly ure 4. In order to perform the analysis, the two tables of bounded: extending a very weak Conceptual Data Model Figure 3 are materialised by the OLAP tool. Each cell allowing only basic constructs with a weak form of aggre- in the bi-dimensional cube on top denotes the aggregation gation already leads to the undecidability of reasoning – composed by all the telephone calls issued at some date i.e., no terminating procedure solving the reasoning prob- (expressed as a day of the year) and originated by a partic- lem may ever exist. On the other hand, recent research ular source (expressed as the type of the calling telephone);

has shown that appropriate restrictions of the allowed ag- the date and the source are the dimensions of the cube, E

gregation functions yield decidability of these problems. while the calls are the target. In particular, cell 1 is the ag- These results concern (1) the use of aggregation functions gregation composed by all those calls issued on 3/1/99 and

E. Franconi, U. Sattler 13-4 duration Dest 1,1 code 1,1 Day Date Call Point type 1,1 Source X

Consumer Business

XX

Cell Land Line Direct Line PABX

average(duration)

Ag-1 Point Type

Figure 5: The Conceptual Data Warehouse Schema for the upper cube considered in Figure 3. E

originating from a land line phone point. It is clear that 1 points)—more conceptual entities come into play. Figure 6 may include more than one call, and it may itself have some presents the extensions required to the original schema.

properties which depend on all of its components. For ex- E

Cell 2 is the aggregation composed by all calls issued

E E

ample, 1 may have the property average(duration) on Friday from a consumer type phone. Similar to 1 ,

which denotes the average duration of all the calls issued E 2 may have the property average(duration) which on 3/1/99 and originating from a land line phone point. Of computes the average duration of all those calls. course, this property may be computed by an appropriate Thus, we need to add both a new aggregated entity and aggregation function from the property duration of the the definitions of the newly introduced levels for the di- components. mensions date and source. The new aggregated en- An adequate basic conceptual schema for this simple tity, Ag-2, aggregates calls according to the level Week multidimensional information base should include the base day and the level Customer Type of the dimensions

entities such as Call, Day,andPhone Point and rela- E

date and source respectively. Then 2 is one of the tions such as date and source. Moreover, the schema aggregations denoted by Ag-2.ThelevelWeek Day is should also include an additional aggregated entity,say obtained by aggregating days from the partitioning of the Ag-1, namely the class denoting the aggregations of calls Day entity into seven sub-entities, namely the seven days by date and source; such an aggregated entity can also of the week. The level Customer Type is obtained have attributes such as average(duration). We can by aggregating phone points from the partitioning of the also say that Ag-1 aggregates telephone calls according Point entity into the two sub-entities Consumer and to the (basic) level Day and the level Point Type of Business. Customer Type is called simple aggrega- the dimensions date and source, respectively. The en- tion, since there is no dimension involved in its definitions. tity Point Type is itself an aggregation, aggregating all Customer Type and Week Day are levels in the mul-

the specific telephone points according to their four basic tiply hierarchically organised source and date dimensions. E types. It is clear that 1 is one of the aggregations denoted We do not formally define in this paper the syntax of the by Ag-1. extended ER model. Figure 5 presents the schema in a variant of the Entity- Relationship data model. The particular way of represent- 2.2 Semantics of the extended ER Model ing aggregated entities in the figure is inspired by [Catarci et al., 1995; De Giacomo and Naggar, 1996]. The semantics of an ER schema is given in terms of le- If we also consider as part of the multidimensionalinfor- gal multidimensional database states, i.e. multidimensional mation base the aggregated view represented by the second databases which conform to the constraints imposed by the cube of Figure 3—denoting the aggregation composed by schema. We consider as a starting point the ER semantics the telephone calls issued at some day of the week and orig- introduced in [Calvanese et al., 1998b], recasted to cope inated from some source type of a different level as before with multidimensional information. For we have chosen

(aggregated now according to consumer and business type the multidimensional logical data model MD introduced

E. Franconi, U. Sattler 13-5 duration Dest 1,1 code 1,1 Day Date Call Point type 1,1 X Source X

Mon Tue Wed Thu Fri Sat Sun Consumer Business

average(duration)

Week Day Ag-2 Customer Type

Figure 6: The Conceptual Data Warehouse Schema for the lower cube of Figure 3.

by [Cabibbo and Torlone, 1998]. MD is independent of Smolka, 1991] whose extensions have been summarised in any specific implementation of multidimensional databases [Donini et al., 1996; Calvanese et al., 1999]. (ROLAP or proprietary MOLAP), thus providing an ab- The basic types of a concept language are concepts, stract and general framework for the logical representation roles,andfeatures. A concept is a description gathering of multidimensional data. In [Cabibbo and Torlone, 1998] the common properties among a collection of individuals;

it is shown how a MD can be translated from a logical point of view it is a unary predicate. Inter- into a ROLAP logical representation in the form of a “star” relationships between these individuals are represented ei- schema, and into a general MOLAP logical representation ther by means of roles (which are interpreted as binary rela- in the form of sparse multidimensional arrays. tions) or by means of features (which are interpreted as par-

MD abstracts notions such as dimension hierarchies tial functions). Both roles and features can be used to indi- and levels, fact tables, cubes, and measures. As expected, viduals to certain properties. In the following, we will con-

dimensions are organised into hierarchies of levels, cor- sider the Description Logic ALC F I [Horrocks and Sattler,

responding to the various granularity of the basic data. 1999], extending ALC with features (i.e., functional roles), Within a dimension, levels are related through roll-up func- inverse roles, role composition, and role restrictions.

tions. The central element of a MD schema is the f-table, According to the syntax rules of Figure 7, ALC F I con-

representing factual data. An f-table is the abstract logical D cepts (denoted by the letters C and ) are built out of con- representation of a multidimensional cube, and it is a func- cept names (denoted by the letter A), roles (denoted by the

tion associating symbolic coordinates (one per involved di- f; g letter R; S ), and features (denoted by the letters ); roles mension) to measures. According to the authors, a mul- are built out of role names (denoted by the letter P )and

tidimensional database state is thus an instance of a MD features are built out of feature names (denoted by the letter logical schema: it is the description of the specific f-tables p); it is worth noting that features are considered as special involved, in the form, for example, of tables describing the cases of roles. mapping from coordinates to measures. Let us now consider the formal semantics of the Thus, a particular ER diagram denotes a set of multidi-

ALC F I .Wedefinethemeaning of concepts as sets of mensional database states, i.e., the set of all possible mul- individuals—as for unary predicates—and the meaning of

tidimensional databases described as MD instances which roles as sets of pairs of individuals—as for binary predi- I

conform to the diagram itself – i.e., they are legal states. If a I

= ;   cates. Formally, an interpretation is a pair I

diagram is inconsistent, then no multidimensional database I I consisting of a set  of individuals (the domain of )and

may conform to it. I I

a function  (the interpretation function of ) mapping ev-

I I

C  R

ery concept C to a subset of , every role to a sub-

I I

3 The basic Modelling Language I

   f

set R of , and every feature to a partial function

I I I

  In this section we give a brief introduction to a basic De- f from to , such that the equations in Figure 8 are scription Logic, which will serve as the basic representa- satisfied.

tion language for our DWCDM proposal. With respect to A knowledge base, in this context, is a finite set  of ter- the formal apparatus, we will strictly follow the concept minological axioms; it can also be called a terminology or

language formalism introduced by [Schmidt-Schauß and TBox. For a concept name A, and (possibly complex) con-

E. Franconi, U. Sattler 13-6

I I

C; D ! A j A

= 

(concept name) >

I top

>j (top)

? = ;

?j bottom

I I

(bottom) I

:C  =  n C

C j not C 

: (complement)

I I I

C u D  = C \ D

u D j and C D :::

C (conjunction)

I I I

C t D  = C [ D

t D j or CD :::

C (disjunction)

I I I I

8R C j all RC

8R  = fi 2  j8j i; j  2 R  j 2 C g

. (univ. quantifier) . C .

I I I I

R C j some RC

9 . (exist. quantifier)

C  = fi 2  j9j i; j  2 R ^ j 2 C g 9R

 . .

f "j undefined f 

I I

(undefinedness) I

f " =  n dom f

: C in fC

f (selection)

I I I I

f : C  = fi 2 dom f j f i 2 C g

! P j P

R; S (role name)

1 I I I I

f j f

R  = fi; j  2    j j; i 2 R g

(feature) 

1

I I I I

j inverse R

R (inverse role)

Rj  = R \   C 

C

Rj j restrict RC

C

I I

(range restriction) I

R  S  = R  S

 S compose RS :::

R (role chain)

! p j p f; g (feature name)

Figure 8: The semantics of ALC F I .

 g compose f g ::: f (feature chain)

junctive queries over expressive semantic data models for

Figure 7: Syntax rules for the ALC F I Description Logic. information systems such as extended Entity Relationship

in the context of schema integration. We have chosen to :

limit the expressivity of the full DLR since we are looking

A = C cepts C; D, terminological axioms are of the form

for a language implementable with the current technology,

v C (concept definition), A (primitive concept definition),

but still capable to encode an interesting enhancement of

v D I C (general inclusion statement). An interpretation

the ER formalism. In particular, we have developed so-

v D C

satisfies C if and only if the interpretation of is I

I phisticated reasoning algorithms for it [Horrocks and Sat-

C D included in the interpretation of D , i.e., .Itis clear that the last kind of axiom is a generalisation of the tler, 1999] and experimented them using the current aca-

: demic implementations of expressive DLs, namely the sys-

= C first two: concept definitions of the type A —where tems FaCT [Horrocks, 1998] and iFaCT. It has been re- A is a concept name—can be reduced to the pair of ax-

cently demonstrated [Horrocks and Patel-Schneider, 1999]

A v C  C v A ioms  and . Another class of termino- that the logic we are considering here allows for the imple- logical axioms—pertaining to roles R; S —are of the form

mentation of sound and complete reasoning algorithms that

v S I R v S R . Again, an interpretation satisfies if and behave quite well both in realistic applications and system- only if the interpretation of R—which is now a set of pairs atic tests.

of individuals—is included in the interpretation of S , i.e.,

I I

S I

R . A non-empty interpretation is a model of a 

knowledge base  iff every terminological axiom of is 4 Encoding ER schemas with Aggregations  satisfied by I .If has a model, then it is satisfiable; thus, It is shown how a schema expressed in the conceptual data checking for KB satisfiability is deciding whether there is model informally introduced in the previous section can be at least one model for the knowledge base.  logically im-

expressed in an ALC F I knowledge base—whose models

 j= plies an axiom (written )if is satisfied by every

correspond with legal multidimensional database states of C model of . We say that a concept is subsumed by a

the ER diagram—allowing for reasoning services such as

  j= C v D

concept D in a knowledge base (written ) I

I satisfiability of a schema or the computation of a logically

D I  C if C for every model of . A concept is sat- implied statement.

isfiable, given a knowledge base , if there is at least one :

I In the following, we describe the translation between an

 C 6= ;  6j= C = ? model I of such that ,i.e. . Con-

cept subsumption can be reduced to concept satisfiability ER diagram and an ALC F I knowledge base.

D  C u:D  since C is subsumed by in if and only if is Definition 1 (Translation) unsatisfiable in .

An ER schema D is translated into a corresponding knowl-

edge base  where for each domain, entity, aggregation, or ALC F I was designed such that it is able to encode database schemas expressed in the most interesting Seman- relationship symbol a concept name is introduced, and for 1 tic Data Models and Object-Oriented Data Models [Hor- each attribute or ER-role symbol symbol a feature name rocks and Sattler, 1999; Calvanese et al., 1998b]. Recently, is introduced. The terminology  is defined to contain the strictly more expressive conceptual data models based on following axioms:

DLs have been considered, most notably the DLR concep- 1ER-roles are the names given to the arguments of relationships; we tual data modelling formalism. DLR was first introduced assume that a unique name is given within a relationship to each ER-role, by [Calvanese et al., 1998a] as a means for encoding con- representing a specific participation of an entity in the relationship.

E. Franconi, U. Sattler 13-7 E; F

 For each ISA link between two entities (resp. two solving reasoning problems in the ER model. The reason-

D 

relationships R; S )in , contains: ing problems we are mostly interested in are consistency of

v F R v S E (resp. ) a ER schema—which is mapped to a satisfiability problem

in the corresponding DL knowledge base—and logical im- E

 For each PARTITION of an entity into sub entities plication within a ER schema—which is mapped to a log-

F :::F D  n

1 in , contains: ical implication problem in the corresponding DL knowl-

E v F t :::t F n

1 edge base.

F v:F i 6= j j i for all

The proof is based by establishing the existence of two

F v E i

i for all

mappings from legal multidimensional database states of D



 D D

For each attribute A in with domain of an entity to models of and vice-versa. Informally speaking, the ex-

R  E (resp. of a relationship ), contains: istence of the mappings ensures that, whenever an aggrega-

tion is satisfiable in , then a non-empty mapping describ-

v A : D R v A : D E (resp. )

ing the corresponding f-table in D exists, and vice-versa.

R D n

 For each relationship in relating entities The same applies for level orderings and roll-up functions

R R

E :::E P  :::P

1 n

by means of the ER-roles , D

E E 1 n in . A more detailed sketch of the proof will be given in

contains: the full paper.

R R

R v P : E  u :::u P : E 

1 n

E E 1 n As a final remark, it should be noted that the high ex-

pressivity of DL constructs can capture an extended version

n =1  For each minimum cardinality constraint in an

R of the basic ER model, which includes not only taxonomic

D R ER-role P in relating a relationship with and

E relationships, but also arbitrary boolean constructs to repre-  entity E (total or mandatory participation), con-

tains: sent so called generalized hierarchies with disjoint unions;

1

R entity definitions by means of either necessary or sufficient

v9P  R E .

E conditions or both, and integrity constraints expressed by

D T :::T

 [ ] n For each aggregation Ag in with targets 1 , means of generalised axioms Calvanese et al., 1998b .

 contains:

9target > u8target T t ::: t T

v Let us now consider the example introduced in Sec- n Ag . . 1

tion 2. We start to (partially) formalise the schema of Fig-

 T n

For each aggregation Ag in D involving a target , ure 4, i.e., the base data. Please recall that every role name

D D

dimensions i (each one being a relationship in ) which appears in the translation of an ER schema in a De-

n L

and corresponding levels i (each one being either scription Logic knowledge base—with the exception of the

E D  i an entity i or a simple aggregation Ag in ), aggregation roles—is a functional role name.

contains:

v8target

v what : Call u when : Day Ag . DATE

D D

1 1 1

 j P 9P > u

D .

L T 1

SOURCE v what : Call u where : Point

1

D D

1 1

1 1

L 8P  j P  t t

D

v what : Call u where : Point . DEST

1

T 1 L

1

D D m

1

1 1 1

 j P 8P  u L

D .

1

1

L T

1

 u

Point v Consumer t Business

D D

n 1 n

> u 9P  j P

D

v Point u:Business

. Consumer

n

T L

n

D D

1 1

n n

8P  j P  t  t L

Business v Point u:Consumer

D .

n n

T L

n

D D

m 1

n n

n

L  8P  j P

D .

n

n

T L

n The partitioning of days into the seven day of the week is

j

L i = E

where i if the level is described by an

i translated in a similar way. E

entity i ; otherwise, if the level is described by a The aggregated entity Customer Type is the simple aggre-

j j

L = T

simple aggregation Ag i , we use its targets . i

i gation of telephone points into two categories:

Type v

Point-

9target >u

Extending the results of [Calvanese et al., 1994] to the  .

target Consumer t Business case of multidimensional databases, it can be proved that 8 . the translation is correct, in the sense that whenever a rea- soning problem has a specific solution in the ER model, The Week Day simple aggregation is obtained in a similar then the corresponding reasoning problem in the DL has a way. corresponding solution, and vice-versa. This is grounded The aggregated entity Ag-2 is defined as being an aggre-

on the fact that there is a precise correspondence between gation composed by those calls issued in some day of the  legal multidimensional databases of D and models of . week and originated by either a consumer telephone point Thus, it is possible to exploit DL reasoning procedures for or a business telephone point:

E. Franconi, U. Sattler 13-8 duration Dest 1,1 code 1,1 Day Date Call Point type 1,1 Source X

Consumer Business

XX

Mobile Call Source Cell Land Line Direct Line PABX

Figure 9: A Conceptual Data Warehouse Schema introducing the entity Mobile Call.

2 v 9target > u8target Call

Ag- . . structure of aggregated entities and multiply hierarchically

1

Ag 2 v8target 9what j where >u

- . SOURCE . organised dimensions. In order to support multiple hier-

1

8what j where Consumer t

SOURCE . archies, the data model provides means for defining and

1

8what j where Business u

SOURCE . structuring these hierarchies, and for arbitrary aggregation

1

>u 9what j when

DATE . along the hierarchies. Our future work will be devoted to a

1

8what j when Mon t

DATE . further development of the data model in order to explicitly

 t consider temporal and spatial dimensions, and a study of

1

8what j when Sun

DATE . the expressivity in relation with decidability and complex- ity of the refinement reasoning task. Recall that Ag-2 is the class of all aggregations such that . each one of them aggregates calls issued at the same day of the week and originated from the same telephone point. References Each aggregation of calls belonging to the class denoted [Agrawal et al., 1995] Agrawal, R.; Gupta, A.; and by Ag-2 includes either only consumer originated calls or Sarawagi, S. 1995. Modeling multidimensional only business originated calls. In a similar way, each aggre- databases. Technical report, IBM Almaden Research gation of Ag-2 includes either only calls issued on Mon- Center, San Jose, California. Proc. of ICDE’97. day, or only calls issued on Tuesday, etc. Thus, aggrega- tions denoted by Ag-2 may be of fourteen possible types: [Baader and Sattler, 1998] Baader, Franz and Sattler, Ul- Monday consumer, Monday business, Tuesday consumer, rike 1998. Description logics with concrete domains and Tuesday business, etc. aggregation. In Proceedings of the 13th European Con- As an example of reasoning, let us see a case with an ference on Artificial Intelligence (ECAI-98). 336–340. inconsistent aggregation. If we introduce the entity Mo- bile Call as in Figure 9, it turns out that the aggregated en- [Baader et al., 1999] Baader, Franz; Franconi, Enrico; and tity having Mobile Call as target (instead of its super entity Sattler, Ulrike 1999. MultidimensionalData Models and Call) and Business as level for the dimension Source is in- Aggregation. Springer-Verlag. chapter 4. Edited by M. consistent, i.e., the materialised cube is necessarily empty. Jarke, M. Lenzerini, Y. Vassilious and P. Vassiliadis. In fact, the translated theory in Description Logics turns out [Cabibbo and Torlone, 1997] Cabibbo, Luca and Torlone, to be unsatisfiable, since mobile calls are originated only Riccardo 1997. Querying multidimensional databases. from cell points, which are disjoint from any kind of busi- In proc. Sixth Int. Workshop on Database Programming ness phone point. Languages (DBPL-97). 253–269. 5 Conclusions [Cabibbo and Torlone, 1998] Cabibbo, Luca and Torlone, Riccardo 1998. A logical approach to multidimensional We have introduced a Data Warehouse Conceptual Data databases. In Proc. of EDBT’98. Model, extending the most interesting traditional Semantic Data Models and Object-Oriented Data Models, which al- [Calvanese et al., 1994] Calvanese, Diego; Lenzerini, lows the representation of a multidimensional conceptual Maurizio; and Nardi, Daniele 1994. A unified frame- view of data. We have seen how the proposed conceptual work for class-based representation formalisms. In data model is able to introduce complex descriptions of the Proc. of KR-94, Bonn D.

E. Franconi, U. Sattler 13-9 [Calvanese et al., 1998a] Calvanese, D.; De Giacomo, G.; [Horrocks and Sattler, 1999] Horrocks, Ian and Sattler, Lenzerini, M.; Nardi, D.; and Rosati, R. 1998a. De- Ulrike 1999. A description logic with transitive and in- scription logicframework for information integration. In verse roles and role hierarchies. Journal of Logic and Proceedings of the 6th International Conference on the Computation. To appear. Principles of Knowledge Representation and Reasoning [ ] (KR-98). Morgan Kaufmann. 2–13. Horrocks, 1998 Horrocks, I. 1998. Using an expressive description logic: FaCT or fiction? In Proc. of the 6 th [Calvanese et al., 1998b] Calvanese, D.; Lenzerini, M.; International Conference on Principles of Knowledge and Nardi, D. 1998b. Description logics for conceptual Representation and Reasoning, Trento, Italy. 636–647. . In Chomicki, Jan and Saake, G¨unter, ed- itors 1998b, Logics for Databases and Information Sys- [Nutt et al., 1998] Nutt, Werner; Sagiv, Yehoshua; and tems.Kluwer. Shurin, Sara 1998. Deciding equivalences among ag- gregate queries. In Proc. of PODS’98. 214–223. [Calvanese et al., 1998c] Calvanese, Diego; Giacomo, Giuseppe De; Lenzerini, Maurizio; Nardi, Daniele; and [Schmidt-Schauß and Smolka, 1991] Schmidt-Schauß, M. Rosati, Riccardo 1998c. Information integration: Con- and Smolka, G. 1991. Attributive concept descriptions ceptual modeling and reasoning support. In Proc. of with complements. Artificial Intelligence 48(1):1–26. the 6th Int. Conf. on Cooperative Information Systems [Vassiliadis, 1998] Vassiliadis, P. 1998. Modeling multi- (CoopIS’98). 280–291. dimensional databases, cubes and cube operations. In [Calvanese et al., 1999] Calvanese, Diego; De Giacomo, Proc. of the 10th SSDBM Conference, Capri, Italy. Giuseppe; Lenzerini, Maurizio; and Nardi, Daniele 1999. Reasoning in expressive description logics. In Robinson, Alan and Voronkov, Andrei, editors 1999, Handbook of Automated Reasoning. Elsevier Science Publishers, Amsterdam. To appear. [Catarci et al., 1995] Catarci, Tiziana; D’Angolini, Gio- vanna; and Lenzerini, Maurizio 1995. Conceptual lan- guage for statistical data modeling. Data & Knowledge Engineering (DKE) 17:93–125. [Cohen et al., 1999] Cohen, S.; Nutt, W.; and Serebrenik, A. 1999. Rewriting aggregate queries using views. In Proc. of PODS’99. To appear. [De Giacomo and Naggar, 1996] De Giacomo, G. and Naggar, P. 1996. Conceptual data model with structured objects for statistical databases. In Proceedings of the Eighth InternationalConference on StatisticalDatabase Management Systems (SSDBM’96). IEEE Computer So- ciety Press. 168–175. [Donini et al., 1996] Donini, F.; Lenzerini, M.; Nardi, D.; and Schaerf, A. 1996. Reasoning in description logics. In Brewka, G., editor 1996, Principles of Knowledge Representation and Reasoning. Studies in Logic, Lan- guage and Information, CLSI Publications. 193–238. [Franconi and Sattler, 1999] Franconi, Enrico and Sattler, Ulrike 1999. A data warehouse conceptual data model for multidimensional aggregation: a preliminary report. Journal of the Italian Association for Artificial Intelli- gence AI*IA Notizie 9–21. [Horrocks and Patel-Schneider, 1999] Horrocks, I. and Patel-Schneider, P. F. 1999. Optimising description logic subsumption. Journal of Logic and Computation. To appear.

E. Franconi, U. Sattler 13-10