YAM A Multidimensional Conceptual Mo del

PhD Thesis

Alb erto Ab ello

Advisors Dr Jose Samos and Dr Felix Saltor

Programa de Do ctorat de Software

Departament de Llenguatges i Sistemes Informatics

Universitat Politecnica de Catalunya

ToEnry

for her unlimited

supp ort and love

Foreword v

Foreword

I havenotalent for making new friends but oh such a genius for delityto

old ones

Peter Ibb etson

Lets saythiswork b egan several years ago when the Spanish armygavemeawholeyear of

vacations in the North of Africa Leaving aside making go o d friends like Jorge I had nothing

to do but reading b o oks playing chess and think ab out my future Paraphrasing G Polya I

thought I am not go o d enough for mathematics and I am to o go o d for the army Computer

science is in b etween

Thus I arrived to the Facultat dInformatica de Barcelona There I was lucky in enjoying

great classmates like Alex and Xavi something essential to get a degree We found some go o d

lecturers but I guess that who made me fall in love with design was Jaume Sistac

Fiveyears later as I nished my undergraduate studies I decided to try a do ctorate

Felix Saltor gave me the opp ortunity of joining his research group and the Generalitat de

Catalunya the grant FI which allowed me to write this thesis I was also included

in pro jects TIC TICC and TICC from the Spanish Research

Program PRONTIC So I b ecame a memb er of the Departament de Llenguatges i Sistemes

Informatics which resourced me with sp ecial mention for the valuable work of the secretaries

of the department and I was kindly welcome by the memb ers of the Seccio de Sistemes

dInformacio

Since then Ive b een sharing an oce with Elena for nearly four lovely years From time

to time we got the visit of Marta the other memb er of the research group always supp ortive

from Lleida What to say ab out them Just a pleasure to work together

Time arrived to nd an advisor and I got two instead of only one Felix taughtmehowto

do quality research and contributed his long exp erience I should name a couple of imp ortant

things I was not able to learn from him write a correct bibliography and drink go o d wine

instead of cokeorkalimotxo The other advisor was Jose Samos I should thank him lots of

things like b eing an inexhaustible fountain of optimism but over the others his almost innite

patience during our fruitful neverending discussions

The work with Josewas easier thanks to the Departamento de Lengua jes y Sistemas In

formaticos of the Universidad the Granada which oered me a place to work there As a side

vi Foreword

eect b eing there allowed me to meet wonderful p eople like Cecilia Eladio or Ventura who

made me feel at home during mynumerous stays in Granada

Arriving to the end I also thank Antoni Olive Ernest Teniente Juan Carlos Trujillo Pedro

Blesa MohandSaid Hacid A Min Tjoa and Panos Vassiliadis for revising this PhD thesis

and accepting b eing part of the jury I am also grateful to the anonymous reviewers of the

thesis and those anonymous referees of the dierent pap ers who sent useful constructive and

instructive comments

Two more things b efore I nish this words I should not forget friends here b ecause chat

b eer playing role games and cycling is also imp ortant to write a thesis And last but not least

an sp ecial acknowledgementforthewomen in my family mymum my dear aunt Angelines

and my grandmother They broughtmeup Alb erto

Contents vii

Contents

Intro duction

General concepts

Motivation and ob jectives

Main contributions

Organization of the thesis

Second chapter Multidimensional mo deling and the OO paradigm

Third chapter Multilevel schemas architecture

Fourth chapter Elements of a multidimensional mo del



Fifth chapter YAM Yet Another Multidimensional Mo del

Sixth chapter Conclusions

Appendixes

Typ ographic conventions

Multidimensional mo deling and the OO paradigm

Multidimensional mo deling

An analysis framework

Other frameworks

A classication and description framework

Classication and description of existing multidimensional mo dels

Research eorts at Conceptual level

Research eorts at Logical level

Research eorts at Physical level

Research eorts on Formalisms

Other work

Summary

Howmultidimensional analysis b enets from OO

ClassicationInstantiation

GeneralizationSp ecialization

AggregationDecomp osition

Behavioural CallerCalled

Derivability

Dynamicity

viii Contents

Conclusions

Multilevel schemas architecture

Extending a schemas architecture for Data warehousing

An example

The parts

The whole

Op erations on schemas

Drilling across semantically related Stars

Drillacross in the literature

Multistar conceptual schemas

Interstellar semantic relationships

Discussion

Conclusions

Elements of a multidimensional mo del

Analysis dimensions

The imp ortance of aggregation hierarchies

Semantic problems in presentmultidimensional mo deling

How to solvethem

Facts sub ject of analysis

Factual data in other mo dels

Multidimensional elements unleashed

Conclusions



YAM Yet Another Multidimensional Mo del

 

YAM is not JAM Just Another Multidimensional Mo del

Structures

Nodes

Arcs

Inherentintegrity constraints

Op erations

Metaclasses

Comparison with other multidimensional mo dels

Conclusions

Conclusions

Survey of results

Future work

Bibliography

Contents ix

A UML Prole for Multidimensional Mo deling

A Intro duction

A Summary of Prole

A Stereotyp es and Notation

A MultidimensionalSchema

A Star

A Fact

A Dimension

A Cell

A SummarizedCell

A FundamentalCell

A Level

A

A SummarizedMeasure

A FundamentalMeasure

A Descriptor

A Base

A Summarization

A Transitive

A NonTransitive

A SummaryParam

A CellRelation

A LevelRelation

A KindOfMeasure

A List

A Induction

A WellFormedness Rules

A Star

A Fact

A Dimension

A Cell

A SummarizedCell

A FundamentalCell

A Level

A Measure

A FundamentalMeasure

A Descriptor

A Base

A Summarization

A SummaryParam

A CellRelation

A LevelRelation

x Contents

A Induction



B Design examples with YAM

B Sales of pro ducts in a gro cery chain

B Kimballsschema

B Golfarellis version of Kimballs schema



B YAM schema

B Discussion

B Warehouse

B Original schema



B YAM schema

B Discussion

B Tickets in sup ermarkets

B Original schema



B YAM schema

B Discussion

B Clinical Data Warehousing

B Original schema



B YAM schema

B Discussion

B Vehicle repairs

B Original schema



B YAM schema

B Discussion

C List of publications

C Related to chapter

C Related to chapter

C Related to chapter

C Related to chapter

C Other publications

Glossary

List of Figures xi

List of Figures

Corp orate Information FactoryIIS

Multidimensional mo deling

Mo deling and implementation pro cess in OLAP vs OLTP environments

Example of multidimensional schema at Upp er detail level

Example of multidimensional schema at Intermediate and Lower detail levels

Database schemas at three levels

Example of GeneralizationSp ecialization

Example of AggregationDecomp osition

levels schemas architecture ROSC

Examples of Comp onentSchemas of CDB and CDB

 

Example of Federated Schema

Data Warehousing schemas architecture from the Federated Schema

Example of External Multidimensional Schema

Integrated architecture for FIS and DW

Example of multidimensional schema

Multistar diagram



ANSISPARC database schemas architecture BFJ

levels multidimensional schemas architecture

Integrated architecture for FIS DWandmultistar schemas

Example of containmentofDimensions

Example of Generalization between Dimensions

Example of correlated Dimensions

Example of Aggregation between Dimensions

Example of Generalization between Facts

Example of Association Derivation between Fact and Dimension

Example of normalization of analysis dimensions

Typesofwholes

Classical Extensional Mereology axioms

Example of analysis dimension

Example of overlapping wholes

xii List of Figures

Allowed cardinalities b etween Levels

Example of dimension sp ecialization

Example of dimension aggregation

Measures group ed into cells corresp onding to facts

P C b eing C fA B C D g

A A

S

P C beingC fA C g and C fB Dg

i S P

ifSP g

Example of Dimension

Graph of Cellsina Fact with two Dimensions

Sp ecialization of a Fact based on a Cell

Sp ecialization of a Fact by region

Diagram of a Cell with three indep endent analysis dimensions

Reduction of a dimensional Cub e to a dimensional Cub e

Example of Dimension

Graph of Cellsina Fact with two Dimensions

UML Relationships b etween mo del elements



Example of YAM schema at Upp er detail level



Example of YAM schema at Intermediate detail level



Example of YAM schema at Lower detail level

Example of sharing of parts b etween several instances

Multidimensional op erations as comp osition of functions



YAM metaclasses in UML notation as in OMGb



Extension of UML with YAM stereotypes

B Schema of the gro cery chain case study Kim

B Schema of the gro cery chain case study GMRa



B Upp er level schema of the gro cery chain case study mo deled with YAM



B Intermediate level schema of the gro cery chain case study mo deled with YAM



B Lower level schema of the gro cery chain case study mo deled with YAM

B Schema of the warehouse snapshot case study Kim

B Schema of the warehouse delivery status case study Kim

B Schema of the warehouse transaction case study Kim



B Upp er level schema of the warehouse case study mo deled with YAM



B Intermediate level schema of the warehouse case study mo deled with YAM



B Lower level schema of the warehouse case study mo deled with YAM

B Schema of the tickets case study mo deled with GOLD

B User requirements for the case study mo deled with GOLD graphical notation



B Upp er level schema of the tickets case study mo deled with YAM



B Intermediate level schema of the tickets case study mo deled with YAM



B Lower level schema of the tickets case study mo deled with YAM

B Patient diagnosis case study Ped

B Schema of the clinical case study Ped

List of Figures xiii



B Upp er level schema of the clinical case study mo deled with YAM



B Intermediate level schema of the clinical case study mo deled with YAM



B Lower level schema of the clinical case study mo deled with YAM

B Schema of the repairs case study SBHD



B Upp er level schema of the repairs case study mo deled with YAM



B Intermediate level schema of the repairs case study mo deled with YAM



B Lower level schema of the repairs case study mo deled with YAM

xiv List of Figures

List of Tables xv

List of Tables

Schema constructs in the dierent mo dels at Conceptual level

Schema constructs in the dierent mo dels at Logical level

Schema constructs in the dierent mo dels at Physical level

Schema constructs in the dierent Formalisms

Summary table of the dierentmultidimensionalmodels

Summary table of relationships b etween FactsandDimensions

Summary table of the dierent elements in a multidimensional mo del

Relationships b etween elements at Upp er detail level

Relationships b etween elements at Intermediate detail level

Relationships b etween elements at Lower detail level



YAM operations



Comparison b etween YAM and other multidimensional mo dels

xvi List of Tables

Introduction

Chapter

Intro duction

Where shall I b egin please your Ma jesty he asked

Begin at the b eginning the King said very gravely and go on till you come to

the end then stop

Lewis Carroll Alices Adventures in Wonderland

In this rst chapter general Data Warehousing and OnLine Analytical Pro cessing

OLAP concepts are dened Afterwards motivation and ob jectives of this thesis are estab

lished In next section its main contributions are briey explained The chapter nishes with

the organization of the rest of the thesis containing a summary of the other chapters

General concepts

As it was dened by William Inmon in Inm a DW is a sub jectoriented

integrated nonvolatile and time variant collection of data in supp ort of managements deci

sions Other authors like Gar prefer to talk ab out Data Warehousing and dene it as a

pro cess not a pro duct for assembling and managing data from various sources for the purp ose

of gaining a single detailed view of part or all of a business Whether collection of data or pro

cess the p ointisthatwe are dealing with a huge amount of data aimed for analysis tasks which

presents challenges in its construction management and usage see Wid and WB for

two surveys of research issues in this eld JLVV contains a wide overview of the area and

Vasb compiles and classies the pap ers published in three of the most signicant database

conferences ie PODS SIGMOD and VLDB from to related to the sub ject

Figure shows the Corp orate Information Factory CIF architecture presented in

I IS We can see that raw detailed data enters from the left side into the op erational ap

plications These applications represent transactional systems that deal with daybyday data They could also get pro cessed information from some external sources if needed

Figure Corp orate Information Factory I IS

All data in the op erational applications is time stamp ed transformed cleansed integrated

and nally deployed into either the DW or the Op erational Data Store ODS An ODS is an

architectural construct that is sub jectoriented integrated volatile currentvalued and contains

only corp orate detailed data as dened in I IB It is used to supp ort the uptothesecond

collective tactical decisionmaking pro cess for the enterprise and can contain data not coming

from the op erational systems The ODS can b e used as an intermediate step for the load of the

DW

Based on the analysis requirements of a department or set of users Data Marts DM are

built As dened in I IS a DM contains customized summarized data from the DW tailored

to supp ort the sp ecic analytical requirements of a given business unit

The interactive querying of the DMs is known as OnLine Analytical Pro cessing OLAP

OLAP pro ducts sp ecially conceived for departmental analysis were presented for the rst time

in CCS where we can also nd twelveevaluation rules for them The rst one of Co dds

evaluation rules expresses the main characteristic of OLAP namely multidimensionality This

characteristic is also outlined in Pen which denes OLAP to ols as FASMI Fast Analysis

of Shared Multidimensional Information The OLAP Council in OLA gives the following

denition

OLAP is a category of software technology that enables analysts managers and

executives to gain insightinto data through fast consistent interactive access to a

Introduction

wide variety of p ossible views of information that has b een transformed from raw

data to reect the real dimensionality of the enterprise as understo o d by the user

OLAP functionalityischaracterized by dynamic multidimensional analysis of con

solidated enterprise data supp orting end user analytical and navigational activities

OLAP to ols represent data as if these were placed in an ndimensional space allowing their

study in terms of facts sub ject of analysis and dimensions showing the dierent p oints of view

according to which the sub ject can b e analyzed This conception gives rise to data schemas

with star shap e ie a sub ject of analysis in the middle and its analysis dimensions around it

OLAP concepts are not completely new As it was shown in Sho most of them were

already used in statistical Nevertheless in the last years the area has got the

attention of the industry as well as the research community giving rise to imp ortantadvances

DSHB surveys the OLAP market while CDgives an overview of DW and OLAP all

together

Motivation and ob jectives

In the last years lots of work have b een devoted to multidimensional mo deling and several

mo dels have b een prop osed However there is neither a well accepted mo del nor a standard

terminologyyet

Some of the existing mo dels formalize multidimensional concepts in one way or another

and present calculus andor algebras to op erate on ndimensional data cub es Other mo dels

showhowmultidimensional data could b e stored in either Relational OO or pure Multidi

mensional DBMSs Thus out of all this work already done few authors paid sp ecial attention

to conceptual multidimensional mo deling What is more only a couple of them studied the

applicability of the OO paradigm to this eld TPGS fo cuses on software engeneering for

OLAP to ols rather than true data mo deling while BTW actually only uses OO concepts

namely Unied Mo deling Language UML in the denition of metaclasses

Multidimensional concepts and relationships are really useful for analysis tasks Neverthe

less this should not imply that other data mo deling concepts should b e ignored In the last

years the OO paradigm proved to b e close to the human way of thinking an essential char

acteristic for a conceptual data mo del Therefore an ob jective of this thesis is to study the

applicability of OO concepts to multidimensional conceptual mo deling

Almost all existing multidimensional mo dels are limited to mo del isolated stars ie isolated

sub jects of analysis At most some of them allow to share analysis dimensions b etween

dierent star schemas However it is easy to nd semantic relationships like Generalization

Association etc that relate concepts in twosuchschemas To utilize data usedobtained on

analyzing a sub ject during the analysis of another one would b e a p owerful to ol in analysis

tasks An architecture based on dierentlevels of schemas needs to b e studied to allow that

The Data Warehouse Architecture presented in KRRT is semantically to o p o or It just

allows to share analysis dimensions

Moreover b esides the lack of semantic relationships there is no agreement on the denition

and prop erties of multidimensional concepts All mo dels merely imp ose the prop erties and

structure of aggregation hierarchies in the analysis dimensions and nob o dy discussed nor proved

them

Another imp ortant issue in multidimensional mo deling is that of aggregability or summariz

ability The data schemas should showhowdataofagiven granularity can giverisetodataof

coarser granularity LS restricts summarizability problems to aggregating along the temp o

ral dimension However other analysis dimensions can also b e problematic and it is imp ortant

to reect this kind of problems in the schema in order to warn analysts

Main contributions

SSSB prop osed to consider the DW as part of the database architecture It studied dif

ferent Data Warehousing architectures and presented an integrated database architecture of

schemas for Federated Information Systems FIS and Data Warehousing It is well known

see SCG that OO data mo dels are well suited to b e used as canonical mo dels for FIS

Therefore from the inclusion of Data Warehousing schemas in the FIS it follows that OO

mo dels also have p ositivecharacteristics for Data Warehousing

Based on that previous work the main contributions of this thesis are

The work in SSSB has b een extended by studying an architecture based on dierent

levels of schemas that facilitates the construction of the dierent CIF comp onents The

characteristics of every level of schemas have b een stated

It is quite common in analysis tasks that information used or obtained from the study of

agiven sub ject is valuable for the analysis of another sub ject However existing mo dels

do not pay enough attention to this and only allow to represent isolated star schemas

This thesis illustrates and exemplies the usage of multistar schemas A variation of the

threelevels ANSISPARCarchitecture is presented to facilitate it

In the last years several multidimensional mo dels app eared Each of those mo dels uses a

dierent nomenclature and was conceived for a dierent purp ose so that their comparison

b ecomes really dicult A framework for their classication and comparison has b een

dened Multidimensional mo dels are classied into Conceptual Logical Physical

and Formalism Moreover their elements are characterized within three detail levels

namely Upp er Intermediate and Lower

The imp ortance of aggregation hierarchies is recognized by almost all authors Thus most

multidimensional mo dels provide mechanisms to dene them Nevertheless none of the

authors proved nor justied the characteristics of those hierarchies In this thesis from

the assumption that those hierarchies are dened by partwhole relationships mereology

axioms have b een used to demonstrate some of their prop erties

Based on the structure of aggregation hierarchies and data dep endencies the structure of the facts sub ject of analysis has also b een studied

Introduction

UML is b ecoming a standard language for conceptual mo deling Thus its metaclasses

have b een extended in this thesis to encompass multidimensional concepts This has given



rise to YAM Yet Another Multidimensional Mo del

The usage of dierent OO relationships ie in UML terminology Generalization Associ

ation Aggregation Derivation and Flow has b een studied for multidimensional schemas



A closed and complete algebra of op erations on data cub es has b een dened for YAM



Integrity constrains have b een dened for YAM They fo cus on identication and ag



gregability of data YAM provides a exible set of mechanisms to show summarizability

of the dierent kinds of user measures

Organization of the thesis

This thesis has b een organized into six chapters including this one and three app endixes

Chapters from twotove contain the contributions of the thesis A brief overview of each

chapter and app endixes is shown b elow

Second chapter Multidimensional mo deling and the OO para

digm

This chapter b egins explaining some basic multidimensional concepts that will b e needed to

understand the rest of the thesis The duality factdimension is intro duced b esides the notion

of data cub e and the well known multidimensional op erations over data cub es Then an

original analysis framework for the classication and comparison of multidimensional mo dels

is intro duced so that related work can b e clearly presented and compared Most existing

multidimensional data mo dels are describ ed here with regard to the analysis framework

The last part of the chapter intro duces the notion of OO dimension as explained in Sal

in order to b e used as a basis for the presentation of some basic ideas of the thesis The usage

of the GeneralizationSp ecialization AggregationDecomp osition InstantiationClassication

DerivabilityorPoint of view Dynamicity and Behavioural OO dimensions in multidimensional

mo deling is briey explained by examples

Third chapter Multilevel schemas architecture

Out of the four characteristics of a DW dened by W Inmon we can see that one of them

ie integrated is also present in a FIS This chapter presents how the levels schemas

architecture for FIS of SL extended to levels in ROSC has b een mo died to include

Data Warehousing schemas by continuing the work done in SSSB Data Warehouse

Schemas Op erational Data Store Schemas and Schemas are placed in the

architecture The characteristics of these new schemas are analyzed The design of the DWis

presented as datadriven versus the querydriven design of the DMs star shap e schemas

Most multidimensional mo dels are restricted to isolated stars Howeverasemantically rich



set of abstractions in the data mo del like those in YAM and an appropriate architecture of

schemas can facilitate the mo delization of related stars The second part of the chapter pays at

tention to this issue fo cusing on the Data Mart Schemas in the architecture Useful semantic

relationships that can b e used to relate dierent stars are shown and their usability to drill

across dierent data cub es is discussed Sp ecically Generalization Association Derivation

and Flow relationships in UML sense are studied Moreover three schema levels based on

the ANSISPARC architecture are dened to facilitate the managementofthesemorecomplex

schemas

Fourth chapter Elements of a multidimensional mo del

Multidimensionality is marked by the duality factdimension That is factual and dimensional

data drive the mo deling implementation and usage of OLAP to ols In this chapter these kinds

of data are analyzed separately

Firstly Dimensions are studied In the literature we can nd dierent denitions and

conceptions of relationships b etween aggregation levels This section contends that they are

partwhole relationships Thus mereology axioms can b e used on the study of Dimensions

From a simple denition and those axioms some prop erties of Dimensions are proved ad

dressing several problems or controversial p oints detected in existing multidimensional mo dels

Moreover the consequences of Generalizationand Aggregation relationships b etween Dimen

sions are also studied

The second half of the chapter studies Facts Their comp onents are dened and their

structure is analyzed with regard to that of the Dimensions and data dep endencies A Cub e

is dened as a function from the cartesian pro duct of Levels in orthogonal Dimensions to the

domain of a Fact A new op eration ie ChangeBase is presented to allow the mo dication

of the ndimensional space were data cells are placed The p ossibilityofhaving Generalization

Association and Derivation relationships b etween Facts is studied



Fifth chapter YAM Yet Another Multidimensional Mo del

This chapter presents a multidimensional conceptual OO mo del its structures integrity con

straints and query op erations It has b een develop ed as an extension of UML core metaclasses

to facilitate its usage as well as to avoid the intro duction of already existing general concepts



YAM allows the representation of several semantically related stars as well as summariz

ability and identication constraints

The rst section outlines the main dierences b etween this and other mo dels ie usability

Semantic Power Semantic Relativism and the p ossibility of expressing summarizability

and identication constraints Then data structures of the mo del are dened and exemplied

in terms of no des and arcs of a graph The applicability of all UML relationships is system

atically studied In next section the inherentintegrity constraints of the mo del are presented

Another section is devoted to multidimensional op erations over Cub es Finally to summarize



the mo del its metaclasses are presented Each YAM metaclass have b een dened as a sub

Introduction

class of a UML metaclass The chapter concludes with a comparison of multidimensional data

mo dels

Sixth chapter Conclusions

The last chapter of the thesis contains some conclusions and future work

App endixes

There are three app endixes to this thesis The rst one contains the formal extension of UML



ie a Prole with all YAM mo deling elements as Stereotype s and the corresp onding integrity

constraints in OCL Another app endix shows several multidimensional design examples of the

 

usage of YAM Someschemas in other mo dels are translated to YAM and some original

design cases are also presented Finally a list of pap ers published as the result of this thesis

work sorted bychapter is included

Typ ographic conventions

Several typ ographic conventions have b een taken to improve the readability of the do cument

Bold Face is used for terms dened along this thesis For instance the term Dimension

in spite of b eing used by other authors has b een carefully studied and dened in pages

and so that it is used in exactly that sense

Quotation indicates terms dened by other authors If the term is considered well

known it is only quoted the rst time it app ears

Times Font marks words and concepts in the gures or examples

Italics is used for UML terms

Moreover UML notation as dened in OMGb has b een used in the gures

Multidimensional modeling and the OO paradigm

Chapter

Multidimensional mo deling and

the OO paradigm

To b e is to b e related

CJ Keyser

The words OnLine Analytical Pro cessing bring together a set of to ols that use multidi

mensional mo deling in the management of information to improve the decision making pro cess

Lately a lot of work has b een devoted to mo deling the multidimensional space Thus the next

sections relate this thesis to other work

Firstly section intro duces main multidimensional concepts like analysis dimension

facts star etc Then section presents an original framework to classify and describ e

multidimensional mo dels They are divided based on the design phase for which they seem

more appropriate ie Conceptual Logicaland Physical or if not used on designing ie

Formalism Moreover this section also explains how the elements of each mo del can b e placed

at three dierent detail levels ie Upp er Intermediateand Lower so that they can b e

easily compared These detail levels refer to the containmentofmultidimensional elements into

one another for instance an analysis dimension is comp osed by dierent aggregation levels

Section corresp onds to the state of the art of the thesis Existing multidimensional

mo dels are classied and describ ed there with regard to the ab ovementioned framework

Finally section outlines the advantages of using an OO mo del in multidimensional de

sign It is argued that multidimensional mo deling is lacking in semantics which can b e obtained

by using the OO paradigm Some b enets that could b e obtained by doing this are classied in

six OODimensions ie ClassicationInstantiation GeneralizationSp ecialization Aggrega

tionDecomp osition Behavioural Derivability and Dynamicity and exemplied with sp ecic cases

Multidimensional mo deling

Along its years of existence SQL proved to b e really useful and well accepted in OnLine

Transactional Pro cessing environments However as time wentby due to the wide spread

of computers databases arrived to analysis tasks in the form of Data Warehousing systems

In this kind of environments b ecause of the huge amount of data complexity of queries and

unskillfulness of users SQL has proved not to b e the b est solution

To bring data near analysts Data Marts DMs app eared They are small Data Ware

houses devoted to satisfy the needs of a reduced set of users They are customized to obtain

go o d query p erformance most of times by means of a querydriven design Closely related

to DMs are OnLine Analytical Pro cessing OLAP to ols By means of multidimensionality

this kind of to ols allow nonexp ert users to formulate their own queries and obtain the results

interactively without the assistance of the IT department

Product

Time Place

Sales

Product Place

Time

a b Cub e metaphor

Figure Multidimensional mo deling

Multidimensionality is based on the duality factdimensions ie facts are analyzed with

regard to data in the dimensions A fact represents a sub ject of analysis while its analysis

dimensions show the dierent p oints of view we can use to study it This gives rise to schemas

with star shap e like that one depicted in gure a having the abstraction representing

the facts in the middle and the analysis dimensions around it The fact in a multidimensional

schema represents the set of measurements to b e analyzed mainly numeric attributes On

the other hand the analysis dimensions mainly contain descriptive attributes that describ e the

p oints in the space

Frequently the Data Cub e metaphor depicted in gure b is used to explain mul

tidimensionality Each cell in the cub e represents a unit of data for instance in the example

ab ove Sales as the intersection of a Product Placeand Time By dening a single p osition

in every dimension of the analysis space we select exactly one of those cells In general since

Multidimensional modeling and the OO paradigm

we could haveseveral more than three analysis dimensions the Cub e should actually b e

called Hyp ercub e from here on the term Cub e will b e misused

Benets of multidimensional mo deling are two fold On the one hand it makes the data

schemas more understandable to nal users and on the other hand it allows to use sp ecic

storage and access techniques that improve query p erformance The way to obtain these b enets

is by simplifying the data schemas so that they only contain the essential things ie a fact to

b e analyzed and its analysis dimensions These schemas are close to the analysts conception

of data and suggest a sp ecic kind of queries so that the system can b e easyly customized to

solve them with go o d resp onse times

Sp ecic op erations have also b een dened in the multidimensional world However there is

no agreement on a standard set of such op erations Often the pro cess of navigating through

multidimensional data is called Slice and Dice Just to cite here those navigation op erations

dened in OLA

ConsolidateAggregateRollup Multidimensional databases generally have hierarchies or

formulabased relationships of data within each dimension Consolidation involves com

puting all of these data relationships for one or more dimensions While such relationships

are normally summations anytyp e of computational relationship or formula mightbe

dened

Drilldown It is a sp ecic analytical technique whereby the user navigates among levels of

data ranging from the most summarized up to the most detailed down The drilling

paths may b e dened by the hierarchies within dimensions or other relationships that

may b e dynamic within or b etween dimensions

RotatePivot This op eration changes the dimensional orientation of a rep ort or page display

For example rotating may consist of swapping the rows and columns or moving one of

the row dimensions into the column dimension or swapping an ospreadsheet dimension

with one of the dimensions in the page display either to b ecome one of the new rows or

columns etc

Selection A selection is a pro cess whereby a criterion is evaluated against the data or members

of a dimension in order to restrict the set of data retrieved

Another generic denition of op erations over data cub es can b e seen in Gio Slice

reduces the dimensionality of a cub e Dice selects a set of data Rollup aggregates data

along the hierarchy in an analysis dimension Drilldown gives more detail in a dimension

by descending along its aggregation hierarchy and Drillacross travels from a data cub e to

another one SQL syntax was also extended to supp ort some multidimensional op erations as

can b e seen in ISO

An essential characteristic of multidimensional analysis is the study of data summarized at

dierent granularities Thus out of these op erations it is essential to remark the imp ortance of

Rollup and Drilldown They imply moving up and down aggregation hierarchies which

dene the aggregation levels of interest for every analysis dimension

An analysis framework

This section presents the original analysis framework that will b e used in section to classify

the huge amount of eorts in the area devoted to mo deling the data cub e The mo dels will b e

divided into four groups based on the design phase for which they are more suitable Moreover

dierent detail levels are also dened to b e able to compare their mo deling constructs Firstly

section briey reviews previous work on classifying and describing multidimensional mo d

els Then section dene the framework that will allow to describ e and classify the dierent

mo dels

Other frameworks

In BSHD a list of requirements for a multidimensional mo del in order to b e suitable for

OLAP applications is used to analyze seven mo dels which are chosen b ecause they contain

some kind of formalism Among those seven wendAGS GL CTa Vas and

Leh Those requirements derived from general design principles and from characteristics

of OLAP applications are the following

Explicit separation of cub e structure and its contents

Complex dimensions

Level structure

Memb er ie level instance structure

Formalism mathematical construct for level structure

Dimension attributes those not dening hierarchies

Symmetry of measures and dimension memb ers

Complex measures

Supp ort of structured measures

Supp ort of derived measures

Additivity of measures

Query formalism

Typ e of formalism ie algebra or calculus

Adho c hierarchies

User dened aggregates

PJ and Ped present eleven requirements found in clinical data warehousing for

multidimensional data mo dels and evaluates twelve preexisting data mo dels against them

Those presented in AGS Dyr Kim GL CTa Leh and Vas are among

those twelve An statistical mo del and a commercial system are also included Moreover it

presents a data mo del which do es address all those requirements The requirements are

Multidimensional modeling and the OO paradigm

Explicit hierarchies in dimensions

Symmetric treatment of dimensions and measures

Multiple hierarchies in each dimension dierent aggregation paths

Supp ort for aggregation semantics applicability of aggregation functions

Nonstrict hierarchies overlapping classications

Nononto hierarchies nonbalanced trees of instances

Noncovering hierarchies

Manytomany relationships b etween facts and dimensions

Handling change and time

Handling dierent levels of granularity

Handling uncertainty

VS and Vasa giveyet another classication of multidimensional mo dels In this case

the discussion is said to b e placed at logical level Among others it pays attention to GL

AGS CTa Leh some industrial standards and a couple of statistical mo dels The

requirements studied in this case are

Representation of the multidimensional space

Cub esTables

ExplicitImplicit hierarchies

Language issues

Character of the query language Pro ceduralDeclarativeVisual

Supp ort of sequences of op erations

Naturality of OLAP op erations mo deled

Mappings oered to

Relations

Multidimensional arrays

The dierence b etween these sets of comparison criteria and the framework prop osed in this

section is that the former aim to discover weaknesses in the existing mo dels while the latter

treats to facilitate the comparison of the dierentwork and terminology Each one of the three

pap ers b egins by dening a list of sp ecic requirements for a multidimensional mo del in order

to evaluate all those mo dels already existing In this section there is not suchalistEach one

of the multidimensional mo dels compiled in next section uses its own terminology and denes

a sp ecic set of design elements In this sense dierent detail levels are used to classify the

constructs of the mo dels in order to b e able to compare them and examine the expressive

power of every mo del

A classication and description framework

This section intro duces two sets of classication and description levels for multidimensional

mo dels Both sets of levels are orthogonal Thus a mo del can b e classied as either Con

ceptual Logical Physicalor Formalism and contain constructs at any of the three detail

levels ie Upp er Intermediateor Lower

Design levels

As dened in EN a data mo del is a set of concepts that can b e used to describ e the

structure of a database In the same b o ok we also nd a categorization of data mo dels

into Highlevel or Conceptual if they provide concepts that are close to the way users

p erceive data Lowlevel or Physical if they provide concepts that describ e the details of

how data is stored in the computer and Implementation if they provide concepts that can

b e understo o d by end users but that are not to o far removed from the way data is organized

within the computer

Also BCN describ es those three groups of mo dels Adopting its terminology from here

on three dierent kinds of multidimensional data mo dels are distinguished based on the con

structsconcepts they provide and the Data Mart design phase they help Those at Concep

tual level that are close to the user and indep endent of the implementation those at Logical

level dep ending on the kind of Database Management System DBMS used in the implemen

tation but still understandable by end users and nally those at Physical level dep ending on

the sp ecic DBMS used and conceived to describ e how data is actually stored

OLTP OLAP ODL O-O DBMS MD DBMS

MOLAP Ideas Relations RDBMS Ideas Classes O-O DBMS MDDM O3LAP

ROLAP

E/R Relations RDBMS

Figure Mo deling and implementation pro cess in OLAP vs OLTP environments

As shown in gure left from UW in an OnLine Transactional Pro cessing OLTP

environment during the rst design step at Conceptual level wewould use Ob ject Deni

tion Language ODL or EntityRelationship ER to represent user ideas in the next step at

Logical level wewould usually use the Relational mo del but we could also use Hierarchical

or Network mo dels not depicted in the gure and in the last step at Physical level the

implementationwould dep end on a sp ecic DBMS ie Oracle Informix Ob jectStore etc In

a similar way in the prop osal of this thesis for an OLAP environment in gure right we

would have the Multidimensional Data Mo del MDDM at Conceptual level and dep ending

on the approach ie Relational ROLAP Ob jectOriented OLAP or pure Multidimen

sional MOLAP wewould use a dierent mo del at Logical level and a dierent DBMS for

the implementation

Multidimensional modeling and the OO paradigm

Besides these three mentioned ab ove there is another set of mo dels which will b e referred

along this thesis as Formalisms whose concepts would not b e used at any database design

phase but on giving a theoretical framework Their stress is on formalizing multidimension

ality rather than on database mo deling They include an algebra or calculus In an OLTP

environment a formalism would b e the Relational Algebra

These four design levels are not ad ho c they are based on the well known design phases of

OLTP systems Thus they can b e used to classify any kind of data mo del A fourth group of

mo dels has b een added to cluster those data mo dels that do not seem well suited for design

but emphasize the formalization of the domain

Detail levels

In a multidimensional mo del several detail levels can b e distinguished Thus wecanseea

schema with coarser or more detailed elements It is similar to show the attributes metho ds

and constraints for every class in a schema or just show the name of the classes By lo oking

to the names of the classes we get an idea of the mo deled reality but if we do not lo ok to the

more detailed information we cannot completely understand the data Three dierent detail

levels can b e found in multidimensional mo deling

Upp er At this level we nd Dimensionsand Facts The Dimensions are used to charac

terize the Facts and show the viewp oints the Facts will b e analyzed from By relating

asetofDimensionstoa Factwe obtain a star shap e schema The p ossibilityofnavi

gating from one of such star shap e schemas to another one uses to b e shown by the share

of Dimensions

Intermediate Dimensions and Facts are decomp osed into Levels and Cells resp ectively

The dierent LevelsinaDimension form an aggregation hierarchyEach Cell contains

data at a given Level for each Dimension its Fact is related to

Lower The most detailed level shows the attributes of the LevelsandCells That is De

scriptors and Measures resp ectively

D Receiver Dimension D Fact F

D F D Waste Transport Time

D

Producer

Figure Example of multidimensional schema at Upp er detail level

Figure represents a multidimensional schema at Upp er detail level If we are talking

ab out a waste transp ort business we could b e interested in analyzing Transport involving the

transp orted Waste the Time the transp ort takes place the Producer it is transp orted from

and the Receiver it is transp orted to Therefore wewould have a dimensional space where

eachpoint represents transp ort data and is identied byawaste a p oint in time a waste

pro ducer and a waste receiver

L Receiver Area

L Plant Manager D

Price Wastes M Volume M L Week Time Shipment C L L L L Families Kind_of_waste Day Year C L Admission Month Danger D Transport Fare M Level L L Cell C Client Descriptor D Measure M L Association

City Producer Aggregation

Figure Example of multidimensional schema at Intermediate and Lower detail levels

The same multidimensional schema is depicted in gure in more detail Each one of

the Dimensions is further describ ed by a hierarchy of dierent Levels For instance Time

Dimension contains Day Week Monthand Year Levels Moreoverinasimilarwaywe could

decomp ose the FactIfwewere interested in the b enets of a transp ort wewould need to

analyze data that would b elong to dierent kinds of data cells price of the shipment that we

charge to our client minus admission fare that a pro cessing plantcharges to us On one hand

we can see a Shipment Cell containing data ab out our shipments which dep ends on the lower

Level of each one of the four Dimensions On the other hand data ab out admission of waste

in a plant do not dep end on our clients nor on Day Level of Time Dimension it dep ends on

Month Level Therefore Admission and Shipment are dierent kinds of Cells but b elong to

the same Fact wewant to analyze

Finallydrawn with dotted lines we can see constructs at Lower detail level Some Levels

have Descriptors asso ciated for instance a Plant has a Manager Besides Cellshave

asso ciated Measures for instance an Admission has a Fare that wewant to analyze

Classication and description of existing multidimen

sional mo dels

This section contains the state of the art of the research in the eld of multidimensional mo d

eling The analysis framework presented in the previous section is used here to classify and

describ e existing multidimensional mo dels

Mo dels are group ed into four dierent sets based on the multidimensional database design

phase they are conceived for Some of the publications considered in those three classications

Multidimensional modeling and the OO paradigm

in section are not included in this one b ecause they do not t at any of those sets ie

pap ers ab out statistical mo dels which relevantcontributions are incorp orated into more recent

multidimensional ones or pap ers whose main sub ject despite b eing devoted to multidimen

sionality is not really multidimensional mo deling The dierentelements and relationships

between them that are provided at each one of the detail levels are studied for each and every

mo del

Sections and contain the four groups of mo dels Moreover a section

on miscellaneous issues has b een added Each one of these four subsections contains prop osals

at a given design level ie Conceptual Logical Physical and Formalisms resp ectively

Inside the subsections mo dels are chronologically ordered byyear Section contains

contributions to multidimensional mo deling that were not classied at any of the previous

ones

At the b eginning of each one of those subsections there is a table showing the constructs of

each mo del at the corresp onding level with regard to the description framework As p ointed out

by BSHD some multidimensional mo dels do not separate cub e structure and contents Only

those concepts represented at the schema level are considered relationships among instances

are not taken into account

p

A tick means something is captured by the mo del while a hyphen means that

the authors of the mo del either say something not to b e mo deled or just do not sayhowto

mo del it A hyphen in the column corresp onding to

Measures M means that nothing can b e represented in the schema ab out Measures

Mayb e only pure numerical values are considered without any meaningful domain

Descriptors d means dimensional entities do not have attributes describing their instances

All the information is kept in the form of classication hierarchies at the most

Relationships Lower detail level means there is not anyway in the mo del to represent

relationships among Measures andor Descriptors

Levels L means there are not explicit aggregation levels in the Dimensions

Cells C means that either the Measures are not group ed or they are not related to a

sp ecic set of Levels but to Dimensions as a whole usually reected as relating the

Measures to the lowest level in the classication hierarchy

Relationships Intermediate detail level means there is not anyway in the mo del to

represent relationships among Cells andor Levels For instance it implies that there is

not the p ossibility of explicit dimension hierarchies

Facts F means that the dierent Cells can not b e group ed with the intention to relate

Measures that even though are dened at dierent granularities are used together in a

given decision making pro cess

Dimension D means that either there is only the p ossibilityofmodelingone Levelorif

it is p ossible to mo del more than one they cannot b e group ed into another construct of the mo del

Relationships Upp er detail level means there is not anyway in the mo del to represent

relationships among Facts andor Dimensions

Research eorts at Conceptual level

This section collects those mo dels that contain concepts which are closer to the user than

to the actual computer implementation ie those in Leh CTa GMRb TP

SBHD SCdMM TBC BTW and HLV These eorts try to representhow

users p erceiveamultidimensional cub e without paying sp ecial attention to formalisms

Upp er detail level Intermediate detail level Lower detail level

Author F D Relationships C L Relationships M d Relationships

p p p

Linear hierarchyofLs

Lehner ds asso ciated to instances of L

Ls cube

p p p Ls form a rollup hierarchy p p d L

F and Ds cub e

Cabibb o Torlone

n

F is a set of Ms partial order f L M

Ds and C cub e

p p p p p

toone b etween d and L

Aggregation hierarchyofLs

Golfarelli et al

Aggregabilitybetween M and D

Compatibilitybetween Cs

p p p p p

Partwhole b etween

Trujillo et al Classification hierarchyofLs Aggregabilitybetween M and D

F and Ds

p p p p p p

rollsup to b etween Ls

M C

Cs in F share Ls in

Sapia et al

fact relates n Ls d L some D

p p p p p p

Aggregation functions

M C

Sanchez et al Cs in F use Ds in F

between Ls d L

Memb ership hierarchyofLs

Aggregation b etween Ls

p p p p p

M C

Tryfona et al

Sp ecialization b etween Ls

d L

Allows MN b etween C and L

AggregabilityofCs

p p p p

Partially ordered set of Ls

Nguyen et al

Ds and Cs cub e

groupby relates Cs and Ls

d L

f Ls C

p p p p p

M C

Ds cub e

Husemann et al

Aggregation path b etween Ls

Aggregabilitybetween M and L

Only domains over N Z and R are allowed

Even though Measures are related to Level they are not group ed into Cell

Possibly derived

Implicit within the structure of the rollsup to graph

Table Schema constructs in the dierent mo dels at Conceptual level

Lehner Nested Multidimensional Data Mo del NMDM

This is rather a presentationoriented mo del conceived to ease navigation through data Leh

emphasizes the presentation of data at two dierent Nested levels and the op erations oered

to the user in order to accomplish this ie slicing drilldown rollup split merge

aggregation and other celloriented op erators like max min etc The existence

of twolevels is said to improvethepower and exibility of the whole analysis pro cess

One of the most interesting features of NMDM b esides the existence of twolevels is the way

it qualies Dimension instances by means of dierent sets of attributes ie dierent instances

in the same class mighthave dierent attributes Thus at the b ottom of every classica

tion hierarchy is placed a primary attribute PA whose instances are called dimensional

elements Those dimensional elements are the leaf no des of a balanced treestructured clas

sication hierarchy Each tree level is called classication attribute CA whose instances

Multidimensional modeling and the OO paradigm

are classication no des CNs dimensional attributes DA are asso ciated to every CN

notice that CNs are instances in the hierarchy

A Primary Multidimensional Ob ject PMO consists of an unique cell identier a set

of CAs and PAs one p er Dimension denoting the granularity of the cell a set containing

one instance p er CAPA sp ecifying the selection criteria an aggregation typ e describing the

aggregation op erations applicable and a data typ e ie domains over N Z orR In turn

a Secondary Multidimensional Ob ject SMO consists of a set of CNs and a set of DAs

applicable to them Thus a Multidimensional Ob ject is a PMO and a set of DAs for

dening the corresp onding nested SMOs

All the schema constructs in this mo del refer to the Dimensions They are dened as

a linear hierarchyofLevels called classication attributes at Intermediate detail level

and the instances of each Level have asso ciated Descriptors called classication no des at

Lower detail level

Cabibb o and Torlone MD

Cabibb o and Torlone in CTb CTa and CT qualify their mo del MD as logical

However they say that it is indep endentofany sp ecic implementation and present a design

metho dology to obtain an MD schema from an ER one Moreover the authors argue that MD

is at a higher level of abstraction than a star schema consisting of relational tables Therefore

it should b e classied as Conceptualeven though it provides a strong formal foundation

including a calculus

The main constructs in the mo del are dimension and ftable Each dimension is

organized in a hierarchyoflevels corresp onding to data domains at dierent granularityIn

turn a level can have descriptors asso ciated with it The ftables are functions from

levels to measures

We can clearly identify the data ab out Dimensions at the three dierentlevels dimen

sion levels and descriptors Ab out facts there are only measures at Lower Detail

Level and a set of ftables at Upp er detail level However measures are not group ed

regarding the Levels they are dened at

Golfarelli Maio and Rizzi Dimensional Fact Mo del

GR GMRa GMRb and GR present a graphical conceptual mo del DFM for

data warehousing b esides a metho dology to obtain a multidimensional schema from the op er

ational schemas either ER or Relational

Contrary to what is said for some formal mo dels the authors claim that it is imp ortantto

clearly distinguish b etween dimensional and factual data Thus a dimensional scheme consist

of a set of fact schemes and each one of these contains a fact measures dimensions

and hierarchies A fact is a fo cus of interest and its attributes are measures The

dimensions are discrete attributes which determine the minimum level of granularitychosen to

represent the fact Finally a hierarchy is a set of dimensional attributes linked by toone

relationships ie or N which form a quasitree Hierarchies may also include non

dimension attributes that contain additional information which can not b e used for aggregation

but just for selection Moreover aggregability can also b e expressed by relationships b etween

a measure and a dimension tagged by the allowed aggregation functions

A sp ecial relation b etween twoschemas is also dened called compatibility and strict

compatibility which indicates and restricts when a query can b e formulated including mea

sures in b oth schemas Roughlytwoschemas are compatible when they have at least one

common dimension attribute

Placing those constructs in the analysis framework of the previous section we obtain De

scriptors called nondimension attributes and Measuresat Lower detail level group ed

resp ectively into Levels called dimension attributes and Cells The Levels form dimension

hierarchies Furthermore Cells are related by compatibilityand MeasuresandDimen

sionsby aggregability

Trujillo Palomar and Gomez GOLD

TP TPG and TPGS describ e an Ob jectOriented conceptual mo del based on a

subset of UML A query notation is also presented

A fact represented as a basic class is describ ed through a set of fact attributes either

atomic or derived representing Measures By mean of partwhole relationships a fact is

related to a set of dimensions also represented as basic classes that showthegranular

ity adopted for representing facts Those dimensions are also describ ed by dimension at

tributes A classication hierarchy is dened as a Directed Acyclic Graph of level classes

ro oted in the dimension class Multiple classication hierarchies are allowed and strictness

and completeness explicited AggregabilityofMeasures along each analysis dimension can b e

represented as well as derived Measures

In this mo del information at Lower detail level is represented in the form of Measures

called fact attributes and Descriptors called dimension attributes The former are

attributes of a Fact at Upp er detail level while the later are attributes of a Level at Inter

mediate detail level A Dimension is dened as a classication hierarchyofLevels

Sapia Blaschka Hoing and Dinter Multidimensional Entity Relationship Mo del

SBHD argues that the ER mo del is not suited for multidimensional conceptual mo deling

Thus a sp ecialization is dened and its usage exemplied

The design of this mo del was driven by the following ideas

Sp ecialization of the ER mo del

Minimal extension of the ER mo del

Representation of the multidimensional semantics

Following those guidelines these sp ecializations are intro duced

A sp ecial entity set dimension level

Multidimensional modeling and the OO paradigm

Two sp ecial relationship sets connecting dimension levels

fact relationship set nary

rollsup to relationship set binary

A rollsup to relates two dimension levels where the second one represents a higher

level of abstraction This kind of relationships dene a Directed Acyclic Graph Multiple

hierarchies alternative paths and shared hierarchylevels for dierent analysis dimensions are

allowed A fact relates n dierent dimension level entities There is not any restriction to

dierent facts b eing related to the same dimension level

Since this mo del is based on ER dimension levels and facts would have attributes

which are identied as Descriptorsand MeasuresatLower detail level The dimension

levels are clearly placed at Intermediate detail level as well as facts Finallya Fact

would corresp ond to what is called a multicub e mo del At this level we also nd implicit

Dimensions a hierarchy of dimension levels

Sanchez Cavero de Miguel and Martnez IDEA

Their authors claim that the aim of SCdMM is to present a conceptual multidimensional

mo del allowing to design multidimensional databases indep endently of the sp ecic pro duct used

in their implementation Besides the mo del a closed algebra is dened with the following op

erations rollup join destroy dimension slice and dice and select A metho dology

and CASE to ol are also mentioned

Amultidimensional schema is dened as a non empty set of domains set of domain

aggregations set of hierarchies and non empty set of fact schemas Three dierent kinds

of domains are distinguished ie dimension domain synthesis domain and description

domain Furthermore a hierarchy is a set of domain aggregations b etween category do

mains a sub class of dimension domain linked to shap e a directed graph A fact schema

is a set of dimension attributes set of dimensions ie a subset of that of dimension

attributes structure of the cell and predicate showing the selected cells Every cell struc

ture is describ ed as a list of synthesis attributes plus an attached list of applyable synthesis

functions

Measures corresp ond to attributes dened on synthesis domains while Descriptors are

those attributes dened on description domains At Intermediate detail level we nd that

every Level corresp onds to a dimension attribute and Cells are called cell structure Dif

ferent dimension attributes are related by aggregation functions giving rise to Dimensions

Each fact schema contains exactly one cell structure However dierent fact schemas are

related We could identify a Fact as a multidimensional schema containing a set of related

fact schemas sharing Dimensions

Tryfona Bushorg and Christiansen starER

In TBC rstly a set of user requirements for a data warehouse conceptual mo del is listed

Then a data mo del based on the well known ER mo del addressing those requirements is

dened The requirements are

Represent facts and their prop erties Three dierent kinds or prop erties are con

sidered ie sto ck ow and valuep erunit

Connect the temp oral dimension to facts

Represent ob jects capture their prop erties and asso ciations among them Three

dierent kinds of asso ciations are highlighted

a Sp ecializationGeneralization

b Aggregation

c Memb ership characterized by strictness or not and completeness or not

Record the asso ciations b etween ob jects and facts

Distinguish dimensions and categorize them into hierarchies dimensions are those

ob jects connected by an asso ciation relationship to a fact

Based on those requirements the constructs of the mo del are Fact set that represents a set

of realworld facts sharing the same characteristics or prop erties Entity set which represents a

set of realworld ob jects with similar prop erties Relationship set that represents a set of asso

ciations of any kind out of the three aforementioned namely Sp ecializationGeneralization

Aggregation and Memb ership among entity sets and fact sets any cardinality is al

lowed ie N N and NM and Attribute which represents a static prop erty of entity

sets relationship sets or facts sets whichcanbeofany of the three kinds mentioned

ab ove namely sto ck ow and valuep erunit

Placing those constructs in the three detail levels we can see implicitlydeneda Dimen

sion at Upp er detail level as a set of related entity sets Aggregation hierarchies in Dimen

sions are dened by means of Memb ership relationships Those entity sets b esides fact

sets would resp ectively play LevelsandCellsrolesatIntermediate level Finally their

attributes would b e MeasuresandDescriptor at Lower level Three dierent kinds of

relationships are allowed b etween Levelsat Intermediate Sp ecialization Aggregation

and Memb ership Moreover any cardinalityisallowed for the relationship b etween a Level

and a Cell

Nguyen Tjoa and Wagner conceptual multidimensional data mo del

The multidimensional mo del presented in BTW uses the Ob jectOriented paradigm to

represent its metamo del Sp ecically UML is used in a schema whichmodelsmultidimensional

data and all together For instance this schema contains a class Dimension and

another class MeasureValue

The dimension memb ers form a hierarchical domain which partitions them into di

mension levels that b elong to a dimension In turn measures are integer or oat values

group ed into cells group ed into groupbys where every cell conforms with a groupby

Multidimensional modeling and the OO paradigm

schema Each groupbyschema refers to a set of measure schemas and dimension lev

els where measure schemes indicate aggregability of measures and dimension levels

show the granularity of the measures

Since data and metadata are dened at the same level multidimensional schemas havea

predened structure and Measure domains and Descriptors cannot b e dened Therefore it

can b e considered that this mo del do es not allow the representation of any kind of information

at Lower detail level However at IntermediatewendLevelsand Cells ie groupbys

and at Upp er wendFacts as the groupbyschemas asso ciated to a cub e schema and

Dimensions called dimension schemas

Husemann Lechtenborger and Vossen conceptual warehouse design

HLV presents a phaseoriented Data Warehouse design metho dologywhich systematically

derives schemas in generalized multidimensional normal form

Those schemas contain dimensions structured in terms of one or more aggregation paths

which could b e alternative or optional that share the same terminal dimension level

and a fact which is a set of measures determined by terminal dimension levels mea

sures functionally dep end on dimension levels The sets of dimension levels of dierent

dimensions are assumed to b e disjoint Each one of those levels has a set of prop erty at

tributes asso ciated A fact schema represents the dimensional context for a set of facts

that share the same terminal dimension levels Summarizability is also shown by relating

measures and dimension levels to a restriction level indicating the aggregation functions

allowed

This mo del has Measures and Descriptorsat Lower detail level which are resp ectively

group ed into Cellsand LevelsatIntermediateHowever while Levels are group ed into

Dimensions based on the meaningful aggregation paths Cells are not group ed if they are

not sharing the same terminal Levels It is imp ortant to remark that summarizabilityisshown

at Lower detail level for each Measure

Research eorts at Logical level

This section contains the work of those authors describing a mo del which is neither Concep

tual nor Physical Their constructs are clearly orientedtoagiven kind of DBMS Never

theless they are not that far from users conceptions Atthislevel we can nd the following

pap ers Kim BSH MTW GLK and MK

Kimball multidimensional mo del

Doubtless the most prominentwork at this design level is Kim It describ es the implemen

tation of the multidimensional mo del on a Relational DBMS Its explanations are not sp ecic

of any DBMS like could b e Microsoft SQL Server nor discusses sub ject such the most appro

priate kind of indexes partitions of a table or retrieve algorithms Therefore it should not b e

considered a Physical mo del

Upp er detail level Intermediate detail level Lower detail level

Author F D Relationships C L Relationships M d Relationships

p p p p

d D

Ds shared byCs

Kimball FK b etween C and Ds

M C

p p p p p

d L

FK b etween C and Ds

Buzydlowski et al Ds shared byCs

M C FK b etween Ls

p p p

Mangisengi et al NR C F M C

p p p p p

d L

Mangisengi et al ER

Ds C Ls D

M F

p p p p p

d L

Pointers from C to Ds

Gopalkrishnan et al Ds shared byCs

Pointers b etween Ls M C

Ls form hierarchies

p p p p p p

d L

Mo o dy et al

Cs form hierarchies

Star schemas share Ds

M C

onetomany b etween Cs and Ls

They are implicitly defined by Descriptorsineach Dimension

Implicitly defined by existing hierarchies

Table Schema constructs in the dierent mo dels at Logical level

In this b o ok presents some multidimensional design patterns and describ es

how they could b e tackled Some eorts have b een done to improveKimballs work BSH

or GLK showtwo Ob jectOriented approaches

The star join schema is dened as comp osed byahuge central and a set

of usually smaller dimension tables surrounding it The primary key of the fact table is

comp osed by a foreign key to each one of the primary keys of the dimension tables The

fact table contains numerical measures usually continuously valued and additive while

dimension tables have attributes usually textual and discrete The dimension tables

can b e shared by dierent fact tables giving rise to a data warehouse bus architecture as

explained in KRRT

The p ossibility of normalizing the dimension tables obtaining an snowakeschema is

presented as an option that should b e avoided It would allow to explicit dimension hierarchies

However the saved space is irrelevant while query p erformance is really worsened a series of

joins b ecome necessary and browsing into dimension attribute values is more dicult

Kimballs mo del do es not dene any explicit aggregation hierarchyorLevels but they are

implicit in the Descriptors Moreover the fact table represents a given Cell related to its

Dimensionsby foreign keys At Lower detail wendDescriptors as well as Measures

Buzydlowski Song and Hassell OLAP

BSH draws the advantages of an OLAP approach as opp osed to ROLAP and MOLAPIt

presents a direct translation from Kimballsmodelinto the Ob jectOriented paradigm Instead

of using relational tables the usage of ob ject classes is prop osed Only two new concepts are in

tro duced ie dimension nonasso ciative classes and dimension asso ciative classes in order

to distinguish those analysis dimensions with and without an explicit hierarchy resp ectively

Thus its constructs are those of Kimballs mo del plus the p ossibility of expliciting Levels

within a Dimension

Mangisengi Tjoa and Wagner Nested Relations and Extended Relational

MTWintro duces and compares two dierent approachestomultidimensional mo deling no

tice that there are twoentries in the summary table for these authors The ideas of those ap

Multidimensional modeling and the OO paradigm

proaches are based on nested relations NonFirst Normal Form Relations and the extension

to the Relational mo del intro duced in Co d

A nested relation is a Relation whose attributes may b e other Relations By nesting

Relations we can reect the dierentdetaillevels in the fact measurements Therefore we

will obtain a Factat Upp er level as a Relation dierent nested relations corresp onding to

Cells at dierent detail levels and nally the Measures for each Cell

On the other hand Co dds extension to the Relational mo del uses concepts like ob ject

identiers OIDs asso ciations or ob ject typ es Moreover it allows new op erations like

patt which partitions a Relation based on a given attribute A fact relation can b e mo d

eled as an asso ciation relation with participating dimension relation typ es containing OIDs

of dimension tuples Each dimension relation typ e could further b e rened by other charac

teristics expliciting the aggregation hierarchy in the same wayhaving OIDs as attributes

Thus at Upp er detail level wewould havethe Dimensions At Intermediate level each

Dimension contains identiers of its Levels whichcontain identiers of ner levels and so

on A fact relation would corresp ond to a Cell at this level Finallyevery Level contains

Descriptors and every Cell contains Measures

Gopalkrishnan Li and Karlapalem Ob jectRelational View

GLKalsopresents an Ob jectOriented approachtomultidimensional mo deling It not only

describ es a data mo del but a metho dology to build a Data Warehouse from Relational data

sources

A translation from Kimballs snowakeschemas to an Ob jectOriented mo del is provided

The poor browsing p erformance in this kind of schemas outlined by Kimball is avoided here by

using a Structural Join Index Hierarchy mechanism A onetoone mapping from Kimballs

tables to ob ject classes is dened Foreign keys are translated to Ob ject Identier p ointers

By these means we obtain Dimensions as a hierarchyofLevels related by ob ject p ointers

At Intermediate detail level we also have Cells related to Dimensionsbyobjectpointers

to o Cellsaswell as Levelscontain attributes ie Measures and Descriptors resp ectively

Moody and Kortink design metho dology

MK describ es a metho dology to develop multidimensional mo dels from ER mo dels

The idea b ehind this work is to b enet the multidimensional design from the information

already in the op erational schemas Dierent kinds of schemas can b e obtained as result of

the dierent steps ie at terraced star constellation galaxy snowake or

star cluster All those kinds of schemas contain Relational tables and are based up on the

duality factdimension They are characterized by dierentlevels of denormalization in either

fact or dimension tables Thus for instance one cho oses whether to explicit Levels or not

by normalizing Dimensions and place the information ab out aggregation levels in dierent

tables Dierent top ologies are oered

Flat schemas contain the minimum numb er of fact tables They do not haveany

dimension table b ecause they are collapsed denormalized into the corresp onding fact

table Moreover some fact tables are also collapsed into more detailed ones if p ossible

They keep all p ossible joins precalculated

Terraced schemas contain all the fact tables without any dimension table all them

are collapsed These schemas only precalculate star joins ie those involving a fact

table and a dimension table

Star schemas contain fact as well as dimension tables However they do not explicit

dimension hierarchies since they are collapsed into a single dimension table

Constellation schemas consist of a set of star schemas with hierarchically linked fact

tables

Galaxy schemas consist of star schemas sharing dimension tables

Snowakeschemas are star schemas with explicit dimension hierarchies obtained by

normalization of dimension tables

Star cluster schemas are snowakeschemas were we collapse those dimension tables

that do not haveamultiple hierarchy

This metho dology in addition to MeasuresandDescriptors at Lower detail level b e

ing members of Levels and Cells resp ectively considers constructs to relate those Levelsand

Cells Dierent Levels can b e related to form p ossibly multiple dimension hierarchies More

over Cells can b e related to show fact hierarchies ie dierentlevels of detail It is not

explicitly said in the metho dology but at Upp er detail level we can identify a Dimension

as a set of dimension tables in the same dimension hierarchy and a Fact as a set of fact

tables in the same fact hierarchy It is also explained what to do with manytomany rela

tionships and subtyp es since they could b e found in a ER mo del but can not exist in a

multidimensional one

Research eorts at Physical level

In this section those prop osals that explain how a data cub e could b e implemented ie stored

andor retrieved are placed The prop osals at this level do not only dep end on the kind of

DBMS but also presentwhich sp ecic mechanisms it should implement

At this level only one pap er ab out mo deling was found Dyr It could b e surprising

that there is only one pap er in this section Howeveratthislevel prop osals must b e devoted

to sp ecic storage techniques instead of providing a true data mo del Since mo deling is a

conceptualization by means of a given set of constructs it is more suitable when we consider

notions closer to the user Thus we could exp ect not to nd anywork in this section but this

one expresses how data should b e stored b esides some concepts to understand it

Multidimensional modeling and the OO paradigm

Upp er detail level Intermediate detail level Lower detail level

Author F D Relationships C L Relationships M d Relationships

p p

Dyreson finer that b etween Ls

Table Schema constructs in the dierentmodelsatPhysical level

Dyreson

Dyr explains how a sparse cub e could b e implemented in a MOLAP database by means of

disjoint complete cub ettes An algorithm to retrieve an aggregate value from the incomplete

data cub e is describ ed b esides another algorithm to remove redundant cub ettes

A measure is dened as a system of measurement and a unit as a subset chosen from

the domain of interest Thus a set of disjoint units chosen from the same domain form a

measure A partial order is dened among measures based on their granularity or precision

A cub ette is dened as containing data ab out a given unit at a given detail level ie

measure

Levels called measures and hierarchies dened as graphs of ner that relationships

between measures are the only constructs provided in this framework b oth at Intermediate

detail level There is nothing said ab out factual information

Research eorts on Formalisms

In this section those mo dels mainly devoted to the denition of a multidimensional algebra

andor calculus are placed Their stress is on formalizationofmultidimensional concepts rather

than data mo deling These mo dels do not paytoomuch attention to facilitate the capture of

the sp ecic user concepts Since their fo cus is not in conceptualizing users ideas we can see in

the summary table that they do not oer as much constructs as other mo dels However if we

would takeinto account the expressiveness of the algebras they mightbeassemantically rich

as Conceptual mo dels are Mo deling constructs are not taken into account since studying

the expressiveness of the op erations is out of the scop e of this work Atthislevel we nd the

following mo dels AGS LW DT HS GL Vasa and Ped

Agrawal Gupta and Sarawagi logical mo del

AGS presents one of the rst multidimensional mo dels and probably one of the most

referenced ones In spite of its qualication as logical by the authors since its fo cus is on

presenting an algebra as p owerful as Relational algebra it can b e considered a Formalism

The main characteristics of this mo del are the following

Symmetric treatment of factual and dimensional data byproviding conversion op erations

from one to another

A minimal closed set of op erations ie push pull destroy dimension restric

tion and join which can b e directly translated to SQL

Supp ort for multiple nonexplicit hierarchies along each analysis dimension

Upp er detail level Intermediate detail level Lower detail level

Author F D Relationships C L Relationships M d Relationships

p p

C tuple of ds

Agrawal et al

or C boolean

n

p p

cub eDset of d Aggregation hierarchies

Li Wang

Cub es share Ds defined at query time

p p p p

Datta Thomas setofMandD cub e f D set of ds

n

p p p p

f d M

Hacid Sattler Partwhole b etween Ls

Aggregated concepts

p p p p

Gyssens Lakshmanan setofLs cub e f D set of ds

p p p p

Vassiliadis basic cub e uses Ds Ls form a lattice C tuple of Ms

F and Ds cub e

Ls form a lattice

p p p p

Applicability of aggregation Cub es sharing Dimensionsform

Pedersen

functions p er L amultidimensional ob ject family

Due to the desired symmetry factdimension everything is considered a Dimension and the function from the cartesian pro duct

of the Dimension domains is defined on the b o oleans rather than on Measures

There is not any information ab out Measures in the schema A function is defined from Dimensionstoasetofscalarvalues

Implicit on defining a cub e as containing a set of Measures

Those attributes that are not at any LevelmustbeintheCell

Dimensions are treated as Measures

Table Schema constructs in the dierent Formalisms

This mo del distinguishes a cub e comp osed by k analysis dimensions a function from k

parameters to the b o oleans or tuple of values and a name for each analysis dimension It do es

not provide any means to explicit dimension hierarchies Moreover the only way to showthat

there are several values in the cells of a data cub e is by dening tuples However the mo del do es

allow to showwhich tuple of values is available dep ending on the selected dimension values

This approach do es not oer to o many conceptual elements to mo del a multidimensional

schema Actuallyitjustprovides Descriptors in the form of dimension values without any

p ossibilityofeven grouping them into dierent Dimensions At most we could consider that

it allows to group Measuresinto tuples giving rise to Cells

Li and Wang Multidimensional Data mo del

In LW its authors dene a Formal Multidimensional Data MDD mo del for OLAP

systems At the center of their approach is the notion of multidimensional cub e They also

dene a Multidimensional Database MDDB as a set of multidimensional cub es and a

nite set of Relations

Amultidimensional cub e schema is a set of pairs dimension name set of attribute

names Thus a multidimensional cub e is a multidimensional cub e schema and a mapping

from a combination of tuples containing the attribute values one for each analysis dimension

to a scalar value There is not any kind of information in the schema at Intermediate detail

level and aggregation hierarchies are not explicitly dened but dynamically xed at query

time by means of ordering op erations However multidimensional cub es in the same MDDB

share dimension Relations This means that if twomultidimensional cub es have the same

dimension name they are using the same dimension Relation

Besides a formalism for multidimensional cub es they also present a grouping algebra

which is used to query the MDDB and a multidimensional cub e algebra used to query a

MDDB and generate views A novel feature of the grouping algebra is that it includes order

related op erations The set of op erations provided by this algebra are those of the Relational

Multidimensional modeling and the OO paradigm

algebra plus some orderoriented op erations and an aggregation op eration The multidimen

sional cub e algebra oers six op erations that are mappings from multidimensional cub es

to multidimensional cub es ie add dimension transfer union cub e aggregation

rcjoin and construct

We can see the conceptual elements provided by this mo del as Dimensionsat Upp er detail

level stating an implicit relation among dierentmultidimensional cub es p ossibly sharing a

Dimension and deaggregating Dimensionsinto Descriptorsat Lower detail level Since

the mapping function b etween Dimensionsand Measures is dened on a scalar value without

any kind of semantic domain we could say that the prop osed mo del do es not provide any means

to represent Measures

Datta and Thomas

The mo del of DTresembles that of AGS The three goals of the authors on oering

their mo del are to

Allow symmetric treatment of dimensional and factual data

Separate structure and contents

Provide comprehensive OLAP functionality

The authors dene a data cub e as a set of dimensions a set of measures a set of at

tributes and a mapping function corresp onding to each dimension a set of attributes So

they neither dene explicit hierarchies nor the set of Measuresavailable at each aggregation

level

By dening cub einstances they accomplish their second goal A cub einstance is a

data cub e plus a set of values plus a mapping from the cartesian pro duct of the dimension

domains to the values Moreover a set of op erations ie restriction aggregation carte

sian pro duct join union dierence pull and push is dened on cub einstances

Op erations push and pull are used to accomplish the rst goal

In this case we can clearly see elements at dierent detail levels At Lower level wend

Descriptorsand Measures While at Intermediate level we nd the set of Dimensions

as sets of Descriptors each corresp onding to exactly one Level and the set of Measures

in the data cub e implicitlythe Cell corresp onding to the unique aggregation level in the

data cub e

Hacid and Sattler description logics framework

HS prop ose an ob jectcentered logical framework ie Description Logics for multidimen

sional data mo dels Their aim is to facilitate comparison or evaluation of dierentmultidi

mensional mo dels provide well dened semantics and allow precise denition of problems A

translation b etween an Extended ER diagram and Description Logics is given in FS

By means of Description Logics the authors represent a data cub e as a relationship among

cells whichkeep the co ordinates and measures Every cell in a data cub e must have the same

structure The functional dep endency b etween co ordinates and measures is explicitly shown

Beside data cub es dimension hierarchies can also b e mo deled A hierarchically structured

dimension is a set of ob jects interrelated by partwhole relationships Thus a hierarchyis

represented as a nite partially ordered set

Moreover a set of op erations on data cub es is also dened In this case those op erations

are restrict destroy join rename Join which oers more parameters than join

aggr and rollup Furthermore a whole section is devoted to the problems of the drill

down op eration

This is a semantically p owerful mo del However some multidimensional mo deling mecha

nisms are not explicitly explained or exemplied ie the participation of dimension hierarchies

in the denition of a data cub e or the usage of complex concepts At Upp er detail level

Dimensions could b e mo deled as the set of concepts participating in classication hierarchies

in spite of it is not explicitly said At Intermediate level those hierarchies are decomp osed

into dierent aggregation concepts at dierentlevels related by partwhole relationships Fi

nallyat Lower level we nd Measures and Descriptors that can b e aggregated into more

complex concepts

Gyssens and Lakshmanan multidimensional database mo del

As some authors b efore GL also dene some required functionalities and drive their mo del

to fulll them

Abilitytoposepowerful adho c queries through a simple and declarativeinterface

Ability to restructure information

Ability to classify or group data sets

Ability to summarize values

To accomplish these goals the authors prop ose a Relational approach and dene an n

dimensional table schema as a triple containing a dimension name set an attribute set

and a function from dimension names to attribute set showing the attributes of each anal

ysis dimension From this denition they develop an algebra based on the Relational algebra

Apart from the redenition of classical Relational op erators the authors add other op erators

like fold and unfold in order to remove and add a Dimension to the schema resp ec

tively and a summarization and aggregation functions Op erators fold and unfold allow

to convert Measuresinto Descriptors and vice versa since the attributes of the disapp eared

Dimension remain in the data cub e as Measures Therefore the mo del allows a symmet

ric treatment of b oth of them Moreover it shows that every multidimensional table can b e

represented by a classical Relation and vice versa

With regard to the mo deling elements provided we can distinguish Descriptorsand Mea

sures at Lower level At Intermediate level the DescriptorsformLevels If we subtract

the Descriptors from the set of all attributes in the data cub e we could also consider implicitly

dened a Cell

Multidimensional modeling and the OO paradigm

Vassiliadis

Vas and Vasa present another formal mo del for multidimensional data b esides its map

ping to ROLAP and MOLAP databases Here the cub e algebra demonstrated to b e complete

and sound consists of just three op erations ie navigate selection and split measures

For each dimension of analysis a set of levels is dened forming a lattice b ounded

by All at top and the detailed levelatbottom A dimension consists of a set of

dimension paths which are totally ordered lists of dimension levels A dimension level

b elongs exactly to one dimension and has an asso ciated space of values The dimension

levels can b e monovalued or multivalued whether their domain is a set or a p ower set of the

space of values

A MDDB is dened as a set of dimensions dimension levels and a basic cub e A

basic cub e contains the data cells at the maximum level of detail Over this by mean of the

cub e algebra other cub es we could call views are dened The existence of the basic cub e

is justied by the imp ossibility of p erforming the drilldown op eration without it

In this mo del a Dimension at Upp er detail is comp osed by a lattice of Levelsat

Intermediate However the dimension levels do not contain further details at LowerIf

welookatCells which are not explicitly dened we see that they are a tuple of Measures

identied by the b ottom levels of the dierent dimension lattices

Pedersen Extended Multidimensional Data Mo del

Besides a classication of multidimensional mo dels PJ PJ and Ped already ref

erenced in section also present an Extended Multidimensional Data Mo del EMDM

After the denition of the requirements most of them refer to semantics for the usage of a

multidimensional mo del in a clinical context and the verication that none of the existing

mo dels addresses all of them this new mo del was dened

EMDM provides a formalism and algebra that is closed and at least as strong as Re

lational algebra with aggregation functions The op erations in the algebra are selection

pro jection rename union dierence identitybased join aggregate formation

valuebase join duplicate removal SQLlike aggregation starjoin drilldown and

rollup The implementation of the mo del using Relational databases is also explained

An ndimensional fact schema consists of a fact typ e and n dimension typ es In

turn a dimension typ e consists of a set of partially ordered category typ es forming a

lattice To each category typ e an aggregation typ e has b een asso ciated indicating the

aggregate functions applicable at that level The mo del treats dimensional and factual data

symmetrically Multiple hierarchies p er analysis dimension nonstrict hierarchies nononto

hierarchies noncovering hierarchies or manytomany relations b etween facts and dimensions

are allowed However there is no way to reect such information in the schema Instead it

is deduced from data instances Moreover relating values that represent the same concept

along time is also p ossible thanks to temp oral constructs

The semantic constructs oered by the mo del are Dimensions and FactsatUpp er detail

level and Levelsat Intermediate It cannot b e considered that the mo del allows to show

Cells Data in the Facts can b e related to any Levelhowever this information cannot b e

shown in the schema At Lower level we nd that Descriptors do not exist Cellsdonot

have attributes neither however Dimension values are used as Measures

Other work

There are other pap ers ab out multidimensional interfaces multidimensional query languages

etc GJJ GL GBLP and BPT among others that also treat as a minor

sub ject some kind of multidimensional mo del These were left out of the classication b ecause

the mo dels have not any new or improved characteristics neither was in the aim of the authors

to presentamultidimensional mo del

Moreover there is a lot of literature devoted to either ROLAP or MOLAP implementation

HRU and TS among others For instance they present dierent kinds of indexing

techniques or partition strategies They were not included in this survey in the section ab out

mo dels at physical level b ecause they do not mo del the multidimensional data but just give

useful hints to obtain go o d storage or query p erformance

Metadata standards either de jure like Common Warehouse Metamo del or de facto

like OLE DB for OLAP have neither b een considered b ecause they are not true data mo dels

They do not aim to mo del the data cub e but to provide an interface that facilitates metadata

interchange among OLAP applications

Summary

Table contains a summary of elements and relationships among them found at the schema

level of each mo del see section and the b eginning of for the meaning of the dierent

columns Notice that information ab out either instances or instantiation relations is not shown

As outlined in BSHD some mo dels do not separate cub e structure and contents In these

cases only that information contained in the schema has b een taken into account A cell

containing a hyphen means the corresp onding mo del do es not provide any construct in that

context while a tick implies the mo del do es provide some kind of construct

It seems that Conceptual mo dels oer the p ossibility of representing muchmoresemantics

that mo dels at other levels Indeed Conceptual mo dels do havetoprovide a rich set of

semantic constructs in order to capture user ideas In turn Formalisms are those that oer

less conceptual constructs However notice they do oer an algebra whose expressiveness was

not considered in this work b ecause the fo cus was on mo deling constructs At Physical level

we nd storage techniques instead of true data mo dels Thus just one Physical mo del was

reviewed Moreover there was not found a great varietyofmodelsat Logical level

Lo oking at the table we can appreciate that the more recent the mo dels are they are

ordered chronologically into each design level they use to capture more semantics This can b e

interpreted as a trend to semantically enrichmultidimensional mo dels However having mo dels

that provide constructs at every heading do es not mean they capture all p ossible semantics

There is neither a mo del encompassing the semantic constructs of the rest nor a consensus or

standard stating what should b e represented in a multidimensional schema

Multidimensional modeling and the OO paradigm

Design Upp er Intermediate Lower

Authors Mo del Level F D Rel C L Rel M d Rel

p p p p p

Lehner NMDM C

p p p p p p p p

Cabibb o and Torlone MD CF

p p p p p p p

Golfarelli et al DFM C

p p p p p p p p

Trujillo et al GOLD C

p p p p p p p p p

Sapia et al MERM C

p p p p p p p p p

Sanchez et al IDEA C

p p p p p p p

Tryfona et al starER C

p p p p p p

Nguyen et al C

p p p p p p p p

Husemann et al C

p p p p p p p

Kimball L

p p p p p p p p

Buzydlowski et al OLAP L

p p p p p

Mangisengi et al NR L

p p p p p p p p

Mangisengi et al ER L

p p p p p p p

Gopalkrishnan et al ORV LP

p p p p p p p p p

Mo o dy and Kortink L

p p p

Dyreson P

p p p

Agrawal Gupta and Sarawagi F

p p p p

Li and Wang MDD F

p p p p p p

Datta and Thomas F

p p p p p p

Hacid and Sattler F

p p p p p p

Gyssens and Lakshmanan F

p p p p p p p

Vassiliadis F

p p p p p p

Pedersen EMDM F

Table Summary table of the dierentmultidimensional mo dels

Howmultidimensional analysis b enets from OO

Probably the most imp ortantadvantage of mo deling the UoD Universe of Discourse by means

of an OO mo del is that the result is closer to the user conception ie it naturally reects

p eoples way of thinking Every ob ject or class mo deled will have a corresp ondence with some

real entity making it quite easy to b e understo o d We can also nd other not that abstract

b enets in the OO paradigm

Ob jectOriented Software Engineering Since the OO paradigm is widely used and well

accepted in Software Engineering it do es not seem a go o d idea to break it by using a

nonOO approach in data mo deling Moreover an OO data mo del eases some sp ecic

tasks like designing a Distributed Ob ject System



NonFirst Normal Form NF It is not mandatory to have at normalized entities We

can design ob jects containing nonatomic values In some cases this can b e found really

useful due to p erformance reasons or just b ecause conceptually it is not necessary to

create an unrealistic entity only to normalize the schema

Ob ject Identier OID The existence of an OID solves the identication problem A key

is not enough to identify an entityWemust consider the case when the primary key

changes and the identity of the ob ject remains the same In that case an internal

identier without any real meaning is needed It would keep the same value along the

whole life of an ob ject in the database indep endently of anychange in the represented

entity It is imp ortant to rememb er here twocharacteristics in the denition of W Inmon

ie nonvolatile and time variant Our data will evolve and OIDs will b e a useful

to ol keeping them consistent

Semantics Expressiveness or semantic p ower as it is dened in SCG is the degree

to which a mo del can express or represent a conception of the real world It measures

the p ower of the structures of the mo del to represent conceptual structures and to b e

interpreted as such conceptual structures The most expressive a mo del is the b etter it

represents the real world and the more information ab out the data gives to the user An

OO mo del is semantically richer than others for instance ER or Relational It is true

we can enrichany of those others with OO features but why should we do that if we can use a true OO mo del

User Model External schema

O-O Multidimensional Conceptual Schema Model

Logical Model

(ROLAP/MOLAP) Database Schema

Figure Database schemas at three levels

The aim of this section is just to outline how the OO paradigm could b e used to help

multidimensional mo deling by giving some examples In gure one can see the level where

the discussion is placed The interest is neither in the b est user mo del nor the b est kind

of database to use either ROLAP Relational OLAP MOLAP Multidimensional OLAP

HOLAP Hybrid OLAP or even OLAP Ob jectOriented OLAP presented in BSH could

b e go o d The b enets of an OO data mo del to integrate the dierentmultidimensional views

and keep the semantics of the data at the conceptual level are highlighted this is shown in more

depth in section If the user wishes to use a dierent one it could always b e translated

to the desired mo del The same can b e said ab out the internal level the usage of an OO

multidimensional mo del do es not imply we are storing the data in an OO database

The stress is on showing the need of using OO semantic concepts Six OODimensions ie ClassicationInstantiation GeneralizationSp ecialization AggregationDecomp osition

Multidimensional modeling and the OO paradigm

Behavioural Derivability and Dynamicity were enumerated in Sal Each one of these OO

Dimension adds a little of semantic p ower to a data mo del Wearegoingtoseehoweach one of

them helps multidimensional mo deling byallowing to represent dierent relationships among

data

Along this section nexus stands for any relationship tagged or not b etween two ob jects

The nexus are sp ecialized for every one of the OODimension to obtain the dierent meanings

As can b e seen in AR nexus roughly corresp onds to Relationship in UML terminology

which is not used here to b e more general

ClassicationInstantiati on

This OODimension distinguishes b etween the o ccurrences and the schema Every instance

is related to at least a class in the schema by nexus in this OODimension All instances

sharing some attributes and representing related concepts are group ed into a given class In

the same way all elements in a schema ie classes nexus representing related concepts

in a data mo del are group ed into a metaclass To nish the recurrence all metaclasses can b e

group ed into exactly one metametaclass which is instance of itself Of sp ecial interest in this

OODimension present in all data mo dels in one way or another is the dynamic and multiple

classication explained in MO

Dynamic classication refers to the ability of the instances to change the class they b elong

to If wewant to analyze the sales dep ending on how go o d our clients are and wehavethem

classied into dierent classes it will b e a matter of time that wewanttomovea given client

from a class to a dierent hop efully b etter one We cannot delete the instance of Client in

the database and create a new one in the new desired class b ecause wewould get a new identity

OID for it That is not what wewant to represent since we did not lose a client and found a

new one It was just our consideration classication ab out a client that actually changed and

that is exactly what the data mo del should b e able to represent This is one case of Slowly

Changing Dimensions where the change aects the classication of the ob ject The general

problem is explained in section

On the other hand multiple classication refers to the p ossibilityofhaving an instance

classied in more than one class not related by GeneralizationSp ecialization nexus at the

same time For instance it is absolutely p ossible to have a clientasprovider at the same time

Since there is not any relationship b etween the Client and Provider classes weneedtohave

the same instance classied at b oth of them multiply classied

These characteristics are always desirable Sp ecically in the eld of data warehousing the

words nonvolatile and time variant together with the OLAP need of analyzing relatively

long p erio ds of time emphasizes their imp ortance Dynamic and multiple classication are

really interesting due to the exibility needed to represent the big amountofchanges present

along the long p erio d of time that uses to b e taken into account in analysis tasks

GeneralizationSp ecialization

Another OODimension is that of GeneralizationSp ecialization relationships represented in

UML by means of Generalization The nexus in this OODimension relate two classes or

metaclasses One of those classes has a more sp ecic meaning than the other The more

general class is called sup erclass with regard to the sp ecic one referred as sub class As

a consequence of this kind of nexuswe obtain inheritance That is the sub class inherits the

prop erties and metho ds of its sup erclass or sup erclasses If it is allowed to have more than

one sup erclass wegainmultiple inheritance a class inherits from all its sup erclasses at a time

Every class will have b esides its own attributes the attributes and relationships of each one of

its sup erclasses Note this is absolutely dierent from multiple classication where an instance

is classied in multiple classes

Person D

Product D

Clerk D Sale F Time D

Client D

Dimension D Cash F Fact F Generalization

Association Credit F

Figure Example of GeneralizationSp ecialization

In gure we can see an example of a multidimensional schema It has Sales as Fact

and Clerk Time Productand Client as analysis dimensions Thus the sub ject of analysis

is Sales and wewant to analyze it dep ending on the clerk who sold the momentitwas done

the pro duct sold and the client who b ought Besides that basic information other details are

also represented by means of nexus in this OODimension

the Sales Fact is sp ecialized in two dierent Facts ie Cashand Credit dep ending

on the kind of payment and

two Dimensions ie Clerkand Client are related by generalizing them in the same

class ie Person

Sp ecializing Facts you can generate new data cub es if they contain any dierent data

or at least show a criterion to select the facts involved in the analysis In the example if

Sales would have dierent attributes dep ending on the kind of payment wewould obtain

Multidimensional modeling and the OO paradigm

three dierent data cub es to b e analyzed ie twocontaining the Measures sp ecic to each

kind of payment and another one with those Measures shared by b oth of them Conversely

if it would not haveany other attribute but those common to b oth kinds of payment we could

analyze the Sales dep ending on whether the paymentwas done by cash or by credit card

This could also b e achieved by just adding an attribute to the facts but it would give a slightly

dierenttint

With regard to relating two Dimensions it shows a common domain b etween them so

that it is allowed to compare the instances or restrict b oth Classes at the same time In

the example the analysts could formulate queries comparing instances of Client and Clerk

b ecause the data schema shows b oth as sub classes of the same class ie Person Moreover

we could consider the p ossibility of class Person b eing used in a dierentmultidimensional

schema whichwould b ecome directly related to that of Sales by means of the nexus between

the Dimensions This would p oint out the relationship b etween facts easing the navigation

through the data

AggregationDecomp osition

By means of this OODimension it is p ossible to build new ob jects as a result of the aggregation

of others which in turn can b e aggregations as well Two dierentkindsofnexus can b e dis

tinguished b elonging to it Based on their strength nexus in the AggregationDecomp osition

OODimension can denote

PartWhole if the new ob ject is conceived as comp osed by others which are its parts This

is called partwhole relationship by some authors and implies an existence dep endency

between b oth sides of the nexus ie the whole cannot exist without its parts This is

called Aggregation in UML terminology

Simple aggregation if the aggregating ob jects are just characteristics of the new one They

could have an existence dep endency to o but it is not an implication of the existence of

the nexus itself This is called Association in UML terminology

The usage of this OODimension in multidimensional design is mandatory since it helps to

represent some of the most common situations and other mayb e not so common

Firstly it helps to dene the analysis dimension hierarchies by means of partwhole links

A dimension hierarchy can b e dened as a lattice with the class corresp onding to the

maximum level of detail in the facts at the b ottom and a class representing the whole set

of p oints in the Dimension at the top see section for an explanation of the prop erties

of aggregation hierarchies In b etween wehave other Levels corresp onding to dierent

data granularities For instance if we collect data hourlythe Time Dimension would

have Hour class at the b ottom whichwould comp ose Day ab ove it whichwould give

raise to Week and Month and so forth The lattice would b e closed at the top byan

All class containing exactly one instance representing all time p oints in the database

These hierarchies are used to rollup the data in the database augmenting its granularity

Moving up eg rollingup from days to months or down eg drillingdown from years

to months along a hierarchywe obtain more or less detail in the data

On the other hand using anykindofnexus in this OODimension we can relate either

Levelsor Cells to their attributes These attributes will b e used to ease the selection

of facts to b e considered in a given analysis byallowing to select them dep ending on the

values

Nexus between the Cells and the Levelsinevery dimension hierarchy are in this OO

Dimension as well They could b e partwhole or simple aggregation nexus but whether

denoting partwhole or not a fact will b e identied by one ob ject at each linked analysis

dimension or more than one if the Dimension has more than one nexus with the facts

class Thus the nexus with the analysis dimensions will form the classkey of the facts

and that is what really distinguishes them from other attributes Sales can b e identied

by the pro duct sold the clerk who sold it the time when it was sold and the clientwho

b ought it Therefore in gure we asso ciate Sales with Dimensions Clerk Time

Productand Client

Finally partwhole nexus can b e found b etween classes of facts By reecting these

nexus in the schema we will also allow the navigation b etween dierent stars

All L

Agency L

Trip C Origin Destination L L L L L L All Day Time Airport Region All

Flight C Origin Destination

L Level L Company Cell C Association L

Aggregation All

Figure Example of AggregationDecomp osition

The example in gure depicts two classes of facts sharing some analysis dimensions and

related by a partwhole nexus The rst class of facts is FlightWeareinterested in analyzing

each ight dep ending on the time it takes place the airline company that owns the plane and

its origin and destination airp orts it is related to the corresp onding analysis dimensions by

simple aggregation nexus At the same time wewant to analyze the sequences of ights that

Multidimensional modeling and the OO paradigm

give rise to whole trips sold by travel agencies The fact that an instance of Trip is comp osed

of a set of instances of Flight is represented by connecting Trip and Flight by means of a

partwhole nexus Trip is also connected to the corresp onding analysis dimensions by simple

aggregation nexus Moreover it is imp ortant to notice that two of those Dimensionscontain

more than one class connected again by partwhole nexus A region is comp osed of a set of

airp orts in the same waythataday is a set of time p oints

In order to keep it simple and understandable the example do es not contain the nexus

representing the attributes of the facts and dimension classes whichwould b elong to the Ag

gregationDecomp osition OODimension as well The four classes at the top of each one of

the analysis dimension hierarchies ie All always exist and contain exactly one instance

corresp onding to the whole set of instances in the lowest granularitylevel of the Dimension

Behavioural CallerCalled

In OO the ob jects interchange messages A class accepts certain kinds of messages from

instances of other classes which trigger the execution of metho ds ie queries up dates calcu

lations etc The nexus in this Behavioural OODimension also known as CallerCalled

show when a class is allowed to invoke a given metho d in another class This concept could b e

identied as a kind of Permission in UML terminology

As p ointed out in Fir Relational entities represent tables purely passivecontainers for

data and since they are not real ob jects are indep endentofbehaviours The inclusion of

metho ds in the data mo del helps to mo del the b ehaviour together with the data It lo oks like

a bad idea to havetwo dierent separated mo dels for statics and dynamics Sp ecicallyin

multidimensional mo deling by asso ciating op erations to a domain wewould b e able to know

which aggregation functions can b e used on a given Measure For instance as explained

in GMRa we can nd semiadditive attributes those that are not additive along one or

more Dimensions or nonadditive attributes which are additive along no Dimension

Temperature should b e marked as nonadditive nob o dy could call an additive metho d on

it and InventoryLevel as semiadditive since it cannot always b e added eg along Time

dimension It do es not imply that other aggregation op erations could b e applied on those

Measures Therefore we need to show the applicabilityofevery dierent op eration

Moreover metho ds facilitate the implementation of complex aggregate functions In an

analysis environment it is imp ortanttokeep trackofthewaythe Measures are obtained

It is not advisable to allow the users to implement their own ad ho c functions It is error

prone and drive to misunderstandings OO concepts such as inheritance p olymorphism or

encapsulation p erfectly t at this p oint For instance supp ose wewould like to obtain the

delay of a ight dened as the dierence b etween the exp ected and real durations actually

not a complex function The problem could arise if the exp ected duration of the ightwere

kept as a time interval If this is the case the dierence could b e done by subtracting the

minimum maximumoreven midp oint exp ected duration which result in completely dierent

values Probably it do es not matter how the result is obtained but wemust ensure it is always

calculated in the same easy to change way to b e able to compare the obtained values among

dierent users or even sessions

Leaving those considerations aside this OODimension is also imp ortant b ecause of security

reasons but that is completely out of the scop e of this thesis

Derivability

Semantic Relativism of a data mo del is dened in SCG as the degree to which the mo del

can accommo date not only one but many dierent conceptions It is really imp ortant b ecause

since dierent p ersons p erceive and conceivetheworld in dierentways the data mo del should

b e able to capture all of them This is represented in UML by means of Derivation relationships

The Derivability OODimension also known as Point of View helps to represent the rela

tionships b etween abstractions in dierent conceptions of the UoD The database do es not need

to physically keep all those conceptions but only their denitions and dierent relationships

among them In general it is not go o d to store derived data unless b ecause of p erformance

reasons not considered bynow What we do really need to store is that derived data exists

and how it is obtained Here is the imp ortance of this OODimension Derivation mechanisms

can b e used to easily restructure the schemas to show them in the way the user wants in order

to b e closer to hisher thoughts Summing up Derivability OODimension is used to dene

derived data

Some analysts do not mind whether data are atomically stored in the database or not In

this sense it is desirable that either derived or atomic Measures are treated equallyHowever

others would liketoknowhow Measures are obtained Therefore the denition of the derived

Measures should b e in the schema of the database as Relational views are It allows either

to hide the complexity or to know where something comes from dep ending on the user needs

At the same time as in the Behavioural OODimension this also makes p ossible that groups

of users haveavailable the same denitions

In multidimensional mo deling it is sp ecially imp ortanttohavethepowerful p ossibilities

oered by this OODimension When a fact is b eing analyzed what really matters is to b e

able to see it from as manypoints of view as p ossible Therefore it is crucial to have the

mechanisms to dene those dierent views of the data For instance all summarized data are

related to their detail data bya nexus in this OODimension If we did not haveitwewould

not haveany kind of summarized data It is necessary to showsuch nexus at conceptual level

to understand the real meaning of data and where they come from

Going back to the example in gure we can see that the origin of a Trip would b e

derived from the originsofthe Flights that comp ose it by taking the rst one destination

would b e dened in the same way the duration of a Trip would b e function of the duration

and taking o times of the dierent Flights and so on

Dynamicity

This OODimension refers to changes along time These changes can b e considered at three

dierentlevels

Ob ject Ob jects are created deleted and also up dated Keeping the history of those up dates

is often referred as Versioning

Multidimensional modeling and the OO paradigm

Class As well as the ob jects the data schema can b e up dated to o New classes are created

old ones are deleted and others just mo died in what is called Schema Evolution

Metaclass In the same waywe can mo dify classes we can add new metaclasses notice we

can neither mo dify nor delete them This means having an Extensible Data Mo del

If we just wanted to represent the current realitywewould not need to consider Dynamicity

OODimension However it is common to need past states Therefore changes need to b e kept

and often stamp ed with some kind of time tag to know when they happ ened

In multidimensional analysis tasks time is an omnipresent dimension Moreover to make

things worse analysts frequently consider a scale of years If we add how fast things change

nowadays we can see the imp ortance of this OODimension for multidimensional mo deling It

is almost imp ossible to nd a business that has not changed at all in the last three or veyears

and those changes must b e reected in the corresp onding information system Leaving aside

changes in metaclasses wewant to see the need of considering the other two kinds of changes

ie those in ob jects and classes

The imp ortance of user requirements makes schema evolution an imp ortant issue When the

user requirements or conceptions change it is advisable to change the data schema in accordance

with them A change in a class or nexus should b e shown in the schema by connecting the

old and the new version with a Dynamicity nexus By doing it the analysts can easily see

the available data and the meaning of the results they are obtaining For instance when the

denition of a derived Measure changes the analyst is able to compare the results using the

new and the old denitions Moreover if some attribute is not kept any more or a new one is

added the analist can know whether it can b e queried or not at a given p oint in time

A sp ecial case of changes in the data is referred in KimasSlowly Changing Dimensions

It arises when attributes in analysis dimension classes are mo died The old values must b e

kept b ecause the facts previous to the change are probably still related to them while the new

ones will b e referred by the facts o ccurring from nowonHowever b oth instances represent

the same entity in reality and it has to b e outlined byanexus between them Clearlyifan

airp ort increases its number of tracks it would b e incorrect to analyze the air trac previous

to the enlargement with regard to the new numb er of tracks Therefore weneedtohavetwo

instances of the same airp ort related by a Dynamicity nexus showing that they represent the

same ob ject

Studying the storage of versions of data is completely out of the scop e of this thesis The

p oint here is to outline the imp ortance of reecting changes in the schema so that they can b e

taken into accountby analists This is similar to the schema evolution problem but worsened

b ecause weneedtokeep the old schema

Conclusions

In this chapter a framework that allows us to classify and compare multidimensional mo dels

has b een presented and exemplied by studying some representative mo dels There exist

previous studies comparing dierentmultidimensional mo dels see section However

those studies intended to show their lacks against a given list of requirements and mo dels

for absolutely dierent purp oses were put into the same bag On the contrary here research

eorts have b een classied into dierentlevels ie Conceptual Logical Physicaland

Formalisms based on their usage in the multidimensional design pro cess or if they are not

conceived for such pro cess Furthermore a framework was given to compare the terminology

used by dierent authors for the constructs of their mo dels

Along this chapter the questions of multidimensional mo deling have b een intro duced Prob

ably b ecause of the interest of the industry in the sub ject it is b eing mainly develop ed in a

sp ecially commercial way This means stressing p erformance and passing over semantics and

conceptual mo deling Multidimensional semantics are really imp ortant b ecause of their prox

imity to the inherent structure of the problem domain but they are not the only ones to b e

represented Other semantics should not b e overlo oked It is not enough having an isolated

multidimensional schema reecting how the user will access the information leaving aside the

representation of other data relationships

The applicability of six OODimensions ie ClassicationInstantiation Generalizati

onSp ecialization AggregationDecomp osition Behavioural Derivability and Dynamicity to

semantically enrichmultidimensional schemas has b een shown by exemplifying it This is a

really imp ortant p oint since as shown in this chapter most authors consider isolated stars

comp osed by a central fact table and dierent at denormalized dimension tables arranged

around it each one related to the central table by a foreign key That is not a bad idea at all

but there is much more information ab out the data sub ject of analysis that the schema could

contain whichwould b e really useful to the analysts users of the multidimensional system

For the sake of simplicity and understandability the stars use to b e represented in an

isolated manner The necessityofproviding an overall view of the data has b een stressed

Multidimensional analysis is used in decision making pro cesses Therefore the most global view

is provided the more the schema helps the users It is really imp ortanttooeranintegrated

vision of the business or sub ject of analysis in order to give the analists a unied set of data

instead of lots of puzzle pieces The prop osal is to relate the puzzle pieces by means of nexus

in the dierent OODimensions Thus existing multidimensional mo dels are not enough

Multilevel schemas architecture

Chapter

Multilevel schemas architecture

For the fashion of Minas Tirith was suchthatitwas built on seven levels each

delved into a hill and ab out eachwas set a wall and in eachwall was a gate

JRR Tolkien The Return of the King

In this chapter an architecture of seven schema levels for Federated Information Systems

FIS is related to Data Warehousing schemas which allows to provide b etter understanding

to the characteristics of every schema as well as the way they should b e dened Because of

the condentiality of data used to make decisions and the federated architecture used data

protection issues are also mentioned

Navigation among dierent stars is usually overlo oked in literature Thus this chapter

studies dierent kinds of conceptual relationships b etween stars ie Derivability Generaliza

tionSp ecialization AggregationDecomp osition and Dynamicity and analyzes how they t

into the schemas architecture The aim is to ease the implementation and usage of multistar

schemas

Firstly section presents the integrated schemas architecture for FIS and DW Then

section shows dierent semantic p ossibilities to relate stars and how this aects the schemas

architecture

Extending a schemas architecture for Data ware

housing

DW is a relatively new area of study The idea b ehind this section is to use the advances already

done in other sub jects to b enet it The presence of the integration concept in the denition of

aDW given in Inminvites to cho ose FIS as a rst class candidate to contribute its advances

Sp ecically the lo cation of DWschemas in an architecture for FIS is prop osed which allows

to study them from this p ointofview

User User ModelUser External Model User External Schema Schema

Canonical Data ... Model External Schema

... Authorization Schema

... Federated Schema

...... Export Schema Export Schema

Component Schema Component Schema

Native Native

Model Native Schema Model Native Schema

Figure levels schemas architecture ROSC

The architecture of seven schema levels depicted in gure will b e used for that purp ose

This architecture was presented in ROSC as an extension of that in SL in order to

separate dierent issues in the pro cess of obtaining the user schemas from the federated ones

Its dierentschema levels b ottomup are

NativeSchema is the conceptual schema of a Comp onent Database CDB expressed

in the native data mo del

Comp onentSchema is the conversion of the NativeSchema into the Canonical Data

Mo del CDM BLOOM dened in CSG in this architecture

Exp ort Schema represents the part of the Comp onentSchema that is available to a

class of federated users

Federated Schema is the integration of multiple Exp ort Schemas EachFederated

Schema supp orts exactly one semantics

Authorization Schema is dened to apply a Multilevel Security MLS p olicy MLS

is a Mandatory Access Control MAC mechanism to protect data where access right

authorizations are not used but access decisions dep end on security levels organized as

a partial ordered set asso ciated to each sub ject and each protected ob ject The security

level asso ciated to a sub ject is named Clearance Level Each one of the Authorization

Schemas represents a subset of the Federated Schema which is accessible by a class

Multilevel schemas architecture

of federated users sub jects with a certain Clearance Level The set of data included

in an Authorization Schema is classied at the same level or at a level smaller than

the level corresp onding to the Authorization Schema

External Schema denes a schema for a class of users andor applications It is still

expressed in the CDM

User External Schema is the conversion of an External Schema to the user data mo del

The idea of relating Data Warehousing and federated databases was already presented in

SSSB In this section the architecture is studied in more depth paying sp ecial attention to

the new schemas app earing to achieve Data Warehousing Section presents the schemas

used to exemplify the architecture section explains the new schema levels one by one

and in section the schema levels are presented all together Finally section lists

the dierent kinds of op erations over schemas necessary to achieve the whole transformation

pro cess from NativeSchemas to User External Schemas

An example

An example previously intro duced in ROSC is used to illustrate the architecture The

Federated Schema is obtained from the integration of two CDB that b elong to an enterprise

of industrial waste transp orts

shipment[P] sh_codewaste_code prod_code rec_code TC

0111 060105 123 321 P class receivers[L]{ waste[P] waste_code waste_name risk[c]precautions[R] TC aggregation_of company_name:strings treatment_type:tr_codes 060105 Nitric Acid 3 Explosive; C class shipment[L]{ Keep cool rec_code:rec_codes aggregation_of ... wa_code:wa_codes class_key rec_code wa_name:strings 060105 Acid - Explosive: R oc:{L,H} risk:integres[H] Keep cool } precaution:strings[H] producer:producers[H] 060105 Liquid- - P class producer[H] { receiver:receivers aggregation_of sh_code:sh_codes company_name:strings customers[P] customer_code customer_type company_name address TC ... pr_code:pr_codes class_key sh_code ... oc:{L,H} 123 P C.I.Q.S.A. Pi, 5, Reus R class_key pr_code } oc:{H}

321 R T.R.I.S.A. Major, 2, Vich P }

Figure Examples of Comp onentSchemas of CDB and CDB

 

In gure we can see the conceptual schema of CDB and some data in it expressed



in the Relational data mo del as well as the conceptual schema of CDB expressed in the



BLOOM data mo del which syntax can b e found in AORS

The ve classes of the Federated Schema showed in gure in the CDM ie BLOOM

as well as the partial ordered set of securitylevels of the FIS and the classication of all com

p onents of the Federated Schema are obtained through data schema integration and security

class f_customers [U] { class f_producers [C] {class f_receivers [U] { alte_graliz_of customers_db1, customers_db2 alte_graliz_of producers_db1, producers_db2 alte_graliz_of receivers_db1, receivers_db2 by db_discr delete_effect propagate by db_discr delete_effect propagate by db_discr delete_effect propagate alte_graliz_of f_producers [C], f_receivers by alte_spaliz_of f_customers by f_customer_type alte_spaliz_of f_customers by f_customer_type f_customer_type delete_effect propagate delete_effect propagate delete_effect propagate aggregation_of aggregation_of aggregation_of db_discr: {db1, db2} db_discr: {db1, db2} [inherited] db_discr: {db1, db2} [inherited] f_cu_code: cu_codes f_prod_code: pr_codes [inherited] f_rec_code: rec_codes [inherited] f_customer_type: strings f_customer_type: strings [inherited] f_customer_type: strings [inherited] f_telephone_number: strings f_telephone_number: strings [inherited] f_telephone_number: strings [inherited] ...... class_key f_cu_code, db_discr class_key f_prod_code, db_discr class_key f_rec_code, db_discr } } }

class f_wastes [U] { class f_shipments [U] { alte_graliz_of wastes_db1, wastes_db2 by alte_graliz_of shipments_db1, shipments_db2 db_discr delete_effect propagate by db_discr delete_effect propagate

aggregation_of aggregation_of db_discr: {db1, db2} db_discr: {db1, db2} f_sh_code: sh_codes f_wa_code: wa_codes f_waste: f_wastes f_risk: integers [S] f_producer: f_producers [C] f_precautions: strings [C] f_receiver: f_receivers ...... class_key f_wa_code, db_discr class_key f_prod_code, db_discr

} }

Figure Example of Federated Schema

p olicies integration pro cesses resp ectively These integration pro cesses are out of the scop e of

this thesis For further information the reader can see GSC data schema integration and

OS security p olicies integration

The parts

In this section each one of the schemas helping on Data Warehousing is dissected As it

is shown in gure all of them are obtained from the Federated Schema which means

that the construction of the DW do es not start from scratch Instead the integration work is assumed as already done

User User Model User External Model User External DW Schema MD Schema DM (8) (8) Model Logical DM Schema Canonical ...... (7) Data External DW Schema External MD Schema Model Extended Multidimensional To Express (6) Time (5) Data Model ... Authorization DW Sch. DW Model Logical DW (4) Schema (3) ODS Data Warehouse Schema Model Logical ODS Schema (2) (1) ... Canonical Federated Schema Data

Model ......

Figure Data Warehousing schemas architecture from the Federated Schema

Multilevel schemas architecture

The Op erational Data Store Schema

Once wehavea Federated Schema the rst thing we could do is to materialize it tag

in gure This means physically storing data to improve query resp onse timesHowthat

materialization is p erformed is completely out of the scop e of this thesis What really matters

is what we obtain with that materialization ie the Op erational Data Store ODS

If we analyze the denition of an ODS we can see that a Federated Schema also satises

the collective integrated op erational needs demanded in I IS Concerning the character

istics of an ODS enumerated in its denition ie sub jectoriented integrated currentvalued

and volatile the federated data is obviously integrated volatile currentvalued and detailed

With regard to sub jectoriented this do es not come from the integration mechanisms but

from the purp ose of who integrates that should cho ose a sub jectoriented schema out of the

multiple p ossibilities that integrate the data sources Thus it is always p ossible to obtain a

sub jectoriented Federated Schema with a small extra design eort

Therefore to obtain an ODS if b etter resp onse times are required wejusthavetophysically

store federated data solving problems related to CDBs interdep endencies p olyinstantiations

etc Its schema will b e exactly one of the Federated Schemas Notice that we could obtain

many dierentFederated Schemas from dierentintegration and negotiation pro cesses Not

all these schemas can b e used for the ODS The implication go es the other way The ODS

schema is one of the p ossible Federated Schemas

The Data Warehouse Schema

The second thing we could do with the Federated Schema is to dene the historic storage

schema needed to supp ort the decisionmaking pro cess A decision ab out whichdataisgoing

to b e stored needs to b e made Weneedtocho ose a set of integrated data that could b e

interesting to analyze Most of the literature seems to suggest the usage of star shap e schemas

at this p oint

The main advantages of star shap e schemas are their simplicityandproximity to the business

analysis concepts It makes them quite easy to b e understo o d by the nal users However even

more imp ortant than that is the fact that they imply a given kind of queries Their structure is

quite concrete and allows to prop ose sp ecic optimizations access paths and storage metho ds

Stars are probably the b est way to study some isolated facts with regard to the desired analysis

dimensions However they are not as go o d at keeping the data of the whole business In the

seven levels architecture the DW is what KRRT calls the Storage Structure and it is

only accessed to solve a small numb er of sp ecic queries As it is outlined in I IS there is

not an homogeneous access pattern in the DW and that is why isolated star shap e schemas do

not t well

We do not wanttohave little knowledge islands but a huge fully connected continent

to travel around Star shap e schemas do not seem semantically rich enough to represent the

business pro cess all in once and accomplish that goal The DW is used to represent the data all

together Thus its strength do es not have to b e in easy querying but in go o d integration and

data semantics representation Precisely b ecause of that the CDM of the federation could b e

the basis for a go o d data mo del for the DW OO mo dels were found as go o d CDM in SCG

therebywe could think of having an OO data mo del for the DW Moreover the imp ortance

of the time dimension in analysis tasks as well as in the DW notice the presence of the words

timevariant and nonvolatile in its denition suggests a temp oral extension of the mo del

The pro cess to obtain the DWSchema in gure tag should not b e querydriven

but datadriven Weneedtocho ose a Federated Schema containing the data of interest

for the analysis and represent time in it It means a semantic enrichmentoftheFederated

Schema along temp oral dimensions reecting new temp oral integrity constraints and security

restrictions It is not as simple as extending the keys OIDs in our case with an elementof

time BFG contains a temp oral extension of an OO data mo del

As in SA we should use two dierent kinds of time Transaction Time and Valid



Time as dened in DGK The storage of the time data enters the system ie Transac

tion Time is mandatory and always p ossible At least two dierent times could b e considered

in this temp oral dimension The rst one would b e the time when the data was intro duced in

the op erational applications and the other one would b e the time of entrance in the DW itself

When talking ab out Valid Time we could consider two dierent times as well The rst

one would b e applied to the ob jects and represented by exactly one continuous time interval

Ab out the other Valid Time represented by a set of noncontiguous disjointintervals it

will b e used to tag the relationships b etween ob jects indicating when they are valid

Transaction Time will always b e presentintheDWbecausewe will always b e able to

register at least the Transaction Time in the DW if the CDBs do not supp ort it while

the Valid Time will completely dep end on its availability in the sources A go o d data mo del

for the DWshouldtakebothinto account and ease their representation Transaction Time

could b e implicit and the Valid Time explicit

In the example for the denition of the DWSchemawe should take another Federated

Schema containing only that data interesting for the analysis ie f telephone number at

customers should not b e in it b ecause it do es not seem interesting to b e analyzed tribute in f

Once wehavethatschema wemust mo dify it to reect the dierent times of interest Classes

producers and f receivers could have an asso ciated Valid Time dep ending on the dates f

of their licenses to handle wastes Each shipment should have a timestamp indicating the day

it takes place Moreover every ob ject would havea Transaction Time saying when it was

intro ducedmo died in the DW and mayb e another one with the entrance date to the CDB

All this do es not mean that the users need to learn any new data mo del later translation

from the canonical mo del to any other user mo del is always p ossible in gure tag

if desired Notice this architecture do es not force us to use an Ob jectOriented Database to

p erform the historic storage either Any kind of system could b e used dening its schema as

in gure tag

Authorization DWSchemas

Authorized access in a DWisscantily studied However the set of data stored in a DW that is

needed to supp ort the decisionmaking pro cess has to b e protected from unauthorized accesses

just as any other information system b ecause data helping to make decisions is probably very

Multilevel schemas architecture

condential

Authorization Schema in gure helps federated databases on data protection b ecause

each Authorization Schema denes the subset of information that a class of usersapplications

can access Authorization DWSchema helps the DW just as Authorization Schema helps

federated databases

The pro cess to obtain an Authorization DWSchema from the DWSchema in gure

tag takes into account the security p olicy of the federation itself The mechanism should

b e similar to that of Authorization Schema Nevertheless it is out of the scop e of this thesis

work

External DW and Multidimensional Schemas

Besides reecting the security asp ects of the DW we can also dene the subsets of data of

interest dep ending on the classes of users andor applications tags and in gure

The external schemas are expressed in the CDM However they can b e translated tag in

gure to any other mo del

At this p oint the strength is not in the data itself but in the needs of the users Here

we will have a querydriven design where what really matters is the vision users have We

could dene by in gure External DWSchemas in the same data mo del of the

DW However if the users havea multidimensional vision of the data we will obtain star shap e

External Multidimensional Schemas by in gure Notice that this transformation

includes two dierent actions On the one hand we are deriving the desired view of data and

on the other hand we are translating the schema to a multidimensional data mo del As we can

see in section this intro duces a mo dication in the architecture to obtain pro cessors that

p erform exactly one task At this moment this mo dication is not included in the discussion

b ecause it is only imp ortantifwe allow several Starsinone External Multidimensional

Schema With only one Star translation and denition of the required multidimensional view can b e easily done in one step

DateD ReceiverD Year Company_name Month Treatment_type Day Shipment F Customer_type Day_of_week Receiver_code ... Volume ... Price Distance Waste D Duration Producer D ... Name Company_name Code Treatment_type Precautions Custumer_type ... Producer_code

...

Figure Example of External Multidimensional Schema

In the industrial wastes example the External Multidimensional Schema depicted in

gure could b e dened from the Authorization DWSchema In this case the analyzers

are interested in the analysis of the shipments dep ending on the date the kind of waste the

pro ducer and the receiver That is for them the Fact is Shipment and the Date Waste

Producer and Receiver are the analysis dimensions If the same analyst needs to study more

than one Fact this schema should contain more than one Star Sometimes this is called a

Star Constellation or Data Warehouse Bus in KRRT

Due to p erformance reasons most of these Stars are materialized represented by in

gure giving rise to Data Marts built with either OLAPROLAP or MOLAP tech

niques However other External Schemas used for or solving some sp oradic

queries would not need to b e materialized nor multidimensional

The whole

User User User Model User External Model User External Model User Schema DW Schema MD Schema DM Model Logical DM Schema Canonical ...... Data ... External DW Schema External MD Schema Model External Schema Multidimensional Canonical Data Model Data ... Model ... Authorization Schema Extended Authorization DW Sch. To Express DW ODS Time Logical DW Model Logical ODS Model Schema Schema ... Data Warehouse Schema Federated Schema

...... Export Schema Export Schema Export Schema

Component Schema Component Schema Component Schema

Native Native Native

Model Native Schema Model Native Schema Model Native Schema

Figure Integrated architecture for FIS and DW

In gure we can see the result of merging the seven levels architecture for federated

databases in gure and the schema levels for Data Warehousing in gure

It is imp ortant to notice the lo cation of the DWSchemaIfwe assume that the presence

of a pro cessor p erforming changes either in data mo del in semantics or in UoD forces the

app earance of a new level the DW should b e placed in b etween the Federated Schema and the

Authorization Schemas However the DWSchema is at the same level than the Federated

Schema b ecause they are equally imp ortant If the Exp ort Schemas were expressed in a

temp oral CDM and we had an integration pro cessor for it then Federated Schema and DW

Schema would collapse into a single schema

Multilevel schemas architecture

On the other hand the double storage system DWDM should not b e avoided The DWis

datadriven designed and will contain data that may not b e sure that will some day b e useful

whichworsen p erformance The DM is querydriven designed oriented to optimize resp onse

times Thus what we will likelyhave is a temp oral or Relational database supp orting time

incrementally designed and p opulated as data is generated From this huge central DW

we can dene and feed smaller DMs in asneeded basis Notice that a metho dology is not

suggested but just an architecture Dening a metho dology is absolutely out of the scop e of

this thesis and the architecture do es not imp ose it

Op erations on schemas

At this p oint it is imp ortant to mention how the transformations b etween levels are p erformed

in the architecture Dierent kinds of op erations in the CDM are necessary to p erform the

following functions



Conforming op erations like those in RAO to transform the Exp ort Schema of one

DB to a form more suitable for integration intoaFederated schema These op erations

are also useful in other contexts in particular to derive external schemas ie views

Generalizing classes from dierent DBs to a sup erclass in a Federated Schema The

schema integration pro cess which pro duces a Federated Schema from several Exp ort

schemas can b e considered as a twostep pro cess rst conforming op erations change

the form of the Exp ort Schemas into a common form and then these are generalized

Discriminated generalization is preferred b ecause of the reasons explained in GSC in

particular the supp ort of multiple semantics as in SL and no loss of information

b ecause each virtual ob ject in a Federated Schema is given a tag ie discriminant

showing from which CDB it comes from

Ob ject Identication Function OIF to assert when an ob ject O in one DB represents



the same real world ob ject as an ob ject O in another DB Dierent users may use dierent



OIFs as explained in SR

Collapse two ob jects into one using a particular OIF GCS If all users share the same

OIF for a federated class or if integrity constraints among the CDBs interdep endencies

must b e enforced then the collapsing op eration may take place during the pro cess of

schema integration otherwise the derivation of each External Schema may collapse

using a dierentOIF

Dealing with value discrepancies preserving all values byhaving multivalued attributes

in Federated Schemas External Schemas may use dierent options such as giving

preference to the value coming from a particular DB shown by its discriminant or by

aggregation by reduction op erations sum average maximum etc as in SR

Protecting securityby hiding relationships b etween abstractions that could reveal con

dential information

Transform into a multidimensional data mo del the structures of the DW data mo del OO

mo dels are preferred as discussed in section

Drilling across semantically related Stars

Quite often dierent data cub es in a business mo del are found closely related and analysts

want to jump from one to another They use to b e interested in generating several rep orts

showing dierent sets of data organized from the same p oint of view so that they are easily

comparable For instance it could b e interesting to analyze evolution of sales along Time and

Product and compare it with pro duction in the same p erio d of time for the same pro duct

Probably that information will b e stored in dierent DMs and users will need to drill across

them The aim of this section is to dig into the applicabilityof Drillacross by studying how

amultidimensional schema could contain several related Stars even if the data cub es are

physically stored in dierentDMs

The structure of the section is as follows section presents some work on relating

dierent star schemas section shows a general approach to the problem and presents

a mo dication to the schemas architecture in previous section to b enet from relationships

between Stars section exemplies dierent kinds of relationships found b etween Stars

nally section contains a discussion ab out the relationships found

Drillacross in the literature

Lately there has b een a lot of work ab out OLAP to ols We can nd literature devoted to sp ecic

storage techniques and access mechanisms as well as to pure multidimensional mo deling Both

areas b enet from the duality factdimension and restrict their studies to isolated stars ie

howwe can storemo del one fact and its surrounding analysis dimensions Nevertheless some

authors have already p ointed out the imp ortance of drilling across dierent data cub es which

means navigating through data in dierent star shap e schemas Unfortunately the Drill

across op eration in these mo dels is limited to the case that some analysis dimensions are

shared by the data cub es

Kim prop oses a logical mo del to implement star schemas on Relational databases Each

star schema contains a central fact table related by foreign keys to its corresp onding dimen

sion tables In order to supp ort Drillacross Ralph Kimball contends that all constraints

on Dimension attributes must evaluate to exactly the same set of Dimension instances in

b oth schemas This is clearly satised if b oth Dimensions are exactly the same However

he also explains how this matching can b e satised for instance if the only dierence b etween

the Dimensions is their granularity ie the nest detail level they allow

A later work GLK presents multistar schemas obtained by normalization of fact

tables while PJdenesamultidimensional ob ject family as multidimensional ob jects

p ossibly with shared sub dimensions ie subsets of levels in the aggregation hierarchy of the

analysis dimension MK go es a little further and distinguishes three kinds of schemas with

more than one star ie constellation galaxy and star cluster A constellation schema

Multilevel schemas architecture

consists of a set of star shap e schemas with hierarchically linked fact tables A galaxy is

a collection of star schemas with shared Dimensions Finally a star cluster is a set of star

schemas sharing sub dimensions Gio denes constellation as well In this case it is a

set of stars sharing Dimensions The shared Dimensionsmust b e conforming dimensions

which means their values are consistent among the stars Even though these mo dels allow

several Starsinaschema they do not study relationships b etween them

Regarding multistar architectures KRRT suggests a Data Warehouse Bus Architec

ture which oers wiredimensions where facts can b e plugged Dimensions tables are con

formed in order to b e shared by fact tables This is a simple solution to the problem of

integrating DMs which helps to develop DMs at dierent times by dierentwork teams

Multistar conceptual schemas

DD Customer Clerk

D F D Product ProductSale Time

D D

Promotion Store

Figure Example of multidimensional schema

A Fact is a sub ject of analysis It could contain dierent kinds of cells we will call them

Cells which in turn could contain dierent Measuresthatwewant to analyze Each data

cell is identied bya point in each of its analysis dimensions These p oints may corresp ond

to dierent granularities for every CellFor instance in the example in gure we are

interested in the analysis of the Fact ProductSale with regard to the pro duct whichwas sold

the time when was sold the customer whom was sold the clerk who sold it the promotion

that aects it and the store where was sold ProductSale would contain dierent Cells if

some Measureswere not available at Day granularity Moreover other Measures could not b e

interesting for every Customer but only for CustomerProfiles Thus there would b e dierent

kinds of cells dep ending on whether Measures are available or meaningful for either Day or

Month Customer or CustomerProfile Facts are deeply studied in section

A Dimension is a connected directed graph representing a p oint of view on analyzing

data Every vertex in the graph corresp onds to an aggregation Level and an edge reects that

every instance at target Level decomp oses into a collection of instances of source Level ie

edges reect partwhole relationships b etween instances of Levels An in depth explanation of

Dimensions is in section Each Level corresp onds to a granularityinthe Dimensionand

has attributes that allow to select some of its instances By selecting p oints in every analysis

dimension wecho ose the data cells of interest in our analysis Thus Dimensionscontain

those data that identify Cells instances

If users are only interested in a given Cell they just need to access a Fact However

sometimes they could desire to relate data in dierent Stars and OLAP to ols should allow

it We can understand Drillacross as reusing the same condition over the Dimensionson

querying dierent Facts This means that we select a subspace in a given Cub e and wantto

view the corresp onding space in a dierent FactAt a rst glance this can b e allowed if b oth

Stars share some Dimensions However it is also p ossible if Dimensions are not exactly

the same but exists some semantic relationship b etween the Dimensions andor Facts in the

Stars

Level L L L L

Cell C D Fact F L D Dimension D L D C F L L L D D C C

L L D F D D F D D D D L L L L

D D L L L

D D D F D F D D D L D L D D D F L C L

L L D L L L

L

Figure Multistar diagram

Existing multidimensional mo dels do not consider semantic domains ie domains reect

ing the conceptualization of values in the mind of the designer Actually the analysis dimensions

used on drilling across need not exactly coincide in b oth Stars They should just b e dened on

a related semantic domain Thus Drillacross between dierent Facts is p erformed thanks

to semantic relationships b etween Stars like those drawn with thick lines in gure Two

Stars can b e related whether their Facts or their Dimensions are We can see four kinds

of semantic relationships in this section ie Derivation Generalization AssociationandFlow

in UML terminology as dened in OMGb

Multidimensional data need an integrated access to b e able to drill across through inter

stellar relationships In this sense the architecture presented in section needs to b e claried

For the sake of simplicity let us leave aside the Authorization DWSchemaIfwe are dealing

with isolated Stars from the DWSchema data is selected and translated to the multidi

mensional mo del so that we obtain an External Multidimensional Schema at conceptual

level Then this is represented at logical level and implemented in a DM

Unfortunately DMs do not take under consideration multistar schemas Moreover to

improve p erformance or due to management reasons dierent Stars could b e implemented on

dierentDMsThus we should b etter use a level schema architecture like that shown in

gure based on the ANSISPARC architecture in gure The architecture facilitates

the usage of semantic relationships at conceptual level while allows to store indep endent Stars

Multilevel schemas architecture

External Schema

Conceptual Schema

Internal Schema



Figure ANSISPARC database schemas architecture BFJ

in dierent DMs Other levels could also b e added to the architecture as in SL if wewere

dealing with heterogeneous DMs and multidimensional mo dels

External MD Schema 1 ... External MD Schema n

DW Schema Corporate MD Schema

DM Schema 1... DM Schema m

Figure levels multidimensional schemas architecture

The only Corp orate Multidimensional Schema CMDS contains all data in the indi

vidual DMs related byinterstellar semantic relationships as sketched in gure like those



oered by YAM Thus the CMDS would b e obtained from the DWSchema by means of

translation op erations b etween the DW mo del and the multidimensional mo del No selection

would b e necessary b ecause we can later on cho ose which instances are stored in the DMs and

which views we oer to the users Problems asso ciated to the obtaining of the DWSchema

are out of the scop e of this section Just to notice that the p ossibilityofintegrating the DMs if

they were not dened from a common DW or at least develop ed based on a common plan will

be much harder As p ointed out in I IS building indep endent DMs directly from op erational

applications is a p o or idea In this architecture DMs are obtained by translating the DWtoa

multidimensional mo del and then storing the Cub es of the dierent Stars found in the DMs

At the b ottom level of the architecture wehave the dierent DM Schemas which could b e

stored in either Relational OLAP ROLAP Multidimensional OLAP MOLAP Ob ject

Oriented OLAP OLAP or any other kind of multidimensional system These DMs optimize

accesses to multidimensional data and it do es not matter whether they were indep endently

dened or built from a huge common DW b ecause we still have a common view of all them at

conceptual level which ensures conformation of data and integrated access DM Schema in i

gure represents the logical schemas of the dierent DMs in the data mo del corresp onding

to the chosen multidimensional system

At the top level we can dene External Multidimensional Schemas EMDS to cover

the needs of dierent users or groups of users These subsets of information would contain

dierentinterrelated Stars that users could successively visit Furthermore since dierent

p eople may view things in a dierentway these schemas oer the p ossibility to rename concepts

or even dene some derived data that was not physically stored in the DMs View denition

mechanisms should b e adapted here to a multidimensional data mo del

User User User Model User External Model User External Model User External Schema DW Schema MD Schema

Canonical Canonical Data Data ...... External DW Schema External MD Schema Model External Schema Model Extended To Express Multidimensional Time Data Model ...... Authorization Schema Authorization DW Sch. Authorization MD Sch.

... Data Warehouse Schema Corporate MD Schema Federated Schema

ODS DW DM Model Logical ODS Model Logical DW Model Logical DM Schema ...... Schema Schema Export Schema Export Schema

Component Schema Component Schema

Native Native

Model Native Schema Model Native Schema

Figure Integrated architecture for FIS DW and multistar schemas

Thus architecture in gure needs to b e mo died in order to supp ort the denition of a

corp orate schema with several Stars so that from it external views can b e dened By doing

so we obtain that every pro cessor b etween twoschema levels p erforms only one task As shown

in gure rstlythe Data Warehouse Schema is translated to a multidimensional mo del

which should allowmultistar schemas Later on the desired multidimensional view would

b e dened from the Authorization MD Schema On doing this we could cho ose the subset

of Stars and semantic relationships connecting them that are of interest for a set of users

Interstellar semantic relationships

We are interested in providing multidimensional schemas with more than one FactHowever

having several Facts in the same schema is absolutely useless if it contains isolated Stars

Facts need to b e related in some way to allow DrillacrossSomemultidimensional mo dels

and most OLAP to ols allow this op eration if two Stars share some Dimensions However

purely sharing Dimensions do es not seem enough at conceptual level where we could nd

Multilevel schemas architecture

much more meaningful semantic relationships among data

In this section we are going to deep into the capture of semantic relationships b etween

dierent Stars that allownavigation through them Firstly p ossible relationships b etween

two Dimensions are shown afterwards relationships b etween two Facts are exemplied and

nallyhowa Fact can b e related to a Dimension in another Starorviceversa is explained

DimensionDimension

KRRT Gio as well as some multidimensional to ols stress the imp ortance of having con

forming dimensions In order to drill across dierent Stars the corresp onding schemas must

share analysis dimensions so that their instances exactly coincide This p osition unnecessarily

restricts the usage of Drillacross Actually it is just needed that the selected instances of

the Dimensions of the origin Star determine instances in the Dimensions of the destination

StarThus domains used in b oth Dimensionsmust b e related in some waybut Dimensions

could still b e absolutely dierent We are going to see four kinds of OO relationships b etween

Dimensions that allow to drill across their Stars

Derivation Firstlywe could nd that the same concept has dierent names dep ending on

the sub ject Therefore the same Dimension with exactly the same instances will need

a dierent name dep ending on the context where we are going to use it Moreover this

Dimension could not play the same role for dierent Facts Productmay b e considered

RawMaterial in a dierentcontext It is not enough to say they are synonyms likein

Kim b ecause they could even have dierent attributes of interest to the users For

example keeping or studying the benefit of raw material can b e meaningless Moreover

we could nd that elements in a Dimension are dierent from those in another one even

though they represent the same concepts For instance a given sub ject implies that Red

Blueand Yellow are the instances in Color domain while in a dierent case weneed

to distinguish dierent kinds of Bluelike Dark Blue or Light Blue It is also p ossible

to nd dierences in how concepts are co died ex letters or numb ers

Sometimes some Dimension instances in a Star are only considered group ed in another

Star b ecause of lackofinterest in the individuals condentiality issues or space problems

to keep information at maximum detail In Kim a Dimension whose ner granularity

is not of interest is called Demographic minidimension Thus we could use the same

Dimension in two star schemas at dierent aggregation levels For instance one of them

could keep data by hour while the other do es it byday Clearly all wehave to do to drill

across them is roll data at Hour up to Day In this way b oth Cells will b e dened over

the same kind of dimensional instances Hence they will b e comparable

If two Dimensions coincide in one of their aggregation levels b oth can b e conceived again

as derived from a common more general Dimension whichcontains their hierarchies

Figure shows howthe Time Dimension is included in a more general Dimen

sion CorporateTimeDimension Time do es neither provide Hour nor Week Aggregation

Levels b ecause they are not of interest for the ProductSale Star where it is used

Time

Hour Day Month Year

Week

CorporateTimeDimension

Figure Example of containmentof Dimensions

In terms of OO considering Derivability or Pointofview would allow us to reect

that two Dimensions are derived b oth from a common concept in spite of the fact that

they lo ok dierent like in the mentioned examples Thus we nd that two Dimensions

can b e related by Derivation and users can drill across from a Fact to another one

through it The Dimension is not shared b ecause it app ears dierentineach Star

However by means of this kind of relationship we can oer users the desired view while

they are still able to drill across

Generalization We can also nd relationships b etween analysis dimensions along General

izationSp ecialization also known as Sup erclassSubclass Dimensions of dierent

Stars could b e related by GeneralizationsothatDrillacross would b e allowed For

instance Customer and Clerk are b oth sub classes of People Therefore we could travel

from a Star with information ab out Customer to another one with information ab out

Clerk if the sets of instances of b oth Dimensions are not disjoint Moreover they will

have in common all those attributes in the sup erclass and mayb e some aggregation levels

People Dimension specialization AgeGroup Level specialization All Aggregation Person SaleRole

{SaleRol="Clerk"}

Clerk

Clerk All

Figure Example of Generalization between Dimensions

As outlined in Gio using sup erclasses and sub classes in star schemas we gain b et

ter understanding Consequences of two Dimensions b eing related by sp ecialization are

studied in section There we can see that as exemplied in gure we should

sp eak of sp ecializing a Dimension at a Level rather that just sp ecializing a Dimen

sionIfwe sp ecialize People Dimension at SaleRole Level solid arrow to get Clerk

Dimension this sp ecialization contains a Level ie Clerk with instances corresp ond

Multilevel schemas architecture

ing to p eople acting as clerk and another one with only one instance representing the set

of all clerks ie All Dashed arrows showthataLevel is sp ecialization of another one

By a similar sp ecialization of Peoplewe could obtain Customer Dimension

In the example AgeGroup aggregation level is not of interest in Clerk Dimension No

tice that if it woulditwould not b e sp ecialization of the homonym Level in People

Dimension since its instances would represent dierent sets of p eople For instance

not everybody between twenty and thirtyyears is a clerk Therefore the instance corre

sp onding to this age group in People Dimension would represent more p eople than the

instance representing the same age group in Clerk

Sharing Dimensionswould not b e enough in this case either Clerk has more attributes

and much less instances than Customer Therefore sharing People Dimension in two

Starswould generate lots of undesirable null values Nevertheless we can still drill across

if instances of Clerk can also b e instances of Customer

Association It is p ossible to have asso ciated analysis dimensions as well The domain of

a Dimension could b e used as an attribute domain in another Dimension Selected

instances in a Dimension would allowtoidentify instances in the other so that it is

p ossible to drill across the corresp onding Facts Clerks use to b e assigned to stores

Thus Clerk would b e asso ciated with Store Dimension mayb e multivalued This is

not what is called outrigger table in Kim That is at Logical level and refers to

normalization In this case it is not normalizing at all but showing that two dierent

analysis dimensions are semantically related

We can also nd stronger asso ciations b etween analysis dimensions if we join more than

one to give rise to another This is not a simple asso ciation b ecause if we remove one of the

aggregated Dimensions we lo ose the aggregate one For example Color Dimension

could b e used to dene ColoredProduct Disso ciating Colors from ColoredProduct

means we do not have colored pro ducts any more

Blue Red Yellow Cars X Red Cars X Trucks X Blue Trucks X Tractors X Yellow Tractors X 3/9 meaningful points 3/3 meaningful points

Two dimensions One dimension

Figure Example of correlated Dimensions

If ColoredProduct would b e represented as two separate Dimensions ie Color and

Product all combinations of colorpro duct would b e allowed Sometimes this could b e

the case but other times pro ducts are only available for a reduced set of colors if b oth

analysis dimensions are correlated as exemplied in gure Therefore it is much

b etter for the designer to reduce the analysis space only to those meaningful values by

mo deling all of them in just one Dimension In the gure wehave six meaningless values

out of nine p ossibilities Thus it should b e b etter to mo del it as only one Dimension

where all three values are meaningful

Aggregation Product ColoredProduct Product Family All Kind Product Family Kind Colors Colored All Product Color Range

Color Range All

Figure Example of Aggregation between Dimensions

Whether it is mo deled as two indep endent Dimensions or only one will dep end on the

distribution of the cells related to the Dimension instances and the user p oint of view

Thus we can nd Color and Product in a Star and ColoredProduct in another one and

a user should b e able to navigate from one to another through Aggregation relationships as

shown in gure Relating Dimensionsby Aggregation also has consequences in the

aggregation hierarchies Figure shows how the hierarchy of the aggregate Dimension

contains the subgraphs of those hierarchies in the Dimensionswe are aggregating as it is

explained in section However it is imp ortant to notice that the elements at homonym

Levels in dierent Dimensions in this case do not coincide For instance Color Level

in Colors Dimension contains elements representing colors Nevertheless Color Level

in ColoredProduct Dimension contains elements representing sets of colored pro ducts

group ed by color

Again this oers navigation p ossibilities that just sharing Dimensions do not oer A

psychological study could include Color Dimension ColoredProduct cannot b e shared

with the Star corresp onding to that study b ecause it was ab out colors and did not

considered pro ducts at all However analysts will probably b e interested on navigating

from one Star to the other Thus navigation should b e allowed through the Association

between Color and ColoredProduct Dimensions

Flow Because of the long p erio ds of interest in analysis tasks and how fast business change

nowadays it is exp ected that analysis dimensions in our multidimensional schema evolve

Due to the imp ortance of time it is not acceptable to throwawayold Dimensions Old

data would still b e stored following the old schema while we are currently using a new

one

As our business grows it could b ecome international so that a new Level Country

will app ear in the Store DimensionAttributes could also app ear or disapp ear in any

Dimension as the information systems and the enterprise evolve We should not study

data with regard to those attributes b ecause their values at the time data was collected

are unknown Therefore old Dimensionsshouldbekept as they were but related to

the corresp onding new Dimensionsby Flow relationships This will show to the user

which dimensional data can b e used at each analysis dep ending on the p erio d of time

Multilevel schemas architecture

heshe is interested on As studied in EK functions can b e oered to the users to

estimate values of dimension attributes in the p erio ds of time they are unknown Anyway

the schema should reect the dierence to let users know whether the values are real or

estimated

Sharing is not p ossible in this case It would mean studying old data with regard to new

attributes or vice versa In any case it could generate absolutely wrong results

FactFact

Points in the multidimensional space are always identied by its analysis dimensions However

using those Dimensionsisnotalways necessary to select a set of p oints Having functions

from p oints in a space to p oints in another one is another p ossibility Therefore if weidentify

a set of cells in a Factwe will b e able to select the corresp onding set in a related Fact This

means that we can also use relationships b etween Factstonavigate The relationships b etween

the structures of two related Facts are studied in section

Derivation MeasuresinoneFact could b e obtained by applying some op eration to Mea

sures in other Facts For instance on analyzing eciency of employees some Measures

could b e obtained by op erating the b enets of some pro ducts sold the b est sales sales

involving relevant pro ducts etc Most Dimensions will b e likely shared by b oth Stars

ie EmployeeEfficiencyand ProductSale However we could also travel b etween

them due to the fact that data in some cells are obtained by pro cessing other cells We

could navigate from data in EmployeeEfficiency to the data in ProductSale used in

their calculation This do es not corresp ond to Drilldown b ecause b oth Facts rep

resent dierent sub jects and selection of Cell instances is not p erformed by means of

aggregation hierarchies

Association A Fact in a Star can b e asso ciated with Facts in another StarFor instance a

Deal is comp osed byseveral individual ProductSale Notice that MeasuresofDeal are

not necessarily obtained from those of ProductSale for instance discount in the deal

Thus if we are studying a set of sales it can b e interesting to see data corresp onding to

deals in whichtheywere done Coincidences or dierences in Dimensions do not matter

We should b e able to travel from a Star to another one just b ecause the Association

relationship b etween the Facts

Generalization Some Factsdonothave exactly the same Measures nor asso ciated Di

mensions but still are closely related For instance ProductSale can b e seen as an

sp ecialization of Contract Since a sale is a kind of contract it will have its sp ecic

Measures and Dimensions In turn as it is shown in gure ProductSale could

b e sp ecialized into CashSale or CreditSale dep ending on howitispaidWewillhave

dierent information for each of the sp ecializations for example numb er of credit card

Analysis dimensions are inherited from the sup erclass but others could b e added like

Bank Users should b e allowed to navigate through dierent Stars just b ecause their

D D Specialization Customer Clerk

D F D Product ProductSale Time

D D Promotion Store

FD

CreditSale Bank

Figure Example of Generalization between Facts

Cells are sp ecialization one of another Usually they will also share most analysis di

mensions but sharing them is not needed in this case to drill across since Fact domains

are subset one of the other

Flow New Measures could app ear p ossibly replacing others precision of the measurements

could also b e improved or even worsened new interesting analysis dimensions could b e

found b esides or instead of the already existing ones and so on Even if the sub ject

stays how information is captured can evolve Data sources measurement instruments

or calculation algorithms are probably going to change and these changes should b e

reected in our mo del by means of Flow relationships b etween Facts All this is not

reected by just relating our FactstoTime Dimension since we actually have dierent

Cell structures One daywe start recording discountchecks in ProductSale hence we

need to keep b oth incomes ie cash and discountchecks From this day on we should

have dierent Starscontaining data ab out the same kind of facts b efore and after the

acceptance of the checks b ecause the Cell structure changed An analyst would b e able

to relate those data by means of Flow relationships b etween Facts At query time this

relationships would allowtoprovide conversion functions b etween dierentversions of

data like in EK or just showawarning to the analysts However implementation

issues are completely out of the scop e of this thesis

None of these relationships can b e reected by sharing Dimensions Instead they show

corresp ondences among factual information which is also imp ortanttonavigate We can go

from a Star to another indep endently of Dimensions if cells in the rst determine cells in

the other

FactDimension

The last p ossibilitytonavigate through dierent Stars is that the Fact in one of them is used as

Dimension in the other or vice versa This should not b e prop erly regarded as Drillacross

Multilevel schemas architecture

since we do not want to analyze data in two Stars from the same p oint of view Rather we are

using results of querying a Star to query a dierent one For instance some p eople could b e

interested in the analysis of promotions Thus the promotions selected by studying Promotion

Fact can b e used as Dimension to study ProductSaleAFact is not conceived to b e used

as Dimension so that it will not exactly coincide in b oth star schemas

D D Association Customer Clerk Derivation

D F D Product ProductSale Time

F D D

Promotion /Promotion Store

Figure Example of Association Derivation between Fact and Dimension

Derivation A Dimension can b e obtained byderivingitfromaFact The name can b e

changed some attributes added or removed others recalculated some instances selected

etc in order to adapt it to its new usage Facts use to havemuch more instances

than Dimensions Nevertheless by grouping them we could obtain coarser aggregation

levels of interest Measures use to b e numerical while attributes in Dimensions use

to b e descriptive Therefore Derivation exemplied in gure will probably imply

achange in the kind of attributes Promotion Fact could haveanumerical attribute

benefits while Promotion Dimension would haveanenumerated derived attribute

success instead Thus users can drill across through the relationship b etween a Fact

and a Dimension even if they do not coincide but there exists a Derivation between

them

Association Instances of a Fact could also b e asso ciated with those of a Dimensionor

vice versa For instance some pro ducts can b e aected by promotions Thus Product

Dimension could have an attribute dened on domain Promotion Fact ie asso ciation

arrow in gure Notice the dierence b etween that and relating the Promotion to

another Fact ie deriving a Dimension and using it to analyze ProductSale The

latter would mean that a sale was p erformed during a promotion while the former would

show all promotions that have b een applied to a kind of pro duct

Discussion

Table sums up the kinds of Relationship s found of interest b etween dierent Stars What

could attract attention is that neither GeneralizationnorFlow relationships b etween a Fact

and a Dimension were considered This is b ecause a temp oral transformation can convert

Relationships DD FF FD DF

p p p

Derivation 

p p

Generalization  

p p p p

Association

p p

Flow  

Table Summary table of relationships b etween FactsandDimensions

neither Factsinto Dimensions nor vice versa and they are dierent enough not to b e related

by GeneralizationIfwewant to obtain one from the other wemust derive it Nevertheless

factual information cannot b e derived from dimensional data That is b ecause Facts represent

measurements while Dimensionsshowgiven information

Drillacross implies the usage of the analysis framework that we are using for a given

Fact on analyzing a dierent one That is to study dierent data at the same granularityand

constrained by the same conditions over the analysis dimensions Other authors restrict that to

Stars that share Dimensions However wehave seen in previous section that there are four

relationships b etween Dimensions that would also allow it If two Dimensions are related by

Generalizationor Flow they will b e dierent However there will b e a onetoone relationship

between their instances so that Drillacross can b e p erformed If the Dimensions are related

by DerivationorAssociation instances do not coincide but an instance of a Dimension

determines instances in the other

Actually it is not necessary the Dimensions in the destination Star to b e related to those

in the origin It could b e that selected cells in the latter determine a set of cells in the former

This is the case if b oth Facts are related in some way Thus we just need to substitute

Measures of one cell by those of its counterpart in the other Fact In this waywe are also

able to study data in two dierent Stars whose Facts are related

Relationships b etween a Fact and a Dimension do not allowproper DrillacrossHow

ever if selected cells determine a set of p oints in a Dimension these can b e used in the analysis

of another Fact

Kim p oints out that it rarely makes sense to restrict simultaneously two Dimensionsin

the same Star by the same condition Likely it is senseless to apply the same constrainttotwo

sets of instances over dierent semantic domains However Kimball also explains how an anal

ysis dimension can play dierent roles in the same Star for instance People acting as Clerk

or Customer in the example If so it could b e constrained for b oth roles at once Furthermore

if wewould havetwo semantically related Dimensions in the same Star b oth could also b e

constrained at the same time b ecause they would b e typ ecompatible in one way or another

For example we could study clerks that are our customers Therefore the relationships found

in last section should not only b e considered to Drillacross but also for op erations that in

volve isolated star schemas ie Dice A condition could b e simultaneously applied to several

Dimensions dened over the same semantic domain like Clerk and Customer Moreover

Multilevel schemas architecture

instances of a Dimension can b e selected by selecting instances of a related one

Conclusions

The rst part of this chapter paid sp ecial attention to Data Warehousing schemas architec

ture and their conceptual design The knowledge in the FIS eld has b een used Doing this

allowed us to consider the integration work as already done Besides it invited to consider

some problems regarding data schemas and data protection from a dierentpoint of view

By lo cating the dierent Data Warehousing schemas in an architecture for FIS an in

tegrated architecture that comprises b oth areas has b een obtained The Data Warehousing

terminology used by other authors was placed in that architecture The characteristics of the

dierentschemas as well as the functions they realize have also b een emphasized The DW

Schema has b een presented as the result of a datadriven design while the DM Schemas

result from a querydriven design

Along the second half of the chapter relationships b etween star schemas have b een describ ed

How dierent Stars can b e related by Derivation Generalization Associationoreven Flow re

lationships in UML terminology has b een explained The usage of those relationships b etween

analysis dimensions and the dierent kinds of cells to navigate or Drillacross between Stars

has b een shown Moreover these relationships could also b e used in other multidimensional

op erations like Dice multidimensional op erations are explained in section

These relationships are not only useful for analysts but also for designers It has b een exem

plied that semantic relationships b etween Dimensionshave nice consequences for aggregation

hierarchies inside Dimensions since two related Dimensionscontain related aggregation hier

archies The same stands for dierent Cellsinrelated Facts Therefore relationships b etween

Stars can also b e used to drive designers work so that they can detect inconsistences and

errors Conformed Starswould also drive users to the usage of data in a uniform way

Elements of a multidimensional model

Chapter

Elements of a multidimensional

mo del

You mentioned your name as if I should recognize it but b eyond the obvious

facts that you are a bachelor a solicitor a freemason and an asthmatic I know

nothing whatever ab out you

Sherlo ck Holmes The Norwo o d Builder

In this chapter the dierent elements of a multidimensional mo del are studied in order to

know things ab out FactsandDimensions From some basic denitions and general concepts

their characteristics are deduced

Multidimensional information can b e shown at dierent aggregation levels often called

granularities for each analysis dimension Thus in the rst half of this chapter ie section

the b enets of understanding the relationships b etween aggregation levels as partwhole

relationships and how it helps to address some semantic problems are outlined Moreover

the consequences of the incorp oration of other Ob jectOriented constructs in the hierarchies of

analysis dimensions is analyzed

In the second half of this chapter ie section the meaning of Facts and the dep enden

cies in multidimensional data is studied This study is used to nd relationships b etween data

cub es in an Ob jectOriented framework

Analysis dimensions

This section is devoted to investigate problems regarding the representation of analysis dimen

sions and their aggregation hierarchies at conceptual level The stress is on howtosolve those

problems by showing aggregation semantics and navigation paths along the analysis dimensions

The imp ortance of semantically rich relationships and their usage in conceptual mo deling is out

lined in Sto A rst approachtohowmultidimensional mo deling could b enet from OO

semantics has already b een shown in section

Most of those mo dels mentioned in section provide some way to represent aggregation

hierarchies Nevertheless those pap ers treat the semantics of conceptual mo deling constructs

rather sup ercially often just p ointing to a general idea

Section discusses b enets of expliciting analysis dimensions Then from some well

identied semantic problems enumerated in section the usage of certain mo deling ab

stractions to solve them is studied These problems are addressed from an OO p ointofview

in section Sp ecically the usage of Association Aggregation and Generalization relation

ships is analyzed

The imp ortance of aggregation hierarchies

Time TimeId Time Store City State Day TimeId StoreId CityId StateId Month Store Day Size StateId ... Year StoreId MonthId CityId ...... Size WeekId ... Sales ... Sales Customer City TimeId TimeId State Customer Color CustomerId LocationId ... LocationId CustomerId CustomerId ColorId ZipCode CustomerId ...... ProductId Colored ZipCode ProductId Colored ... Product ClerkId Product ClerkId Amount Amount ProductId ... ProductId ... ColorId Clerk ColorId Clerk KindId Kind Family ClerkId Weight ... ClerkId KindId FamilyId Contract ...... Contract FamilyId ...... Weight

...

a Star schema b Snowakeschema

Figure Example of normalization of analysis dimensions

Kim as well as other authors like Gio argue that snowaking dimension tables

which implies a higher level of normalization as shown in gure is a serious mistake

except for a reduced set of sp ecic cases From their p oint of view even though it saves

some negligible storage space it intimidates users by unnecessarily complicating the schema

and slowdown most forms of browsing among dimensional attributes joins are slower and less

intuitive than selections That normalization would also explicit aggregation hierarchies These

hierarchies would showhow Measures can b e summarized known as rollup or decomp osed

known as drilldown Nevertheless they argue that the hierarchies are necessary neither to

rollup nor to drilldown since these are implicit in attribute values

However some p eople disagree with those ideas see PJorLAW for instance and

contend that aggregation hierarchies should b e explicit since they provide basis for dening

aggregate data and shownavigation paths in analysis tasks HSpresents a description

logics mo del which describ es aggregation hierarchies as partially ordered sets with partwhole

Elements of a multidimensional model

relationship b eing their strict order In TBC a multidimensional mo del which allows the

usage of sp ecialization aggregation and memb ership relationships is prop osed The

authors claim that Dimensions are usually governed by asso ciations of typ e membership

forming hierarchies that sp ecify granularities TPG also used Ob jectOriented concepts to

mo del Dimensions Sp ecically Associations dene a directed acyclic graph b etween aggre

gation levels and Generalization represents categorization of aggregation levels allowing to

dene additional features of the subtyp es There are also some pap ers sp ecically related to

aggregation hierarchies in analysis dimensions like JLS and PR

Actually the context makes the dierence If we are at a logical or physical design phase as

in Kim it is p ossible to obtain b etter p erformance or understandabilityby denormalizing

some tables However at a conceptual level wemust represent aggregation paths b esides their

dierent semantics If this puts obstacles in the way of nonexp ert users understanding schemas

the user interface can hide as much information as necessary to make it understandable to a

given user Performance problems of the system will b e addressed at further design phases ie

logical and physical

As already stated in literature it is imp ortant to separate conceptual and physical comp o

nents Logical or physical mo dels are semantically p o orer than conceptual ones Conceptual

mo dels are very imp ortant b ecause they give to the user much more information ab out the

mo deled reality and are closer to hisher way of thinking This is sp ecially necessary in analysis

tasks b ecause of the unpredictable nature of user queries in these environments This kind of

users can not b e restricted to a small set of predened queries Indeed they need to generate

their own queries most of times based on metadata Thus it is essential for a conceptual

mo del to provide means to show aggregation hierarchies and as much semantics as p ossible

For instance showing that two analysis dimensions are sp ecialization of another one means

that their instances for example customers and clerks can b e compared

Semantic problems in presentmultidimensional mo deling

This section outlines some problems found in existing multidimensional mo dels Some of them

were already identied in SR Leh and PJ Even though SR can b e considered

as out of place most of the problems it identies in statistical mo deling are also applicable in

the multidimensional context The problems related to mo deling Dimensions are group ed

into ve sections

Aggregation levels graph

At rst glance one could think that aggregation levels graphs are quite simple Data ab out

stores is aggregated based on the city they b elong to data ab out cities is aggregated based

on the state they b elong to and so on Although the aggregation hierarchy lo oks linear and

simple it simply suces to lo ok at the ColoredProduct Dimension to nd that pro ducts can

b e aggregated either by color or kind We can see other examples of multiple aggregation paths

in Tho

Some OLAP to ols imp ose the constraint that an aggregation graph must b e connected

and show parentchild relationships b etween attributes LAW imp oses the existence of a

common top aggregation level called All dening a lattice of aggregation levels for every

analysis dimension and identies relationships b etween Levels as functional dep endencies

PJ also identies multiple aggregation paths in the same Dimension and presents the

dierent aggregation levels forming a lattice b eing related by greater than relationships

meaning logical containment of the elements at one level into those at the other It could

also b e the case that our information sources feeding the DW collect data at MonthandWeek

Level but not at Day Level Therefore we could dene a common aggregation top but not

a common b ottom for b oth aggregation paths

There is no justication in literature of the structure of aggregation levels in an analysis

dimension and the relationships among them b eing a lattice semilattice or just a directed

graph It is necessary to nd a wide accepted denition of analysis dimensions This is the rst

step to state its structure and prop erties

Relationship cardinalities

Almost all the related research argues that aggregation hierarchies are formed by toone

relationships It means that an elementatagiven aggregation level is related to exactly one

element of the next aggregation level in the hierarchy A store corresp onds to exactly one city

a city in turn to exactly one state and so on As p ointed out in LAW this provides nice

aggregability prop erties

However we can nd examples where hierarchies are not dened by toone relationships in

SR Kim and Tho PJ also presents examples where the dimension hierarchies

b esides p ossibly b eing tomany can b e noncovering In general the most common and com

putationally comfortable cardinalities are N and meaning minimummaximum

cardinalities at lowerhigher aggregation levels

A diculty slightly related to this is that of having dierent path lengths b etween instances

at two aggregation levels in the dimension hierarchy An instance a at aggregation level L is



part of b at aggregation level L which in turn is part of c at aggregation level L However

 

there can b e another instance e at aggregation level L that is directly part of d at aggregation



level L This is identied by PJ as nononto hierarchies



In general we could nd sixteen dierent cardinalities for relationships b etween two aggre

gation levels ie two or for minimum and or N for maximum raised to the p ower of

four most of them presenting summarizability problems Thus it is needed to clearly identify

meaningless cardinalities to avoid misunderstandings on designing as well as the meaningful

ones to strive to solveany problems they generate

Heterogeneous aggregation levels

SR detects a problem referred as nonhomogeneous statistical ob jects This means having

ob jects at the same aggregation level that have dierent attributes For example instances in

People Dimension will have dierent attributes and will b e classied into dierent categories whether they act in a sale as clerk or customer

Elements of a multidimensional model

In Leh this is solved by dening the attributes at instance level However as p ointed

out by some authors see BSHD explicit separation of cub e structure and its contents is a

desirable mo del feature In this sense attaching sp ecic attributes to every instance do es not

seem a go o d solution LAW also tackles the problem and prop oses to solveitby means of

attributes with null values showing that a given attribute is non applicable and restricting

the usage of these attributes to selection of instances forbidding grouping by them The

solution in BHL is much more elegant It prop oses to dene dierent Relations for every

set of instances sharing the same attributes

Still it is not enough to solve the problem at logical level by means of Relations Mo deling

the concepts so that more semantics are captured is also imp ortant TBC and TPGS

prop ose to sp ecialize the aggregation levels Nevertheless they do not study the consequences

of suchsemantic relationship in aggregation hierarchies

Reuse of analysis dimensions

Multidimensional data cub es are conceived in an isolated manner Howeverwhenwe use them

wewanttonavigate from a kind of fact to another one known as drillacross This means

we are analyzing data in a Fact from a given p ointofviewandwant to view data in another

Fact from the same p ointofviewThus Facts need to have equivalentpoints of view ie

Dimensions Moreover we can also nd the same Dimension playing dierent roles in a

StarFor instance in a sale People Dimension plays two dierent roles ie Clerk and

Customer

Most multidimensional mo dels ignore drillacross If it is considered like in Kim

this op eration is restricted to the case that b oth Starshave common dimension tables As

exemplied in SBHD two Stars could also use the same analysis dimension at dierent

aggregation levels still allowing drillacross

Multidimensional analysis and research is usually restricted to one Fact Representing inter

dimension relationships would allowmorepowerful analysis by relating data in dierent Stars

The more semantically rich these relationships are the b etter for the analysts

Correlated analysis dimensions

In general analysis dimensions use to b e indep endent Thus the p ointofviewchosen at one

of them do es not restrict those p ossible values available at others However we can nd some

cases where there exist meaningless combinations of dimension values they are correlated For

instance it may b e that all pro ducts are not on sale everywhere Dep ending on the pro duct

characteristics it is sold in a store or not Some other examples of this situation can b e found

in Kim referring the problem as manytomany relationships If valuesintwo analysis

dimensions are correlated we could cho ose to keep b oth in the same dimension table

There is no multidimensional conceptual mo del able to capture this kind of relationship

However it is needed to capture at conceptual level the p ossibilityofcombining dierent

Dimensions to give rise to a new one Representing b oth Dimensions together at logical

or physical level would dep end on the numb er of meaningful combinations with regard to the

numberofelements of the correlated Dimensions

How to solve them

Relationships b etween aggregation levels should b e interpreted as partwhole also known as

comp osition relationships This allows us to use Classical Extensional Mereology CEM

axioms and other concepts in GP to address problems stated in previous section

MASS COLLECTION COMPLEX (homogeneous) (uniform) (heterogeneous) W W W

rrrrr r1 r2 r3 r4 r5

P P PPP P P1 P2 P3 P4 P5

QUANTITIES ELEMENTS COMPONENTS

Figure Typ es of wholes

As depicted in gure we nd three dierent domainindep endent kinds of partwhole

relations induced by the comp ositional structure of the whole ie Mass Collection or

Complex If there is no comp ositional structure the whole is considered homogeneous for

example an amount of rice If we takeinto consideration dierentelementsitisunderstood

as a collection having a uniform comp ositional structure for example a convoyoftrucks

If we see dierent parts playing dierentroleswehave a complex with an heterogeneous

comp ositional structure for example the pieces in an engine Mass Collection and

Complex represent extreme cases on a scale leading from a total lack of comp ositional struc

ture to wholes with complex internal organization Dierent p eople could conceive a comp osed

element at dierent p oints of that scale

The main ob jective of dening relationships b etween dierent instances in an analysis di

mension is to showhow to apply aggregation functions ie sum min max avg etc Since

these functions consider instances as equals playing the same role in the aggregation those re

lationships should b e conceived as collections From here on partwhole relationships b etween

aggregation levels in an analysis dimension should b e understo o d as forming collections

In case of having collections GP considers that the axiomatic system of CEM as stated

in gure that is also explained in AFGP seems to b e ideally suited except for axiom

In our case axiom also p erfectly suits since a user can always b e interested in considering

a given set of elements as a whole in order to apply an aggregation function Semantically

axiom is not true since the same collection of elements could comp ose dierent wholes ie

two clubs at a given p oint in time can have the same set of memb ers However in order to

apply aggregation functions b oth collections would give the same result Thus wewould not

b e talking ab out clubs but just sets of members whichwould b e the same individual

Elements of a multidimensional model

EXISTS If A is part of B b oth A and B exist

ANTISYMMETRY If A is part of B B is not

part of A

TRANSITIVITY If A is part of B and B is part

of C then A is part of C

SUPPLEMENTATION If A is a prop erpartof

B then another individual C exists whichisthe

missing part from B

EXTENSIONALITY A and B have the same

parts if and only if A and B are the same indi

vidual

SUM There always exists the individua l com

posed byanytwo individua ls of the theory

Figure Classical Extensional Mereology axioms

GP also explains that there might b e more than one way to decomp ose the same whole

ie some ob jects could b e understo o d as collection of dierent kinds of elements for instance

ayear b eing a collection of either trimesters or fourmonth p erio ds

Relationships inside an analysis dimension

Some mo dels like CTa and GMRb already stated that Dimensionscontain dier

ent Levels which represent domains at dierent granularities Those granularities showhow

elements are group ed to apply aggregation functions Thus relationships are dened among

elements at dierent aggregation levels standing for comp osition

Along AggregationDecomp osition OODimension we nd dierent kinds of relationships

based on their strength Those that do not stand for comp osition or partwhole relationships

are Association s In this kind of relationship an instance is related to another just to showa

prop erty of the second one Every instance in an analysis dimension will b e related to some

instances b ecause of those b eing its parts and to other instances b ecause of those simply

showing its prop erties

It is essential to distinguish b oth kinds of relationships in a multidimensional mo del since

they will allow to understand what was intended on dening a given schema Partwhole

relationships will showhow dierentelements are group ed together in a Dimension while

Association s will indicate which are the dierentcharacteristics available to select instances

Thus rollup and drilldown op erations will b e p erformed along partwhole relationships

while selection known as slicedice will b e p erformed by means of Association relationships

A minimum denition that everyb o dy could agree in order to deduce some controversial

prop erties of an analysis dimension using CEM axiomatic system is intro duced here Firstly

on referring to aggregation levels in multidimensional analysis there is a misuse of language on

saying for instance A city decomp oses into stores The real meaning is easily inferred but

it is imp ortanthaving in mind that it should b e said A set of stores in a city decomp oses into

stores

An analysis dimension can b e dened as follows

Denition A Dimension is a connected directedgraph representing a point of view on

analyzing data Every vertex in the graph corresponds to an aggregation level and an edge

reects that every instanceattarget Level decomposes into a col lection of instances of source

Level ie edges reect partwhole relationships between instances of Levels

ColoredProducts Color Colored Product

Kind Family

Figure Example of analysis dimension

In OO terminology Levelswould b e classes and their instances would b e ob jects Figure

shows an example of DimensionItcontains a graph with four aggregation levels ie

ColoredProduct Color Kind and Family and three edges showing that families of pro ducts

can b e decomp osed into dierent kinds of pro ducts and these into colored pro ducts which can

b e group ed by color Toavoid identication problems p ointed out by some authors we can

assume that instances of the Levelshave unique OIDs

From denition and CEM axioms some prop erties can b e deduced with regard to analysis

dimensions

Prop erty A Dimension does not contain cycles

Pro of Let us suppose that a cycle in the dimension graph exists By successively considering

axiom on any instanceAofaLevel forming the cycle we would obtain that exists another

instance B of another Level forming the cycle so that A is partofBandBispart of A This

contradicts axiom then a cycle can not exist in the graph of a Dimension

Prop erty For every Dimensionthere exists a unique aggregation level Atomic which con

tains elementary ie that can not bebroken down instances Notice that elementary instances

could be unknown in a given database

Pro of By property thereisatleast a Level whose instances do not have parts If there

is more than one of those Atomic Levels sinceaDimension is connected and axiom there

wil l exist an instance E conceivedascomposition of elementary instances at each one of the

Atomic Levels By axiom al l those col lections of elementary instances composing E must

be the same col lection of elements Therefore there exists only one Atomic Level

Elements of a multidimensional model

Prop erty For every Dimensionthere might exist a level All containing instances com

posed by al l elementary instances in the Dimension If this level exists a Its instances are

not col lected by instances at any other aggregation level b This aggregation level has exactly

one instance and c It is unique in the Dimension

Pro of By successively considering axiom we can construct an instance E composedbyall

elementary instances in the DimensionaIfE would bea properpartof an E by axiom

there would be an elementary instance that is not in E Therefore E is an instanceofa

Level whose instances arenotpart of any other instanceintheDimension which contradicts

the condition b If this Level would contain two instances both containing al l elementary

instances by axiom they would be the same instance c This Level is unique sinceifthere

were another Level whose instances col lect al l elementary instances they would be the same

instancewealready have in All level by axiom

Prop erty Those Levels whose instances are not col lected by instances of any Level ie

they are not sourceofedges in the dimension graph can beconnected with an edge to Level

All

Pro of The instanceofLevel All can bedecomposed into instances at any Level covering

Atomic LevelIfthereisa Level not covering Atomic Levela col lection can beadded to it

by axiom col lecting every elementary instance missing

Prop erty Every instanceofaLevel that is not Atomic has at least one part

Pro of An instance without parts is elementary and al l elementary instances areatAtomic

Levelbyproperty

Prop erty Every instanceofaLevel that is not Atomic might have more than one part

Pro of If the partof relationship between two instances is a properpartof by axiom the

col lection wil l have more than one part

Product Kind Family

Ferrero Rocher Candies Kinder Surprise Gifts Toys

Rubik’s cube

Figure Example of overlapping wholes

Prop erty An element might bepart of several col lections at the same time

Pro of There is no mereological axiom forbidding the sharing of elements among several col

lections in spite of it is a necessary condition to ensure summarizability as shown in LS

Al lowing this case is not a conceptual but a computational problem addressed as so in PJ

If as depicted in gure a given product at Level Product is al lowedtobelong to two dif

ferent kinds of products at the same Level Kind some derived attributes of instances of Level

Family which arecomposed by elements at Level Kindmustbecalculatedfrom elements at

Level Product for example car dGifts car dCandiescar dToys

Prop erty If Level All exists in the Dimensionthegraph is a lattice and col lections in

each Level are disjoint then for every Level S every instanceinitispart of a col lection at

each and every other Level T being target of edges leaving source Level S

Pro of A lattice with All Level at top by axiom implies that every elementary instance

is col lectedinatleast one instance of any other Level By imposing that col lections in a Level

are disjoint we obtain that every element in S must becol lected exactly in one col lection in T

If elements were not disjoint therecould be an instanceofS overlapping several col lections in

T so that it would not becompletely contained into any of them

With regard to problems stated in section regarding the graph of aggregation levels

from denition and prop erties and we ensure that in general those aggregation levels in a

Dimension form a semilattice Moreover prop erties and showthatAll Level can always

b e dened in order to obtain a lattice Those problems ab out relationships cardinalities are

explained by the other prop erties Prop erties and imply that the relationships b etween two

Levels will involve N parts for every whole Prop erty explains that a part could participate

in more than one whole or not Prop erty shows that if wehave a lattice with Level All

and parts do not participate in more than one whole there is a whole for every part ie we

have cardinality N If the same part can participate in more than one whole at the

same Level we can not guarantee that there is a whole for every part even if All exists in

the Dimensionwehave cardinality NN In any case axiom shows that the needed

instances could b e obtained to have N wholes for every part so that wehave NN

0..N 1..N Part Whole 1..1 1..N Part Whole 1..N 1..N

Part Whole

Figure Allowed cardinalities b etween Levels

Figure summarizes the allowed cardinalities in an aggregation hierarchy There are two

p ossibilities b oth with at least one part for every whole The most common case is wend

exactly one whole for every part However it is also p ossible that a given part b elong to several

wholes In this case if we nd parts that do not participate in any whole wholes can always

b e built so that every part participate in at least one

Elements of a multidimensional model

Relationships b etween analysis dimensions

It is not enough showing relationships inside a Dimension or Level It is also imp ortantto

analyze relationships b etween elements of analysis dimensions in dierentstarschemasoreven

in the same one In this section we are going to consider two kinds of semantic relationships

ie GeneralizationandAggregation

Generalization The usage of Generalization relationships b etween aggregation levels is pro

p osed in TBC and TPG Doubtless Generalization is an essential relationship

to b e shown in multidimensional schemas Nevertheless isolated aggregation levels can

not b e sp ecialized to show more sp ecic meanings They must b e considered inside a

Dimension

Prop erty In general a Level and its specialization can not belong to the same Di

mension

Pro of Let us assume that both a Level L and its specialization L areinthesame

S

DimensionInorder to dene a lattice with Level All sinceinthiscase L must cover

S

Atomic Levelwecould beforced to have some instances in L Those instances we are

S

forced to have in L could not full l specialization criterion Therefore it is not always

S

possible to have both Levels in the same Dimension

People Dimension specialization AgeGroup Level specialization Aggregation Person All SaleRole

{SaleRol="Clerk"}

Clerk

Clerk All

Figure Example of dimension sp ecialization

Figure shows an example where People Dimension is sp ecialized at SaleRole Level

solid arrow to haveaClerk Dimension This sp ecialization contains a Level with

all p eople acting as clerk and another one with only one element which is also an

instance of SaleRole representing the set of all clerks Dotted arrows showthataLevel

is sp ecialization of another one the instance of ClerkAll is that one of SaleRole

fullling the sp ecialization criterion SaleRoleClerk AgeGroup Level is not of

interest in Clerk Dimension Notice that if it would it would not b e sp ecialization of

the homonym Level in People Dimension since its instances would b e dierent they

would collect less p eople

Generalizing the example if D is the sp ecialized Dimension of D at Level L D

S S

contains at least the Level L sp ecialization of L and a sp ecialization of every Level S

in D containing parts of instances of L These sp ecialized Levelscontain exactly those

S

instances of the corresp onding Level of D b eing part of any collection in L Besides

S

those mandatory LevelsinD it is also p ossible that D contain other Levels that are

S S

not sp ecialization of any Level in D with elements not in D

All instances of a Level will have common prop erties since it represents a given class of

ob jects able to play the same role in a collection By sp ecializing a Dimensionwewillbe

able to show attributes common only to a subset of instances b esides their sp ecic part

whole relationships which solves problems presented in section as heterogeneous

aggregation levels Association as well as Aggregation relationships are inherited along

sp ecializations Therefore it also addresses the reuse of analysis dimensions It is not

only p ossible to drillacross from a Star S to a Star S when b oth share Dimensions

 

but also when the Dimensionsof S are sp ecialization of those in S

 

Semantics are not only useful for users but they can also improve query p erformance

In the example Clerk and Customer are sp ecialization of the same class ie People

On comparing instances in those Dimensions if the sp ecialization is disjoint means

they will always b e dierent Just knowing whether it is covering or not would allowto

obtain thresholds of aggregation results The sp ecialization b eing covering and disjoint

also suggests parallel computing

Aggregation Another interesting relationship to b e shown is that of elementary instances in

a Dimension b eing aggregated in elementary instances in another Dimension This

means expressing Aggregation relationships b etween Dimensions

Prop erty If elementary instances in a Dimension D arepart of elementary in

stances in Dimension D the graph of D wil l be a subgraph of D Notice that instances

A A

in D wil l not be those in D butpart of them

A

Pro of Elementary instances in D can begrouped so that the same elementary in

A

stancein D is part of every element in each col lection By axiom these col lections can

become instances in D Then instances in D can begrouped by the same criteria used

A A

on grouping elements in D

Aggregation Product ColoredProduct Product Family All Kind Product Family Kind Colors Colored All Product Color Range

Color Range All

Figure Example of dimension aggregation

Besides having Salesby ColoredProductwe could obtain data in another star schema by

Coloror ProductKind Instances in these Dimensionswould b e aggregated to show the

Elements of a multidimensional model

kind of pro duct sold and the color of that pro duct As depicted in gure the comp osed

Dimension would contain at least the graph of each one of the parts joining All levels

plus a common AtomicHowever notice that for example instances of ColorsColor

and ColoredProductsColor do not coincide While the former represent colors the

latter represent groups of pro ducts group ed by colors

By means of Aggregation relationships b etween Dimensions we address the problem

found in section as correlated analysis dimensions Two Dimensions aggregated

to generate a new one mean that there is a relationship b etween them that should b e

considered at design and query time

Facts sub ject of analysis

The aim of this section is to clarify some concepts ab out facts and how they should b e mo deled

this was already done for dimensional data in previous section Functional Dep endencies

FDs were successfully used on developing Relational theoryThus how they could also b e

used to explain multidimensionality is going to b e shown here This do es not mean pleading

for ROLAP as opp osed to MOLAP to ols The discussion is placed at conceptual level and it

is indep endentofany kind of underlying system

By b etter understanding multidimensionality and how it should b e mo deled we can obtain

several b enets Firstly it will help on designing multidimensional schemas as normal forms

do for relational ones Secondly users will also b enet from it since querying will b e easier

and more understandable Finally storage and retrieve systems could also b e improved if

knowledge ab out the real meaning of data is improved

The meaning of multidimensionality has not b een unambiguously stated in the past This

section is not going to rediscover multidimensionality but just clarify and justify some p oints

Section mentions some multidimensional data mo dels that contribute in one way or another

to mo delize factual data Then section explains multidimensional concepts placing them

at dierent detail levels with regard to ndimensional spaces and FDs b etween them It also

exemplies some relationships b etween FactsandCub es

Factual data in other mo dels

Lots of work have b een devoted to multidimensional mo deling Out of all pap ers devoted to this

sub ject some pay more attention than others to mo deling facts at conceptual level GMRb

denes a fact schema as a set of measures related to dimension attributes In SBHD

facts are sp ecialization of relationships in ER sense In CTa a fact is dened as

a function over the cartesian pro duct of domains of its analysis dimensions HS denes a

cub e as an ob ject which is asso ciated to cells of similar form

Kim states that the fact table has a comp osite primary key made up of the foreign

keys to its dimension tables Gio agrees on that and emphasizes that records in the fact

table represent p oints in the multidimensional space BPT denes aggregation hierarchies

in terms of FDs b etween sets of attributes LAWcontains a prop osal of normal forms for

multidimensional mo deling based on weak functional dep endencies It states that there is

a functional dep endency from analysis dimensions to summary attributes ie Measures

Aschema in multidimensional normal form means analysis dimensions are orthogonal to

each other and summary attributes are fully functionally determined by the set of terminal

category attributes ie atomic aggregation levels

FDs in the context of multidimensional databases need much more attention A theoret

ical wide study of dep endencies is in Tha For a more applicationoriented explanation

of dep endencies EN contains twochapters devoted to dep endencies and normal forms in

Relational databases and how they help on designing

Multidimensional elements unleashed

The DW contains lots of Measures analysts want to understand and compare Studying all

together would b e almost imp ossible In this section we are going to see how these data can b e

successively group ed at dierent detail levels to ease its management Wewillhave Measures

group ed into cells of dierent Class es that can b e seen as ndimensional Cub es which will

b e group ed based on the kind of fact ie Fact they represent

13 13 4.1 14 4.1 14 8 8 11 5 11 5 9 7 8 9 7 8 11 11 15 11 11 15 12 10 12 10 6 13 2 8 6 13 2 8 4 5 1 4 5 1 10 10 10 10 12 12 7 9 4.3 7 9 4.3 8 1 4.5 8 1 4.5 14 14 3 3.7 3 3.7 4 4 4 6 15 4.1 4 6 15 4.1 3 6.4 3 6.4 6.3 6.3 2 7.6 2 7.6 4 5.3 4 5.3

1.5 14 12 1.5 14 12 8.9 8.9 9 1.3 9 1.3 1.7 1.7 9.2 6 7.6 9.2 6 7.6

1 1.5 8.9 1 1.5 8.9 9 14 9 14

2 6.3 2 6.3

Figure Measures group ed into cells corresp onding to facts

Elements of a multidimensional model

Measures and cells

Usually for the same kind of fact sub ject of analysis at Lower detail level wehaveseveral

Measures For instance for a Sale we could keep cost revenue amount of product etc

Thus when a cloud of measurements must b e faced those corresp onding to the same fact are

always group ed in the mind of analysts

Denition A cell contains a possibly empty set of measurements and represents a given

fact

Figure sketches this by drawing several measurements Those that corresp ond to the

same fact are inside a cell which represents the fact One of these cells ie an instance of a

kind of fact contains all measurements wehave ab out what was sold to John Do e last Monday

in Barcelona ie we sold him items and charged

Nevertheless grouping measurements of the same fact is not enough to b e able to make

decisions Several facts can b e group ed and it gives rise to more complex facts Algebraically

the set of cellsC representing all p ossible facts in the DW forms a commutative semigroup

with union x y means cells x and y are group ed into a new complex cell Notice that

Measures are not considered in the discussion This deals with cells that could have attributes

or not so that summarization functions are not taken into accountbynow C fullls

the following prop erties

Closed x y C x y C

Commutative x y C x y y x

Asso ciative x y z C x y z x y z

Neutral element x C x x

A B A A B C B C B A B D A B A B C D C D C A C D C D A B C D

D D

Figure P C beingC fA B C D g

A A

If we call C the set of all cells representing atomic facts ie those that cannot b e decom

A

p osed and weallow the union of any kind of cell P C should b e considered whichcontains

A

CardC 

A

cells gure shows a set with four atomic cells

Fortunately what analysts really want to study is only a subset of P C This subset

A

is dened by the dierent kinds of facts We do not need to consider P C but at most

A

S S

P C b eing every C the set of all atomic cells of a given kind of fact so that C C

i i i A

Kind of fact S Kind of fact P

A B A B C

D D C

S

Figure P C b eing C fA C g and C fB Dg

i S P

ifSP g

For example if cells A and D in gure are of the Sales kind of fact S while cells C and B

S

P C fA B C D AD B C gThus in this are of the Productions kind of fact P

i

ifSP g

case analysts would not b e interested on sixteen cells but only on six

Analysis dimensions and aggregation levels

The facts only gain meaning when analysis dimensions identify them If we subtract dimensional

information from them only mute numb ers remain Talking ab out sales is senseless if you do

not know who sold what when whom etc Thus cells are usually group ed to give rise to

more complex cells whichcontain derived measurements However most combinations of

cells do not give rise to meaningful more complex cells It must b e done based on analysis

dimensions for example we should not group data regarding months with those regarding

years In section wehave already seen semantics and structure of Dimensions which

show the dierent p oints of view analysts use to study facts Each Dimension contains a

graph indicating how the facts can b e aggregated along the analysis dimension see denition

at page In this case those cells that are identied by instances of the same Level are

group ed to obtain a more complex cell identied by an instance of a Level ab ove that

Time Trimester

Month Year All

Four−month

Figure Example of Dimension

Figure shows an example of Dimensionwhichcontains ve Levels ie Month

Trimester Fourmonth Yearand AllEvery instance of Month Level represents a month

which can b e aggregated in two dierentways to obtain either trimesters or fourmonth p erio ds

Both kinds of instances ie Trimester or Fourmonth can b e group ed to obtain years Finally

at top wehave All Level with exactly one instance representing the group of all months in the

Dimension

Elements of a multidimensional model

Classes of cells

We can asso ciate every atomic cell to an instance of a Level in each of its analysis dimensions

showing the meaning of its measurements If all cells in a complex cell are asso ciated with

instances in a Level l and there exists an instance of l exactly comp osed by those instances in

 

l we can asso ciate the complex cell with the instance of l For example if all cells comp osing

 

another one are at level Month and corresp ond to exactly those months in a trimester the

complex cell is asso ciated to an instance of Trimester Level

Denition A Cell ie Class of cells contains those cellsrepresenting the same kind of

fact and being associated with instances of the same Level for each of the Dimensionsweuse

to analyze it

For example all cells representing sales during a given month in a given store by a given

customer form a Class Instances of this Class dier in one or more of the instances of the

Dimensions they are asso ciated to ie the month it was sold the store where it was sold or

the customer who b ought it Two cells regarding the same kind of fact and the same instance

in every Dimension will b e in the same CellThus Dimension instances identify cellsina

Class

CardC

 

If weallowed to compare or group any set of cells wewould nd that there exist

p ossible sets of cellsinP P C Thus Cells are dened to ease the study of these huge

amount of sets of cells Only MeasuresincellsofthesameClass can b e compared or

treated together b ecause they represent exactly the same kind of information ie Factat

the same granularityie Level Whats more analysts are not interested in all cellsin

P C but only in those corresp onding to FactsataLevel in eachoftheDimensions They

are only interested in subsets of every P C determined by aggregation hierarchies in analysis

i

dimensions Aggregation hierarchies in the Dimensions restrict the union of cells to those of

the same ClassFor example a cell asso ciated to an instance of Month cannot b e group ed with

another cell at Year Level to give rise to a more complex cell

Facts

Only cells of the same Class can b e group ed to obtain a coarser cellThus instances of a

Cell are obtained byunionofcells in another CellThisisalways done following aggregation

paths in the analysis dimensions The Cells generated by grouping cells in another Cell always

regard the same kind of fact Thus we can group Cellsinto FactsatUpp er detail level

Denition A Fact is a connected directedgraph representing a subject of analysis Every

vertex in the graph corresponds to a Cellandanedge reects that every instanceattarget

Cell decomposes into a col lection of instances of source Cell ie edges reect partwhole

relationships between instances of Cells

Figure shows an example of the structure of a Fact with two orthogonal Dimensions

Time already depicted in gure and Geographic comp osed by City Regionand All

Levels We can see that there is a Cell in the Fact for every combination of Levels in the

Cells in a Fact Time

A Geographic A A Y

X R Y T F A Level C T F Cartesian product X M R Cell M Aggregation

C

Figure Graph of Cellsina Fact with two Dimensions

Dimensions Having two orthogonal Dimensions with and Levelsrespectively means

that the Fact will haveCells These Cells and the partwhole relationships b etween them

form a lattice All atomic cells are in the Cell at the b ottom while the Cell at top contains

only one cell which is the union of all atomic cells

A Cell may contain any kind of data It uses to b e numerical b ecause wealways know

how to summarize numerical data ie sum avg etc However we just need a set of

aggregation op erations for a nonnumerical data typetobeabletokeep it in cells For instance

we could aggregate character strings by setunion Therefore we can also have descriptive

attributes in Cells Aggregation op erations for b o olean Measures would b e count and

or etc Anywayifthedatatyp es of the Measureshave an order we can always aggregate

calculating the median Thus Measuresin Cells could always b e aggregated to obtain the

MeasuresinCells with more complex instances except if the summarization function is not

transitive or the aggregation level is not a valid source due to any reason as explained in section

Dierent aggregation functions ie sum average minimum etc could b e used

to obtain dierent Measures in a complex Cell

Some Cells could contain Measures that are not obtained by aggregation of those from

other Cells For example some data could b e collected yearlysothat cellsatMonth Level

cannot contain it Moreover Gio distinguishes b etween analytical and nonanalytical data

Sometimes we are interested in analyzing data at a given aggregation level and ignore atomic

data However in spite of we might not collect Measuresatthelowest level of granularity

due to either availability p erformance or legal reasons ie p ersonal data use to b e private

we could b e interested in keeping some information ab out instances at that level for example

names of p eople in the census Thus wehave cases where MeasuresinaCell are not present

for coarser or more detailed Levels

Elements of a multidimensional model

If we know the Dimensions that dene the Cellsina Factweknowwhich are those Cells

and how they are related Thus it could b e inferred that it is not necessary to showthemin

multidimensional mo deling However this is not true As stated ab ove some Cells could have

sp ecic Measures or other could b e sp ecially imp ortanttobeshown to users As some derived

attributes are shown in a conceptual schema for the sake of completeness and clearness so some

Cells with complex cells should also b e shown in a multidimensional schema Most of those

cells will b e calculated on the y but other could b e physically stored to improve p erformance

or just keep sp ecic Measures

If weonlyhave toone relationships b etween Levels every one denes a partition of atomic

cells those in the Cell at b ottom Each cell comp oses exactly one more complex cell in every

Cell ab oveitsNevertheless we cannot assume that b ecause having tomany relationships

in the aggregation hierarchies implies we do not obtain partitions of the Atomic ClassIn

the worst case analysts could b e interested in all P C Thus tomany relationships in the

i

aggregation hierarchies generate semantic as well as computational problems on calculating

derived Measures for complex cells but this do es not mean they should b e forbidden in a

multidimensional mo del

Now we are going to see howdierent Facts can b e conceptually related In the following

paragraphs it is shown how some Ob jectOriented relationships ie Generalization Aggrega

tion and Derivation between cells are represented as relationships b etween Cellsand Facts

Generalization As it was previously said cellsinagiven Cell could have Measuresthat

cells in other Cellsdonothave For instance if our companymay b e the result of

a fusion of preexisting smaller companies is organized by autonomous regions it could

b e that the information systems in one of these regions collect data that those in other

regions do not Thus we will sp ecialize the corresp onding Cell dep ending on the region

All

Regions

AllNorth AllSouth AllEast AllWest

Figure Sp ecialization of a Fact basedonaCell

Sp ecialization of Cells is due to the sp ecialization of the kind of fact they are representing

Sp ecializing means dividing the instances of a sup erclass into dierent sub classes Notice

that the sets of atomic cells in each of these sub classes ie sets of north east west

and south cells are in P C beingC the cells in the sup erclass Therefore if they

i i

are meaningful for analysts as depicted in gure there will b e a Cell in the Fact so

that each sub class in the sp ecialization corresp onds to an instance of the Cell ie the

Cell will have four instances one p er region Thus we should rather sp ecialize a Fact

basedonaCell

Cells in a Fact Cells in the specialization Region=South

A A

Y Y

A

T F T F Specialization R A Aggregation M M Cell

C C

Figure Sp ecialization of a Fact by region

To sp ecialize a Factwehavetocho ose the appropriate Cell that contains the cells

corresp onding to the desired sub classes Then it is sp ecialized into Cells with exactly

one instance that will b e the Cell at top of the lattice of Cells in the new more sp ecic

Facts The example in gure shows the same Fact in gure that wewantto

sp ecialize nowby region Thus we taketheCell containing data by regions and sp ecialize

it in one Cell with one cell This gives rise to a new Fact having a Cells subgraph of

that of the sup erclass which will b e the lattice having the sub class at top Notice that

Geographic Dimension in the Fact sp ecialization is an sp ecialization of the Geographic

Dimension in the original Fact ie All Levels do not coincide

Aggregation We can also nd that dierent cells are aggregated to obtain a cell ab out another

sub ject a dierent kind of fact For instance a deal is comp osed by several individual

sales Notice that Measuresof Deal are not necessarily obtained from those of Sale for

example discount in the deal In this case we do not group cellsalongany analysis

dimension It do es not generate coarser cells in the same Factbut cellsinanother Fact

There could b e or not coincidences in analysis dimensions Dep ending on it the Cells

lattice will have a more or less similar form

The usefulness of this kind of relationships b etween Cellsistwofold On one hand it

allows to dene complex Facts from simpler ones which will improve understandability

of data On the other hand two Facts can b e related so that navigation b etween them is

Elements of a multidimensional model

p ossible If we are studying a set of sales it can b e interesting to see data corresp onding to

deals in whichtheywere done Coincidences or dierences in Dimensions do not matter

We should b e able to travel from a Class to another one just b ecause the Aggregation

relationship b etween the Facts

Derivation Another p ossibility is that Measuresinacell are obtained by pro cessing Mea

suresincells ab out a dierent kind of fact If this is the case wesay that there is a

Derivation relationship b etween b oth Cells extensivelybetween b oth Facts For exam

ple on analyzing eciency of employees some Measures could b e obtained by pro cessing

the b enets of some pro ducts sold the b est sales sales involving relevant pro ducts etc

Derivation relationships can also b e used to hide information change names or units of

Measures Most Dimensions will b e likely shared bybothCells However The Cells

are related b ecause of relationships b etween cells not b ecause of the Dimensions This

do es not corresp ond to partwhole relationships in the lattice of a Fact b ecause b oth

Class es represent dierent sub jects and grouping of cells is not p erformed by means of

aggregation hierarchies but by conditions over the Measures themselves

Gio denes a degenerate fact as a Measure recorded in the intersection table of

a manytomany relationship b etween Facts It could b e seen as data in a Cell b eing related

to two dierent Cellsby onetomany relationships Thus we could also see it as two Facts

acting as Dimensions of another Fact Therefore the duality FactDimensions only exists

if welooktoanisolatedmultidimensional schema Lo oking to all multidimensional schemas

together means that what is considered a Fact by an analyst could b e considered a Dimension

by another one or vice versa

The structure of CellsinaFact a lattice exactly coincides with that of Levelsina

Dimension Not only structure but meaning coincides as well In b oth cases there is an

Atomic Class at b ottom whose instances are successively aggregated in instances of other

Class es until we obtain an instance of the top Class whichcontains all atomic instances Both

Facts and Dimensions contain a graph of partwhole relationships b etween Classes The

dierence is that the aggregation graph of a Dimension dep ends on its prop er semantics

while the aggregation graph of a Fact dep ends on the aggregation hierarchies of its analysis

dimensions Thus we could consider a Dimension as a dimensional selfqualied Fact

All we need to obtain a Dimension from a Fact is to express it in the appropriate base as

explained in next section

Cub es

LAW explains that analysis dimensions of a summary attribute should b e orthogonal

This means that there are no dep endencies b etween them However having no dep endencies

between any pair of analysis dimensions of a set of cells could b e a really strong constraint

Actually what really matters is just having no dep endencies b etween the dimensions of the

space we are using for a given study This means dep endencies should b e forbidden b etween the dimensions used on visualizingstoring data cub es

Therefore it is imp ortanttoknowthevalid ndimensional spaces that can b e used on

analyzing a given kind of multidimensional data namely Cell All p ossible combinations of

instances in the Levels dening such spaces must b e p ossible This maybestatedasmultivalued

dep endencies with the empty set in the left hand side degenerated dep endencyforevery

pair of Levels Being the set of Levels used to visualizestore a Cell

L L and i j L j L

i j i j

13 4.1 14 8 11 5 9 7 8 11 11 15 12 10 6 13 2 8 4 5 1 10 10 12 7 9 4.3 8 1 4.5 14 3 3.7 4 4 6 15 4.1 3 6.4 6.3 2 7.6 4 5.3

1.5 14 12 8.9 9 1.3 1.7 9.2 6 7.6

1 1.5 8.9 9 14

2 6.3

Figure Diagram of a Cell with three indep endent analysis dimensions

Degenerated dep endencies for every pair of Levelsmeanswe are talking ab out the

cartesian pro duct of all of them Since we also havethat Levelsidentify the cells in a Class

that cartesian pro duct fully functionally determines the cells ie L L C Thus

 n class

a Cell either atomic or complex determined by analysis dimensions could b e drawn as those

in gure forming an ndimensional data cub e Notice that wecouldhave alternative

keys ie a Cell could b e organized in dierent ndimensional data cub es

Denition A Cub e is an injective function from an ndimensional nite spacedenedby

the cartesian product of n functional ly independent Levels fL L g to the set of instances

 n

of a Cell C

c

c L L C inj ectiv e

 n c

Being a function means a Cub e is not allowed to have holes Anycombination of Di

mension instances must b e valid ie related to a cell However missing cells should b e

Elements of a multidimensional model

allowed if they mean that the fact is unknown or that it could have happ ened but it did

not Toavoid these holes this can b e represented as a b o olean Measure per cell mean

ing whether the corresp onding fact happ ened or not a null value in this b o olean Measure

means we do not know if it happ ened What must b e forbidden is an sparse Cub e b ecause

of inapplicable combinations in the cartesian pro duct since it means wehave dep endencies

between Dimensions which is a bad conceptual design

On the other hand a Cub e needs to b e injective in order to allow spaces that do not contain

all instances of a Cell This means we will b e able to visualizestore only a subset of the Class

In general dierent Cells are determined by cartesian pro ducts of dierent Levels How

ever it could also b e that the same set of Levels determine two dierent Cub es for dierent

kinds of facts for example Sales and Purchases in our business b oth b eing analyzed by

Month Region and Product That is Dimensions can b e freely reused for dierent Cub es

Base changes Steinitzs theorem regarding vectorial spaces states that if fe e g are a base

 n

for a space and fv v g are linearly indep endent we can change m elements in the

 m

base by v and it still b e a base Since Cub es are nothing else that nite spaces we can

i

also nd that two Cub es are related by a base change Dimensionschange in our case

Both Cub es contain the same cells but just place them in a space dened by dierent

analysis dimensions Thus Dimensions in one of them must functionally determine the

ones in the other

Figure Reduction of a dimensional Cub e to a dimensional Cub e

If Levels fL L g determine a Cub e and there exists a set of functionally indep endent

 n

Levels fL L g so that L L L L we can change the Levelsthat

 n

 m  m

L L dimensionality can b e reduced by dene the space of the Cub e If L

j k

i

replacing L and L by L assketched in gure Some authors prop ose to join two

j k

i

correlated analysis dimensions in order to avoid meaningless combinations This is not the

case The number of cells is exactly the same They are just placed in another wayAs

L L dimensionalityofthe Cub e can b e decreased it can also b e increased if L

k

j i

All these base changes b etween Cub es can b e seen as an application of the transitive

prop erty of FDs b etween Levels

As a sp ecial case a surrogate generated by a sequence is always a base for the

dimensional space However it can b e considered a degenerate case since it is meaning

less for analysts and implies the loss of all b enets in multidimensionalityNevertheless

as mentioned in previous section it is imp ortanttoconvert a Fact in a DimensionIn

Gio the Dimension of a dimensional space is called a shadow dimension which

has a onetoone relationship with the fact table Since there are not two cells asso ci

ated to the same instance in the Dimension it is not going to b e used neither to restrict

nor to group However the information could b e attached to a given rep ort bywayof

awareness concept

Another problem already discused by some authors is how Measures can b e transformed

into analysis dimensions for its own FactFor example LAW stated that using a

Measure as Dimension means a change in the schema In this framework wehave just

a base change in the space Whether numerical or descriptive if a set of attributes fully

functionally determines cellsina Cub e they can b e used as analysis dimensions Thus

Measures could also b e used as analysis dimensions if they allow to identify cells

Conclusions

There is some controversy ab out whether aggregation hierarchies must b e implicit or explicit

This chapter shows that at conceptual level it is essential to explicit aggregation hierarchies

and as much information as p ossible ab out analysis dimensions That information will ease the

user to understand data and p ose adho c queries Users will b e able to classify and group data

sets in an appropriate manner

Some problems on explicitly mo deling aggregation hierarchies havebeenidentied and

addressed by providing partwhole semantics to relationships b etween aggregation levels and

considering mereology axioms Thus an analysis dimension is dened as a connected directed

graph of aggregation levels and for each one of the problems some mereological prop erties were

inferred to solve it This is the rst work deducing prop erties of analysis dimensions instead of

just imp osing them As a result of this studywe can see that nononto and noncovering

hierarchies as presented in Ped should not b e allowed This is mainly due to considering

that all instances of a Class must have the same structure If we nd that it is absolutely

necessary having dierent structures for dierent instances of the same Classwe can obtain it

by sp ecializing the Class

Not only partwhole but other kinds of relationships were found interesting for analysis

dimensions ie Generalization and Association It was also shown how dierent Dimensions

can b e related and the consequences that relationships have in aggregation hierarchies

Detail level Sub ject of analysis Analysis dimensions

Lower Measures Descriptors

Intermediate Cells representing a Class of cells Levels

Upp er Facts representing a kind of facts Dimensions

Table Summary table of the dierent elements in a multidimensional mo del

Once dimensional data has b een analyzed the second half of the chapter aimed to help on

clarifying what multidimensionality means Ndimensional spaces and functional dep endencies

Elements of a multidimensional model

were used to explain what measures cells cub es and facts exactly are which will

help on designing as well as querying multidimensional data

As summarized in table we can distinguish three dierent detail levels At Lower detail

level wehave Measures that are the Attributes of the cells Then we can group cellsinto

dierent Class es that can b e drawn as ndimensional Cub esatIntermediate detail level

thanks to that the dierent analysis dimensions dening a Cub e are functionally indep endent

Finallyat Upp er level several Cells representing the same kind of fact at dierent aggregation

levels are group ed into a FactParallelism b etween the structure of analysis dimensions and factual data has b een outlined



YAM Yet Another Multidimensional Model

Chapter



YAM Yet Another

Multidimensional Mo del

Blo oming yam

Dont hurry dont worryYou are only here for a short visit So b e sure to

stop and smell the owers

In the New York Times

Several pap ers app eared in the last years regarding multidimensional mo deling However

few of them place the discussion at a conceptual level Moreover most of them fo cus on the rep

resentation of isolated star schemas ie the representation of only one kind of facts surrounded

by its analysis dimensions In spite of the fact that the dominant trend in data mo deling is the

Ob jectOriented OO paradigm only a couple of prop osals on OO multidimensional mo d

eling exist TP and BTW These prop osals use Unied Mo deling Language UML

standard dened in OMGb in some way but none of them prop oses an extension of it

to include multidimensionality Only the Common Warehouse Metamo del CWM standard

dened in OMGa extends UML metaclasses to representsomemultidimensional concepts

However it is to o general and not conceived as a conceptual mo del

Next section explains the main contributions of this multidimensional mo del Then sec

tions and present its structures inherentintegrity constraints and op erations

resp ectively Section shows the metaclasses of the mo del and their relationships with UML



metaclasses Finally section compares YAM with other multidimensional mo dels against

several items most of them already intro duced by other authors

YAM is not JAM Just Another Multidimensional

Mo del

As stated in AHV a database mo del provides the means for sp ecifying particular data

structures for constraining the data sets asso ciated with these structures and for manipulating

the data It is also explained there that as Relations are the data structures of the Relational

mo del so graphs are the structures of OO mo dels A precise easily understandable semantics



for graphs in this OO mo del is provided by dening YAM structures as an extension of a



wide accepted mo deling language ie UML eachandevery YAM metaclass is a sub class

of a UML metaclass There are some multidimensional mo dels that use UML notation but

no one extends its concepts for multidimensional purp oses By using UML as a base for the



denition of structures of YAM it is built on solid well accepted foundations and avoids the

denition and exemplication of basic concepts It makes unnecessary to explain what Class es

Attribute s etc are

The main goal of multidimensionality is to help nonexp ert users to query data Therefore

the data structures of a multidimensional mo del should showhow data can b e accessed driving

users in their understanding They should keep as much information as p ossible but the

resulting schema must b e easily understandable by nal users Thus the dierent mo deling



elements in YAM have b een dened at three levels ie Upp er Intermediateand Lower

so that they are successively decomp osed to give the desired detail

Expressiveness or Semantic Power as it is dened in SCG is the degree to which

a mo del can express or represent a conception of the real world It measures the p ower of

the elements of the mo del to represent conceptual structures and to b e interpreted as such

conceptual structures The most expressive a mo del is the b etter it represents the real world

and the more information ab out the data gives to the user As outlined in FBSV due to the

presence of multidimensional aggregation data warehouse and sp ecially OLAP applications

ask for the vital extension of the expressivepower and functionality of traditional conceptual

mo deling formalisms Therefore this is crucial for conceptual multidimensional mo dels like



YAM since they are used to represent user ideas Dierent kinds of no des and arcs in the

graphs will b e dened to improve the Expressiveness of the mo del The applicability of the

dierent kinds of relationships supp orted by UML has b een systematically studied

Another imp ortant p oint for a data mo del is its Semantic Relativism It is dened in

SCG as the degree to which the mo del can accommo date not only one but many dierent

conceptions Since dierent p ersons p erceive and conceivetheworld in dierentways the se

mantic relativism of a data mo del is really imp ortant to b e able to capture all those conceptions

The information kept in the DW should b e shown to users in the form they exp ect to see it



YAM Yet Another Multidimensional Model



indep endently of howitwas previously conceived or is actually stored Therefore YAM also

provides mechanisms derivation relationships at dierent detail levels to mo del the same data

from dierent p oints of view



YAM also pays sp ecial attention to showhow data can b e classied and group ed in a

manner appropriate for subsequent summarization Summarized data can b e reected in the

schema as well as the ways to obtain it For instance this information can b e used at later

design phases to decide materialization



Therefore main advantages of YAM are its expressiveness and semantic relativism b esides

the exibility oered in the denition of summarization constraints it generalizes the work in

LS Moreover from the separate study of characteristics of analysis dimensions and factual

data in section and it is ensured that it is dened on solid foundations That study



mainly impacts in the denitions in In section YAM is compared with other

mo dels to show its advantages and disadvantages There its contributions can b e clearly seen

regarding sp ecic items

Structures

In this section the structures in the mo del ie no des and arcs are dened

No des

Multidimensional mo dels are based on the duality factdimensions Intuitively a fact repre

sents data sub ject of analysis and dimensions show dierent p oints of view we can use in

analysis tasks The facts represent measurements in a general sense while dimensions

represent given information we already have b efore taking the measurements on the under

standing that they can always b e mo died As previous work for the denition of this mo del

dimensions and facts were separatedly studied in chapter The reader is referred to it

for an sp ecic deep er explanation of each of b oth kinds of data Now the denition of the

dierent no des found in a multidimensional OO schema is given

Denition A Level represents the set of instances of the same granularity in an analysis

dimension It is an specialization of Class UML metaclass

Denition A Descriptor is an attribute of a Level used to select its instances It is an

specialization of Attribute UML metaclass

Denition A Dimension is a connected directedgraph representing a point of view on an

alyzing data Every vertex in the graph corresponds to a Level and an edge reects that every

instanceoftarget Level decomposes into a col lection of instances of source Level ie edges

reect partwhole relationships between instances of Levels It is an specialization of Classier

UML metaclass

Notice that the acyclicityofDimension hierarchies do es not need to b e part of their

denition It can b e proved from mereology axioms

Customer Aggregation AgeGroup

Customer All

Bonanza

Figure Example of Dimension

Figure shows an example of DimensionItcontains four Levels Customer AgeGroup

Bonanza and AllEvery instance of Customer Level represents a customer whichcanbe

aggregated in two dierentways to obtain either age or go o dness groups of customers At

top wehave All level with exactly one instance representing the group of all customers in the

Dimension The structure and prop erties of the graphs of the Dimensionswere carefully

explained in section Just to note here that it forms a lattice and due to the transitive

prop erty of partwhole relationships some arcs are redundant so that they do not need to b e

explicited for instance Customer b eing aggregated into All

Denition A Cell represents the set of instances of a given kind of fact measured at the

same granularity for each of its analysis dimensions It is an specialization of Class UML

metaclass

Denition A Measure is an attribute of a Cell representing measureddatatobeana

lyzed Thus each instanceofCell contains a possibly empty set of measurements It is an

specialization of Attribute UML metaclass

Denition A Fact is a connected directedgraph representing a subject of analysis Every

vertex in the graph corresponds to a Cellandanedge reects that every instanceoftarget

Cell decomposes into a col lection of instances of source Cell ie edges reect partwhole

relationships between instances of Cells It is an specialization of Classier UML metaclass

Figure shows an example of the structure of a Fact with two orthogonal Dimensions

Customer already depicted in gure and Clerk comp osed by Clerk TeamandAll Levels

We can see that there is a Cell in the Fact for every combination of Levelsinthe Dimensions

Thus a Fact contains all data regarding the same sub ject at anygranularityHaving two

indep endent Dimensionswith and Levels resp ectively means that the Fact will have

dierent Cells These Cells and the partwhole relationships b etween them form a lattice

as was already explained in section It is not necessary to represent all those Cellsin

the schema Cells just containing derived data are optional and should only b e explicited to

emphasize the imp ortance of summarized data at a given aggregation level

These six kinds of no des ie Fact Dimension Cell Level Measureand Descriptor

are group ed in three pairs At Intermediate level there are Cellsand Levels Lo oking

at Lower detail wesee MeasuresandDescriptors Moreover at this level we also dene

KindOfMeasure to showthatseveral Measures in dierent Cells corresp ond to the same



YAM Yet Another Multidimensional Model

Cells in a Fact Level

Customer Clerk X Cartesian product All All Cell

Aggregation All Ag Bo X Te All

Ag Bo Cu Cl Te

Cu

Cl

Figure Graph of CellsinaFact with two Dimensions

measured concept at dierent aggregation levels Moreover at Upp er detail level wehave

Facts and Dimensionsone Fact and the Dimensions asso ciated to it comp ose a Star

Denition A Star isamodeling element composedbyoneFact and several Dimensions

that can be used to analyze it It is an specialization of Package UML metaclass

Arcs

Relationship Generalization (from Core) Association

<> Flow Association Generalization Derivation (from Core) (from Core) (from Core) (from Core)

source target 2..* parent child suplier ModelElement Classifier GeneralizableElement (from Core) (from Core) (from Core)

client

Figure UML Relationships b etween mo del elements

Once the no des have b een dened in this section we are going to see the dierent kinds

of arcs we could nd b etween them UML provides four dierent kinds of relationships Gen

eralization Flow Association and Dependency As depicted in gure Generalization rela

tionships relate two GeneralizableElement s one with a more sp ecic meaning than the other

Classier s and Association sareGeneralizableElement s Flow relationships relate twoelements

in the mo del so that b oth represent dierentversions of the same thing Associationasde

ned in UML sp ecication denes a semantic relationship b etween Classiers By means of a

stereotyp e of AssociationEnd UML allows to use a stronger typ e of Association ie Aggrega

tion where one classier represents parts of the other ie it shows partwhole relationships

Finally UML allows to represent dierent kinds of Dependency relationships b etween Mod

elElement slike Binding Usage Permissionor AbstractionWe are not going to consider the



three rst b ecause they are rather used on application mo deling and YAM is just a data

mo del Moreover due to the same reason out of the dierentstereotyp es of Abstraction we are

only going to use Derivation Derivability also known as Point of View helps to represent

the relationships b etween mo del elements in dierent conceptions of the UoD

The usability of these relationships b etween concepts was briey explained and exemplied

in section Here we are systematically going to see how they can b e used to relate multi

dimensional constructs at every detail level For every pair of constructs at each detail level it

will b e shown whether they can b e related by a given kind of Relationship or not Moreover

if two constructs can b e related it will also b e shown whether they must b elong to the same

construct at the level ab ove or not ie inter or intra relationships resp ectively

Upp er detail level

FactFact FactDimension DimensionFact DimensionDimension

Generalization Inter IntraInter

Association IntraInter IntraInter IntraInter IntraInter

Aggregation IntraInter IntraInter

Flow Inter IntraInter

Derivation Inter IntraInter IntraInter

Table Relationships b etween elements at Upp er detail level

Table shows the dierent relationships we can nd at this detail level Since a Star only

contains one Fact in order to havetwo related Facts they must b elong to dierent Stars

Therefore relationships b etween Facts will always b e interstellar but for reexive Associa

tion sandAggregation s However we can haveinterstellar as wellasintrastellar relationships

between two Dimensions b ecause a Star contains several Dimensions which can b e related

Figure shows examples of most relationships at this level Firstly corresp onding to the

upp erleft corner of the table we see that two Facts can b e related by Generalization ie

ProductSale and CreditSale We will have dierent information for the more sp ecic Fact

for example numb er of credit card Thus analysis dimensions are inherited from the more

general Fact but others could b e added like Bank ProductSale and Production are related

by Association to show the corresp ondences b etween pro duced and sold items We can also

nd Aggregation relationships b etween Facts A Fact in a Star can b e comp osed by Factsin

another StarFor instance a Deal is comp osed byseveral individual ProductSale Notice that

it is not always p ossible to calculate all measurements of Deal from those of ProductSale for

instance discount in the deal Data sources measure instruments or calculation algorithms



YAM Yet Another Multidimensional Model

D D <> People Club Productions <> Sales F Assigned to Production D D Customer Clerk

D /RawMaterial D F D Product ProductSale Store <> <> Promotions Applied to Deals F Dimension D D D F Promotion /Promotion Time Deal Fact F Derived element /

Generalization 28−12−92 <> 28−12−92 <> Association CreditSales OldSale Aggregation D F F D Derivation Bank CreditSale OldProductSale OldStore

Flow



Figure Example of YAM schema at Upp er detail level

are probably going to change and these changes should b e reected in our mo del by means of

Flow relationships b etween Facts All these changes are not reected by just relating our Facts

to Time Dimensionsincewe actually have dierent Cells On December th of we

started recording discountchecks in ProductSalesothatwekept b oth incomes ie cash

and discountchecks From that dayonwehave dierent Factscontaining the same kind of

data b efore and after the acceptance of the checks ie OldProductSaleand ProductSale

Finallytwo Facts could also b e related by Derivation relationships to show that they are the

same concept from dierentpoints of view

In the upp erright corner of table we can see that there exist Generalization relationships

between Dimensions For instance People Dimension generalizes Clerk and Customer ones

Notice that if we supp ose that all p eople are customers b oth related Dimensionswould b elong

to the same Star It is also p ossible to have analysis dimensions related by AssociationThus

Clerk is asso ciated with Store Dimension to show that clerks are assigned to stores We can

also nd stronger asso ciations b etween analysis dimensions if we join more than one to give rise

to another For example People Dimension is used to dene Club by means of an Aggregation

relationship Every instance of Club is comp osed by a set of p eople Several years ago when

our lo cal business grew Store Dimension was changed to reect the new Level RegionAt

conceptual level those changes are represented bya Flow relationship b etween OldStore and

Store Derivation sallow to state that there are dierent views of the same DimensionWe

could nd that the same concept has dierent names dep ending on the sub ject weareThus a

Dimension could b e used in dierent Stars For example Product is considered RawMaterial

in a dierentcontext Therefore the same Dimension with exactly the same instances needs

a dierent name dep ending on the context These Dimensions could even have dierent

aggregation hierarchies or attributes of interest to the users For example studying the raw

material group ed by prot margin can b e meaningless

The middle columns in table showhowa Fact can b e related to a Dimension and vice

versa Firstlywe see that a Fact is related to its analysis dimensions by means of Associa

tion relationships Moreover they can also b e asso ciated to Facts in another Star as shown

in the example where Promotion Fact is asso ciated to Product Dimension in the Sales

StarADimension can b e obtained by deriving it from a Fact The name can b e changed

some aggregation levels added or removed others mo died some instances selected etc in

order to adapt it to its new usage In our example some p eople is interested in the analysis

of promotions Thus the promotions selected by studying Promotion Fact can b e used as

Dimension to study ProductSale Notice the dierence b etween deriving a Dimension and

asso ciating it to a Fact in another Star The former allows to study the sales p erformed during

a promotion while the latter shows all promotions that have b een applied to a kind of pro duct

That Derivation between a Fact and a Dimension uses to b e an interstellar relationship

ie from a Factwe deriveaDimension to analyze another Fact However we could also

use information derived from a Fact to analyze the same Fact It is also imp ortanttosay

that a Fact cannot b e derived from a Dimension b ecause Facts represent measurements so

that they cannot b e found a priori in the form of Dimension The rest of relationships ie

Generalization AggregationandFlow cannot b e found b etween a Fact and a Dimension

nor vice versa All three imply obtaining a new element based on a preexisting one and the

dierence b etween Fact and Dimension is so imp ortant that the obtaining of one from the

other should b e restricted to derivation mechanisms For instance a Fact cannot eventually

b ecome a Dimension

Intermediate detail level

CellCell CellLevel LevelCell LevelLevel

Generalization Inter Inter

Association IntraInter Inter Inter IntraInter

Aggregation IntraInter IntraInter

Flow Inter Inter

Derivation Inter Inter Inter

Table Relationships b etween elements at Intermediate detail level

Table shows the relationships we can nd at this level Most of them are exemplied

in gure Our company resulting from the fusion of preexisting smaller companies is

organized in autonomous regions Thus the information systems in one of these regions collect

data that those in other regions do not so we sp ecialize our Cells ie AtomicSale dep ending

on the region This sp ecialization is due to the sp ecialization of the kind of fact they are

representing Therefore we can see in the upp erleft corner of the table that two Cells can b e

related by Generalization but they must b elong to dierent Facts ie it is an interfactual

relationship Cells in dierent Facts can b e asso ciated for instance each Cell representing

a sale with its corresp onding Cell representing the pro duction of what was sold Moreover

we can also have Association relationships b etween Cells in the same Fact for instance



YAM Yet Another Multidimensional Model

computers are asso ciated to those other pro ducts that are plugged to them In general we only

haveintrafactual Aggregation relationships which corresp ond to those relationships b etween

Levels and are not necessary in the schema However we could also nd that dierent Cells

are aggregated to obtain a Cell ab out a dierent kind of fact when b oth Facts are also related

like ProductSale and Deal In this case we do not group Cells along any analysis dimension

ie it do es not generate coarser Cells in the same FactbutCells in another Fact ie

AtomicDeal If a new Measure would app ear for a kind of fact wewould obtain a new Cell

related to the old one by means of a Flow Both would represent the same concept However

they would b elong to dierentversions of the same Fact itisaninterfactual relationship

Derivation relationships can b e used to hide information change names or Measures in the

Cells giving rise to new Facts

<> L AtomicSaleInSouthRegion C OldRegion Sales

All L

All L Region L 13−8−01

C AtomicProd Team L Store L relatives Assigned to Minute L Clerk L SalePoint L All L AtomicSale C Hour L Kind L <> {Clerk, Minute, Product} C <> {SalePoint, Minute, Product} Day L /DailySalesByStore <> {Customer, Minute, Product} <> {Product, Store, Day} Promotion L Month L Plugged to Product L C AtomicDeal

Customer L Kind L All L Cell C Level L LLAgeGroup Family L Generalization Goodness Association Aggregation All L All L

Flow



Figure Example of YAM schema at Intermediate detail level

The rightmost column shows that we could also nd Generalization relationships b etween

two Levels As in the case of Cells it must b e an interdimensional relationship b ecause b oth

Levels cannot b e related at the same time by Generalization and partwhole relationships

Associations between Levels can b e intra as well as interdimensional The Level representing

clerks is asso ciated with other clerks hisher relatives in the same Dimension and with stores

in another DimensionIntradimensional Aggregation s dene the graph of the Dimension

However we could also nd interdimensional Aggregation sbetween Levels if two Dimensions

are so related When the companywas restructured and the regional division changed the

aggregation level showing it also changed Both new and old Levels are related by means of a

Flow although they represent the same concept they b elong to dierentversions of the same

Dimension Finally as for any other concept a Level could b e derived from another one to

show it from a dierentpoint of view

All relationships in the central columns must b e interstructure b ecause Cellsand Levels

always b elong to dierent structures ie FactsandDimensions resp ectively As for rela

tionships at upp er detail level a Cell cannot b e converted into a Level nor vice versa by means

of Generalization AggregationorFlowItmust always b e done using derivation mechanisms

Moreover b ecause of the same reason that a Fact cannot b e derived from a Dimensiona

Cell cannot b e derived from a LevelNevertheless if a Dimension is derived from a Fact

its Levels are also derived from the Cellsofthe Fact Association s exist b etween Cellsand

Levels or vice versa showing the granularityofthe Cells

Lower detail level

MeasureMeas MeasureDescr DescriptorMeas DescriptorDescr

Flow Inter Inter

Derivation IntraInter Inter Inter IntraInter

Table Relationships b etween elements at Lower detail level

Elements at this level are neither Classier snorGeneralizableElement s but just Attribute s

Therefore as it is shown in table they can only b e related by those relationships b etween

ModelElement ie Derivation and Flow

Ifachange aects a Measure or Descriptor they will b elong to new versions of their Cell

and Level resp ectivelyThus Flow relationships are in b oth cases interstructure Moreover

simply evolution cannot convert a Measure into a Descriptor nor vice versa

Kind L AtomicSale C Cell C {income−Production.cost} Level L income: Income name Derivation /attName {supliers} /revenue: Revenue Invalid source for: /incomeAverage: IncomeAverage Income Revenue DailySalesByStore C <> IncomePerPerson Customer L /incomePerPerson: IncomePerPerson {Store −> sum(Income)/Population} /bonanza {Time, Product −> sum(IncomePerPerson)} NonTransitive {atomicSale.income} Store L <> zipCode IncomeAverage influenceArea

population Time, Product, Store, Promotion, Customer, Clerk −> avg(Income)



Figure Example of YAM schema at Lower detail level

It is always p ossible to dene derived Measures from other MeasuresinthesameCell

as well as Descriptors from other Descriptors in the same Level Moreover in b oth cases

supplier Attribute s could also b e in other Class es MeasuresinaCell could b e obtained by

applying some op eration to Measuresinother Cells For instance lo oking to Lower detail



YAM Yet Another Multidimensional Model

level elements in gure we see that measurements of revenue in AtomicSale are obtained

from subtracting cost in Production What is more a Descriptor can b e obtained from some

Measures for example the go o dness of a customer from the income of hisher purchases or

vice versa for example impact of sales obtained by dividing incomes by the p opulation of the

inuence area of the store This gure do es not show arcs b etween Cells and Levels b ecause

they are at Intermediate detail level

Inherentintegrity constraints

The metaclasses of the mo del dene constraints on multidimensional schemas but constraints

should also b e dened on their instances In this section that kind of constraints are going to

b e addressed paying sp ecial attention to two imp ortant asp ects in multidimensional mo deling

namely placement of data in an ndimensional space and summarizability of data

The main contribution of multidimensionality is the placement of data in an ndimensional

space This improves the understanding of those data and allows the implementation of sp e

cic storage techniques It is imp ortant that the n dimensions of the space ie Cub e are

orthogonal If not ie if a Dimension determines others the visualization of data will b e

unnecessarily complicated we are showing more information than it is needed and it will b e

more dicult for users to understand it moreoverstoragemechanisms are aected as well

b ecause they are not considering that several combinations of dimension values are imp ossible

mayb e resulting in a waste of space This do es not mean that all Dimensionsina Star must

b e orthogonal Nevertheless those dening Cub es which are used for visualization as well as

storage purp oses should b e or at least the user should know whether they are

A Cell instance is related to one ob ject or set of ob jects if it is an Association with upp er

b ound multiplicity greater than one at each asso ciated analysis dimension and those ob jects

or sets of ob jects completely identify it Thus regarding placement of data in ndimensional

spaces wecouldsay that the set of Levelsa Cell is asso ciated with form a sup erkey in



Relational terms of that CellInYAM every minimal set of Levels b eing sup erkey ie

key in the Relational mo del of a Cell is called a Base When one of these Bases that dene

spaces of orthogonal Dimensions is asso ciated to a Cellwe obtain a Cub eFor instance

AtomicSale in gure can b e asso ciated with p oints in the dimensional space dened by

Levels Clerk Minuteand ProductsothatAtomicSale is fully functionally determined by

those three LevelsaBase of the space

Denition A Cub e is an injective function from an ndimensional nite space denedby

the cartesian product of n functional ly independent Levels fL L g to the set of instances

 n

of a Cell C

c

c L L C inj ectiv e

 n c

If the Levelswere not functionally indep endent ie they did not form a Base wewould

use more Dimensions than strictly needed to represent the data and would generate empty meaningless zones in the space

Another interesting group of constraints to deal with is that related to summarization

anomalies and how to solveorprevent them In multidimensional mo deling it is essential to

knowhow a given kind of measure must b e aggregated to obtain it at a coarser granularity

LSidenties three necessary intuitively also sucient conditions for summarizability

Disjointness the subsets of ob jects to b e aggregated must b e disjoint

Completeness the union of subsets must constitute the entire set

Compatibility category attribute ie Level summary attribute ie KindOfMea

sure and statistical function ie Summarizationmust b e compatible

The rst two conditions are absolutely dep endent on constraints over cardinalities in the

partwhole relationships in the Dimensions b ecause these dene the grouping categories

Therefore let us briey talk also ab out this third group of integrity constraints of the mo del

Product Kind Family

Ferrero Rocher Candies Kinder Surprise Gifts Toys

Rubik’s cube

Figure Example of sharing of parts b etween several instances

Toavoid those anomalies on summarizing data some mo dels forbid tomany relationships

in the aggregation hierarchies This means that instances of a Part Level can only b elong

to one WholeNevertheless there is no mereological axiom forbidding the sharing of parts

among several wholes As exemplied in gure a given pro duct Kinder Surprise at Level

Product b elongs to two dierent kinds of pro ducts at the same Level Kind ie Candiesand

Toys We argue that this case should not b e ignored bya multidimensional mo del Therefore

nonstrict hierarchies are allowed in the Dimensionsandtheyneedtobetaken into account

to decide summarizabilityof Measures

The other problem on cardinalities is that of nononto and noncovering hierarchies as

presented is Ped That is having dierent partwhole structures for instances at the same

Level is allowed For example if wewould have a statecitylike Monaco in a Geographic

linear Dimension with Levels City Stateand All we could generate b oth situations If

we consider that Monaco isacitywehave a noncovering hierarchywe are skipping State

Level On the other hand if it is considered a state we obtain a nononto hierarchywe

have dierent path lengths from the ro ot to the leaves dep ending on the instances In this



case YAM prop oses the usage of what some authors call DummyValues to guarantee the

existence of at least one part for every whole in the hierarchy These values are not dummyat

all Monaco b eing a statecity do es not mean it is either a state or a city but a state and a city



YAM Yet Another Multidimensional Model

at the same time Thus b oth instances will representcity and state facets of the same entity



Therefore in YAM cardinalities in aggregation hierarchies are parts for every whole

and wholes for every part on the understanding that Dimension instances can always b e

dened so that there are wholes for every part The interested reader can refer to section

for a deep er explanation of these cardinalities

Going back to the group of constraints regarding summarizability in the mo del there are

three dierent elements to deal with that problem all exemplied in gure These elements

allow to represent summarizability conditions in a more exible way than just distinguishing

additive semiadditive and nonadditive Measures Firstlywehave that some Levels

are an InvalidSource for the calculation of a given KindOfMeasure for example Kind is an

invalid source for Income and Revenue This means that measurements at an aggregation level

cannot b e used to obtain data at higher aggregation levels and wemust go to nner granularities

mayb e to the Atomic Level to obtain the source data for the calculation This can b e due to

the fact that the instances of that Level are not disjoint or not complete ie summarizability

conditions and mentioned ab ove A Level being invalid or not cannot b e deduced just from

the cardinalities of its asso ciations but also dep ends on the KindOfMeasureFor instance if

a Measure is obtained as the minimum of a set of measurements it do es not matter whether

the source sets of instances are disjointornotFor example in some cases double counting

could even b e desirable

Moreover Induce Association shows the summarization that must b e p erformed on ag

gregating a given KindOfMeasure along a Dimension This constraint regards the third

condition mentioned ab ove Along a given analysis dimension we can use a summarization

op eration while along a dierent analysis dimension we use a dierent function For instance

we aggregate IncomePerPerson along Time and Product by means of sum while along Store

it needs to b e recalculated from Incomes Incompatibilities are not always asso ciated to Time

DimensionFurthermore instances of Induction could b e partially ordered if necessaryto

show that op erations are not commutative and mustbeperformedinagiven order as p ointed

out in Tho For example sums along a Dimension must b e p erformed b efore averages

along another one so that we aggregate up to the desired Level in a Dimension and then

we aggregate along the other

Finally another p ointtotakeinto account usually overlo oked in other mo dels is that

of transitivity If a summarization op eration is not transitive we cannot use precalculated

aggregates at a given Level to obtain those at higher levels Going to the atomic source is

mandatory for instance we should not p erform the average of averages if wewant to obtain

the average of raw data

Op erations

The multidimensional mo del is just a query mo del ie it do es not need op erations for up date



since this is not directly p erformed by nal users YAM op erations fo cus on identifying

and uniformly manipulating sets of data namely Cub es In a Cub e data are identied by

their prop erties Thus these op erations are separated from the physical storage of the data

Moreover they are not presentation oriented like those in Tes

Detail level Sub ject of analysis Point of view

Upp er Drillacross ChangeBase

Intermediate Rollup

Lower Pro jection Dice



Table YAM op erations

As everything in a multidimensional mo del op erations are also marked by the duality

factdimensions Table shows the op erations in two columns The rst one contains those

op erations having eect on the sub ject of analysis ie Fact Celland Measure They select

the part of the schema wewant to see In the other column there are those op erations aecting

the p ointofviewwe will use in the analysis ie Dimension Level and Descriptor They

allow to reorganize the data mo dify their granularity and fo cus on a sp ecic subset by selecting

the instances wewant to see

ci i i i i L 1x..xLkx ..xLm C c

f h g

co o o o o

L 1x..xLj x ..xLn C c

Figure Multidimensional op erations as comp osition of functions

In the sense of AHV these op erations are conceptually a pro cedural language b ecause

queries are sp ecied by a sequence of op erations that construct the answer We generally say

that a query is from or over its input schema to its output schema Thus there exists an input

mdimensional Cub e c and wewant to obtain an output ndimensional Cub e c Since

i o

we dened a Cub e as a function see denition op erations must transform a function into

another function Op erations in the rst column work on the image of the function ie Cell

while op erations in the second column change its domain ie Base As depicted in gure

wehave three families of functions ie f g andh that can b e used to transform a Cub e

Obtaining c from c can b e seen as mathematical comp osition of functions c c

o i o i

with and b elonging to the families of functions g and f resp ectively Firstlywecansee

how ChangeBase given c and a function b elonging to a family of functions f between the

i

nite spaces dened by cartesian pro duct of Levelsofeach Cub e we obtain a new Cub e

c c Nevertheless Drillacross do es change the CellThus it works in the opp osite

o i

way in the sense that it needs a Cub e c and the function b elonging to a family of functions

i

g from a Cell to another Cell to obtain the new Cub e c c

o i

Unfortunately it is not p ossible to dene all op erations in suchaway Rollup changes

the space as well as the CellThus obtaining it as a comp osition of functions is not p ossible



YAM Yet Another Multidimensional Model

b ecause a co ordinate in the space of c corresp onds to several p oints in c Therefore there is

o i

no sothatc is a comp osition of and c It can neither b e dened as an homomorphism

o i

like those in FM b ecause the problem is not the conversion of a set of instances into one

instance whichisalways p erformed by union but deciding which is the set of instances to b e

converted dened by a function of family h

Drillacross This op eration changes the image set of the Cub e by means of an injective

function of the family g relationships in section can b e used for this purp ose

The space remains exactly the same only the cells placed in it change This function

relates instances of a Fact to instances of another one

i o

C C inj ectiv e

c c

c x c c x

o i i

Pro jection This just selects a subset of Measures from those available in the selected Cell

Since it works at the attribute level it is absolutely equivalenttothehomonym op eration

in Relational algebra

c x c c xm m

o m m i i  k

k

ChangeBase This op erations reallo cates exactly the same Cell in a new space It changes

the domain set of the Cub e by means of an injective function of the family f ie

relates p oints in an ndimensional nite space to p oints in an mdimensional nite space

Thus it actually mo dies the analysis dimensions used

i i o o

inj ectiv e L L L L

m  n 

c x c c x

o i i

Rollup It groups cellsinthe Cub e based on an aggregation hierarchy This op eration

mo dies the granularityofdataby means of an exhaustive function of the family h

ie relates instances of two Levels in the same Dimension corresp onding to a part

whole relationship It reduces the number of cells but not the number of Dimensions

c x c c y

o i i

y x

Dice By means of a predicate P over Descriptors this op eration allows to cho ose the

subset of p oints of interest out of the whole ndimensional space Like Pro jectionitis

absolutely equivalent to an op eration of Relational algebra In this case the op eration is

Selection

c x if P x

i

c x c

o P i

undef if P x

With this set of op erations we can derive Slicewhich reduces the dimensionality of the

original Cub e by xing a p ointinaDimension This is obtained by means of Dice and

ChangeBase op erations

c xsl ice c c

o L k i L L L L L k i

i i i n i

Lo oking to the empty cell in table it is clear that there is another op eration missing

whichwould allow to select the Cell wewant to query in the same waywecho ose Measuresor

Facts However the sp ecic Cell we analyze cannot b e selected by itself but it is absolutely

determined by the selected aggregation levels in every Dimension Moreover Drilldown

ie the inverse of Rollup is neither dened b ecause as argued in HS we can only apply

it if we previously p erformed a Rollup and did not lose the corresp ondences b etween cells

This can b e expressed as an undo of Rolluporifwe do not wanttokeep track of results

by means of views over the atomic data as in Vas Drilldown would b e really useful to

Dice at a higher level of aggregation than the result data

If wewant to know the pro duction cost of every pro duct sold under a given promotion by

month and plant we should p erform the following op erations over our AtomicSale schema

Dice to select promotion A Drillacross to Pro duction Fact Pro jection to

see just the desired Measure cost Rollup to obtain data at Month Level notice



that summarization op eration is not explicited b ecause a YAM schema shows how a given

KindOfMeasure must b e summarized along each Dimension and nally ChangeBase

to cho ose the appropriate ndimensional space to place data

AtomicS al e

M onthP lantP r oduct M onth cost P r oduction P r omotionA

Prop erty The cub e algebracomposedbytheseoperations is closed ie they operate on

Cub es and the result of al l operations is always a Cub e

Pro of Being closedseems clear for ChangeBase and Drillacross sincecomposition of

functions is always a function and al l functions in these operations are injective as wel l as for

Pro jection and Selection since the former only removes attributes from the image and the

latter removes points from the domain of the function Therefore al l this operations result in

an injective function from a cartesian product of Levelstoa Cell In the case of Rollup

being exhaustive implies that multidimensional operation denes a function over a Cell if there

is at least one y for every x so that y x then c wil l be dened for every x Moreover

o

being a function means the result is injective if y has only one image then c wil l only

i

belong to one c x resulting in dierent images for every x

o

Prop erty The cub e algebracomposed by these operations is complete ie any valid Cub e

can becomputed as the combination of a nite set of operations

Pro of Being complete is also true sinceifthereisanFDbetween two Cub es in the closure

of FDs thereisasequenceofoperations that al lows to obtain one from the other We can

change the left hand side of the function dening a Cub e ie the cartesian product of Levels



YAM Yet Another Multidimensional Model

in two ways the domain by means of ChangeBase and its elements by means of Dice

As can beseen in denition the right hand side of that function a Cellisdenedbytwo

characteristics a subject that can be changedbyDrillacross and an aggregation level

that can be changedbyRollup Attributes inside the Class can be selectedbymeans of

Pro jection

Prop erty The cub e algebracomposed by these operations is minimal ie none can be

expressed in terms of others nor can any bedropped without aecting their functionality and

the operations are atomic ie each operation performs exactly one task

Pro of This can beeasily inferredfrom the explanation of each operation above and table

Each operation works inside only one detail level Moreover they work either on factual

or dimensional data Rollup could be thought as working on both sides however it real ly

operates only on factual data based on dimensional data

We could also compare this set of op erations with those three in Vasa ie Navigate

Selection and Split measure Selection and Split measure are absolutely equivalent

to Dice and Pro jection resp ectively Regarding Navigate it could b e obtained by means of

Rollup and its corresp onding undo Navigate always op erates on a base cub e so that

atomic data can b e used to Drilldown Drillacross and ChangeBase havenocounter

part They work on semantic relationships b etween dierent Stars and functional dep endencies

between Levels in dierent Dimensions asso ciated to a Fact and were not treated as rst

class citizens in any other multidimensional mo del b efore

These op erations allow to build Cub es on solid mathematical foundations Semantic re

lationships in the multidimensional schema dene functions b etween Class es By comp osing

those functions appropriatelywe can obtain the desired vision of data If wewant to analyze

instances of a given Class in the space dened by the cartesian pro duct of a set of Class es all

wehave to do is nd the appropriate comp osition of functions If that chain of functions

exists we can analyze data in the desired way

Thus prop erties of mathematical functions can b e applied For instance

Similar to op erations b etween functions fopg f x op g x we can also dene

op erations b etween Cub es if b oth are dened over the same domain ndimensional

space

c op c c x op c x

   

If the op eration is dened over the image of the Cub es it is dened over Cub es Thus

Unionand Intersection of Cub es can b e easily dened as it is dened for cells

Two functions over dierent domains can dene a function over the union of the domains

This means that Cub esdenedover sub classes can give rise to a broader Cub e over a

sup erclass For instance if predicate P denes an sp ecialization

c x if P x



c x

o

c x if P x n n

This invites to parallelize the calculation of the cub e c

o

fopcx if P x



fopc x

o

fopc x if P x n

n

If the result of op erations is not a function it is not a Cub e Therefore byvalidating

whether a sequence of relationships is a function we can validate the existence of Cub es

Metaclasses

MultidimensionalSchema

1..* 1..* 1 * Dimension Star Fact Upper Level

Cube  Correspond to /Determine  1..* 1 part ** 1..* part * * 1 1..* 1..* 1 * LevelRelation Level Base Cell whole /CellRelation whole 1..* 1 * 1 * 1..* * *   Defined at

Intermediate Level /SummarizedCell FundamentalCell

1..** 1..* /SummarizedMeasure FundamentalMeasure * * 1..* Descriptor  Measure /From 1..*

type 1 InvalidSource *

Lower Level KindOfMeasure Induce {partially ordered} * 1 1..* 1 Derived element / Summarization SummaryParam * List type Generalization Composition Aggregation Binary−association

Transitive NonTransitive Ternary−association



Figure YAM metaclasses in UML notation as in OMGb

As it was dened in Inm a DW is a sub jectoriented set of data When analysts wantto

study a given sub ject they want to see together all data regarding it Thus a sub jectoriented

mo del where all Class es related to a sub ject are shown together in the multidimensional schema



is prop osed For this purp ose YAM uses the Upp er detail level which as depicted in gure



YAM Yet Another Multidimensional Model

shows that a Star is comp osed byoneFact and several Dimensions Sub jectoriented

do es not imply sub jectisolated Therefore relationships b etween dierent Stars will exist as

it was shown in section

At the Intermediate detail level we can see that Dimensions are comp osed by Levels

related by LevelRelations representing partwhole relationships Hence a Dimension is a

lattice stating how measured data can b e aggregated On the other hand we see that a Fact

is comp osed byasetofCells Eachofthose Cells is dened at an aggregation level for each

of the analysis dimensions of its Fact If there is a Level l whose elements are obtained



by grouping those of another Level l atwhichaCell c is dened then wehave another

 

Cell c related to l whose instances are comp osed by those of c Cells c and c are related

    

bya CellRelation which corresp onds to the LevelRelation between l and l Asetof

 

functionally indep endent Levelsforma Base and the pair BaseCell where the Base fully

determines instances of the CellisaCub e

Some data must b e physically stored while other will or could b e derived In the same way

some mo del elements must b e explicited in the schema while other for instance CellRelation

can b e derived In this sense those Cells that need to b e explicited ie FundamentalCells

are distinguished from those that do not ie SummarizedCells b ecause all data they

contain can b e derived

At Lower detail level we can see information regarding the attributes of the concepts

we are representing The Levelscontain Descriptors and the Cellscontain Measures

SummarizedCellsonlycontain data that can b e derived ie SummarizedMeasures

They are shown in the schema to outline the imp ortance of the Cell they are rst class

candidates to b e precalculated On the other hand FundamentalCells can contain derived

and not derived data they must b e physically stored SummarizedMeasures are obtained

from other Measures while FundamentalMeasures are not Notice that it is p ossible to

obtain one Measure from more than one supplier for instance to b e able to weigh an average

Every Dimension induces a Summarization over a given KindOfMeasure In general

SummarizedMeasures are obtained by sum of other However this is not always the case

pro duct minimum maximumaverage or any other op eration could b e used It dep ends on

the KindOfMeasure and the Dimension along whichwe are summarizing LS studies

the inuence of the temp oral dimension on three dierent kinds of attributes Thus when

wewant to obtain a SummarizedMeasure in a Cell c from a Measure in another Cell



c the Summarization p erformed is that induced bythe Dimension that contains the



LevelRelation to which the CellRelation between c and c corresp onds

 

Summarizationsover a KindOfMeasure are partially ordered to state that some must

b e p erformed b efore others Moreover some data at an aggregation level could b e an invalid



source to summarize some KindOfMeasures which is also captured in a YAM schema

A summarization op eration b eing nontransitive implies that any summarization that uses it

must b e done from the atomic data

Figure shows how all these multidimensional concepts p erfectly t into UML A Star is

a Package that contains a sub ject of analysis Facts and Dimensionsare Classier scontaining

Class es ie Cells and Levels resp ectively Finally Measure and Descriptor are just

ModelElement (from Core)

Parameter Attribute Operation Namespace GeneralizableElement Relationship Constraint (from Core) (from Core) (from Core) (from Core) (from Core) (from Core) (from Core)

<> <> <> <> Classifier Package Association <> SummaryParam Measure Descriptor Summarization (from Core) (from Model_Management) (from Core) Base

Class <> <> DataType <> Model <> <> (from Core) Fact Dimension (from Core) Star (from Model_Management) CellRelation LevelRelation

<> <> <> <> <>

Level Cell KindOfMeasure List MultidimensionalSchema



Figure Extension of UML with YAM stereotyp es



Attribute softhe Class es All other elements in YAM have also b een placed as Stereotype sof

some UML concept the formal denition of these Stereotype s is in app endix A Mayb e the

most relevantonesare CellRelation and LevelRelation that are Aggregation s Moreover a

Base is just a Constraint stating that a set of functionally indep endent Levels fully determine

instances of a Cell

This proves that multidimensional mo deling is just an sp ecialization of general data mo d

eling We could roughly say that all we are doing is splitting elements in the mo del based on

whether they refer to factual or dimensional data It can b e seen that some sp ecic concepts

are dened b esides prop erties and constraints of the new structures LST claims that ER

provides the complete functionality and supp ort necessary for OLAP applications Here we can

see that UML also provides such supp ort However it is well known that the more sp ecic the

Class es in a schema are the b etter they represent reality In the same way the more sp ecic our

data mo del is the b etter it will represent reality Therefore in multidimensional mo deling it is

imp ortant to show Facts Dimensions Cells Levels Measures and Descriptors instead

of just Class es Classier s and Attribute s

Comparison with other multidi mensional mo dels

Some OO multidimensional mo dels have already b een dened and some of them used UML

syntax to do it However this is the rst extension of UML for multidimensional mo deling

As previously said CWM do es extend UML Nevertheless it is not a multidimensional data

mo del but a metadata standard for data warehousing

In BSHD a list of requirements for a multidimensional mo del in order to b e suitable

for OLAP were derived from general design principles and from characteristics of OLAP ap

plications Ped also presents eleven requirements found in clinical data warehousing for

multidimensional data mo dels Vasagaveyet another classication of logical cub e mo dels



which are not considered here b ecause YAM is at conceptual level These comparisons are



YAM Yet Another Multidimensional Model

Metamo del Structures Constraints Op erations

Reference

p p p p p

Kim NL Rel p p p Headers p

p p p p p

LW Maths Rel p p A Relations

p p p p

AGS Maths p p p A Cub es

p p p p p

HS Maths DL p p p A Cub es

p p p p p p p

GL Maths Rel AC Cub es

p p p p p p p

DT Maths p A Cub es

p p p p p p p

CTa Maths AC Cub es

p p p

Leh NL A Cub es

p p p p p p

GMRb NL p QL Cub es p

p p p p p p p p p

TPGS NL OO p p p p p Cub es

p p p p p p

SBHD ER ER p

p p p p p p p

TBC NL ER p p p p

p p p p p p

BTW UML Cub es

p p

Vasa Maths A Cub es

p p p p p p p p p p p

Ped Maths p p A Cub es

p p p p p p p p p p

TKS NL

p p p p p p p p p p p p p p p p p

YAM UML UML p A Cub es p

Tick Supp orted in the mo del Hyphen Not supp orted or not explained how to supp ort it

p Partially supp orted NL Natural Language

Rel Relational DL Description Logics

ER EntityRelationship UML Unified Mo deling Language

OO Ob jectOriented paradigm QL Query Language

A Algebra C Calculus



Table Comparison b etween YAM and other multidimensional mo dels

reviewed in section In next pages the items most of them taken from those pap ers used

in the comparison of mo dels summarized in table are briey explained

Language used to dene the mo del This column shows the language mainly used

byevery multidimensional mo del to express its metaschema

Extended framework Some mo dels redene or extend concepts in other more general

mo dels or design frameworks which is reected in this column In spite of TP uses

UML notation it is not extending UML b ecause neither stereotyp es prop erties nor con

straints ie the extension mechanisms of UML are used on dening the multidimensional

mo del

Explicit separation of structure and contents from BSHD The data structure

should b e represented in the schema while the contents should corresp ond to instances

Explicit aggregation hierarchies from BSHDandPed The mo del should

showhow data can b e successively aggregated along analysis dimensions

Multiple hierarchies in each Dimension from Ped Although aggregation hi

erarchies can b e linear most dimensions showmultiple aggregation paths so this should

also b e allowed

Dimension attributes from BSHD Showing other characteristics of the analysis

dimensions that do not dene hierarchies should also b e p ossible

Measures sets from BSHD This refers to the p ossibility of dening complex

Cell structures grouping more that one Measure related to the same Fact Supp ort

provided byAGS is considered partial b ecause in spite of it allows to manage tuples

of measurements they do not haveany extra meaning as a whole

Measures at dierent levels of granularity Measurements could b e taken at dierent

aggregation levels If so Measures b elonging to the same Factoreven showing the

same kind of measure should b e related in some wayPed prop oses a comparison item

slightly similar to this However it is stated as having exactly the same kind of measure

b eing measured at dierent aggregation levels so that sometimes it should b e stored in



a Cell and others in a dierentoneItwould b e solved in YAM by sp ecializing the

Cells dep ending on whether the Measure is derived or not

Descriptions and measurements are treated symmetrically from BSHDand

Ped The data mo del should allow Facts to b e treated as Dimensions and vice



versa YAM allows the usage of measurements as descriptors for other measurements

by means of derivation mechanisms

Multistar schemas Users should not b e restricted to an isolated sub ject They need

to see several Facts in one schema It is not enough sharing Dimensions as in Kim

since richer semantic relationships can b e used

Generalization relationships Generalization s should b e shown b etween Dimensions

and Facts either in the same Star or in dierent Stars as well as b etween Levelsand

Cells

Association relationships Representing Association s should b e allowed b etween Di

mensions and Facts either in the same Star or in dierent Stars as well as b etween

Levels and Cells

Change and time from Ped Although the business b eing reected in the schema

change it should b e p ossible to compare data over time

Derived elements from BSHD The denition of concepts by means of other

concepts should b e part of the schema

Imprecision from Ped The problem of representing and querying imprecise data



has not b een tackled in YAM

Nononto hierarchies from Ped That is hierarchies with paths of dierent lengths



from the ro ot to the leaves should b e represented YAM do es not fulll this p oint

b ecause every ob ject in an aggregation level must have the same structure ie the Class

structure Thus it is not p ossible that some instances of a Class can b e divided into

parts while others can not if so it should b e sp ecialized in some way

Noncovering hierarchies from Ped That is hierarchies where there exist rela

tionships b etween instances of Levels that are not directly related It is not necessary to

b e supp orted in this mo del b ecause if those relationships really exist they should b e ex

plicitly represented in the schema by a partwhole relationship b etween the corresp onding

Levels



YAM Yet Another Multidimensional Model

Manytomany relationships b etween two Levels from Ped Some mo dels

just mention the p ossibilityofhaving this kind of relationships ie AGS HS and

DT

Manytomany relationshipsbetween Fact and Dimension from Ped There



is no constraint forbidding this in YAM Like these relationships are allowed in UML



so they are in YAM However we can always see it as the fact b eing related to one

set of elements in the Dimension so that we obtain a toone relationship with a new

Dimension of sets of elements SRME analyzes dierent implementations of these

relationships

Additivity semantics from BSHD and Ped Multidimensional mo dels should

showhow a concept is obtained if it can at coarser granularities and which aggregation

functions can b e applied to a given Measure in order to obtain the same KindOfMea

sure at higher aggregation levels Some multidimensional mo dels likeGMRbor

TP show p ossible functions that can b e applied to a MeasureHowever they do

not show the sp ecic op eration that keeps the meaning of the measurement at coarser

aggregation levels

Identication of facts The mo del should showhow the dierent data sub ject of analysis

canbeidentied by means of dimensional data Most mo dels just show the aggregation

levels at which data are taken but they do not show the functional dep endency that fully

determine the measurements Vasamentions that the data set in a cub e is a set of

tuples suchthatcontains a primary keyHowever it is not reected by his mo del in any

way

Mathematical construct used for the op erations from BSHD This column

shows the mathematical formalism used in the mo dels to dene the op erations over data

Elements over which op erations are dened

Queries using adho c hierarchies not included in the schema from BSHD

In order to roll data up it is necessary a function showing the corresp ondence b etween



Levels If that function is not in the schema where is it YAM allows to dene sp ecic

star schemas for every user prole Thus adho c hierarchies for adho c queries can b e

dened there

User dened aggregation functions from BSHD As any op eration can b e



dened in a UML schema so YAM supp orts it

Drillacross Some mo dels allowto drillacross if the Stars share analysis dimensions

However we can nd semantic relationships that also allowit

Conclusions

In the last years lots of work have b een devoted to OLAP technology in general and multidi

mensional mo deling in particular However there is no well accepted mo del yet Moreover in

spite of the acceptance of the OO paradigm only a couple of eorts takeitinto account for

conceptual mo deling



In this chapter YAM a multidimensional conceptual mo del which allows the usage of

semantic OO relationships b etween dierent Stars has b een presented This mo del has b een

dened as an extension of UML to makeitmuch more understandable and avoid its denition

from scratch As a side eect this shows that multidimensional mo deling is just an sp ecial case

of data mo deling

Structures in the mo del have b een dened by means of metaclasses which are sp ecialization

of UML metaclasses Thus p ossible relationships among multidimensional elements have b een

systematically studied in terms of UML relationships among its elements so that they allowto

show semantically richmultistar schemas The inherentintegrity constraints of the mo del pay

sp ecial attention to identication of data and summarizability providing much more exibility

than those of previous multidimensional mo dels

It could b e argued that all those semantic relationships and integrity constraints make



YAM to o complex or even cumb ersome However we could nd CASE to ols to ease designers

work As shown in app endix A standard extension mechanisms of UML havebeenusedto



dene YAM constructs Therefore any CASE to ol following UML standard could easily b e

adapted for multidimensional design

Finallya setofwellknown multidimensional op erations has b een explained in this frame

work by means of functions Understanding op erations over Cub es as op erations over math

ematical functions would allow to apply work done in that eld to multidimensional query

pro cessing This set has b een shown as a closed and complete algebra for Cub es Sp ecically

an op eration to change the Dimensionsofa Cub e ie ChangeBase has b een dened

Thus byhaving candidate Bases and this op eration the most appropriate representation of

data can b e selected in every situation

Conclusions

Chapter

Conclusions

b efore I could come to any conclusion it o ccurred to me that myspeechor

my silence indeed any action of mine would b e a mere futility What did it matter

what anyone knew or ignored What did it matter who was manager One gets

sometimes such a ash of insight The essentials of this aair lay deep under the

surface b eyond my reach and b eyond mypower of meddling

Joseph Conrad

The aim of this last chapter is to outline the main contributions of this thesis Moreover

some research lines continuing this work are also indicated in section

Survey of results



The main contribution of this thesis is YAM However there are other results that were

obtained in the way to it that should also b e stressed here

Firstly the dierent Data Warehousing schemas were placed in the framework of Fed

erated Information Systems FIS This allowed to study them from a dierentpoint of view

and understand their usefulness much b etter Leaving aside the imp ortance of p erformance

and paying sp ecial attention to conceptual design a seven layers architecture of conceptual

schemas was prop osed to integrate the DWina FISFrom that architecture of schemas and

the b enets of OO data mo dels as canonical mo dels for federations blo omed the idea of using

OO concepts on designing the DW Out of the dierent problems that raised in that context

it was decided to deep into the improvementofmultidimensional data mo dels

Multidimensionality as suchwas not b orn in the researchcommunity but as a resp onse of

to ols vendors to the demand of analysts Thus there was not a strong mathematical foundation

for it like that of the Relational databases Concepts were not clearly stated and most eorts

were devoted to improve p erformance and presentation In the last years it captured the

attention of researchers and data mo dels have app eared without a standard not even well

accepted nomenclature Therefore another imp ortant milestone in this work was to dene a

framework that allowed to classify and compare all previous work Six dierent elements were

identied in multidimensional mo deling literature A few mo dels used all of them others only

a subset However all constructs in these mo dels p erfectly t into the classication grid of six

holes presented in this thesis

The six elements in the multidimensional mo dels were clearly divided by the duality fact

dimension Thus they have b een carefully studied separatedly Firstly some authors already

p ointed out that dimensional relationships b etween aggregation levels were partwhole rela

tionships However most pap ers just referred to them as rollup relationships Anyway

prop erties of aggregation hierarchies were always imp osed never deduced Thus this thesis

shows how they can b e demonstrated from mereology axioms Regarding factual data they

used to b e considered as pure numb ers that can b e op erated in some wayHowever it is also

shown in this thesis that we can consider it as a commutative semigroup with union that

allows to dene classes based on sub jects and aggregation levels and that can b e placed in

ndimensional spaces dened by functional dep endencies from dimensional data

All that study was done with the OO paradigm in mind The ultimate goal was to b enet

from its semantic relationships to improvemultidimensional mo deling Thus the rst problem

was that most OLAP to ols consider semantically p o or isolated star shap e schemas To solve

this an architecture of schemas was prop osed so that we could havesemantically related stars

while they can still b e easily implemented

Finallysemantic relationships b etween multidimensional elements were studied For this

purp ose the relationship constructs oered by UML standard were analyzed The usefulness

of each and every relationship was exemplied and their consequences in the data structures

showed

Data structures integrity constraints and op erations had to b e dened in order to havea

true data mo del For the structures it was chosen to extend UML by means of the mechanisms

it oers ie Stereotypes Regarding integrity constraints due to the imp ortance of aggrega

tion they pay sp ecial attention to it Moreover another forgotten p ointwas also considered

here identication of multidimensional data Lastly since data cub es were found to b e func

tions a closed and complete algebra of multidimensional op erations was dened in terms of

mathematical functions

Future work

This thesis work can b e continued following several dierent research lines It can b e related

to other areas like database security temp oral issues query optimization and translation to

logicalphysical level metho dologies or just keep on studying mo deling problems at conceptual

level

Oli has studied the problems of integrating the securitylayers of dierentcomponent

databases in a federation The architecture of seven levels of schemas shows that this work

can also b e used for Data Warehousing Nevertheless its application is not automatic New

problems arise mainly due to materialization of data for instance p olyinstantiation

Conclusions

As already said the schema of the DWhascharacteristics of a bitemp oral database Thus

it should b e studied how this area can b enet from those advances in temp oral databases

Moreover schemaaswell as data evolution in the analysis dimensions should also b e studied

in detail

A semantically richschema is really useful to help users on understanding data However it

would also b e half used if not taken into account for query optimization Semantic optimization

should b e considered sp ecially for drillingacross as a future sub ject to b e studied Moreover



mathematical theory of functions could also b e used on query optimization b ecause YAM

op erations work on functions

An essential issue not tackled in this thesis is the study of a metho dology for schema deni

tion Patterns should b e detected in the dataoriented DWschema in order to b e translated in

some way to the queryoriented DM schema Multidimensional structures should b e identied

and captured from a nondimensional schema Moreover the denition of multidimensional

views should also b e studied in order to supp ort symmetric usage of factual and dimensional

data as well as adho c hierarchies

Once the multidimensional conceptual schema has b een dened it is necessary to implement

it It is well known that several options are available at this p oint The implementation can b e

done on a Relational DBMS an OO DBMS or a pure multidimensional DBMS Thus either



the implementation of YAM on a MOLAP to ol or its translation to Relational or ODMG

standards should b e taken under consideration Performance issues could b e considered at this

p oint for instance the sparsity of cub es

These two tasks more closely related to mo deling issues ie dening a metho dology and

implementing multidimensional schemas on a given kind of DBMS would b e closely related to

the implementation of a CASE to ol Two p ossibilities can b e considered at this p oint Firstly



any CASE to ol following UML standard could b e extended to supp ort YAM constructs The

other option would b e build a new to ol from scratch whichwould allow the implementation of

more sp ecic features Both strategies lo ok promising

Finallyseveral conceptual multidimensional mo deling problems are still op en As can b e

read in AFGP Aggregation and Association relationships are closely related Namely there

are some prop erties that the whole inherits from its parts ex b eing defective others that the

parts inherit from the whole they are part of ex lo cation and some prop erties in the parts

which are systematically related to prop erties of the whole ex weight of parts b eing less than

weight of the whole The implications on aggregabilityaswell as the inheritance of prop erties

between parts and wholes is another interesting researchlinetofollow

Bibliography

Bibliography

AFGP Alessandro Artale Enrico Franconi Nicola Guarino and Luca Pazzi PartWhole

relations in Ob jectcentered systems an overview Data and Know ledge Engineering

DKE

AGS Rakesh Agrawal Ashish Gupta and Sunita Sarawagi Mo deling Multidimensional

Databases In Proceedings of th International Conference on Data Engineering

ICDE pages IEEE Computer So ciety

AHV Serge Abiteb oul Richard Hull and Victor Vianu Foundations of Databases

AddisonWesley

AORS Alb erto Ab ello Marta Oliva Elena Ro drguez and Felix Saltor The syntax of

BLOOM schemas Technical Rep ort LSIR Departament de Llenguatges i

Sistemes Informatics Universitat Politecnica de Catalunya

AR Alb erto Ab ello and Elena Ro drguez Describing BLOOM with regard to UML

Semantics In Proceedings of the V Jornadas de Ingeniera del Software y Bases de

Datos JISBD pages Gracas Andres Martn

BCN Carlo Batini Stefano Ceri and ShamkantBNavathe Conceptual Database Design

an EntityRelationship Approach BenjaminCummings

BFG Elisa Bertino Elena Ferrari and Giovanna Guerrini T Chimera A Temp oral

Ob jectOriented Data Mo del Theory and Practice of Object Sytems



BFJ Thomas Burns Elizab eth N Fong David Jeerson Richard Knox Leo Mark

Christopher Reedy Louis Reich Nick Roussop oulos and Walter Truszkowski Ref

erence Mo del for DBMS Standardization Database Architecture Framework Task

Group DAFTG of the ANSIXSPARC Database System Study Group SIG

MOD Record

BHL Andreas Bauer Wolfgang Hummer and Wolfgang Lehner An Alternative Rela

tional OLAP Mo deling Approach In Proceedings of the nd International Confer

ence on Data Warehousing and Know ledge Discovery DaWaK volume

of LNCS pages Springer

Bibliography

BPT Elena Baralis Stefano Parab oschi and Ernest Teniente Materialized views selection

in a multidimensional database In Proceedings of the rd International Conference

on Very Large Data Bases VLDB pages Morgan Kaufmann

BSH Jan W Buzydlowski IlYeol Song and Lewis Hassell A Framework for Ob ject

Oriented Online Analytical Pro cessing In Proceedings of the st International

Workshop on Data Warehousing and OLAP DOLAP pages ACM

BSHD Markus Blaschka Carsten Sapia Gabriele Hoing and Barbara Dinter Finding

your way through multidimensional data mo dels In Proceedings of th International

Workshop on Database and Expert Systems Applications DEXA pages

IEEE Computer So ciety

BTW Nguyen Thanh Binh A Mint Tjoa and Roland R Wagner An Ob ject Oriented

Multidimensional Data Mo del for OLAP In Proceedings of the st International

ConferenceonWebAge Information Management WAIMvolume of

LNCS pages Springer

CCS E F Co dd S B Co dd and C T Salley Providing OLAP to useranalysts An

IT mandate Technical rep ort E F Co dd Asso ciates

CD Sura jit Chaudhuri and Umeshwar Dayal An overview of data warehousing and

OLAP technology SIGMOD Record

Co d E F Co dd Extending the relational mo del to capture more meaning ACM Trans

actions on Database Systems

CSG Malu Castellanos Felix Saltor and Manolo GarcaSolaco A Canonical Mo del

for the Interop erability among Ob jectOriented and Relational Databases In Dis

tributed Object Management Proceedings International Workshop on Distributed

Object Management pages Morgan Kaufmann

CT Luca Cabibb o and Ricardo Torlone Querying Multidimensional Databases In

Proceedings of the th International Workshop on Database Programming Languages

DBPL pages Springer

CTa Luca Cabibb o and Ricardo Torlone A Logical Approach to Multidimensional

Databases In Advances in Database Technology EDBTvolume of LNCS

pages Springer

CTb Luca Cabibb o and Ricardo Torlone From a Pro cedural to a Visual Query Language

for OLAP In Proceedings of the th International Conference on Scientic and

Statistical Database Management SSDBM pages IEEE Computer

So ciety

Bibliography



DGK Curtis E Dyreson Fabio Grandi Wolfgang Kafer NickKlineNikos Lorentzos

Yannis Mitsop oulos Angelo Montanari Daniel Nonen Elisa Peressi Barbara Per

niciJohnFRoddick Nandlal L Sarda Maria Rita Scalas Arie Segev Richard T

Sno dgrass MikeDSooAbdullahTansel Paolo Tib erio and Gio Wiederhold A

consensus glossary of temp oral database concepts SIGMOD Record

DSHB Barbara Dinter Carsten Sapia Gabriele Hoing and Markus Blaschka The OLAP

Market State of the Art and Research Issues Journal of Computer Scienceand

Information Management

DT Anindya Datta and Helen Thomas A Conceptual Mo del and an algebra for OnLine

Analytical Pro cessing in Data Warehouses In Proceedings of the th Workshop on

Information Technologies and Systems WITS pages

Dyr Curtis E Dyreson Information retrieval from an incomplete data cub e In Proceed

ings of the nd International ConferenceonVery Large Data Bases VLDB

pages Morgan Kaufmann

EK Johann Eder and Christian Koncilia Changes of Dimension Data in Temp oral

Data Warehouses In Proceedings of the rd International Conference on Data

Warehousing and Know ledge Discovery DaWaK volume of LNCS

pages Springer

EN Ramez Elmasri and ShamkantBNavathe Fundamentals of Database Systems

Benjamin Cummings third edition

FBSV Enrico Franconi Franz Baader Ulrike Sattler and Panos Vassiliadis Fundamentals

of Data Warehousingchapter Multidimensional Data Mo dels and Aggregation

pages SpringerVerlag Matthias Jarke Maurizio Lenzerini Yannis

Vassilious and Panos Vassiliadis editors

Fir Joseph M Firestone Ob jectOriented Data Warehousing Technical rep ort Exec

utive Information Systems Inc White Pap er No Five

FM Leonidas Fegaras and David Maier Towards an Eective Calculus for Ob ject Query

Languages In Proceedings of the ACM SIGMOD International Conferenceon

Management of Data SIGMOD pages ACM Press

FS Enrico Franconi and Ulrike Sattler A data warehouse conceptual data mo del

for multidimensional aggregation In Proceedings of the st International Work

shop on Design and Management of Data Warehouses DMDW CEURWS

httpwwwceurwsorg

Gar Stephen R Gardner Building the data warehouse Communications of the ACM

Bibliography

GBLP Jim Gray Adam Bosworth Andrew Layman and Hamid Pirahesh Data cub e A

relational aggregation op erator generalizing groupby crosstab and subtotals In

Proceedings of the th International Conference on Data Engineering ICDE

pages IEEE Computer So ciety

GCS Manolo GarcaSolaco Malu Castellanos and Felix Saltor A SemanticDiscrimi

nated ApproachtoIntegration of Federated Databases In Proceedings of the rd

International ConferenceonCooperative Information Systems CoopIS pages

UniversityofToronto

Gio William A Giovinazzo ObjectOriented Data Warehouse DesignPrentice Hall

GJJ Michael Gebhardt Matthias Jarke and Stephan Jacobs A To olkit for Negotiation

Supp ort Interfaces to MultiDimensional Data SIGMOD Record

GL Marc Gyssens and Laks V S Lakshmanan A Foundation for Multidimensional

Databases In Proceedings of rd International ConferenceonVery Large Data

Bases VLDB pages Morgan Kaufmann Publishers

GL Frederic Gingras and Laks V S Lakshmanan nDSQL A multidimensional lan

guage for interop erability and OLAP In Proceedings of the th International

ConferenceonVery Large Data Bases VLDB pages

GLK Vivekanand Gopalkrishnan Qing Li and Kamalakar Karlapalem StarSnowFlake

Schema Driven Ob jectRelational Data Warehouse Design and Query Pro cessing

Strategies In Proceedings of the st International Workshop on Data Warehousing

and Know ledge Discovery DaWaK volume of LNCS pages

Springer

GMRa Matteo Golfarelli Dario Maio and Stefano Rizzi Conceptual Design of Data Ware

housing from ER Schemes In Proceedings of the st Hawaii International Con

ference on System Sciences pages IEEE Computer So ciety

GMRb Matteo Golfarelli Dario Maio and Stefano Rizzi The Dimensional Fact Mo del

a Conceptual Mo del for Data Warehouses International Journal of Cooperative

Information Systems

GP Peter Gerstl and Simone Pribb enow Midwinters end games and b o dy parts A

classication of partwhole relations International Journal of HumanComputer

Studies

GR Matteo Golfarelli and Stefano Rizzi A Metho dological Framework for Data Ware

house Design In Proceedigns of the st International Workshop on Data Warehous

ing and OLAP DOLAP pages ACM

Bibliography

GR Matteo Golfarelli and Stefano Rizzi Designing the data warehouse key steps and

crucial issues Journal of Computer Science and Information Management

GSC Manolo GarcaSolaco Felix Saltor and Malu Castellanos A Structure Based

Schema Integration Metho dologyInProceedings of the th International Con

ference on Data Engineering ICDE pages IEEE Computer So ciety

HLV Bo do Husemann Jens Lechtenborger and Gottfried Vossen Conceptual Data

Warehouse Design In Proceedings of the nd International Workshop on Design and

Management of Data Warehouses DMDW CEURWS httpwwwceur

wsorg

HRU Venky Harinarayan Anand Ra jaraman and Jerey D Ullman Implementing data

cub es eciently SIGMOD Record

HS MohandSad Hacid and Ulrike Sattler An Ob jectCentered Multidimensional

Data Mo del with Hierarchically Structured Dimensions In Proceedings of the IEEE

Know ledge and Data Engineering Exchange Workshop KDEX pages

IEEE Computer So ciety

I IB William H Inmon Claudia Imho and Greg Battas Building the operational data

store John Wiley Sons

I IS William H Inmon Claudia Imho and Ryan Sousa Corporate Information Fac

tory John Wiley Sons

Inm William H Inmon Building the Data Warehouse John Wiley Sons second

edition

ISO ISO ISOIEC Information technology Database languages SQL

International Organization for Standardization

JLS Hosagrahar V Jagadish Laks V S Lakshmanan and Divesh Srivastava What can

Hierarchies do for Data Warehouses In Proceedings of th International Confer

enceonVery Large Data Bases VLDB pages Morgan Kaufmann

JLVV Matthias Jarke Maurizio Lenzerini Yannis Vassilious and Panos Vassiliadis edi

tors Fundamentals of Data Warehousing SpringerVerlag

Kim Ralph Kimball The Data Warehouse toolkit John Wiley Sons

KRRT Ralph Kimball Laura Reeves Margy Ross and Warren Thornthwaite The Data

Warehouse lifecycle toolkit John Wiley Sons

Bibliography

LAW Wolfgang Lehner Jens Albrecht and Hartmut Wedekind Normal Forms for Mul

tidimensional Databases In Proceedings of th International ConferenceonSta

tistical and Scientic Database Management SSDBM pages IEEE

Computer So ciety

Leh Wolfgang Lehner Mo deling Large Scale OLAP Scenarios In Advances in Database

Technology EDBTvolume of LNCS pages Springer

LS HansJ Lenz and Arie Shoshani Summarizability in OLAP and Statistical Data

Bases In Proceedings of the th International Conference on Scientic and Statisti

cal Database Management SSDBM pages IEEE Computer So ciety

LST Jana Lewerenz KlausDieter Schewe and Bernhard Thalheim Mo delling Data

Warehouses and OLAP Applications by Means of Dialoge Ob jects In Proceedings

of the th International ConferenceonConceptual Modeling ERvolume

of LNCS pages Springer

LW Chang Li and X Sean Wang A data mo del for supp orting online analytical pro

cessing In Proceedings of the th International Conference on Information and

Know ledge Management CIKM pages

MK Daniel L Mo o dy and Mark A R Kortink From Enterprise Mo dels to Dimensional

Mo dels A Metho dology for Data Warehouse and Data Mart Design In Proceedings

of the nd International Workshop on Design and Management of Data Warehouses

DMDW CEURWS httpwwwceurwsorg

MO James Martin and James Odell ObjectOriented Methods Pragmatic Considera

tions PrenticeHall

MTW Oscar Mangisengi A Min Tjoa and Roland R Wagner Multidimensional Mo deling

Approaches for OLAP Based on Extended Relational Concepts In Proceedings of

the th International Database Conference on Heterogeneous and Internet Databases

IDC

OLA OLAP Council OLAP and OLAP Server Denitions Available at the URL

httpwwwolap councilorgresearchglossarylyhtm

Oli Marta Oliva Integracio dels Criteris de Seguretat per Realitzar el Control dAcces

en un Sistema Federat de Bases de Dades Heterogenies PhD thesis Department

de Llenguatges i Sistemes Informatics Universitat Politecnica de Catalunya

OMGa OMG Common Warehouse MetamodelFebruary Version

OMGb OMG UniedModeling Language Specication Septemb er Version

Bibliography

OS Marta Oliva and Felix Saltor Integrating Multilevel SecurityPolicies in Multilevel

Federated Database Systems In Data and Applications Security Developments

and Directions Proceedings of the th IFIP Working ConferenceinDatabase

and Applications Security DBSec pages Kluwer Academic Publishers

Ped Torb en B Pedersen Aspects of Data Modeling and Query Processing for Complex

Multidimensional Data PhD thesis Faculty of Engineering Science Aalb org

University

Pen Nigel Pendse The OLAP Rep ort What is OLAP Available at the URL

httpwwwolaprep ortcomfasmihtml Ltd

PJ Torb en B Pedersen and Christian S Jensen Research Issues in Clinical Data

Warehousing In Proceedings of the th International Conference on Statistical

and Scientic Database Management SSDBM pages IEEE Computer

So ciety

PJ Torb en B Pedersen and Chistian S Jensen Multidimensional Data Mo deling for

Complex Data In Proceedings of the th International Conference on Data Engi

neering ICDE pages IEEE Computer So ciety

PR Elaheh Pourabbas and Maurizio Rafanelli Characterization of Hierarchies and Some

Op erators in OLAP Environment In Proceedings of the nd International Workshop

on Data Warehousing and OLAP DOLAP pages ACM



RAO Elena Ro drguez Alb erto Ab ello Marta Oliva Felix Saltor Cecilia Delgado Ela

dio Garv and Jose Samos On Op erations along the GeneralizationSp ecialization

Dimension In Proceedings of the th International Workshop on Engineering Fed

erated Information Systems EFIS pages IOS Press

ROSC Elena Ro drguez Marta Oliva Felix Saltor and Benet Camp derrich On Schema

and Functional Architectures for Multilevel Secure and Multiuser Mo del Federated

DB Systems In Proceedings of the International Workshop on Engineering Feder

ated Database Systems EFDBS held in conjuntion with CAISE pages

SA Richard Sno dgrass and Ilso o Ahn Temp oral Databases IEEE Computer

Sal Felix Saltor Panorama Informaticochapter Semantica de datos pages

Federacion Espanola de So ciedades de Informatica FESI

SBHD Carsten Sapia Markus Blaschka Gabriele Hoing and Barbara Dinter Extending

the ER Mo del for the Multidimensional Paradigm In Proceedings of the st Inter

national Workshop on Data Warehouse and Data Mining DWDM in conjunction

with ERvolume of LNCS pages Springer

Bibliography

SCdMM Adolfo Sanchez Jose Mara Cavero Adoracion de Miguel and Paloma Martnez

IDEA A Conceptual Multidimensional Data Mo del and Some Metho dological Im

plications In Proceedings of the VI Congreso Internacional de Investigacion en

Ciencias Computacionales CIICC pages Instituto Tecnologico de

Cancun

SCG Felix Saltor Malu Castellanos and Manolo GarcaSolaco Suitability of Data

Mo dels as Canonical Mo dels for Federated DBs SIGMOD Record

Sho Arie Shoshani OLAP and Statistical Databases Similarities and Dierences In

Proceedings of the th ACM SIGACTSIGMODSIGARTSymposium on Princi

ples of Database Systems PODS pages ACM Press

SL Amit P Sheth and James A Larson Federated Database Systems for Managing

Distributed Heterogeneous and Autonomous Databases ACM Computing Surveys

SR Arie Shoshani and Maurizio Rafanelli A Mo del for Representing Statistical Ob jects

In Proceedings of the rd International Conference on Information Systems and

Management of Data COMAD McGrawHill

SR Felix Saltor and Elena Ro drguez On Intelligent Access to Heterogeneous Informa

tion In Proceedings of the th International Workshop on Know ledge Representation

meets DataBases KRDBi CEURWS

SRME IlYeol Song William Rowen Carl Medsker and Edward Ewen An Analysis of

ManytoMany Relationships Between Fact and Dimension Tables in Dimensional

Mo deling In Proceedings of the rd International Workshop on Design and Manage

ment of Data Warehouses DMDW CEURWS httpwwwceurwsorg

SSSB Jose Samos Felix Saltor Jaume Sistac and Agust Bardes Database Architec

ture for Data Warehousing An Evolutionary Approach In Proceedings of th In

ternational ConferenceonDatabase and Expert Systems Applications DEXA

volume of LNCS pages Springer

Sto Veda C Storey Understanding semantic relationships VLDB Journal Very Large

Data Bases Octob er

TBC Nectaria Tryfona Frank Busb org and Jens G Borch Christiansen starER A

conceptual mo del for data warehouse design In Proceedings of the nd International

Workshop on Data Warehousing and OLAP DOLAP pages ACM

Tes Olivier Teste Towards Conceptual Multidimensional Design in Decision Supp ort

Systems In Proceedings of the th EastEuropean ConferenceonAdvances in

Databases and Information Systems ADBIS pages

Bibliography

Tha Bernhard Thalheim Dependencies in Relational Databases BG Teubner

Tho Erik Thomsen OLAP Solutions John Wiley Sons

TKS Aris Tsois Nikos Karayannidis and Timos Sellis MAC Conceptual Data Mo d

eling for OLAP In Proceedings of the rd International Workshop on Design and

Management of Data Warehouses DMDW CEURWS httpwwwceur

wsorg

TP Juan Carlos Trujillo and Manuel Palomar An Ob jectOriented Approach to Multi

dimensional Database Conceptual Mo deling In Proceedings of the st International

Workshop on Data Warehousing and OLAP DOLAP pages ACM

TPG Juan Carlos Trujillo Manuel Palomar and Jaime Gomez Applying Ob ject

Oriented Conceptual Mo deling Techniques to the Design of Multidimensional

Databases and OLAP applications In Proceedings of the st International Confer

enceonWebAge Information Management WAIMvolume of LNCS

pages Springer

TPGS Juan Carlos Trujillo Manuel Palomar Jaime Gomez and IlYeol Song Designing

Data Warehouses with OO Conceptual Mo dels IEEE Computer

Tru Juan Carlos Trujillo El modelo GOLD un modelo conceptual orientado a objetos

paraeldiseno de aplicaciones PhD thesis Departamento de Lengua jes y Sistemas

Informaticos Universidad de Alicante

TS Dimitri Theo doratos and Timos K Sellis Data Warehouse Conguration In

Proceedings of the rd International ConferenceonVery Large Data Bases

VLDB pages Morgan Kaufmann

UW Jerey D Ullman and Jennifer Widom A First Course in Database Systems

PrenticeHall

Vas Panos Vassiliadis Mo deling Multidimensional Databases Cub es and Cub e op era

tions In Proceedings of the th International Conference on Scientic and Statis

tical Database Management SSDBM pages IEEE Computer So ciety

Vasa Panos Vassiliadis Data Warehouse Modeling and Quality Issues PhD thesis De

partment of Electrical and Computer Engineering National Technical University

of Athens

Vasb Panos Vassiliadis Gulliver in the land of data warehousing practical exp eriences

and observations of a researcher In Proceedings of the nd International Work

shop on Design and Management of Data Warehouses DMDW CEURWS

httpwwwceurwsorg

Bibliography

VS Panos Vassiliadis and Timos Sellis ASurvey of Logical Mo dels for OLAP

Databases SIGMOD Record

WB MingChuan Wu and Alejandro PBuchmann Research issues in data warehous

ing In Datenbanksysteme in Bur o Technik und Wissenschaft BTW Informatik

Aktuell pages Springer

Wid Jennifer Widom Research problems in data warehousing In Proceedigns of the th

International Conference on Information and Know ledge Management CIKM

pages ACM

UML Prole for Multidimensional Modeling

App endix A

UML Prole for

Multidimensional Mo deling

The nice thing ab out standards is that there are so many of them to cho ose

from

Andrew S Tanenbaum

This app endix contains a UML Prole for multidimensional mo deling It is the formaliza



tion using UML standard notation in OMGb of YAM Thus UML is customized for a

multidimensional domain

Firstly section A intro duces the prole and A contains a table of the stereotyp es dened

in it Section A shows the denition of each Stereotype following UML notation Finally

section A lists the integrity constraints in OCL

A Intro duction

This Prole aims to facilitate the mo deling of multidimensional data Basicallyitdenestwo

Stereotype s for some metaclasses in the UML core Thus it allows to represent factual and

dimensional data

Moreover b esides those Stereotype s others are also dened to facilitate the representation

of sp ecic multidimensional constraints regarding summarizability and identication of data

Some Stereotypes are also sp ecialized to establish the dierence b etween basic and derived

elements in the mo del

A Summary of Prole

The Stereotype s that are dened bythisProle are summarized in the following table

Stereotyp e Base Class

MultidimensionalSchema Model

Star Package

Fact Classier

Dimension Classier

Cell Class

SummarizedCell Class

FundamentalCell Class

Level Class

Measure Attribute

SummarizedMeasure Attribute

FundamentalMeasure Attribute

Descriptor Attribute

Base Constraint

Summarization Operation

Transitive Operation

NonTransitive Operation

SummaryParam Parameter

CellRelation Association

LevelRelation Association

KindOfMeasure DataType

List DataType

Induction ModelElement

A Stereotyp es and Notation

Data multidimensionally mo deled consist of several interrelated Stars Each Star is comp osed

by one Fact and several Dimensions On the one hand Facts are comp osed by Cells and

these are comp osed by Measures On the other hand Dimensions are comp osed by Levels

and these are comp osed by Descriptors In addition there are Stereotype stoshowhow data

can b e aggregated and identied

A MultidimensionalSchema

Stereotyp e Base Class Parent Tags Constraints Description

Multidimensional Model NA stars NA Sp ecifies a schema showing multi

Schema dimensional data comp osed bydif

ferent Stars

A Model stereotyp ed as MultidimensionalSchema is the notation used for a Multidi

mensionalSchema

Tag Stereotyp e Typ e Multiplicity Description

stars Multidimensional stereotyp e Shows the different star shap e

Schema Star schemas in the multidimensional

schema

UML Prole for Multidimensional Modeling

A Star

Stereotyp e Base Class Parent Tags Constraints Description

Star Package NA fact dimen The Fact is asso ciated to ev Sp ecifies a star shap e schema It

sions ery Dimension represents data regarding only one

sub ject of analysis

The notation used for a Star is a Package stereotyp ed as Star

Tag Stereotyp e Typ e Multiplicity Description

fact Star stereotyp e Shows the sub ject of analysis

Fact

dimensions Star stereotyp e Shows the differentpoints of view

Dimension analysts could use on the study of

the sub ject of analysis

A Fact

Stereotyp e Base Class Parent Tags Constraints Description

Fact Classier NA cells The Cells are related by CellRe Sp ecifies a sub ject of

lations A Fact can neither b e analysis

sp ecialization of a Dimension

nor b e obtained byevolution of

a Dimension nor b e aggregate

into a DimensionAFact can

not b e derived from a Dimen

sion Moreover its Cells cannot

b e related by sp ecialization evo

lution nor derivation

The notation used for a Fact is a b ox with an F in an upp er corner

Tag Stereotyp e Typ e Multiplicity Description

cells Fact stereotyp e Shows the differentgranularities of

Cell the data sub ject of analysis

A Dimension

Stereotyp e Base Class Parent Tags Constraints Description

Dimension Classier NA levels The Levels are related by means Sp ecifies a p ointofviewonan

of LevelRelations A Dimension alyzing a given sub ject of anal

can neither b e sp ecialization of a ysis

Fact nor b e obtained byevolution

of a Fact nor b e aggregate into a

FactMoreover its Levels cannot

b e related by sp ecialization evolu

tion nor derivation

The notation used for a Dimension is a b ox with a D in an upp er corner

Tag Stereotyp e Typ e Multiplicity Description

levels Dimension stereotyp e Shows the different granularities that

Level can b e used in the analysis along the

analysis dimension

A Cell

Stereotyp e Base Class Parent Tags Constraints Description

Cell Class NA bases Can only b e asso ciated to Sp ecifies a sub ject of analysis at a

Measure Attribute s Every given granularity

Cell b elongs to exactly one

Fact A Cell can neither

b e sp ecialization of a Level

nor evolution of a Levelnor

b e aggregated into a Level

nor derived from a Level

Moreover its Measurescan

not b e related byevolution

The notation used for a Cell is a Class with a C in an upp er corner

Tag Stereotyp e Typ e Multiplicity Description

bases Cell stereotyp e Shows the different spaces that can

Base b e used to place the instances of the

Cell

A SummarizedCell

Stereotyp e Base Class Parent Tags Constraints Description

SummarizedCell Class Cell NA It must b e p ossible to derive Sp ecifies Cells whose data can b e

all its Measures derived

The notation used for a SummarizedCell is a Cell with a derivation bar in frontofits

name

A FundamentalCell

Stereotyp e Base Class Parent Tags Constraints Description

FundamentalCe ll Class Cell NA It must b e asso ciated to some Sp ecifies Cells whose data cannot

Measures that cannot b e de b e derived

rived

The notation used for a FundamentalCell is that of a Cell It is not necessary any sp ecial

notation since the sp ecialization into fundamental and summarized Cells is alternative Thus

a given Cell is marked if it is SummarizedCell and not market otherwise

A Level

Stereotyp e Base Class Parent Tags Constraints Description

Level Class NA NA Can only b e asso ciated to Sp ecifies a p oint of view at a given

Descriptor Attribute s Ev granularity

ery Level b elongs to exactly

one Dimension A Level

can neither b e sp ecialization

of a Cell nor evolution of a

Cell nor b e aggregated into

a Cell Moreover its De

scriptors cannot b e related

byevolution

The notation used for a Level is a Class with an L in an upp er corner

A Measure

Stereotyp e Base Class Parent Tags Constraints Description

Measure Attribute NA typ e It must b e asso ciated to a Cell Sp ecifies Attribute s of the sub ject of

and cannot b e obtained byevolu analysis

tion of a Descriptor

The notation used for a Measure is that of an Attribute It is not necessary to use any

sp ecial notation since this kind of elements will always b e asso ciated to Cells and are the only

Attribute s that can b e asso ciated to Cells

Tag Stereotyp e Typ e Multiplicity Description

typ e Measure stereotyp e Sp ecifies the kind of measure the

KindOfMeasure Measure is

UML Prole for Multidimensional Modeling

A SummarizedMeasure

Stereotyp e Base Class Parent Tags Constraints Description

SummarizedMeasure Attribute Measure from NA Sp ecifies a Measure that can b e

derived

The notation used for a SummarizedMeasure is that of a Measure with a bar in front

of its name Moreover a Comment can b e attached to it showing the formula used in the

calculation

Tag Stereotyp e Typ e Multiplicity Description

from SummarizedMeasure stereotyp e Shows the Measures used in the

Measure calculation of a SummarizedMea

sure

A FundamentalMeasure

Stereotyp e Base Class Parent Tags Constraints Description

FundamentalMea sure Attribute Measure NA It can only b e as Sp ecifies a Measure that cannot b e

so ciated to Fun derived

damentalCells

The notation used for a FundamentalMeasure is that of a Measure It is not necessary

any sp ecial notation since the sp ecialization into fundamental and summarized Measuresis

alternative Thus a given Measure is marked if it is SummarizedMeasure and not market

otherwise

A Descriptor

Stereotyp e Base Class Parent Tags Constraints Description

Descriptor Attribute NA NA It must b e asso ciated to a Level Sp ecifies Attribute s of the different

and cannot b e obtained byevolu points of view

tion of a Measure

The notation used for a Descriptor is that of an Attribute It is not necessary to use any

sp ecial notation since this kind of elements will always b e asso ciated to Levels and are the only

Attribute s that can b e asso ciated to Levels

A Base

Stereotyp e Base Class Parent Tags Constraints Description

Base Constraint NA comp onents The different comp onents must Sp ecifies a finite space defined by

b e functionally indep endent the cartesian pro duct of a given set

of Levels

Similar to constraints of StereotypesthiskindofConstraint is also drawn in the b oxof

the corresp onding Cell stereotyp ed as Base

Tag Stereotyp e Typ e Multiplicity Description

comp onents Base stereotyp e Shows the set of Levels that com

Level pose the Base

A Summarization

Stereotyp e Base Class Parent Tags Constraints Description

Summarization Operation NA NA Its parameter s are Summary Sp ecifies the op eration applied to a

Param given kind of measure on aggregat

ing along a given analysis dimen sion

A summarization is represented by means of a String

A Transitive

Stereotyp e Base Class Parent Tags Constraints Description

Transitive Operation Summarization NA NA Sp ecifies that a given Summariza

tion is transitive

The notation used for a Transitive is that of a Summarization It is not necessary any

sp ecial notation since the sp ecialization into transitive and not transitive is alternative Thus

a given Summarization is marked if it is NonTransitive and not market otherwise

A NonTransitive

Stereotyp e Base Class Parent Tags Constraints Description

NonTransitive Operation Summarization NA NA Sp ecifies that a given Summariza

tion is not transitive

A Comment can b e attached to the Summarization showing it is not transitive

A SummaryParam

Stereotyp e Base Class Parent Tags Constraints Description

SummaryParam Parameter NA NA Are always part of a Sum Sp ecifies the parameters of a Sum

marization Their typ e is a marization

List

The notation used for a SummaryParam is that of a Parameter It is not necessary any

sp ecial notation since these are always part of a Summarization and only SummaryParam

can b e part of it Since their typ e is always a List it can just b e noted bythetyp e of the List

A CellRelation

Stereotyp e Base Class Parent Tags Constraints Description

CellRelation Association NA corresp ond It is asso ciated to two Cells Sp ecifies relationships b etween Cells

Both Cellsmust b elong to the in a Fact

same Fact One of its Associa

tionEnd must b e an Aggregation

The notation used for a CellRelation is that of an Aggregation It is not necessary any

sp ecial notation since it is the only kind of aggregation allowed b etween Cells inside a Fact

Tag Stereotyp e Typ e Multiplicity Description

corresp ond CellRelation stereotyp e Sp ecifies the relationship b etween

LevelRelation Levels that generates the relation

ship b etween Cells

A LevelRelation

Stereotyp e Base Class Parent Tags Constraints Description

LevelRelation Association NA NA It is asso ciated to two Levels Sp ecifies relationships b etween

Both Levelsmust b elong to the LevelsinaDimension

same Dimension One of its As

sociationEnd must b e an Aggrega

tion

UML Prole for Multidimensional Modeling

The notation used for a LevelRelation is that of an Aggregation It is not necessary

any sp ecial notation since it is the only kind of aggregation allowed b etween Levelsinsidea

Dimension

A KindOfMeasure

Stereotyp e Base Class Parent Tags Constraints Description

KindOfMeasure DataType NA invalidSource NA Sp ecifies a kind of Measure

A DataType stereotyp ed as KindOfMeasure is the notation used for a KindOfMea

sure

Tag Stereotyp e Typ e Multiplicity Description

invalidSource KindOfMeasure stereotyp e Sp ecifies the different aggregation

Level levels that cannot b e used as source

for the calculation of a given kind of

Measure

A List

Stereotyp e Base Class Parent Tags Constraints Description

List Classier NA typ e NA Sp ecifies a list of elements of a

typ e

The notation used for a List is a Classier stereotyp ed as List

Tag Stereotyp e Typ e Multiplicity Description

typ e List Classier Sp ecifies the typ e of the elements in

the list

A Induction

Stereotyp e Base Class Parent Tags Constraints Description

Induction ModelElement NA inductor sub ject Each Dimension only in Sp ecifies that a given summariza

induced duces one Summarization tion function is induced by aggrega

on a KindOfMeasure tions along an analysis dimension

over all Measures of a given kind

The notation used for a Induction is a String added to the b oxoftheKindOfMeasure

stating the list of Dimensions and the Summarization that these induce

Tag Stereotyp e Typ e Multiplicity Description

inductor Induction stereotyp e Sp ecifies a set of analysis dimensions

Dimension that induce a given Summarization

sub ject Induction stereotyp e Sp ecifies a KindOfMeasure sub ject

KindOfMeasure of summarization

induced Induction stereotyp e Sp ecifies the Summarization induced

Summarization on a kind of Measure when aggregat

ing along some analysis dimensions

A WellFormed ness Rules

Similar to those rules expressed in OCL sp ecied for every UML element this section contains

such rules for the multidimensional stereotyp es

A Star

The Fact is asso ciated to eachandevery Dimension

context Star inv

selfdimensionsforalld j dopp ositeAsso ciationEndsasso ciationincludesselffact

A Fact

The Cells that form a Fact are connected by means of CellRelations

context Fact inv

selfcellsforallcCell j cconnectedSubsetO fCells selfcells

A Fact cannot b e sp ecialization of a Dimension

context Fact inv

selfgeneralizationparentforallp j not po clIsKindOfDimension

A Fact cannot b e obtained byevolution of a Dimension

context Fact inv

selftargetFlowsourceforallp j not po clIsKindOfDimension

A Fact cannot b e aggregate into a Dimension

context Fact inv

selfallOpp ositeAsso ciaitonEndsselectaggregationaggregateparticipant

forallp j not po clIsKindOfDimension

A Fact cannot b e derived from a Dimension

context Fact inv

selfclientDep endencyselecto clIsK indOfderivesupplier

forallp j not po clIsKindOfDimension

Cellsina Fact cannot b e related by sp ecialization

context Fact inv

selfcellsforallc j csp ecializationchildintersectionselfcellsisEmpty

Cellsina Fact cannot b e related byevolution

UML Prole for Multidimensional Modeling

context Fact inv

selfcellsforallc j ctargetFlowsourceintersectionselfcellsisEmpty

Cellsina Fact cannot b e related by derivation

context Fact inv

selfcellsforallc j cclientDep endencyselect o clIsKindOfde rivesupplier

intersectionselfcellsisEmpty

A Dimension

The Levels that form a Dimension are connected by means of LevelRelations

context Dimension inv

selflevelsforalllLevel j lconnectedSubsetOfLevels selflevels

A Dimension cannot b e sp ecialization of a Fact

context Dimension inv

selfgeneralizationparentforallp j not po clIsKindOfFact

A Dimension cannot b e obtained byevolution of a Fact

context Dimension inv

selftargetFlowsourceforallp j not po clIsKindOfFact

A Dimension cannot b e aggregate into a Fact

context Dimension inv

selfallOpp ositeAsso ciaitonEndsselectaggregationaggregateparticipant

forallp j not po clIsKindOfFact

Levelsina Dimension cannot b e related by sp ecialization

context Dimension inv

selflevelsforalll j lsp ecializationchildintersectionselflevelsisEmpty

Levelsina Dimension cannot b e related byevolution

context Dimension inv

selflevelsforalll j ltargetFlowsourceintersectionselflevelsisEmpty

Levelsina Dimension cannot b e related by derivation

context Dimension inv

selflevelsforalll j lclientDep endencyselecto clIsKindOfder ivesupplier

intersectionselflevelsisEmpty

A Cell

A Cell can only b e asso ciated to Measure Attribute s

context Cell inv

selffeatureforalls j so clIsKindOfAttribute implies

so clIsKindOfMeasure

A Cell b elongs to exactly one Fact

context Cell inv

FactallInstancesselectf j fcellsexistsself size

A Cell cannot b e sp ecialization of a Level

context Cell inv

selfgeneralizationparentforallp j not po clIsKindOfLevel

A Cell cannot b e obtained byevolution of a Level

context Cell inv

selftargetFlowsourceforallp j not po clIsKindOfLevel

A Cell cannot b e aggregate into a Level

context Cell inv

selfallOpp ositeAsso ciaitonEndsselectaggregationaggregateparticipant

forallp j not po clIsKindOfLevel

A Cell cannot b e derived from a Level

context Cell inv

selfclientDep endencyselecto clIsK indOfderivesupplier

forallp j not po clIsKindOfLevel

MeasuresinaCell cannot b e related byevolution

context Cell inv

selffeatureselecto clIsKindOfMeasure forallc j ctargetFlowsource

intersectionselffeatureselect o clIsKindOfMeasu re isEmpty

UML Prole for Multidimensional Modeling

Additional op erations

The op eration connectedSubsetO fCells results in the set of all Cells connected to a given

one by means of CellRelations

connectedSubsetOfCells SetCell

connectedSubsetOfCells selfunionselfopp ositeAsso ciationEndsselect

asso ciationo clIsTyp eOfCellRelationparticipantconnectedSubsetOfCells

A SummarizedCell

All its Measuresmust b e summarizable

context SummarizedCell inv

selffeatureforalls j so clIsKindOfAttribute implies

so clIsKindOfSummarizedMeasure

A FundamentalCell

Itmust b e asso ciated to some Measures that are not derived

context FundamentalCell inv

selffeatureexistss j so clIsKindOfFundamentalMeasure

A Level

A Level can only b e asso ciated to Descriptor Attribute s

context Level inv

selffeatureforalls j so clIsKindOfAttribute implies

so clIsKindOfDescriptor

A Level b elongs to exactly one Dimension

context Level inv

DimensionallInstancesselectd j dlevelsexistsself size

A Level cannot b e sp ecialization of a Cell

context Level inv

selfgeneralizationparentforallp j not po clIsKindOfCell

A Level cannot b e obtained byevolution of a Cell

context Level inv

selftargetFlowsourceforallp j not po clIsKindOfCell

A Level cannot b e aggregate into a Cell

context Level inv

selfallOpp ositeAsso ciaitonEndsselectaggregationaggregateparticipant

forallp j not po clIsKindOfCell

Descriptorsina Level cannot b e related byevolution

context Level inv

selffeatureselecto clIsKindOfDescr iptorforallc j ctargetFlowsource

intersectionselffeatureselect o clIsKindOfDesc riptorisEmpty

Additional op erations

The op eration connectedSubsetOfLevels results in the set of all Levels connected to a

given one by means of LevelRelations

connectedSubsetOfLevels SetCell

connectedSubsetOfLevels selfunionselfopp ositeAsso ciationEndsselect

asso ciationo clIsTyp eOfLevelRelationparticipantconnectedSubsetOfLevels

A Measure

A Measure can only b e asso ciated to a Cell

context Measure inv

selfownero clIsKindOfCell

A Measure cannot b e obtained byevolution of a Descriptor

context Measure inv

selftargetFlowsourceforallp j not po clIsKindOfDescriptor

A FundamentalMeasure

A FundamentalMeasure can only b e asso ciated to a FundamentalCell

context FundamentalMeasure inv

selfownero clIsKindOfFundamentalCell

UML Prole for Multidimensional Modeling

A Descriptor

A Descriptor can only b e asso ciated to a Level

context Descriptor inv

selfownero clIsKindOfLevel

A Descriptor cannot b e obtained byevolution of a Measure

context Descriptor inv

selftargetFlowsourceforallp j not po clIsKindOfMeasure

A Base

The dierent comp onents of a Base must b e functionally indep endent

context Base inv

selfcomp onentsforalll j selfcomp onentsforalll j l l implies

   

l j l

 

b eing l j l a degenerated dep endency as explained in page

 

A Summarization

The parameter sofaSummarization are SummaryParam

context Summarization inv

selfparameterforallo clIsKindOfSummaryParam

A SummaryParam

A SummaryParam is always part of a Summarization

context SummaryParam inv

selfBehaviouralFeatureo clIsKindOfSummarization

The typ e of a SummaryParam is a List

context SummaryParam inv

selftyp eo clIsKindOfList

A CellRelation

A CellRelation asso ciates two Cells

context CellRelation inv

selfconnectionforallparticipanto clIsKindOfCell

Both Cellsmust b elong to the same Fact

context CellRelation inv

FactallInstancesforallf j fcellsintersectionselfallConnectionsparticipant

notEmpty implies fcellsincludesselfallConnectionsparticipant

One of its AssociationEnd must b e an Aggregation

context CellRelation inv

selfallConnectionsselectaggregation aggregatesize

A LevelRelation

A LevelRelation asso ciates two Levels

context LevelRelation inv

selfconnectionforallparticipanto clIsKindOfLevel

Both Levelsmust b elong to the same Dimension

context LevelRelation inv

DimensionallInstancesforalld j dlevelsintersectionselfallConnectionsparticipant

notEmpty implies dlevelsincludesselfallConnectionsparticipant

One of its AssociationEnd must b e an Aggregation

context LevelRelation inv

selfallConnectionsselectaggregation aggregatesize

A Induction

Every Dimension can only induce one Summarization on a KindOfMeasure

context Induction inv

InductionallInstancesforalli j iinductorintersectionselfinductornotEmpty and isub jectselfsub ject implies iinducedselfinduced



Design examples with YAM

App endix B



Design examples with YAM

Few things are harder to put up with than the annoyance of a go o d example

Mark Twain Puddnhead Wilsons Calendar



This app endix exemplies the usage of YAM on mo deling a database Cases of study used



by other authors are mo deled here with YAM The aim of this app endix is to exemplify the



usage of YAM at the same time that it is compared with other contributions in the area

B Sales of pro ducts in a gro cery chain

The gro cery chain example has b een used byseveral authors like Kim or GMRa to

exemplify their work Dierentnuances are intro duced byeach author In this case a chain

of sup ermarkets is mo deled so that each sup ermarket is divided into departments that oer

dierent kinds of pro ducts The sup ermarkets are spread over dierent states Wewantto

analyze what where and whichday the pro ducts are sold

B Kimballs schema

Figure B represents the case of study at logical level as in Kim It shows a central

fact table related to its dimension tables by foreign keys The dimension tables do not

explicit aggregation hierarchies but contain a list of attributes Aggregability constraints are

not presentintheschema either

In this case the Promotion Dimension is of sp ecial interest to analyze the impact of

dierent oers Moreover the nner granularitychosen has b een the items sold by promotion

by store byday

Store Dimension Time Dimension store_key time_key store_name day_of_week store_number day_number_in_month stor_street_address day_number_overall store_city week_number_in_year store_county week_number_overall store_state month store_zip month_number_overall sales_district quarter sales_region fiscal_period store_phone Sales Fact holiday_flag store_FAX weekday_flag floor_plan_type time_key last_day_in_month_flag photo_processing_type product_key season event finance_service_type store_key first_opened_date promotion_key Product Dimension last_remodel_date dollar_sales store_sqft product_key grocery_sqft unit_sales SKU_description meat_sqft dollar_cost SKU_number customer_count package_size Promotion Dimension brand subcategory promotion_key category promotion_name department price_reduction_type package_type ad_type diet_type display_type weight coupon_type weight_unit_of_measure ad_media_name units_per_retail_case display_provider units_pershiping_case promo_cost cases_per_pallet promo_begin_date shelf_width promo_end_date shelf_height

shelf_depth

Figure B Schema of the gro cery chain case study Kim

B Golfarellis version of Kimballs schema

In gure B the same schema is represented at conceptual level by M Golfarelli Circles

represent dimension attributes ie Levels while nondimension attributes ie De

scriptors are represented by lines Arcs represent toone relationships The nonadditivity

of the numb er of customers along Product Dimension is shown by a dashed line Moreover

a dash crossing an arc indicates optionality in the relationship

manager department manager marketing group category city type weight brand

day of week product diet sales manager season holiday sale district SALE qty sold revenue store no. of customes year quarter month date city county state promotion address phone begin date price reduction end date ad type

cost

Figure B Schema of the gro cery chain case study GMRa



Design examples with YAM



B YAM schema

<> Sales D Dimension D Ad Time D Fact F PackageType D Derived element / PriceReduction D D F Generalization Promotion Sale D PackedIn Association DisplayType Aggregation D D Derivation D Store Product

Coupon Flow



Figure B Upp er level schema of the gro cery chain case study mo deled with YAM

Firstlywe can see in gure B the gro cery chain schema at Upp er detail level mo deled



with YAM Incontrast with gure B this schema contains ve Dimensions This fact

reects the indep endence b etween pro duct and the package typ e Any pro duct might b e packed

in anykindofpackage Therefore we could aggregate indep endently along b oth hierarchies

Another interesting p oint rises when lo oking at the Promotion DimensionWe can see that

a promotion is the combination of four dierent concepts Advertisement PriceReduction

DisplayTypeand Coupon In this case weareinterested in the study of combinations of those

promotion mechanisms However other analysts could b e interested in the inuence of those

concepts separately Therefore the corresp onding analysis dimensions could b e b eing used in

other Stars

Level L Cell C Derived element / Generalization Association Aggregation <> Derivation Flow Sales

L L FiscalPeriod Quarter LMonth L All LDay L L L L Week Product PackageSize Brand

L Ad Subcategory L

PriceReduction L PromotionL AtomicSale C All L L Display L PackedIn Category <> {Product, Promotion,Day,Store} CouponL Department L

L CountyL ZipCode L State L L All L Store L PackageType All L

SalesRegion L SalesDistrict L All



Figure B Intermediate level schema of the gro cery chain case study mo deled with YAM

In gure B the details of the Dimensions are shown Since Promotion is comp osed by

four Dimensions the aggregation hierarchies of those Dimensions are alternative aggregation

paths in PromotionFor the sake of simplicitythose Dimensionshave not b een depicted at

Intermediate detail level

At this level wealsoshow that even though there are ve Dimensions four of them are

enough to identify a sale A pro duct determines a package typ e

Day L <> Cell C Quantity dayOfWeek Level L AtomicSale C dayNumberInMonth {Time, Promotion, Store, Product, PackageType −> sum(Quantity)} Derivation /attName {supliers} dayNumberOverall quantitySold: Quantity holidayFlag <> dollarRevenue: Revenue Counter weekdayFlag dollarCost: Cost {dollarRevenue−dollarCost} lastDayInMonthFlag {Time, Promotion, Store −> sum(Count)} customerCount: Counter season /grossProfit: Profit

event /grossMargin: Margin {grossProfig/dollarRevenue}



Figure B Lower level schema of the gro cery chain case study mo deled with YAM

Finallyat Lower detail level we can see the attributes of the dierent Class es For the sake

of simplicity only the attributes of Day Level have b een depicted Other dimensional attributes

in gure B would b e translated in a similar way Notice that Dimension keys do not app ear

in gure B Since we are dealing with an OO data mo del OIDs are assumed Thus in order

to represent a foreign key drawing the corresp onding Association between Class es is enough

Muchmoreinteresting is the information regarding factual attributes Firstlywe are able

to show which attributes are basic and which are derived b esides the corresp onding derivation

formula Moreover summarizability can b e explicited as well We can see that Quantity

Measures will b e summarizable along any analysis dimension Nevertheless Counter Mea

sures will not b e summarizable along Productnor PackageType Dimensions

B Discussion

Mayb e at logical level dep ending on the DBMS and applications we use for the implemen

tation expliciting this information would b e a serious mistake However at conceptual level

representing users aggregation intentions is critical as can b e seen in Golfarellis version as well



as in YAM Moreover depicting these hierarchies we are able to outline the relationships

between hierarchies of related Dimensions for instance Promotion Dimension in our case

Additivity is not reected in Kimballs schemas as it is in Golfarellis version This can easily



b e reected in YAM as a particular case of aggregability where we apply sum function

Golfarellis schemas oer the p ossibility of depicting optionality of attributes or dimensional



relationships as well which can also b e easily shown in YAM by means of standard UML

mechanisms ie cardinalities



Design examples with YAM

B Warehouse

Another interesting example in Kim is this one ab out warehouse inventories It is originally

used to exemplify semiadditivityof Measures ie those that cannot b e summarized by means

of the sum function The sto cks are not additive along the temp oral dimension b ecause they

represent snapshots of a level

Three dierentschemas are explored Therstoneevery day measures the inventory

levels and places them in separate records The second schema contains one record for each

delivery to the warehouse which registers the disp osition of all the items until they have left

the warehouse The third and last data schema records every change of the status of delivery

pro ducts

B Original schema

Snapshot Fact Time Dimension time_key product_key warehouse_key Warehouse Dimension quantity_on_hand Product Dimension quantity_shipped value_at_cost

value_at_LSP

Figure B Schema of the warehouse snapshot case study Kim

Figure B only shows three dimension tables namely Warehouse Timeand Product

on hand It records the sto ck ofagiven pro duct The most interesting Measure is quantity

in the warehouse The other Measuresallow to obtain more elab orated derived Measures

likenumb er of turns days supply or gross margin return on inventory which are not

reected in the schema

Delivery Status Fact Time Dimension time_key Warehouse Dimension product_key warehouse_key Product Dimension vendor_key PO_number Vendor Dimension PO_line_number first_received_date last_received_date first_inspect_date first_auth_to_sale_date first_shipment_date last_shipment_date last_return_date qty_received qty_inspected qty_returned_to_vend qty_placed_in_inv qty_auth_to_sell qty_picked qty_boxed qty_shipped qty_returned_by_cust qty_returned_to_inv qty_damaged qty_lost qty_written_off unit_cost orig_selling_price last_selling_price

avg_selling_price

Figure B Schema of the warehouse delivery status case study Kim

The second p ossibility is reected in gure B It assumes we are able to distinguish the

dierent itemssothatweknow whichwas supplied from eachvendor Thus every time we

obtain a new shipment it is recorded and tracked until we sell it Dates and quantities are

registered for each step

Transaction Fact Time Dimension time_key Warehouse Dimension product_key warehouse_key Product Dimension transaction_key PO_number Transaction Dimension

ammount

Figure B Schema of the warehouse transaction case study Kim

The last inventory schema is drawn in gure B It records every transaction in the

warehouse Four analysis dimensions are prop osed ie Warehouse Time Productand

Transaction Transaction Dimension has one instance for every kind of transaction The

only Measure in the fact table is amount



B YAM schema

All three schemas regard warehouse inventory the same sub ject Therefore the analysts will



probably want to see them together or navigate from one to another With YAM thisis

easy b ecause they b elong to the same Fact hence the same Star

<> Inventory Dimension D Warehouse D Fact F Product D Vendor D Derived element / Generalization Inventory F Placed with For Association Aggregation TransactionKind D TimeD On Order D Derivation

Flow



Figure B Upp er level schema of the warehouse case study mo deled with YAM

At Upp er detail level in gure B wehave the only Star It contains Inventory Fact and

six Dimensions namely Warehouse Vendor Order Time TransactionKindand Product

Moreover we observe that an order is placed with a vendor for a given warehouse on a given

moment

In gure B wehave the same schema at Intermediate detail level Toavoid complicat

ing unnecessarily the gure aggregation hierarchies and Association sbetween Levelshavenot

b een depicted Thus we can appreciate that there are three Cellsofinterest corresp onding

to the three dierentschemas in Kim The more detailed one is Transactionwhich can

b e analyzed based on Levels Minute TransactionKind Warehouse Order and Product that

identify it it is assumed that at any time we can distinguish the order by means of which the



Design examples with YAM

Level L Cell C Derived element / Generalization Association Aggregation <> Derivation Flow Inventory L Warehouse Product L

C Snapshot C DeliveryStatus <> {Product,Day,Warehouse} <> {Product, Vendor, Day, Warehouse} <> {Product, Order}

L Day L Vendor

L Minute C Transaction L Order <> {Product, TransactionKind, Minute, Order} <> {Product, TransactionKind, Minute, Day, Vendor, Warehouse}

TransactionKind L



Figure B Intermediate level schema of the warehouse case study mo deled with YAM

dierent items were obtained Vendor Warehouseand Day fully determine Ordersothatcan

substitute it in the Base

If we aggregate appropriately adding or subtracting dep ending on the kind of transaction

instances of that Cell along Time and Warehouse Dimensions we obtain snapshots of inven

tory at the desired granularityDay and Warehouse in this case Therefore if it would not

contain Measures regarding costs it could b e considered a derived Cell It can b e studied

along Warehouse Productand Time Dimensions

Finallyifwewould aggregate Transaction instances based on the order wewould obtain

instances of DeliveryStatus Cell Its instances are identied by Order and Productorby

Product Vendor Day and Warehouse In this case this Cell do es contain some Measures

that cannot b e obtained from those in the Atomic Cell Therefore it cannot b e considered

as derived Storing its instances is not optional but mandatory

At Lower detail level we can see the attributes of every Class Derivation formulas of the

dierent Attribute s can b e explicited as well as aggregability of the dierent KindOfMeasures

We can see that sto cks can b e added along Product and Warehouse Dimensions However

to obtain sto cks at coarser granularities along Timeavg should b e p erformed Another

p ossibilityistoobtainthestockofany unit of time as the sto ck at the upp er b ound of the

p erio d This is used on the derivation of some Measures

DeliveryStatus C Snapshot C Transaction C Cell C Level L first_received_date: Date value_at_cost: Cost amount: Amount last_received_date: Date value_at_LSP: LSP Derivation /attName {supliers} first_inspect_date: Date /quantity_on_hand: Stock first_auth_to_sale_date: Date /final_quantity_on_hand: LastStock first_shipment_date: Date last_shipment_date: Date /quantity_shipped: Quantity last_return_date: Date /avg_quantity_shipped: AvgQuantity qty_received: Quantity /#OfTurns: Turns {quantity_shipped/quantity_on_hand} qty_inspected: Quantity /daysSupply: Supply qty_returned_to_vend: Quantity qty_placed_in_inv: Quantity {final_quantity_on_hand/avg_quantity_shipped} qty_auth_to_sell: Quantity qty_picked: Quantity <> <> qty_boxed: Quantity qty_shipped: Quantity Stock LastStock qty_returned_by_cust: Quantity 1: {Product, Warehouse −> sum(Stock)} 1: {Product, Warehouse −> sum(Stock)} qty_returned_to_inv: Quantity qty_damaged: Quantity 2: {Time −> avg(Stock)} 2: {Time −> last(Stock)} qty_lost: Quantity qty_written_off: Quantity <> unit_cost: Quantity Amount orig_selling_price: Price last_selling_price: Price 1: {Product, Warehouse, Time, Order, Vendor −> sum(Stock)}

avg_selling_price: Price 2: {TransactionKind −> if put(t) then add(Amount) else subtract(Amount)}



Figure B Lower level schema of the warehouse case study mo deled with YAM

B Discussion

In this case of studywe can appreciate the advantages of a semantically richmultidimensional

mo del If wehaveKimballsschemas we could say that they share some dimension tables so



that we can drill across However by means of YAM we can place all data in the same schema

and represent the dierent relationships we nd among them Transaction and Snapshot not

only share Dimensions Snapshot is the aggregation of Transaction and can b e computed

from it

Moreover the sp ecic AVG TIME SUM op eration prop osed byKimball to aggregate

sto cks is not necessary any more The problem can b e solved in a more general wayshowing

how data is aggregated along each DimensionThus the aggregation op eration will aggregate

any kind of data based on its sp ecication

number is referred in Kim as a degenerate dimension ie a dimension key PO

without a corresp onding dimension table b ecause it do es not contain any attribute In this



case with YAM it is just another Class ie Order asso ciated to the Fact acting as

Dimension It could have Attribute s or not and give rise to a Relational table in a ROLAP

system or not We are at conceptual level yet

Finally just to mention that we could imagine yet another p ossibility b esides those three

schemas in gures B B and B We could sp ecialize Transaction dep ending on the kind of

transaction This would give rise to a dierent Cell for every kind of transaction These Cells

would generate new Stars and could contain sp ecic Measuresoruseother Dimensions

in the analysis Moreover once wehave dened the more general schema we could dene

dierent Stars oering the appropriate views ie the same three indep endentschemas we

found in Kim



Design examples with YAM

B Tickets in sup ermarkets

Tru prop oses as case study a mo dication of the gro cery chain prop osed in Kim The

main dierence is that the study fo cuses on tickets and ticket lines which adds some diculty

to the schema

The Client analysis dimension substitutes Promotion esp ecialization hierarchies of pro d

ucts are considered and derivation and summarization information is also mo deled into the

schema Moreover some manytomany relationships app ear in this case likethatbetween

pro ducts and tickets the same pro duct can b e found in several ticket lines or b etween sales

districts and communities

B Original schema

1..* Season {quantity_sold = sum(quantity)} 1 {prod_price = quantity*price} {total_price = sum(prod_price)} 1 1..* {#clients = count()} 1 1 1 1 Year Trimester Month Time Products_Sales *** * {OID} ticket_number {dag} {OID} row_number quantity {#clients is not summarizable along Product Dimension} /quantity_sold {stock is {AVG,MIN,MAX} along Time Dimension} price /prod_price * /total_price * 1..* /stock 1 TradeMark {dag} Product /#clients {dag} Store 1..* 1 1..* * Group * 1 1 Cleaning Food {dag} Client City 1 * 1..* SalesDistrict Family * 1 Eat Drink Province 1..*

{Completeness} 1..* Kind Kind 1 Community

Frozen Fresh Soft Alcoholic 1..*

Figure B Schema of the tickets case study mo deled with GOLD

Figure B shows the case study expressed in GOLD for the sake of simplicity the at

tributes of every class have not b een depicted in the gure Of sp ecial interest in this schema

are the sp ecialization hierarchyofProduct and its manytomany relationship with the

Sales facts class It is also imp ortant to notice that Client and Store share part Product

of their aggregation hierarchies Moreover several constraints and derivation formulas are

depicted around the facts class

Moreover b esides the information in the data schema user requirements are also represented

in the GOLD mo del For instance in gure B we can see four of such requirements which

as shown in TPGS could b e directly translated to standard Ob ject Query Language

OQL syntax

CC_1 CC_2 CC_3 CC_4

Measures SUM(quantity) Measures SUM(quantity) Measures SUM(quantity) Measures SUM(quantity) Slice Slice Slice Slice Time.Year = "1999" Store.community = "Comunidad Valenciana" Time.Year = "1999" Product.Group = "Food" Dice Dice Dice Dice Store.Community Store.Province, Store.City, Product.Family, Store.Community Store.Community Product.Type Client

OLAP operations OLAP operations OLAP operations OLAP operations

Figure B User requirements for the case study mo deled with GOLD graphical notation

CC Quantity sold p er pro duct during the year group ed by community where they were

sold

Quantity of fo o d sold in the Comunidad Valenciana aggregated by family and kind of CC

pro duct and by the province and city where the store is placed

CC Quantity of pro duct sold during group ed by clients and community

CC Quantity of pro duct sold group ed bycommunity



B YAM schema

D <> Geographic Tickets Dimension D Fact F Time D Derived element / D D Client Store Generalization F Association TicketLine Aggregation D Ticket Line D Derivation Product D Flow

Group

D D Cleaning Food Family

D D Eat Drink

Kind Kind DDDD

Frozen Fresh Soft Alcoholic



Figure B Upp er level schema of the tickets case study mo deled with YAM



In gure B we can see information ab out this case of study mo deled with YAM

at Upp er detail level Three p oints are of interest here Firstlywe can see that geographic

information has b een used on dening b oth Client and Store Dimensions Moreover the

sp ecialization of Product is also shown here Finallywe can see that two Dimensionshave



Design examples with YAM

b een added If information is stored p er line p er ticket it is supp osed that these will b e analysis

dimensions of our data If not why is the information stored in the Fact at that granularity

Thus b oth Dimensions are shown at this detail level

Level L Cell C Derived element / Generalization Association Aggregation Derivation <> Flow Tickets

C ProductPerTicket C Ticket <> {Time, Store, Ticket, Product} <> {Time, Store, Ticket} * 1..* 1 * * 1 1 L 1..* ClientsByProvinceL ClientsByCityL ClientL L L L All L ClientsByCommunity Product TradeMark All 1..* 1 * 1 * 1 1 1..* 1..* * 1 1 L 1 Line All L * * 1..* 1 1 1 C * L L L TicketLine Community Province City 1 <> {Time, Store, Ticket, Line} * 1 L L 1 Ticket All 1 1 1..* * *

* * 1 1 1 L * L * 1..* L * L1 * Month Trimester 1 * 1 StoresByProvince StoresByCity * 1 L TimeL 1 L L L StoresByCommunity L 1 Store Year All All L L 1..* SalesDistrict 1..* 1..* Season 1 1 1 1..* 1..* 1..* 1 1..*

1 /SalesPerCommunity C /SalesPerCommunityPerYear C *

<> {StoresByCommunity} <> {StoresByCommunity,Year}



Figure B Intermediate level schema of the tickets case study mo deled with YAM

At Intermediate detail level in gure B things get much more complicated Firstlywe

can see that the aggregation hierarchyofevery Dimension is depicted with the corresp onding

cardinalities Notice that in this case Client and Store do not share their hierarchies In spite

of that LevelsinbothDimensions are related to the same class es that would b elong to a

Geographic Dimension Instances of Client Dimension represent clients or sets of clients

Instances of Store represent stores or sets of stores Instances of Geographic Dimension

represent geographic areas Therefore Levels in either Client or Store can b e asso ciated to

the same geographic areas but they cannot b e geographic areas

Another dierence with gure B is that the cardinalityofthe Association between

Product and TicketLine is onetomany The cardinalitybetween Ticket and Product



is manytomany Notice that this information is represented in YAM with dierent Cells

related by CompositionItshows that several TicketLines comp ose a ProductPerTicketand

several ProductPerTicket comp ose a TicketTheAsso ciationsbetween these secondary

Cells and the corresp onding aggregation levels have not b een depicted to keep the schema as

clear as p ossible

Finally user requirements can b e outlined here by depicting the desired SummarizedCells

In this case SalesPerCommunityPerYear and SalesPerCommunity are shown to exemplify it

Besides showing these derived Classes in the schema it is also p ossible to represent in more



detail the user requirements by means of YAM query algebra

From the class SalesPerCommunityPerYearwe select year and pro ject only CC

quantity attribute Then wechange the base to see it in a unidimensional space of

communities

S al esP er C ommunity P er Y ear

S tor esB y C ommunity q uantity

Year 

CC For each kind of pro duct wewould have its own StarThus from the corresp onding

Cellwe select the tickets sold in Comunidad Valenciana Afterwards we roll them up

to StoresByCity and All Level in the corresp onding sub class of Product Then the

desired attribute is chosen and nally data are placed in a dimensional space dened

by cities and kinds of pro ducts

q uantity

StoresByCityfF r oz enF r eshS of tAlcoholicg

F r oz enP er T ick et

S tor esB y C ity S tor esB y C ommunity C omunidadV alenciana

F r eshP er T ick et

S tor esB y C ity S tor esB y C ommunity C omunidadV alenciana

S of tP erT icket

StoresByCity S tor esB y C ommunity C omunidadV alenciana

Al cohol icP er T ick et

S tor esB y C ity S tor esB y C ommunity C omunidadV alenciana

CC Selected tickets of are rolled up to Clients and StoresByCommunity Then

quantity Measure is pro jected and placed in a dimensional space dened by Client

and StoresByCommunity

T ick et

C lientS tor esB y C ommunity q uantity C lientsS tor esB y C ommunity

Year 

Instances of SalesPerCommunity Cell are placed in a unidimensional space dened by CC

StoresByCommunity

S al esP er C ommunity

S tor esB y C ommunity q uantity

L C Time <> ProductSale C TicketLine Stock {sum(TicketLine.quantity)} date /quantity_sold: Quantity quantity: Quantity day 1: {Line −> min(Stock)} price: Price /prod_price: Price {sum(TicketLine.quantity*TicketLine.price)} dayNumber 2: {Ticket, Time −> avg(Stock)} /stock: Stock totalNumber 3: {Store −> sum(Stock)} C Ticket holidayFlag <> {sum(ProductSale.prod_price)} eventCounter /total_price: Price Cell C #clients: Counter Level L

{Time, Store, Client −> sum(Counter)} Derivation /attName {supliers}



Figure B Lower level schema of the tickets case study mo deled with YAM

Finally gure B shows the more detailed elements in the schema Time Level shows its

Descriptors and we see that a Counter can only b e added along Time Storeand Client

Moreover it is shown that Stock measurements must b e summarized by means of min along

Lineavg along Ticket and Time and sum along Store in that order The Measures

in the dierent Cells are also shown in the gure b esides the corresp onding formula for the

derived ones



Design examples with YAM

B Discussion



Both GOLD and YAM use UML notation Thus it is quite simple to appreciate some

dierences Firstlywe can see that relationships b etween Levels are simply considered As



sociation s in GOLD and Composition sinYAM This means that instances of Levelsin



YAM show sets of elements while in GOLD they represent elements that identify grouping

characteristics

From that dierence in the conception of the structure of Dimensions comes the dierence

in the sharing of hierarchies If elements in the Dimensions represent sets of instances it is

not p ossible that two dierent Dimensions share a Level b ecause they represent dierent



concepts even if the grouping characteristic is the same Therefore with YAM itmust

b e represented as a derivation from a common Dimension used in the denition of b oth

hierarchies However since GOLD classes represent grouping characteristics they can b e freely

shared b etween Dimensions

Another p oint regarding aggregation hierarchies is that of sp ecialization hierarchyGOLD

understands sp ecializations as aggregation paths so that aggregation is also allowed by sub



classes In YAM aggregation is strictly represented by aggregation hierarchies Nevertheless

to facilitate it a Level could b e dened so that their instances corresp ond to the sub classes

This Level can b e used in the denition of the subDimensions



There are two concepts explicit in GOLD and implicit in YAM Firstly aggregation

hierarchies b eing a DAG comes from the denition of Dimension and mereological axioms



Moreover completeness is also assumed in YAM iflevel All exists in the hierarchy Also

comes from mereological axioms that aggregation hierarchies can always b e dened so that they

are complete



OIDs are not explicited in YAM They are considered as meaningless identiers The

fact that several attributes identify instances is only of interest for Cells and it is shown in

the form of several Levels forming a Base



On the other hand in YAM relationships b etween LevelsandCellsare Association s

while they are Aggregation s in GOLD Thus GOLD considers that a fact class is comp osed by



its analysis dimensions However in YAM Dimensions are simply used for the identication

of Cell instances Instances of Levels are not necessarily part of the cells



An imp ortant advantage of YAM comes from the p ossibility of dening several Cells

inside the same Star It allows to normalize the Facts so that attributes are fully functionally

number determined by the analysis dimensions For instance we can see in gure B that row

would not fully functionally determine total price

Finally another imp ortant dierence b etween b oth mo dels is how aggregability is under



stood While GOLD shows p ossible ways of aggregating Measures YAM shows howa

KindOfMeasure must b e aggregated along the Dimensions to obtain exactly the same

KindOfMeasure at coarser granularity

B Clinical Data Warehousing

A clinical case of study is used in Ped to motivate and exemplify his work In this context

clinical data ab out patients is used to address quality management and medical research issues

More sp ecically a diab etes treatment domain is mo deled

Figure B Patient diagnosis case study Ped

As shown in gure B the system registers for each patient the blo o d sugar level shown

by HbAC diagnosis and place of residence The variation of blo o d sugar levels among

diagnoses and the frequency of diagnoses p er areas want to b e studied Age is a derived

attribute indicated by means of parenthesis and Precision shows how precise the value of

HbAC is It admits three dierentvalues ie precise impreciseand inapplicable

Diagnosis represents a condition that a physician identies in a patient Every patient

could have one or more diagnosis and the time interval of validity of the diagnosis is also

stored The Type of a diagnosis indicates whether it is considered primary or secondary

There is only one primary diagnosis p er patient

The dierent kinds of diagnosis ie Lowlevel Familyand Groupshow the dierent

precision in diagnosing The most precise is a Lowlevel diagnosis and the least precise is the

Group diagnosis The diagnosis hierarchy is nonstrict ie an element can b e member of

several collections at higher levels The hierarchyevolves over time new diseases are added and

old ones are reclassied and is not onto ie some families are not divided into Lowlevel

diagnosis Moreover regarding addresses hierarchynotevery Address is lo cated in a CityIt

is noncovering

Atypical query in this domain is the average HbAC group ed by Lowlevel diagnosis



Design examples with YAM

B Original schema

Figure B Schema of the clinical case study Ped

In Ped everything that characterizes the fact typ e is considered to b e dimensional

Therefore if patient is considered the fact we obtain an schema like that in gure B Notice

that even Measures ie HbAC are considered dimensional



B YAM schema

<> Date D Patient

/Age D <> OldPatient D D /DateOfDiagnosis /DateOfBirth F OldPatient Patient F BornOn

D Diagnosis D Name 1..* OldDiagnosis D

Residence D LivesAt

<> <> HbA1c%App HbA1c%Inapp Dimension D D F F /HbA1c% Applicable Inapplicable Fact F Derived element / Generalization <> <> HbA1v%Pre HbA1v%Impre Association Aggregation F F Precise Imprecise Derivation

Flow



Figure B Upp er level schema of the clinical case study mo deled with YAM



Figure B shows the same schema in gure B mo deled with YAM at Upp er detail

level Here we can see the Facts and Dimensionsofinterest in the domain Notice that

Patient has b een sp ecialized to show whether the long term blo o d sugar level ie HbAC

has b een measured or not and if measured whether a precise or imprecise metho d was used

Possible changes in the diagnosis hierarchyhave b een represented byaFlow relationship

between the old and new version of the Dimension and Fact Association sbetween Nameand

DateOfBirth and Residence are also shown

Regarding derived information b oth temp oral Dimensions come from a more general tem

p oral DimensionMoreover HbAC Dimension derives from the measurements of sugar level

if done As in the original schema Age is also derived from the DateOfBirth Dimension

<> Patient

L /All /AllL

L /Decade /Decade L

L All L /Year /Year L L /Week L /Week L /Quarter L TenYearGroup /Quarter L

L L FiveYearGroup L All /Month /MonthL

L L L DiagnosisGroup /DiagnosisDay /Age /DayOfBirthL

BornOn 1..* L L AtomicPatient DiagnosisFamily LLDiagnosis C L L Name All <> {DiagnosisDay, Name} All L LivesAt

L Address L AreaL County

<> HbA1c%App Level L Cell C Derived element / Diagnosis C Generalization <> {LLDiagnosis} Association Aggregation L L C All /HbA1c% AtomicApplicable Derivation

Flow



Figure B Intermediate level schema of the clinical case study mo deled with YAM

In gure B the same schema at Intermediate detail level is shown The same information

at Upp er level is now depicted with regard to Levelsand Cells Moreover it is also stated

the Base of the Cells AtomicPatient is identied by DiagnosisDay and a Patient while

Diagnosis is identied by LLDiagnosis Diagnosis Dimension is dened as nonstrict

For the sake of simplicity some information has not b een reected at this level For instance

there should b e in the gure the common more general Dimension for DiagnosisDayand

DayOfBirthaswell as the temp oral evolution of Diagnosis Moreover the schema has b een



Design examples with YAM

centered in Patient StarThus HbACApp and other Starshave b een neglected For example

how Cellsin HbACApp are related to Levels in the dierent Dimensionshave not b een

depicted Diagnosis Cell has b een explicited in the diagram to outline the imp ortance of

measuring the average HbAC at that Level

DiagnosisGroup L AtomicApplicable C <> HbA1c%Counter Invalid source for: hbA1c%: HbA1c% HbA1c%Counter {Date, Age, Diagnosis, Residence, Name, HbA1c%−>sum(HbA1c%Counter)} AtomicPatient C L <> DiagnosisFamily counter: HbA1c%Counter HbA1c% NonTransitive Invalid source for: C HbA1c%Counter Diagnosis {Date, Age, Diagnosis, Residence, Name−>Avg(HbA1c%)} hbA1c%Avegare: HbA1c%Average Age L <> Cell C /age HbA1c%Average NonTransitive Level L

{DateOfDiagnosis.date−DateOfBirth.date} {Diagnosis−>Avg(HbA1c%)} Derivation /attName {supliers}



Figure B Lower level schema of the clinical case study mo deled with YAM

Finally gure B depicts the most detailed schema Basicallyitshows the Attribute s

of the most representative Classes We can see the Attribute sofAtomicPatienthow age is

derived AtomicApplicable and Diagnosis and the way to summarize the dierent kinds of

measures Non transitivity of summarizations has also b een explicited

B Discussion

The rst thing that attracts attention is that gure B shows six analysis dimensions while

gure B shows seven This is due to the temp oral mechanisms oered byPedersens mo del

These mechanisms allowtohave the temp oral dimension implicit in the schema Nevertheless



YAM do es not oer such mechanisms so that Date Dimension needs to b e explicited If we

consider that the temp oral dimension is always present in analysis tasks and it is well known

we can omit it However if wewant to reect the imp ortance of that analysis dimension and

b e able to dene sp ecic elementsforitinevery schema or view it is much b etter to explicit



it Evolution of Diagnosis hierarchy can also b e explicited in a YAM schema

Another imp ortantpoint is that Residence Dimension b ecomes linear when mo deled with

 

YAM This is b ecause YAM do es not allow noncovering hierarchies Having instances

skipping aggregation levels is not necessary If some addresses do not b elong to any city all

we need to do is dene rural areas together with the urban ones so that they cover the set of

addresses

Regarding HbAC Dimension and the Precision Attribute gure B do es not reect

reality Grouping instances in dierent steps dep ending on the precision of measurements is

not enough Analysts should b e able to study precise and imprecise measurements separately

or not at will and b e able to know whether a given measurement is precise or not Moreover

by dening sub classes of Patientweshowhow an attribute is present in some instances and

absent from others



Derivation is a rst class concept when mo deling with YAM While gure B not even



shows that age comes from a derived Attribute with YAM we can explicit how it is obtained

from other Attribute s Moreover two Dimensions deriving from a common one can also b e

shown if appropriate

Ped just distinguishes three typ es of aggregate functions One of those typ es is asso ciated

to every Level in the DimensionsHowever this information is not included in the schema



YAM allows to show the sp ecic aggregation prop erties of every aggregation for instance

transitivity aggregability along a given analysis dimension or the prop er source level for the

aggregation



Another interesting p ointof YAM is that it provides mechanisms to reect the imp ortance

of measures at dierent aggregation levels For instance in this case gure B do es not reect

the imp ortance of the average of HbAC at LLDiagnosis Nevertheless gure B shows that

there exists a Cell identied by LLDiagnosis and B zo oms into that Cell to showthatit

contains an HbACAverage Attribute More details ab out this aggregation and other is also

reected at Lower detail level

To nish just to say that a diagnosis b eing primary or secondary is neither shown in



gure B nor when using YAM However it can b e easily mo deled by means of General izationSp ecialization



Design examples with YAM

B Vehicle repairs

Another interesting case study is that presented in BSHD and SBHD It is a real world

pro ject with an industrial partner where a car manufacturer wants to analyze the repairs of his

vehicles to improve the pro ducts dene new warranty p olicies and to assess the quality of the

garages Thus he is interested in analyzing vehicle repairs based on the sp ecic vehicle garage

where it is repaired the day of the repair and the customer Several measures are of interest

namely wages part costs total costs duration of the repair and numb er of p ersons that are

involved

B Original schema

year

costs (part)

costs (wages) month vehicle costs (total) sales

# of persons price day duration type of garage

vehicle vehicle garage geogr. region country brand model vehicle repair

Lenght Isa customer Width Height age Color Car Truck income Horse power

#_Seats Loading_area

Gear_type Loading_capacity

Figure B Schema of the repairs case study SBHD

Figure B presents the data schema as in SBHD It is presented in Multidimensional

ER MER an extension of the ER mo del We can see that several Facts an sp ecialization

of Relationship are allowed in the same schema ie vehicle repair and vehicle sales in

this case Moreover aggregation hierarchies are explicited and they can b e shared by dierent

Dimensionslike customer and garage Attributes that describ e instances of Dimension

an sp ecialization of Entity but do not dene aggregation hierarchies are also allowed Sp e

cialization of concepts like vehicle into Car and Truck is also exemplied and justied by

the existence of sp ecic attributes

Sp ecic queries of interest to b e p ossed on this schema are

Give me the average total repair costs of a vehicle p er month for garages in Bavaria by

typ e of garage during the year

Givemethevevehicle typ es that had the highest average part cost p er repair in the

year



B YAM schema

<> <> VehicleSales VehicleRepairs Dimension D Fact F D D /MonthTime Time Derived element / Generalization Association F D F D VehicleSale Vehicle VehicleRepair Garage Aggregation Derivation Flow Customer D

CarDD Truck Geographic D



Figure B Upp er level schema of the repairs case study mo deled with YAM



Figure B shows the same schema at Upp er detail level with YAM Here the two dier

ent sub jects of analysis are separated into two Stars ie VehicleSales and VehicleRepairs

The fact that b oth use the Time Dimension is showby means of a Derivation relationship

from the more general we derive the more sp ecic Moreover at this level we can also observe

that in the denition of Garage and Customer Dimensions geographic information was used

in some way Finally sp ecialization of Vehicle is also shown

L L L L LL L All Brand Model Car Truck Model Brand All L

<> VehicleRepairs

L LL All L CustomerByCountry L CustomerByRegion L Customer L Vehicle Model Brand All L

VehicleRepair C CountryLL Region <> {Day, Garage, Vehicle} Level L Cell C Derived element / Generalization L L Association GarageByCountry GarageByRegion L L L * L Day L Month Year All Aggregation All L Garage TypeOfGarage L Derivation

Flow



Figure B Intermediate level schema of the repairs case study mo deled with YAM

At Intermediate level as shown in gure B we can see the details of Dimensions

and Facts Firstlyitisshown that Levels of sp ecialized Dimensions ie Vehicle are also

sp ecialized into Levels of the corresp onding Dimensions ie Car and Truck Moreover



Design examples with YAM

geographic Levelsin Customer and Garage are asso ciated to Levels that should b elong to a

Geographic Dimension Finally it is also imp ortant to outline that the Base of the only

Cell is comp osed just by three Levels ie Day Garageand Vehicle Customer could b e

used in analysis tasks but it is determined by the other three Levels Therefore it would b e

awaste of space to use a dimensional space to store instances of VehicleRepair

L L Vehicle Truck VehicleRepair C <> TotalAverage length loadingCapacity costsPart: Cost NonTransitive width loadingArea costsWage: Cost {Time, Vehicle, Customer, Garage −> avg(TotalCost)} height /costsTotal: TotalCost L color Car #ofPersosn: Counter {costsPart+costWage} horsePower numberOfSeats duration: Duration gearType avgCostsTotal: Average Customer L C Cell C age VehicleSales income Level L

price: Price Derivation /attName {supliers}



Figure B Lower level schema of the repairs case study mo deled with YAM

At the more detailed level depicted in gure B we see the Attribute s Just to notice

here that derivation of Attribute s is explicited

Regarding the example queries the rst one would b e solved by selecting repairs in in

Bavaria rolling them up to Month and TypeOfGarage pro jecting costsTotal Measureand

placing data in a dimensional space dened by Month and TypeOfGarage

M onthT ypeOfGarage av g C ostsT otal M onthT y peO f Gar ag e

V ehicleRepair

year AN D r eg ionBavaria

The second query asks for vehicle typ es which do es not corresp ond to the factual data

in this schema To b e able to solveitVehicle should b e considered a Factandaverage part

repair costs should b e a derived attribute of its asso ciated Dimension

B Discussion

First of all we see that like in section B several Levels can b e shared b etween Dimensions

In SBHD it is p ointed out that in spite of that they are mo deled together the schema still

contains two dierent Dimensions Thus at conceptual level no redundant mo deling of the

shared Levels is necessary and at later phases of design this can b e used to avoid redundancies

storing b oth Dimensions only once

BSHD explicitly explains the imp ortance of having the denition of derived Measures

as part of the schema However in SBHD it is sp ecied that this cannot b e included in this

mo del b ecause like ER it is only able to reect static structure of the application domain



There is no problem in YAM to showsuch information by means of UML notation in Static

StructureDiagrams

It is also said in BSHD that the computation of aggregation functions mightnotbe

semantically meaningful for all Measures and it should b e expressible in the conceptual mo del

Nevertheless it is not shown in gure B It is shown at Lower detail level when mo deling



with YAM

Finally another interesting issue not reected with MER is the dep endencies b etween Di



mensions YAM allows to show that in this example only three Dimensions are necessary

to identify the facts

List of publications

App endix C

List of publications

Every pap er published in a resp ectable journal should have a preface bythe

author stating why he is publishing the article and what value he sees in it I have

no hop e that this practice will ever b e adopted

Morris Kline

This app endix contains the publications that generated this thesis work b esides those that

in one way or another also inuenced it Those that are closer to the sub ject are classied

based on the chapters Section C contains those pap ers written during the elab oration of this

thesis that cannot b e regarded as prop er thesis work

C Related to chapter

Alb erto Ab ello Jose Samos and Felix Saltor A Framework for the Classication and

c

Description of Multidimensional Data Mo dels SpringerVerlag In Pro ceedings of

the th International ConferenceonDatabase and Expert Systems Applications DEXA

Munich Germany September Pages Lecture Notes in Computer

Science volume Springer ISSN ISBN

The words OnLine Analytical Pro cessing bring together a set of to ols that

use multidimensional mo deling in the management of information to improve

the decision making pro cess Latelya lotofwork has b een devoted to mo deling

the multidimensional space The aim of this pap er is twofold On one hand it

compiles and classies some of that work with regard to the design phase they

are used in On the other hand it allows to compare the dierent terminology

used byeach author by placing all the terms in a common framework

Alb erto Ab ello Jose Samos and Felix Saltor A Data Warehouse Multidimensional

Data Mo dels Classication Technical Rep ort LSI Departamento de Lengua jes y

Sistemas Informaticos Universidad de Granada December

Extended version of the previous pap er

Alb erto Ab ello Jose Samos and Felix Saltor Benets of an Ob jectOriented Multidi

c

mensional Data Mo del SpringerVerlag In Pro ceedings of the Objects and Databases

International Symposium in th European Conference on ObjectOrientedProgram

ming ECOOP Sophia Antip olis and Cannes France June Pages

Lecture Notes in Computer Science volume Springer ISSN ISBN

In this pap er we try to outline the go o dness of using an OO mo del on de

signing multidimensional Data Marts We argue that multidimensional mo del

ing is lacking in semantics which can b e obtained by using the OO paradigm

Some b enets that could b e obtained by doing this are classied in six O

O Dimensions ie ClassicationInstantiation GeneralizationSp ecialization

AggregationDecomp osition Behavioural Derivability and Dynamicity and

exemplied with sp ecic cases

C Related to chapter

Jose Samos Alb erto Ab ello Marta Oliva Elena Ro drguez Felix Saltor Jaume Sistac

Francisco Araque Cecilia Delgado Eladio Garv and Emilia Ruiz Sistema Co op erativo

para la Integracion de Fuentes Heterogeneas de Informacion y Almacenes de Datos In

Novatica NovDec pages Aso ciacion de Tecnicos de Informatica ATI

In Spanish

This work presents our prop osal for the creation of a prototyp e of co op era

tive systems for the integration of heterogeneous information sources and data

warehouses which is at the core of our research The general goal is to provide

a software layer that allow the co op eration among several information sources

interconnected by means of a communication network Each source owns its

answer services to queries that regarding its data p erform its users and addi

tionallywants to oer to some users the opp ortunity of accessing the whole

set of data in a uniform manner integrated access either in real time or by

means of the data warehouse

Alb erto Ab ello Marta Oliva Jose Samos and Felix Saltor Information System Ar

chitecture for Data Warehousing from a Federation In Pro ceedings of the International

Workshop on Engineering Federated Information Systems EFIS Dublin Ireland

June Pages IOS Press ISBN

This pap er is devoted to Data Warehousing schemas architecture and its

data schemas We relate a federated databases architecture to Data Warehouse

schemas which allows us to provide b etter understanding to the characteristics

of every schema as well as the way they should b e dened Because of the

List of publications

condentiality of data used to make decisions and the federated architecture

used wealsopay attention to data protection

Alberto Abello Marta Oliva Jose Samos and Felix Saltor Information System Archi

tecture for Secure Data Warehousing Technical Rep ort LSIR Departamentde

Llenguatges i Sistemes Informatics Universitat Politecnica de Catalunya April

Extended version of the previous pap er

Felix Saltor Marta Oliva Alb erto Ab ello and Jose Samos Building Secure Data Ware

house Schemas from Federated Information Systems In Pro ceedings of the International

CODATA Conference on Data and Information for the Coming Know ledge Milenium

CODATABaveno Italy Octob er Extended abstract

There are similarities b etween architectures for Federated Information Sys

tems and architectures for Data Warehousing In the context of an integrated

architecture for b oth Federated Information Systems and Data Warehousing

we discuss how additional schema levels provide security and op erations to

convert from one level to the next

Alb erto Ab ello Jose Samos and Felix Saltor Multistar conceptual schemas for OLAP

systems Technical Rep ort LSIR of the Departament de Llenguatges i Sistemes

Informatics Universitat Politecnica de Catalunya

OLAP to ols divide concepts based on whether they are used as analysis

dimensions or are the fact sub ject of analysis whichgives rise to star shap e

schemas Op erations are always provided to navigate inside such star schemas

However the navigation among dierent stars uses to b e forgotten This pap er

studies dierent kinds of conceptual relationships b etween stars ie Deriv

ability GeneralizationSp ecialization AggregationDecomp osition and Tem

p oral and prop oses a level schemas architecture to ease the implementation

and usage of multistar schemas

C Related to chapter

Alb erto Ab ello Jose Samos and Felix Saltor Understanding Analysis Dimensions in

a Multidimensional Ob jectOriented Mo del In Pro ceedings of the rd International

Workshop on Design and Management of Data Warehouses DMDWInterlaken

Switzerland June SwissLife ISSN

OLAP denes a set of data warehousing query to ols characterized by pro

vidingamultidimensional view of data Information can b e shown at dierent

aggregation levels often called granularities for each dimension In this pap er

we try to outline the b enets of understanding the relationships b etween those

aggregation levelsasPartWhole relationships and how it helps to address some

semantic problems Moreover we prop ose the usage of other Ob jectOriented

constructs to keep as much semantics as p ossible in analysis dimensions

Alb erto Ab ello Jose Samos and Felix Saltor Understanding Facts in a Multidimensional

c

Ob jectOriented Mo del ACM In Pro ceedings of the th International Workshop on

Data Warehousing and OLAP DOLAP Atlanta USA Novemb er Pages

ACM Press ISBN

OnLine Analytical Pro cessing to ols are used to extract information from

the Data Warehouse in order to help in the decision making pro cess These

to ols are based on multidimensional concepts ie facts and dimensions In this

pap er we study the meaning of facts and the dep endencies in multidimensional

data This study is used to nd relationships b etween cub es in an Ob ject

Oriented framework and explain navigation op erations

C Related to chapter



Alberto Abello Jose Samos and Felix Saltor YAM Yet Another Multidimensional

Mo del An extension of UML Technical Rep ort LSIR of the Departamentde

Llenguatges i Sistemes Informatics Universitat Politecnica de Catalunya

This pap er presents a multidimensional conceptual Ob jectOriented mo del

its structures integrity constraints and query op erations It has b een develop ed

as an extension of UML core metaclasses to facilitate its usage as well as



to avoid the intro duction of completely new concepts YAM allows the

representation of several semantically related stars as well as summarizability

and identication constraints

C Other publications

Alb erto Ab ello Francisco Araque Jose Samos and Felix Saltor Bases de Datos Feder

adas Almacenes de Datos y Analisis Multidimensional In Tal ler de Almacenes de Datos

yTecnologa OLAP dentro de las VI Jornadas de Ingeniera del Software y Bases de

Datos JISBD Almagro Spain Novemb er In Spanish

This pages presentoutwork in the BLOOM pro ject of federated databases

regarding data warehousing and multidimensional analysis

Elena Ro drguez Alb erto Ab ello Marta Oliva Felix Saltor Cecilia Delgado Eladio Garv

and Jose Samos On Op erations along the GeneralizationSp ecialization Dimension In

Pro ceedings of the International th Workshop on Engineering Federated Information

Systems EFIS Berlin Germany Octob er Pages Inx ISBN

List of publications

The need to derive a database schema from one or more existing schemas

arises in Federated Database Systems as well as in other contexts Op era

tions used for this purp ose include conforming op erations whichchange the

form of a schema In this pap er we present a systematic approach to establish

a set of primitive conforming op erations that op erate along the Generaliza

tionSp ecialization dimension in the context of Ob jectOriented schemas

Elena Ro drguez Alb erto Ab ello and Marta Oliva Resumen del Simp osium en Ob jetos

y Bases de Datos del ECOOP In Tal ler de Bases de Datos Orientadas a Objetos

dentro de las V Jornadas de Ingeniera del Software y Bases de Datos JISBD

Valladolid Spain Novemb er In Spanish

The aim of this contribution is just to p opularize the results of the Sympo

th

sium on Objects and Databases held on June in SophiaAntip olis France

th

in conjunction with the European Conference on ObjectOrientedProgram

ming ECOOP This eventcontinued the short tradition established

the year b efore in Lisb on Portugal where was held the rst Workshop on

ObjectOriented Databases

Alb erto Ab ello and Elena Ro drguez Describing BLOOM with regard to UML Se

mantics In Pro ceedings of the V Jornadas de Ingeniera del Software y Bases de Datos

JISBD Valladolid Spain November Pages Gracas Andres

Martn ISBN

In this pap er we describ e the BLOOM metaclasses with regard to the Uni

ed Mo deling Language UML semantics We concentrate essentially on the

GeneralizationSp ecialization and AggregationDecomp osition dimensions b e

cause they are used to guide the integration pro cess BLOOM was intended for

Here we fo cus on conceptual data mo deling constructs that UML oers In

spite of UML provides much more abstractions than BLOOM we will show

that BLOOM still has some abstractions that UML do es not For some of

these abstractions we will sketchhow UML can b e extended to deal with this

semantics that BLOOM adds

Alb erto Ab ello Marta Oliva Elena Ro drguez and Felix Saltor The syntax of BLOOM

schemas Technical Rep ort LSIR Departament de Llenguatges i Sistemes In

formatics Universitat Politecnica de Catalunya July

The BLOOM BarceLona Ob ject Oriented Mo del data mo del was de

velop ed to b e the Canonical Data Mo del CDM of a Federated Database

Management System prototyp e Its design satises the features that a data

mo del should have to b e suitable as a CDM The initial version of the mo del

BLOOM has evolved into the presentversion BLOOM This rep ort sp ec

ies the syntax of the schema denition language of BLOOM In our mo del

aschema is a set of classes related through two dimensions the generaliza

tionsp ecialization dimension and the aggregationdecomp osition dimension

BLOOM supp orts several features in each of these dimensions through their

corresp onding metaclasses

Even if users are supp osed to dene and mo dify schemas in an interactive

way using a Graphical User Interface a linear schema denition language is

clearly needed Syntax diagrams are used in this rep ort to sp ecify the language

an alternative using grammar pro ductions app ears as App endix A A p ossible

graphical notation is given in App endix B A comprehensive running example

illustrates the mo del the language and its syntax and the graphical notation

Alb erto Ab ello Marta Oliva Elena Ro drguez and Felix Saltor The BLOOM mo del re

visited An evolution prop osal p oster session In Workshop Reader of the th European

Conference on ObjectOrientedProgramming ECOOP Lisb on June Pages

SpringerVerlag Lecture Notes in Computer Science volume Springer

ISBN X

The growing need to share information among several autonomous and het

erogeneous data sources has b ecame an active research area A p ossible solution

is providing integrated access through a Federated Information System FIS

In order to provide integrated access it is necessary to overcome semantic het

erogeneities and represent related concepts This is accomplished through an

integration pro cess in whichaCanonical Data Model CDM plays a central

role

Once argued the desirable characteristics of a suitable CDM the BLOOM

mo del BarceLona Ob ject Oriented Mo del was progressively dened Recently

wehave revised the BLOOM mo del giving rise to BLOOM We discuss the

change reasons and the main innovations that BLOOM includes

Alb erto Ab ello CORBA A middleware for an heterogeneous co op erative system Techni

cal Rep ort LSIR Departament de Llenguatges i Sistemes Informatics Universitat

Politecnica de Catalunya May

Two kinds of heterogeneities interfere with the integration of dierentin

formation sources those in systems and those in semantics They generate

dierent problems and require dierent solutions This pap er tries to sepa

rate them by prop osing the usage of a distinct to ol for each one ie CORBA

and BLOOM resp ectively and analizing how they could collab orate CORBA

oers lots of ways to deal with distributed ob jects and their p otential needs

while BLOOM takes care of the semantic heterogeneities Therefore it seems

promising to handle the system heterogeneities by wrapping the comp onents of

the BLOOM execution architecture into CORBA ob jects

Alb erto Ab ello and Felix Saltor Implementation of the BLOOM data mo del on Ob ject

Store Technical Rep ort LSIT Departament de Llenguatges i Sistemes Informatics

List of publications

Universitat Politecnica de Catalunya May

BLOOM is a semantically enriched Ob jectOriented data mo del It oers

extra semantic abstractions to b etter represent the real world Those abstrac

tions are not implemented in any commercial pro duct This pap er explains

how all them could b e simulated with a software layer on an Ob jectOriented

database management system Concretelyitproved to work on Ob jectStore

Glossary

Glossary

Asso ciation As explained in OMGb denes a semantic relationship b etween Classiers

see gure in page The instances of an Association are a set of tuples relating

instances of the Classiers

Aggregation As explained in OMGb a kind of Association relationship so that one end

is part of the other see gure in page

Attribute As explained in OMGb a named slot within a Classier that describ es a range

of values that instances of the Classier may hold

BLOOM BarceLona Ob jectOriented Mo del It was conceived as a semantically rich OO

mo del to b e used to overcome semantic heterogeneities in the integration pro cess of a

FIS

Canonical Data Mo del Common mo del used to overcome the heterogeneities in the dierent

data mo dels of the CDBs in a federation

CASE Computer Aided Software Engineering

CDB See Comp onent Database

CDM See Canonical Data Mo del

Cell ie class of cells contains those cells representing the same kind of fact and b eing

asso ciated with instances of the same Level for each of the Dimensionsweuseto

analyze it see pages and

CIF See Corp orate Information Factory

Class As dened in OMGb a description of a set of ob jects that share some Attribute s

Operation s Method s Relationships and semantics

Classier As dened in OMGb an element that describ es b ehavioral and structural fea

tures it comes in several sp ecic forms including Class DataType Interface Component

and others see gure in page

CMDS See Corp orate Multidimensional Schema

Glossary

Common Warehouse Metamo del As explained in OMGa a metadata standard which

purp ose is to enable easy interchange of warehouse and business intelligence metadata

between warehouse to ols warehouse platforms and warehouse metadata rep ositories in

distributed heterogeneous environments

Comp onent Database Each database participating in a FIS

Conceptual mo del Data mo del close to the way users p erceive data and indep endent of the

implementation see page

Corp orate Information Factory Data Warehousing architecture dened in I IS see g

ure in page

Corp orate Multidimensional Schema Intermediate level of a levels architecture for the

managementofmultidimensional data see gure in page

Cub e An injective function from an ndimensional nite space dened by the cartesian pro d

uct of n functionally indep endent Levels to the set of instances of a Cell see page

CWM See Common Warehouse Metamo del

Data Mart As dened in I IS a collection of data tailored to the Decision Supp ort Sys

tems pro cessing needs of a particular department see page

Data Warehouse As it was dened in Inm an integrated sub jectoriented historic and

nonvolatile set of data in supp ort for the decision making pro cess see page

Data Warehousing As it was dened in Gar a pro cess not a pro duct for assembling

and managing data from various sources for the purp ose of gaining a single detailed view

of part or all of a business see page

Data Cub e A metaphor that represents how analysts conceive data see page

DB Database

DBMS Database Management System

Derivation As dened in OMGb a kind of relationship which sp ecies that the clientmay

b e computed from the supplier see gure in page

Descriptor An attribute of a Level used to select its instances see page

Dimension A connected directed graph representing a p oint of view on analyzing data Every

vertex in the graph corresp onds to an aggregation level and an edge reects that every

instance at target Level decomp oses into a collection of instances of source Level ie

edges reect partwhole relationships b etween instances of Levels see page

DM See Data Mart

Glossary

DW See Data Warehouse

ER EntityRelationship

Expressiveness As it is dened in SCG the degree to which a mo del can express or

represent a conception of the real world see page

Fact A a connected directed graph representing a sub ject of analysis Every vertex in the

graph corresp onds to a Cell and an edge reects that every instance at target Cell

decomp oses into a collection of instances of source Cell ie edges reect partwhole

relationships b etween instances of Cells see page

FD Functional Dep endency

FIS See Federated Information System

Federated Information System A collection of co op erating but autonomous comp onent

systems

Flow As explained in OMGb a relationship b etween twoversions of an ob ject see gure

in page

Generalization As explained in OMGb a taxonomic relationship b etween a more general

element and a more sp ecic element see gure in page

Hyp ercub e See Data Cub e

Intermediate Detail level that contains Class es ie Cells and Levels see page

Key As dened in AHV a minimal sup erkey

Level Represents the set of instances of the same granularity in an analysis dimension see

page

Logical mo del A data mo del providing concepts that can b e understo o d by end users but

that are not to o far removed from the way data is organized within the computer see

page

Lower Detail level that contains Attribute s ie Measures and Descriptors see page

Measure An attribute of a Cell representing measured data to b e analyzed see page

Measurement Act of measuring Each instance of Measure

Mereology The science that studies partwhole relationships

MOLAP See Multidimensional OLAP

Multidimensional OLAP Pure multidimensional DBMS

Nexus Any kind of semantic relationship b etween two ob jects see page

Glossary

OLAP See Ob jectOriented OLAP

Ob ject Constraint Language A formal language to express sideeectfree constraints de

ned in OMGb

Ob jectOriented OLAP Ob jectOriented DBMS adapted for OLAP

Ob ject Query Language Query language of the ODMG Ob ject Data Management Group

data mo del

OCL See Ob ject Constraint Language

ODS See Op erational Data Store

OLTP OnLine Transactional Pro cessing

OLAP See OnLine Analytical Pro cessing

OnLine Analytical Pro cessing As dened in OLA a category of software technology

that enables analysts managers and executives to gain insightinto data through fast

consistent interactive access to a wide variety of p ossible views of information that has

b een transformed from raw data to reect the real dimensionality of the enterprise as

understo o d by the user see page

OO Ob jectOriented

OODimension Each one of the six dimensions of the OO paradigm identied in Salsee

section

Op erational Data Store As dened in I IB an architectural construct that is sub ject

oriented integrated volatile currentvalued and contains only corp orate detailed data

see page

OQL See Ob ject Query Language

Physical mo del A data mo del tightly coupled to the sp ecic DBMS used and conceived to

describ e how data is actually stored see page

ROLAP See Relational OLAP

Relational OLAP Relational DBMS adapted for OLAP

Semantic Domain Domain reecting the conceptualization of values in the mind of the de

signer

Semantic Power See Expressiveness

Semantic Relativism As is dened in SCG the degree to which a mo del can accommo

date not only one but many dierent conceptions see page

Glossary

Slice and Dice As dened in OLA the userinitiated pro cess of navigating by calling

for page displays interactively through the sp ecication of slices via rotations and drill

downup see page

SQL Structured Query Language

Star A mo deling element comp osed byaFact and several Dimensions that can b e used to

analyze it see page

Sup erkey As dened in AHV a subset of the attributes of a Relation that functionally

determines it



Transaction Time As dened in DGK is the time when a fact is current in a database

and may b e retrieved

UML See Unied Mo deling Language

Unied Mo deling Language As said in OMGb provides a consistent language for sp ec

ifying visualizing constructing and do cumenting the artifacts of software systems

UoD Universe of Discourse

Upp er Detail level that contains Classier s ie FactsandDimensions see page



Valid Time As it is dened in DGK the time when a fact is true in the mo deled reality