Detailed Table of Contents

Preface ...... xiii

Section 1 Conceptual Design and Ontology-Based Integration

Chapter 1 From User Requirements to Conceptual Design in Warehouse Design: A Survey...... 1 Matteo Golfarelli, DEIS - , Italy

This chapter gives a nice survey on conceptual design and user requirement analysis in the context of data warehouse environment and shows its importance in guarantying the success of business intel- ligence projects.

Chapter 2 Data Extraction, Transformation and Integration Guided by an Ontology ...... 17 Chantal Reynaud, Université -Sud, CNRS (LRI) & INRIA (Saclay – Île-de-), France Nathalie Pernelle, Université Paris-Sud, CNRS (LRI) & INRIA (Saclay – Île-de-France), France Marie-Christine Rousset, LIG – Laboratoire d’Informatique de , France Brigitte Safar, Université Paris-Sud, CNRS (LRI) & INRIA (Saclay – Île-de-France), France Fatiha Saïs, Université Paris-Sud, CNRS (LRI) & INRIA (Saclay – Île-de-France), France

This chapter focuses on the problem of integrating of XML heterogeneous information sources into a data warehouse with data defined. The main characteristic of this integration is that it is guided by an ontology. The authors present an approach supporting the acquisition of data from a set of external sources available for an application of interest including data extraction, data transformation and data integration or reconciliation. The integration middleware that the authors propose extracts data from external XML sources which are relevant according to an RDFS+ ontology, transforms returned XML data into RDF facts conformed to the ontology and reconciles RDF data in order to resolve possible redundancies. Chapter 3 X-WACoDa: An XML-Based Approach for Warehousing and Analyzing Complex Data ...... 38 Hadj Mahboubi, Université de (ERIC Lyon 2), France Jean-Christian Ralaivao, Université de Lyon (ERIC Lyon 2), France Sabine Loudcher, Université de Lyon (ERIC Lyon 2), France Omar Boussaïd, Université de Lyon (ERIC Lyon 2), France Fadila Bentayeb, Université de Lyon (ERIC Lyon 2), France Jérôme Darmont, Université de Lyon (ERIC Lyon 2), France

This chapter proposes a unified XML warehouse reference model that synthesizes and enhances related work, and fits into a global XML warehousing and analysis approach thea uthors have developed. This chapter is validated by a software platform that is based on this model, as well as a case study that il- lustrates its usage.

Chapter 4 Designing Data Marts from XML and Relational Data Sources ...... 55 Yasser Hachaichi, Mir@cl Laboratory, Faculté des Sciences Economiques et de Gestion, Tunisia Jamel Feki, Mir@cl Laboratory, Faculté des Sciences Economiques et de Gestion, Tunisia Hanene Ben-Abdallah, Mir@cl Laboratory, Faculté des Sciences Economiques et de Gestion, Tunisia

This chapter presents a bottom-up/data-driven method for designing data marts from two types of sources: a relational database and XML documents compliant to a given DTD. This method has three automatic steps: Data source pretreatment, relation classification, and data mart schema construction and one manual step for DM schema adaptation. This method is illustrated an e-ticket DTD used by an online broker and a relational database modeling a hotel room booking system.

Chapter 5 Ontology-Based Integration of Heterogeneous, Incomplete and Imprecise Data Dedicated to a Decision Support System for Food Safety ...... 81 Patrice Buche, INRA, France Sandrine Contenot, INRA, France Lydie Soler, INRA, France Juliette Dibie-Barthélemy, AgroParisTech, France David Doussot, AgroParisTech, France Gaelle Hignette, AgroParisTech, France Liliana Ibanescu, AgroParisTech, France

This chapter presents an application in the field of food safety using an ontology-based data integration approach. The ontology-based data integration approach permits to homogenize data sources which are heterogeneous in terms of structure and vocabulary. This approach is done in the framework of the Semantic Web, an international initiative which proposes annotating data sources using ontologies in order to manage them more efficiently. Chapter 6 On Modeling and Analysis of Multidimensional Geographic Databases ...... 96 Sandro Bimonte, LIRIS (Laboratoire d’InfoRmatique en Images et Systèmes d’information), France

This chapter presents a panorama of spatial OLAP (SOLAP) models and an analytical review of research SOLAP tools. It describes a Web-based system: GeWOlap. GeWOlap is an OLAP-GIS integrated solu- tion implementing drill and cut spatio-multidimensional operators, and it supports some new spatio- multidimensional operators which change dynamically the structure of the spatial hypercube thanks to spatial analysis operators.

Section 2 Physical Design and Self Tuning

Chapter 7 View Selection and Materialization ...... 114 Zohra Bellahsene, LIRMM-CNRS/Université 2, France

This chapter presents the problem of materialized view selection and presents algorithms in dynamic environment. A tool called MATUN, build from the ground up to facilitate different view materialization strategies using the proposed algorithms.

Chapter 8 ChunkSim: A Tool and Analysis of Performance and Availability Balancing ...... 131 Pedro Furtado, , Portugal

This chapter proposes ChunkSim, an event-based simulator for analysis of load and availability balancing in chunk-wise parallel data warehouses. This chapter discusses first how a shared nothing machine can store and process a data warehouse chunk-wise, and uses an efficient on demand processing approach. Then, it presents different parameters used by ChunkSim. Finally, it presents data allocation and replica- tion alternatives that ChunkSim implements and the analysis that ChunkSim is currently able to run on performance and availability features.

Chapter 9 QoS-Oriented Grid-Enabled Data Warehouses ...... 150 Rogério Luís de Carvalho Costa, University of Coimbra, Portugal Pedro Furtado, University of Coimbra, Portugal

This chapter presents QoS-oriented scheduling and distributed data placement strategies for the Grid-based warehouse. It discusses the use of a physically distributed database, in which tables are both partitioned and replicated across sites. The use of facts’ table partitioning and replication is particularly relevant as Grid users’ queries may follow geographical related access patterns. Section 3 Evolution and Maintenance Management

Chapter 10 Data Warehouse Maintenance, Evolution and Versioning ...... 171 Johann Eder, University of Klagenfurt, Austria Karl Wiggisser, University of Klagenfurt, Austria

This chapter focuses on the problem of maintenance in the data warehouse domain and provides illustrat- ing examples that motivate the need for data warehouse maintenance. It presents some basic terms and definitions for the common understanding and introduces the different aspects of data warehouse main- tenance. Several approaches addressing the problem are presented and classified by their capabilities.

Chapter 11 Construction and Maintenance of Heterogeneous Data Warehouses ...... 189 M. Badri, Crip5 Université Paris Descartes, France and Lipn Université Paris Nord, France F. Boufarès, Lipn Université Paris Nord, France S. Hamdoun, Lipn Université Paris Nord, France V. Heiwy, Crip5 Université Paris Descartes, France K. Lellahi, Lipn Université Paris Nord, France

This chapter proposes a formal framework which deals with the problem of integrating of heterogeneous data sources from various categories: the structured data, the semi-structured data and unstructured data. This approach is based on the definition of an integration environment.

Section 4 Exploitation of Data Warehouse

Chapter 12 On Querying Data and Metadata in Multiversion Data Warehouse ...... 206 Wojciech Leja, Poznań University of Technology, Poland Robert Wrembel, Poznań University of Technology, Poland Robert Ziembicki, Poznań University of Technology, Poland

This chapter presents the MVDWQL query language, for the multiversion data warehouse that allows: (1) query multiple data warehouse versions that differ with respect to their schemas, (2) augment query results with metadata describing changes made to the queried data warehouse versions, and (3) explicitly query metadata on the history of data warehouse changes and visualize their results. Two types of queries on metadata are supported, namely: (1) queries searching for data warehouse versions that include an indicated data warehouse object and (2) queries retrieving the history of the evolution of an indicated data warehouse object. Chapter 13 Ontology Query Languages for Ontology-Based Databases: A Survey ...... 227 Stéphane Jean, LISI/ENSMA and University of , France Yamine Aït Ameur, LISI/ENSMA and , France Guy Pierra, LISI/ENSMA and University of Poitiers, France

This chapter presents a nice survey on ontology query languages dedicated to ontology based databases. To compare these languages, a set of requirements for an architecture that extends the traditional ANSI/ SPARC architecture with conceptual and ontological levels is defined. An interesting discussion about two Semantic Web query languages: SPARQL and RQL is given. Two database-oriented ontology query languages: Oracle extension to SQL and OntoQL to fulfill these requirements are given.

Chapter 14 Ontology-Based Database Approach for Handling Preferences ...... 248 Dilek Tapucu, IYTE, Izmir, Turkey Gayo Diallo, LISI/ENSMA, Poitiers, France Yamine Ait Ameur, LISI/ENSMA, Poitiers, France Murat Osman Ünalir, Ege University, Izmir, Turkey

This chapter studies the problem of query personalisation in the context of ontology based databases, where OntoDB model is used to validate the proposed work. A preference model is proposed. It is com- posed of several types of preferences usually addressed in the literature in a separate way.

Chapter 15 Security in Data Warehouses ...... 272 Edgar R. Weippl, Secure Business Austria, Austria

This chapter presents an important issue in data warehouse which is security. It describes the traditional security models: mandatory access control (MAC), driven mainly by military requirements and role- based access control (RBAC) is the commonly used access control model in commercial databases. Some issues on statistical databases are given.

Compilation of References ...... 280

About the Contributors ...... 306

Index ...... 315