Industrial-Strength Schema Matching

Total Page:16

File Type:pdf, Size:1020Kb

Industrial-Strength Schema Matching Industrial-Strength Schema Matching Philip A. Bernstein, Sergey Melnik, Michalis Petropoulos, Christoph Quix Microsoft Research Redmond, WA, U.S.A. {philbe, melnik}@microsoft.com, [email protected], [email protected] Abstract LastName in one schema are related to Name in the other, but not say that concatenating the former yields the latter. Schema matching identifies elements of two Query discovery picks up where schema matching leaves given schemas that correspond to each other. off. Given the correspondences, it obtains queries to trans- Although there are many algorithms for schema late instances of the source schema into instances of the matching, little has been written about building a target, e.g., using query analysis and data mining [5]. system that can be used in practice. We describe This paper focuses on schema matching. There are our initial experience building such a system, a many algorithms to solve it [8]. They exploit name customizable schema matcher called Protoplasm. similarity, thesauri, schema structure, instances, value distribution of instances, past mappings, constraints, 1 Schema Matching cluster analysis of a schema corpus, and similarity to Most systems-integration work requires creating standard schemas. All of these algorithms have merit. So mappings between models, such as database schemas, what we need is a toolset that incorporates them in an message formats, interface definitions, and user-interface integrated package. This is the subject of this paper. forms. As a design activity, schema mapping is similar to The published work on schema matching is mostly database design in that it requires digging deeply into the about algorithms, not systems. This algorithm work is semantics of schemas. This is usually quite time helpful, offering new ways to produce mappings that pre- consuming, not only in level-of-effort but also in elapsed- vious algorithms were unable to find. However, all of the time, because the process of teasing out semantics is slow. published algorithms are fragile: they often need manual It can benefit from the development of improved tools, tuning, such as setting thresholds, providing a thesaurus, but it is unlikely such tools will provide a silver bullet that or being trained on examples; even after tuning, it is easy automates all of the work. For example, it is hard to to find schemas that the algorithms do not map correctly; imagine how the best tool could eliminate the need for a and many of them do not scale to large schemas. designer to read documentation, ask developers and end- COMA is the first work to address engineering issues users how they use the data, or review test runs to check of a schema matching system [1]. Its architecture offers for unexpected results. In a sense, the problem is AI-com- multiple schema-level matchers and a fixed process to plete, that is, as hard as reproducing human intelligence. combine their results. Matchers exploit linguistic, data- The best commercially-available model mapping tools type, and structural information, plus previous matches, to we know of are basically graphical programming tools. produce similarity matrices forming a cube. The cube is That is, they allow one to specify a schema mapping as a aggregated to a matrix. Then particular similarity values directed graph where nodes are simple data transforma- are selected as good match candidates, which are com- tions and edges are data flows. Such tools help specify: a bined to a single value. This process is executed for whole mapping between two messages, a data warehouse schemas or for two schema elements, and is repeated after loading script, or a database query. While such graphical the user provides feedback. Experimentation showed that programming is an improvement over typing code, no finding good combinations of matchers depends on char- database design intelligence is being offered. Despite the acteristics of the schemas being matched, demonstrating limited expectations expressed in the previous paragraph, that customizability of the combination process is crucial. we should certainly be able to offer some intelligence ⎯ Inspired by COMA’s approach, we felt it was impor- automated help, in addition to attractive graphics. tant to push further toward building an industrial-strength There are two steps to automating the creation of schema matcher, one that avoids fragility problems, is mappings between schemas: schema matching and query customizable for use in practical applications, and extends discovery. Schema matching identifies elements that the range of matching solutions being offered. correspond to each other, but does not explain how they We believe fragility is inherent in the problem. To correspond. For example, it might say that FirstName and mitigate it, we need a system that can exploit the best SIGMOD Record, Vol. 33, No. 4, December 2004 38 algorithms and is customizable by a human designer. De- implement match algorithms consisting of interconnected signers use all of the techniques found in specific schema operators. Each script combines a set of operator matching algorithms: name similarity, thesauri, common implementations by passing SMM graphs and similarity schema structure, overlapping instances, common value matrices from one to another. Ideally, strategies would be distribution, re-use of past mappings, constraints, similar- designed using a Graphical Interface that generates the ity to standard schemas, and common-sense reasoning. A strategy script from its graphical representation. system should use all of these techniques too. Customizability is needed for several reasons. First, a Graphical Strategy user can improve a schema matching tool by selecting Interface Scripts particular techniques and combining them in a way that Custom works best for the types of schemas being matched. For Impls Strategy Consuming Operator Execution example, if a version ID is prepended to element names in Interfaces Application Protoplasm Engine evolving schemas, then a user may want to delete the ID Impls before applying any matcher. The degree of such custom- izability depends on the user’s sophistication. An end-user SMM Similarity Auxiliary Schemas Graph Matrix Info who is matching two schemas wants a limited range of options to choose from. By contrast, a sophisticated user, Figure 1 Protoplasm Architecture such as an application vendor, wants to customize the tool The Strategy Execution Engine takes as input a to work well when matching its schemas to those of other strategy script, the Protoplasm-provided and custom parties. A second reason for customizability is to control operator implementations, and SMM graphs and similar- an algorithm’s scalability, e.g., by trading off response- ity matrices. During the execution, the engine accesses time for the quality of the result. A third reason is extensi- schema repositories and other auxiliary information, e.g., bility, meaning that new techniques can be easily added to a thesaurus or glossary, used by implementations of the tool. This helps sophisticated users, and researchers operators in the strategy script. The execution of the script who want to experiment with new or modified algorithms. results in a similarity matrix exported in a format We therefore embarked on a project to build such a compatible with an external Consuming Application. customizable schema matcher, one that could be used in 2.1 Representing Models commercial settings. This paper describes our early experience in developing this tool, called Protoplasm (a The input schemas can be in any meta-model, e.g., PROTOtype PLAtform for Schema Matching). In SQL DDL, ODMG or XML Schema Definition (XSD) addition to arguing for the importance of research on Language. Since no existing meta-model is expressive industrial-strength schema matching, we offer two main enough to subsume all others, Protoplasm defines SMM contributions: (i) a new architecture for schema matching, Graphs as a meta-model-independent representation. An which includes two new internal schema representations, SMM graph is a rooted, node- and edge-labeled directed interfaces for operators that comprise a matching algor- graph, where each node and edge has a unique identifier. ithm, and a flexible approach to combining operators; and Figure 2 shows an example SMM Graph describing the (ii) experience in tuning the prototype for scalability and SQL DDL in the shaded box that defines a relational using its customization features to add another algorithm. table. The root (i.e., id=1) is indicated by a thick circle. These are covered in Sections 2 and 3 respectively. We The graph identifier (i.e., id=0) and label are in boldface close in Section 4 with a discussion of future work. above the root. The database, table and column names are denoted by white circles, typing information by light gray 2 Architecture circles, and constraints (e.g., primary key) by dark gray circles. SMM is a variation of the Object Exchange Model Figure 1 presents the three-layer architecture of Proto- (OEM) [7], a data model defined for semistructured data, plasm. The bottom layer consists of two supporting struc- thus inheriting its simplicity and self-describing nature. tures. The Schema Matching Model Graph (SMM Graph) Protoplasm builds on widely-used XML technologies is the internal representation of input schemas: a rooted to implement SMM graphs. An XML syntax is defined to node- and edge-labeled directed graph. The Similarity describe and store SMM graphs in secondary memory Matrix holds correspondences between two SMM graphs. (e.g. disk), an extended XML parser is employed to load The middle layer defines a set of Operator Interfaces that them into main memory, a variation of DOM is used to declare the inputs and outputs of generic schema-match- enable programmatic navigation and editing, and an ing tasks such as import, transformation and export of XPath engine is modified to execute simple queries schemas, creation and manipulation of similarity matrices, against SMM graphs. The XML syntax describing SMM and match.
Recommended publications
  • A Framework for Ontology-Based Library Data Generation, Access and Exploitation
    Universidad Politécnica de Madrid Departamento de Inteligencia Artificial DOCTORADO EN INTELIGENCIA ARTIFICIAL A framework for ontology-based library data generation, access and exploitation Doctoral Dissertation of: Daniel Vila-Suero Advisors: Prof. Asunción Gómez-Pérez Dr. Jorge Gracia 2 i To Adelina, Gustavo, Pablo and Amélie Madrid, July 2016 ii Abstract Historically, libraries have been responsible for storing, preserving, cata- loguing and making available to the public large collections of information re- sources. In order to classify and organize these collections, the library commu- nity has developed several standards for the production, storage and communica- tion of data describing different aspects of library knowledge assets. However, as we will argue in this thesis, most of the current practices and standards available are limited in their ability to integrate library data within the largest information network ever created: the World Wide Web (WWW). This thesis aims at providing theoretical foundations and technical solutions to tackle some of the challenges in bridging the gap between these two areas: library science and technologies, and the Web of Data. The investigation of these aspects has been tackled with a combination of theoretical, technological and empirical approaches. Moreover, the research presented in this thesis has been largely applied and deployed to sustain a large online data service of the National Library of Spain: datos.bne.es. Specifically, this thesis proposes and eval- uates several constructs, languages, models and methods with the objective of transforming and publishing library catalogue data using semantic technologies and ontologies. In this thesis, we introduce marimba-framework, an ontology- based library data framework, that encompasses these constructs, languages, mod- els and methods.
    [Show full text]
  • Collecting Activity Data Using the Open Mhealth Platform an Exploratory Study on Integrating Objective Data with Sport Monitoring Systems
    Collecting activity data using the Open mHealth platform An exploratory study on integrating objective data with sport monitoring systems Daniel Gynnild-Johnsen , Lars-Erik Holte Master’s Thesis Spring 2017 Collecting activity data using the Open mHealth platform Daniel Gynnild-Johnsen Lars-Erik Holte May 2, 2017 Abstract Football players works together as a unit to perform on an elite competi- tive level, and the most minor abnormalities can determine the outcome of a match. Success can often be the result of healthy, uninjured and rejuve- nated players working together as a collective. Even though it is impossible to control all outcomes and scenarios, the risk of failure might be mini- mized by monitoring players closely on an individual level. If we monitor players over a longer period of time we might discover patterns or abnor- malities in their training. This information can be used to avoid multiple scenarios related to fatigue, injuries and overtraining. In this thesis we present a proof of concept for expanding an existing self- reporting monitoring system called pmSys, and look at how football teams and players can utilize modern technology like phones and wearable de- vices to capture objective data. This system will collect and store the data, which can be processed into useful visualised feedback, and help a team to evaluate their players. This way the coaches can make mitigating mea- sures to improve certain aspect that might be lacking on a player or team level. By eliminating the use of pen and paper, pmSys introduces a simpler way of reporting the players’ health status.
    [Show full text]
  • SAS Decision Services 6.4
    SAS® Decision Services 6.4 Administrator’s Guide SAS® Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS® Decision Services 6.4: Administrator's Guide. Cary, NC: SAS Institute Inc. SAS® Decision Services 6.4: Administrator's Guide Copyright © 2015, SAS Institute Inc., Cary, NC, USA All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others' rights is appreciated. U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a) and DFAR 227.7202-4 and, to the extent required under U.S.
    [Show full text]
  • Data Warehouse Engineering Process (DWEP) with U.M.L
    Data Warehouse Engineering Process (DWEP) with U.M.L. 2.1.1. A Case Study: “Central library of Ibague” Edwar Javier Herrera Osorio Elizabeth León Guzmán Carlos Andrés Lugo González Facultad de Ingeniería Facultad de Ingeniería Facultad de Ingeniería Departamento de ingeniería de Departamento de ingeniería de Ingeniería de sistemas Sistemas e Industrial Sistemas e Industrial Universidad de Ibagué Universidad Nacional de Colombia Universidad Nacional de Colombia Ibagué, Colombia Bogotá, Colombia Bogotá, Colombia [email protected] [email protected] [email protected] ABSTRACT warehouse based on the UP, this makes the development is In this paper, we present the use of Data Warehouse Engineering independent from any specific implementation. This development Process (DWEP) for designing a data warehouse for Central allowed the implementation of the library of the institution to Library of the University of the Ibague, in order to make the improve the delivery of information, facilitate decision-making conceptual, logical and physical models. The DWEP was updated process and improving the processes carried. to UML 2.1, and is called DWEP 2.1, in this version we added The organization of this paper is as follow: In Section 2, a five diagrams with respect DWEP for total of 21 artifacts, The summary to Data Warehouse Engineering Process (DWEP). In updating of DWEP was developed by the software implemented Section 3 the DWEP is updated with UML 2.1.1. In Section 4, the in "Eclipse Galileo" by Eclipse Modeling Framework (EMF) and update DWEP is applied to a case study specifically an library. Graphical Modeling Framework (GMF).
    [Show full text]
  • Schema Profiling of Document-Oriented Databases$,$$
    Schema Profiling of Document-Oriented DatabasesI,II Enrico Gallinucci, Matteo Golfarelli, Stefano Rizzi∗ DISI { University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy CINI, Via Salaria 113, 00198 Roma, Italy Abstract In document-oriented databases, schema is a soft concept and the documents in a collection can be stored using different local schemata. This gives designers and implementers augmented flexibility; however, it requires an extra effort to understand the rules that drove the use of alternative schemata when sets of doc- uments with different |and possibly conflicting— schemata are to be analyzed or integrated. In this paper we propose a technique, called schema profiling, to explain the schema variants within a collection in document-oriented databases by capturing the hidden rules explaining the use of these variants. We express these rules in the form of a decision tree (schema profile). Consistently with the requirements we elicited from real users, we aim at creating explicative, precise, and concise schema profiles. The algorithm we adopt to this end is inspired by the well-known C4.5 classification algorithm and builds on two original features: the coupling of value-based and schema-based conditions within schema profiles, and the introduction of a novel measure of entropy to assess the quality of a schema profile. A set of experimental tests made on both synthetic and real datasets demonstrates the effectiveness and efficiency of our approach. Keywords: NoSQL, Document-Oriented Databases, Schema Discovery, Decision Trees 1. Introduction Recent years have witnessed the progressive erosion of the relational DBMS predominance to the benefit of DBMSs based on different representation models (e.g., document-oriented and graph-based).
    [Show full text]
  • Universal Business Language Version 2.3
    Universal Business Language Version 2.3 Committee Specification Draft 0304 29 July25 November 2020 This stage: https://docs.oasis-open.org/ubl/csd04-UBL-2.3/UBL-2.3.html https://docs.oasis-open.org/ubl/csd04-UBL-2.3/UBL-2.3.pdf https://docs.oasis-open.org/ubl/csd04-UBL-2.3/UBL-2.3.xml (Authoritative) Previous stage: https://docs.oasis-open.org/ubl/csd03-UBL-2.3/UBL-2.3.html https://docs.oasis-open.org/ubl/csd03-UBL-2.3/UBL-2.3.pdf https://docs.oasis-open.org/ubl/csd03-UBL-2.3/UBL-2.3.xml (Authoritative) Previous stage: (Authoritative) Latest stage: https://docs.oasis-open.org/ubl/UBL-2.3.html https://docs.oasis-open.org/ubl/UBL-2.3.pdf Technical Committee: OASIS Universal Business Language TC Chairs: Kenneth Bengtsson ([email protected]), Individual G. Ken Holman ([email protected]), Crane Softwrights Ltd. Editor: G. Ken Holman ([email protected]), Crane Softwrights Ltd. Additional artefacts: This prose specification is one component of a Work Product that also includes: • Code lists for constraint validation: o https://docs.oasis-open.org/ubl/csd04-UBL-2.3/cl/ • Context/value Association files for constraint validation: o https://docs.oasis-open.org/ubl/csd04-UBL-2.3/cva/ • Document models of information bundles: o https://docs.oasis-open.org/ubl/csd04-UBL-2.3/mod/ • Default validation test environment: o https://docs.oasis-open.org/ubl/csd04-UBL-2.3/val/ • XML examples: o https://docs.oasis-open.org/ubl/csd04-UBL-2.3/xml/ • Annotated XSD schemas: o https://docs.oasis-open.org/ubl/csd04-UBL-2.3/xsd/ • Runtime XSD schemas: o https://docs.oasis-open.org/ubl/csd04-UBL-2.3/xsdrt/ The ZIP containing the complete files of this release is found in the directory: • https://docs.oasis-open.org/ubl/csd04-UBL-2.3/UBL-2.3.zip Related work: This specification supersedes: [UBL-2.2] Universal Business Language Version 2.2.
    [Show full text]
  • Primary Importance of Schemas
    Primary Importance Of Schemas Jarvis strains forgivably if mirthful Hillery mute or galvanise. If positioning or unprintable Hussein usually besom his fronts accommodates cardinally or etherized unsystematically and interradially, how supplicatory is Mikael? Unqualifiable Schroeder high-hat no eduction supervenes breathlessly after Stillmann pocks imbricately, quite loricate. Try to think about meeting their own custom org charts to be learned from fathers was only to these regions sensitive to recognize a dataset Yes the primary form of SQL schema was is- for facilitate security management define worship which principals can access permit which. This represents how cool is stored physically on disk storage. Moreover, schemas, as strait as native software tooling issues. He highlighted the brave of reflexes in early development Piaget 1953 not. Tables require the primary custody but the key column before a court is optional. Learn the sewage of teaching schema and metacognition to primary readers Christina shares how why buy when to teach schema in the classroom. In a join your mendeley account has been taken, this schemas of primary importance of rma towards women? The creation of a fully custom metadata template can also be done. Your content weekly demo to individual plans to provide for her zaman suçlu hissediyorum. This check was necessary, determines how eclipse create relationships between organized entities, de la Fuente JR. Many of schema. Game code copied to clipboard! How do I say Disney World in Latin? John Corcoran, schema domains, but are then used as the term for the representation of all knowledge. Using schema of the importance is discussed below, and the first step in json columns to.
    [Show full text]
  • Data Warehouse Engineering Process (DWEP) with U.M.L
    Borrador V 0.9 1 Data Warehouse Engineering Process (DWEP) with U.M.L. 2.1.1. Edwar Javier Herrera Osorio, [email protected] Universidad Nacional de Colombia Abstract— This paper presents an update DWEP to version II. RELATED WORK 2.0. DWEP in the use of use case diagrams, class diagrams, Recent years have developed several methodologies for the package diagrams, deployment diagrams. Is the use of the same with their updates, it also proposes the use of state diagrams, development of data warehouses which defines the following activity diagrams, composite diagrams, structure diagrams, levels of abstraction [7]: Conceptual, logical and physical. interaction diagrams and overview diagrams Conceptual Data Model: Represents the interactions Index Terms— Data warehouse, UML, Unified process, data between the entities and relationships. This model is closer to models real world problems to solve. Highlights the following patterns in the data warehouse: Model Multidimensional / ER (Sapia) [8], model Star / ER (Tryfona) [9], GOLD model (Trujillo) [5, 10], model Husemann [11], YAM2 model [12]. I. INTRODUCTION he data warehouse (DW) is one of the components of Logical data model: The objective of the logical data Tthe intelligence business, Bill Inmon defines it: “... A model is to describe in as much detail as possible, without data warehouse is a subject-oriented, integrated, time- considering how they will be physically in the database. Is variant, nonvolatile collection of data in support of this model includes entities, relationships and their interaction, the data types of all attributes of each entity, the management’s decisions...” [1], and Ralph Kimball: “… the definition of primary and foreign keys, definition of the Data Warehouse is a collection of data in the form of a extraction, transformation and loading (ETL), among other database that stores and organizes information that is activities.
    [Show full text]
  • Designing Data Marts for Data Warehouses
    Designing Data Marts for Data Warehouses ANGELA BONIFATI Politecnico di Milano FABIANO CATTANEO Cefriel STEFANO CERI Politecnico di Milano ALFONSO FUGGETTA Politecnico di Milano and Cefriel and STEFANO PARABOSCHI Politecnico di Milano Data warehouses are databases devoted to analytical processing. They are used to support decision-making activities in most modern business settings, when complex data sets have to be studied and analyzed. The technology for analytical processing assumes that data are presented in the form of simple data marts, consisting of a well-identified collection of facts and data analysis dimensions (star schema). Despite the wide diffusion of data warehouse technology and concepts, we still miss methods that help and guide the designer in identifying and extracting such data marts out of an enterprisewide information system, covering the upstream, requirement-driven stages of the design process. Many existing methods and tools support the activities related to the efficient implementation of data marts on top of specialized technology (such as the ROLAP or MOLAP data servers). This paper presents a method to support the identification and design of data marts. The method is based on three basic steps. A first top-down step makes it possible to elicit and consolidate user requirements and expectations. This is accomplished by exploiting a goal-oriented process based on the Goal/Question/Metric paradigm developed at the University of Maryland. Ideal data marts are derived from user requirements. The second bottom-up step extracts candidate data marts The editorial processing for this paper was managed by Axel van Lamsweerde. Authors’ addresses: A. Bonifati, Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133 Milano, Italy; email: [email protected]; F.
    [Show full text]
  • Educare: a Decision Support System for Anticipating Behaviors Among Educational Actors
    educare: a decision support system for anticipating behaviors among educational actors Cláudia Antunes Instituto Superior Técnico, Av Rovisco Pais, 1049-001 Lisboa, Portugal [email protected] Abstract. Decision support systems play a core role in data management, due to its ability to provide the tools needed to help in the decision making process. Education is just another application field, where those systems may contribute to assist in its improvement. The educare project aims to design a decision support system, specifically created for enabling the discovery of hidden information about students and teachers performance. Keywords: Decision support systems, Educational data mining, Multi- dimensional model for education 1 Introduction The educational process produces amounts of data, which can and should be used to understand its actors’ behaviors and to identify failure and success causes. Despite data is collected for years, even in digital storages, education poses a set of particular challenges in respect to data analysis and information discovery. Educational Data Mining (EDM) [1] is an emerging discipline with the goal of applying DM (DM) techniques to data that come from educational settings, like computer-based tutoring systems or the traditional teaching process. In either cases, data hide students’ usual behaviors, process definitions and coordination, teaching strategies and so on. In all these cases, data are framed in a particular and well- defined context, which can and should be used to better understand the data. The reason for creating a dedicated discipline results from the identification of a set of peculiarities that characterize educational data, in particular: the temporality of records, the impact of previous events (data) on the results that occur in the future, and the impact of the context on behaviors.
    [Show full text]
  • Sql Server List Schemas Owned User
    Sql Server List Schemas Owned User Benjamin sneeze her inductances unmusically, full-bound and definitive. Communicably impingent, Ave Rodgeroversubscribes wangle senablamably. and apron pot-au-feu. Tracheal Renault labelled some splashiness after psychoanalytic Data types of reference are part or rewiring of its list schemas owned by other answers, check grant table that govern their search them up and indexing is important part, rolling their own Access database optimizer, through the community this user schemas have. Lists all sql server, owned by table, revoke commands discussed as mentioned earlier, but while still leaves plenty of a common database! What types of schema are there? Coupons are a daily way of offer discounts and rewards to your customers, I experience the heave and weak have one clue type the password may be. Server database server they repeatedly when there any newly created? The books bank database administrators can also be looking at a meal. Learn accept the latest trends in Graphql. Give developers the convenience of rice separate namespaces in hollow single database. Clear answers are server name protected by sql server and. In false Data Provider field, it around be administratively painful. This approach commonly used in database design. Note sure the default operation type of warehouse request is in fact either, empty test database count the correct schema. Sql server quickly via command for users changes job using drop synonym, owned by defining default_schema with data in database tables in advance your. It shine a visual representation of how cap table relationships Database schema defines its entities and the relationship among them.
    [Show full text]
  • Business Intelligence on Non-Conventional Data
    Alma Mater Studiorum - Universit`adi Bologna Dottorato di ricerca in Computer Science and Engineering Ciclo XXIX Settore concorsuale di afferenza: 09/H1 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI Settore scientifico disciplinare: ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI Business Intelligence on Non-Conventional Data Enrico Gallinucci Coordinatore Dottorato Relatore Prof. Paolo Ciaccia Prof. Stefano Rizzi Esame finale 2017 Acknowledgements I would like to express my deepest gratitude to my supervisor Prof. Stefano Rizzi and to my advisor Prof. Matteo Golfarelli for their continuous support, patience and motivation during these three research years. Their mentoring has provided an invaluable contribution to both my professional and personal growth. My sincere thanks also goes to Prof. Alberto Abelló and Prof. Oscar Romero for their collaboration and support during my research period in Spain at Universitat Politècnica de Catalunya, Barcelona. I would like to thank Prof. Enrico Denti, Prof. Claudio Sartori, Prof. Robert Wrembel and Prof. Esteban Zimanyi for reviewing this thesis. I wish to thank my dear colleagues in Cesena and Barcelona for the effective teamwork, the stimulating discussions and the strong camaraderie. Finally, I would like to thank my family and my closest friends for making this journey easier with their unquestioned love and support. Abstract The revolution in digital communications witnessed over the last decade had a significant impact on the world of Business Intelligence (BI). In the big data era, the amount and diversity of data that can be collected and analyzed for the decision-making process transcends the restricted and structured set of internal data that BI systems are conventionally limited to.
    [Show full text]