Emergent Relational Schemas for RDF

Total Page:16

File Type:pdf, Size:1020Kb

Emergent Relational Schemas for RDF Emergent Relational Schemas for RDF Minh Duc Pham Committee prof.dr. Frank van Harmelen prof.dr. Martin Kersten prof.dr. Josep Lluis Larriba Pey prof.dr. Thomas Neumann dr. Jacopo Urbani The research reported in this thesis has been partially carried out at CWI, the Dutch National Research Laboratory for Mathematics and Computer Science, within the theme Database Architectures. The research reported in this thesis has been partially carried out as part of the continuous research and development of the MonetDB open-source database man- agement system. SIKS Dissertation Series No. 2018-19 The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. The cover was designed by the author. Photo by Leo Rivas on Unsplash. The printing and binding of this dissertation was carried out by Ipskamp Printing. ISBN 978-94-028-1110-0 VRIJE UNIVERSITEIT Emergent Relational Schemas for RDF ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. V. Subramaniam, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de Faculteit der Bètawetenschappen op donderdag 6 september 2018 om 15.45 uur in de aula van de universiteit, De Boelelaan 1105 door Minh Duc Pham geboren te Bac Ninh, Vietnam promotor: prof.dr. P.A. Boncz copromotor: prof.dr. S. Manegold Tặng bố mẹ của con vì tình yêu thương vô bờ bến, Tặng em yêu vì bao gian khổ, ngọt ngào và con trai - nguồn vui vô tận ... Abstract The main semantic web data model, RDF, has been gaining significant traction in various domains such as the life sciences and publishing, and has become the unri- valed standard behind the vision of global data standardization and interoperability over the web. This data model provides the necessary flexibility for users to repre- sent and evolve data without prior need of a schema, so that the global RDF graph (the semantic web) can be extended by everyone in a grass-roots and pay-as-you-go way. However, as identified in this thesis, this flexibility which de-emphasizes the need for a schema and the notion of structure in the RDF data poses a number of data management issues in systems that manage large amounts of RDF data. Specif- ically, it leads to (i) query plans with excessive join complexity which are difficult to optimize, (ii) low data locality which blocks the use of advanced relational phys- ical storage optimizations such as clustered indexing, data partitioning, and (iii) a lack of schema insight which makes it harder for end-users to write SPARQL queries with non-empty-results. This thesis addresses all three problems. We uncover and exploit the fact that real RDF data, while not as regularly structured as relational data, still has the great majority of triples conforming to regular patterns. Recognizing this structure in- formation allows RDF stores to become both more efficient and easier to use. An important take-away from this thesis is that the notion of “schema” is understood differently in semantic web than in databases. In semantic web “schema” refers to ontologies and vocabularies which are used to describe entities in terms of their properties and relationships in a generic manner, that is valuable across many differ- ent application contexts and datasets. In databases, “schema” means the properties of data stored in a single database. We argue both different notions of schema are valuable. Semantic schemas could be a valuable addition to relational databases, such that the semantics of a table (the entity it may represent) and of its columns and relationships is made explicit. This can facilitate data integration. Relational schemas are valuable for semantic web data, such that RDF stores can better orga- nize data on disk and in memory, SPARQL engines can do better optimizations, and SPARQL users can better understand the nature of an RDF dataset. This the- sis concentrates on these latter points. Concretely, we propose novel techniques to automatically derive a so-called emergent relational schema from an RDF dataset that recovers a compact and precise relational schema with high triple coverage and short human-readable labels. Beyond the use of the derived emergent relational schema for conveying the structure information of RDF dataset to users and al- lowing humans to understand RDF dataset better, we have exploited this emergent schema internally inside the RDF system (in storage, optimization, and execution) to make RDF stores more efficient. In particular, using emergent relational schema allows to make RDF storages more compact and faster-to-access, and helps reduc- ing the number of joins (i.e., self-joins) needed in SPARQL query execution as well as the complexity of query optimization, showing significant performance improve- ment in RDF systems. This approach opens a promising direction in developing efficient RDF stores which can bring RDF-based systems on par with relational- based systems in terms of performance without losing any of the flexibility offered by the RDF model. Besides the contributions on developing high performance RDF stores using the automatically derived emergent relational schema, in this thesis, we also pro- vided insights and materials for evaluating the performance and technical chal- lenges of RDF/graph systems. Particularly, we developed a scalable graph data generator which can generate synthetic RDF/graph data having skewed data dis- tributions and plausible structural correlations of a real social network. This data generator, by leveraging parallelism though the Hadoop/MapReduce paradigm, can generate a social network structure with billions of user profiles, enriched with interests/tags, posts, and comments using a cluster of commodity hardwares. The generated data also exhibited interesting realistic value correlations (e.g., names vs countries), structural correlations (e.g., friendships vs location), and statistical distributions (e.g., power-law distribution) akin to a real social network such as Facebook. Furthermore, the data generator has been extended and become a core ingredient of an RDF/graph benchmark, LDBC Social Network Benchmark (SNB), which is designed to evaluate technical challenges and solutions in RDF/graph sys- tems. Samenvatting Het semantische web-datamodel, RDF, wordt in toenemende gebruikt in verschil- lende domeinen zoals de life sciences en publishing, en is uitgegroeid tot standaard voor wereldwijde gegevensstandaardisatie en interoperabiliteit. RDF biedt flexi- biliteit voor gebruikers om gegevens weer te geven en te ontwikkelen zonder dat daar een schema voor nodig is, zodat de wereldwijde RDF-graaf (het “semantische web”) door iedereen kan worden uitgebreid op eigen initiatief. Deze flexibiliteit brengt een aantal problemen met zich mee in systemen die grote hoeveelheden RDF-gegevens beheren, omdat het de behoefte aan een schema en het begrip van structuur in de RDF-gegevens minder benadrukt. In de eerste plaats verhoogt het ontbreken van schema-informatie de complexiteit van query-optimalisatie, zodat in de praktijk RDF database-systemen een veel kleiner gedeelte van de zoekruimte kunnen bekijken, en er slechtere en dus veel langzamere query-plannen gevon- den worden. Daarnaast zorgt de lage gegevenslokaliteit ervoor dat het gebruik van geavanceerde fysieke opslagoptimalisaties voor relationele databases, zoals geclus- terde indexering en gegevenspartitionering, niet mogelijk is. Tot slot is het door een gebrek aan schema-inzicht moeilijk voor eindgebruikers om goede SPARQL queries te schrijven. Dit proefschrift gaat in op elk van deze drie problemen. We ontdekken en exploiteren het feit dat echte RDF datasets in vrij hoge mate tabu- lair gestructureerd zijn. Het automatisch herkennen van zulke structuur maakt het mogelijk RDF-opslag efficienter¨ en gebruiksvriendelijker te maken. Een belangrijke constatering van dit proefschrift is dat aan het begrip “schema” een verschillende betekenis toegekend wordt in het semantisch web dan in databases. Binnen het semantisch web verwijst “schema” naar ontologieen¨ en vocabulaires die worden gebruikt om concepten op een generieke manier te beschrijven, zodat die concepten in vele situaties en toepassingen (her-)bruikbaar zijn. In databases verwijst “schema” naar iets heel anders, namelijk naar de specifieke structuur van gegevens in een enkele dataset. Wij betogen dat beide betekenissen van een schema waardevol zijn. Semantische schema’s zouden een waardevolle toevoeging kunnen zijn aan relationele databases: de semantiek van een tabel (de entiteit die het kan vertegenwoordigen) en van zijn kolommen en relaties wordt expliciet gemaakt. Dit kan de integratie van gegevens uit verschillende databases vergemakkelijken. Rela- tionele schema’s zijn ook waardevol voor semantische webgegevens: de opslag van RDF-gegevens op een schijf of in geheugen kan er beter mee georganiseerd worden zodat RDF-databases betere optimalisaties kunnen uitvoeren, en gebruikers kunnen beter begrijpen welke attributen werkelijk in een RDF-dataset aanwezig zijn. Dit proefschrift stelt nieuwe technieken voor om automatisch een zogenaamd “emergent” relationeel schema af te leiden van een RDF-dataset. Het resultaat is een compact en nauwkeurig relationeel schema waarin tabellen, kolommen en re- laties korte namen krijgen die makkelijk voor mensen leesbaar zijn. Dit emergente relationele schema is niet alleen nuttig om mensen de structuur RDF-gegevens beter te laten
Recommended publications
  • Mapping Spatiotemporal Data to RDF: a SPARQL Endpoint for Brussels
    International Journal of Geo-Information Article Mapping Spatiotemporal Data to RDF: A SPARQL Endpoint for Brussels Alejandro Vaisman 1, * and Kevin Chentout 2 1 Instituto Tecnológico de Buenos Aires, Buenos Aires 1424, Argentina 2 Sopra Banking Software, Avenue de Tevuren 226, B-1150 Brussels, Belgium * Correspondence: [email protected]; Tel.: +54-11-3457-4864 Received: 20 June 2019; Accepted: 7 August 2019; Published: 10 August 2019 Abstract: This paper describes how a platform for publishing and querying linked open data for the Brussels Capital region in Belgium is built. Data are provided as relational tables or XML documents and are mapped into the RDF data model using R2RML, a standard language that allows defining customized mappings from relational databases to RDF datasets. In this work, data are spatiotemporal in nature; therefore, R2RML must be adapted to allow producing spatiotemporal Linked Open Data.Data generated in this way are used to populate a SPARQL endpoint, where queries are submitted and the result can be displayed on a map. This endpoint is implemented using Strabon, a spatiotemporal RDF triple store built by extending the RDF store Sesame. The first part of the paper describes how R2RML is adapted to allow producing spatial RDF data and to support XML data sources. These techniques are then used to map data about cultural events and public transport in Brussels into RDF. Spatial data are stored in the form of stRDF triples, the format required by Strabon. In addition, the endpoint is enriched with external data obtained from the Linked Open Data Cloud, from sites like DBpedia, Geonames, and LinkedGeoData, to provide context for analysis.
    [Show full text]
  • SQL Server Column Store Indexes Per-Åke Larson, Cipri Clinciu, Eric N
    SQL Server Column Store Indexes Per-Åke Larson, Cipri Clinciu, Eric N. Hanson, Artem Oks, Susan L. Price, Srikumar Rangarajan, Aleksandras Surna, Qingqing Zhou Microsoft {palarson, ciprianc, ehans, artemoks, susanpr, srikumar, asurna, qizhou}@microsoft.com ABSTRACT SQL Server column store indexes are “pure” column stores, not a The SQL Server 11 release (code named “Denali”) introduces a hybrid, because they store all data for different columns on new data warehouse query acceleration feature based on a new separate pages. This improves I/O scan performance and makes index type called a column store index. The new index type more efficient use of memory. SQL Server is the first major combined with new query operators processing batches of rows database product to support a pure column store index. Others greatly improves data warehouse query performance: in some have claimed that it is impossible to fully incorporate pure column cases by hundreds of times and routinely a tenfold speedup for a store technology into an established database product with a broad broad range of decision support queries. Column store indexes are market. We’re happy to prove them wrong! fully integrated with the rest of the system, including query To improve performance of typical data warehousing queries, all a processing and optimization. This paper gives an overview of the user needs to do is build a column store index on the fact tables in design and implementation of column store indexes including the data warehouse. It may also be beneficial to build column enhancements to query processing and query optimization to take store indexes on extremely large dimension tables (say more than full advantage of the new indexes.
    [Show full text]
  • When Relational-Based Applications Go to Nosql Databases: a Survey
    information Article When Relational-Based Applications Go to NoSQL Databases: A Survey Geomar A. Schreiner 1,* , Denio Duarte 2 and Ronaldo dos Santos Mello 1 1 Departamento de Informática e Estatística, Federal University of Santa Catarina, 88040-900 Florianópolis - SC, Brazil 2 Campus Chapecó, Federal University of Fronteira Sul, 89815-899 Chapecó - SC, Brazil * Correspondence: [email protected] Received: 22 May 2019; Accepted: 12 July 2019; Published: 16 July 2019 Abstract: Several data-centric applications today produce and manipulate a large volume of data, the so-called Big Data. Traditional databases, in particular, relational databases, are not suitable for Big Data management. As a consequence, some approaches that allow the definition and manipulation of large relational data sets stored in NoSQL databases through an SQL interface have been proposed, focusing on scalability and availability. This paper presents a comparative analysis of these approaches based on an architectural classification that organizes them according to their system architectures. Our motivation is that wrapping is a relevant strategy for relational-based applications that intend to move relational data to NoSQL databases (usually maintained in the cloud). We also claim that this research area has some open issues, given that most approaches deal with only a subset of SQL operations or give support to specific target NoSQL databases. Our intention with this survey is, therefore, to contribute to the state-of-art in this research area and also provide a basis for choosing or even designing a relational-to-NoSQL data wrapping solution. Keywords: big data; data interoperability; NoSQL databases; relational-to-NoSQL mapping 1.
    [Show full text]
  • Data Dictionary a Data Dictionary Is a File That Helps to Define The
    Cleveland | v. 216.369.2220 • Columbus | v. 614.291.8456 Data Dictionary A data dictionary is a file that helps to define the organization of a particular database. The data dictionary acts as a description of the data objects or items in a model and is used for the benefit of the programmer or other people who may need to access it. A data dictionary does not contain the actual data from the database; it contains only information for how to describe/manage the data; this is called metadata*. Building a data dictionary provides the ability to know the kind of field, where it is located in a database, what it means, etc. It typically consists of a table with multiple columns that describe relationships as well as labels for data. A data dictionary often contains the following information about fields: • Default values • Constraint information • Definitions (example: functions, sequence, etc.) • The amount of space allocated for the object/field • Auditing information What is the data dictionary used for? 1. It can also be used as a read-only reference in order to obtain information about the database 2. A data dictionary can be of use when developing programs that use a data model 3. The data dictionary acts as a way to describe data in “real-world” terms Why is a data dictionary needed? One of the main reasons a data dictionary is necessary is to provide better accuracy, organization, and reliability in regards to data management and user/administrator understanding and training. Benefits of using a data dictionary: 1.
    [Show full text]
  • Open Web Ontobud: an Open Source RDF4J Frontend
    Open Web Ontobud: An Open Source RDF4J Frontend Francisco José Moreira Oliveira University of Minho, Braga, Portugal [email protected] José Carlos Ramalho Department of Informatics, University of Minho, Braga, Portugal [email protected] Abstract Nowadays, we deal with increasing volumes of data. A few years ago, data was isolated, which did not allow communication or sharing between datasets. We live in a world where everything is connected, and our data mimics this. Data model focus changed from a square structure like the relational model to a model centered on the relations. Knowledge graphs are the new paradigm to represent and manage this new kind of information structure. Along with this new paradigm, a new kind of database emerged to support the new needs, graph databases! Although there is an increasing interest in this field, only a few native solutions are available. Most of these are commercial, and the ones that are open source have poor interfaces, and for that, they are a little distant from end-users. In this article, we introduce Ontobud, and discuss its design and development. A Web application that intends to improve the interface for one of the most interesting frameworks in this area: RDF4J. RDF4J is a Java framework to deal with RDF triples storage and management. Open Web Ontobud is an open source RDF4J web frontend, created to reduce the gap between end users and the RDF4J backend. We have created a web interface that enables users with a basic knowledge of OWL and SPARQL to explore ontologies and extract information from them.
    [Show full text]
  • A Performance Study of RDF Stores for Linked Sensor Data
    Semantic Web 1 (0) 1–5 1 IOS Press 1 1 2 2 3 3 4 A Performance Study of RDF Stores for 4 5 5 6 Linked Sensor Data 6 7 7 8 Hoan Nguyen Mau Quoc a,*, Martin Serrano b, Han Nguyen Mau c, John G. Breslin d , Danh Le Phuoc e 8 9 a Insight Centre for Data Analytics, National University of Ireland Galway, Ireland 9 10 E-mail: [email protected] 10 11 b Insight Centre for Data Analytics, National University of Ireland Galway, Ireland 11 12 E-mail: [email protected] 12 13 c Information Technology Department, Hue University, Viet Nam 13 14 E-mail: [email protected] 14 15 d Confirm Centre for Smart Manufacturing and Insight Centre for Data Analytics, National University of Ireland 15 16 Galway, Ireland 16 17 E-mail: [email protected] 17 18 e Open Distributed Systems, Technical University of Berlin, Germany 18 19 E-mail: [email protected] 19 20 20 21 21 Editors: First Editor, University or Company name, Country; Second Editor, University or Company name, Country 22 Solicited reviews: First Solicited Reviewer, University or Company name, Country; Second Solicited Reviewer, University or Company name, 22 23 Country 23 24 Open reviews: First Open Reviewer, University or Company name, Country; Second Open Reviewer, University or Company name, Country 24 25 25 26 26 27 27 28 28 29 Abstract. The ever-increasing amount of Internet of Things (IoT) data emanating from sensor and mobile devices is creating 29 30 new capabilities and unprecedented economic opportunity for individuals, organisations and states.
    [Show full text]
  • USER GUIDE Optum Clinformatics™ Data Mart Database
    USER GUIDE Optum Clinformatics Data Mart Database 1 | P a g e TABLE OF CONTENTS TOPIC PAGE # 1. REQUESTING DATA 3 Eligibility 3 Forms 3 Contact Us 4 2. WHAT YOU WILL NEED 4 SAS Software 4 VPN 5 3. ABSTRACTS, MANUSCRIPTS, THESES, AND DISSERTATIONS 5 Referencing Optum Data 5 Optum Review 5 4. DATA USER SET-UP AND ACCESS INFORMATION 6 Server Log-In After Initial Set-Up 6 Server Access 6 Establishing a Connection to Enterprise Guide 7 Instructions to Add SAS EG to the Cleared Firewall List 8 How to Proceed After Connection 8 5. BEST PRACTICES FOR DATA USE 9 Saving Programs and Back-Up Procedures 9 Recommended Coding Practices 9 6. APPENDIX 11 Version Date: 27-Feb-17 2 | P a g e Optum® ClinformaticsTM Data Mart Database The Optum® ClinformaticsTM Data Mart is an administrative health claims database from a large national insurer made available by the University of Rhode Island College of Pharmacy. The statistically de-identified data includes medical and pharmacy claims, as well as laboratory results, from 2010 through 2013 with over 22 million commercial enrollees. 1. REQUESTING DATA The following is a brief outline of the process for gaining access to the data. Eligibility Must be an employee or student at the University of Rhode Island conducting unfunded or URI internally funded projects. Data will be made available to the following users: 1. Faculty and their research team for projects with IRB approval. 2. Students a. With a thesis/dissertation proposal approved by IRB and the Graduate School (access request form, see link below, must be signed by Major Professor).
    [Show full text]
  • Storage, Indexing, Query Processing, And
    Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 May 2020 doi:10.20944/preprints202005.0360.v1 STORAGE,INDEXING,QUERY PROCESSING, AND BENCHMARKING IN CENTRALIZED AND DISTRIBUTED RDF ENGINES:ASURVEY Waqas Ali Department of Computer Science and Engineering, School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, Shanghai, China [email protected] Muhammad Saleem Agile Knowledge and Semantic Web (AKWS), University of Leipzig, Leipzig, Germany [email protected] Bin Yao Department of Computer Science and Engineering, School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, Shanghai, China [email protected] Axel-Cyrille Ngonga Ngomo University of Paderborn, Paderborn, Germany [email protected] ABSTRACT The recent advancements of the Semantic Web and Linked Data have changed the working of the traditional web. There is a huge adoption of the Resource Description Framework (RDF) format for saving of web-based data. This massive adoption has paved the way for the development of various centralized and distributed RDF processing engines. These engines employ different mechanisms to implement key components of the query processing engines such as data storage, indexing, language support, and query execution. All these components govern how queries are executed and can have a substantial effect on the query runtime. For example, the storage of RDF data in various ways significantly affects the data storage space required and the query runtime performance. The type of indexing approach used in RDF engines is key for fast data lookup. The type of the underlying querying language (e.g., SPARQL or SQL) used for query execution is a key optimization component of the RDF storage solutions.
    [Show full text]
  • Activant Prophet 21
    Activant Prophet 21 Understanding Prophet 21 Databases This class is designed for… Prophet 21 users that are responsible for report writing System Administrators Operations Managers Helpful to be familiar with SQL Query Analyzer and how to write basic SQL statements Objectives Explain the difference between databases, tables and columns Extract data from different areas of the system Discuss basic SQL statements using Query Analyzer Use Prophet 21 to gather SQL Information Utilize Data Dictionary This course will NOT cover… Basic Prophet 21 functionality Seagate’s Crystal Reports Definitions Columns Contains a single piece of information (fields) Record (rows) One complete set of columns Table A collection of rows View Virtual table created for easier data extraction Database Collection of information organized for easy selection of data SQL Query A graphical user interface used to extract Analyzer data SQL Query Analyzer Accessed through Microsoft SQL Server SQL Server Tools Enterprise Manager Perform administrative functions such as backing up and restoring databases and maintaining SQL Logins Profiler Run traces of activity in your system Basic SQL Commands sp_help sp_help <table_name> select * from <table_name> Select <field_name> from <table_name> Most Common Reporting Areas Address and Contact tables Order Processing Inventory Purchasing Accounts Receivable Accounts Payable Production Orders Address and Contact tables Used in conjunction with other tables These tables hold every address/contact in the
    [Show full text]
  • Uniform Data Access Platform for SQL and Nosql Database Systems
    Information Systems 69 (2017) 93–105 Contents lists available at ScienceDirect Information Systems journal homepage: www.elsevier.com/locate/is Uniform data access platform for SQL and NoSQL database systems ∗ Ágnes Vathy-Fogarassy , Tamás Hugyák University of Pannonia, Department of Computer Science and Systems Technology, P.O.Box 158, Veszprém, H-8201 Hungary a r t i c l e i n f o a b s t r a c t Article history: Integration of data stored in heterogeneous database systems is a very challenging task and it may hide Received 8 August 2016 several difficulties. As NoSQL databases are growing in popularity, integration of different NoSQL systems Revised 1 March 2017 and interoperability of NoSQL systems with SQL databases become an increasingly important issue. In Accepted 18 April 2017 this paper, we propose a novel data integration methodology to query data individually from different Available online 4 May 2017 relational and NoSQL database systems. The suggested solution does not support joins and aggregates Keywords: across data sources; it only collects data from different separated database management systems accord- Uniform data access ing to the filtering options and migrates them. The proposed method is based on a metamodel approach Relational database management systems and it covers the structural, semantic and syntactic heterogeneities of source systems. To introduce the NoSQL database management systems applicability of the proposed methodology, we developed a web-based application, which convincingly MongoDB confirms the usefulness of the novel method. Data integration JSON ©2017 Elsevier Ltd. All rights reserved. 1. Introduction solution to retrieve data from heterogeneous source systems and to deliver them to the user.
    [Show full text]
  • Product Master Data Management Reference Guide
    USAID GLOBAL HEALTH SUPPLY CHAIN PROGRAM Procurement and Supply Management PRODUCT MASTER DATA MANAGEMENT REFERENCE GUIDE Version 1.0 February 2020 The USAID Global Health Supply Chain Program-Procurement and Supply Management (GHSC-PSM) project is funded under USAID Contract No. AID-OAA-I-15-0004. GHSC-PSM connects technical solutions and proven commercial processes to promote efficient and cost-effective health supply chains worldwide. Our goal is to ensure uninterrupted supplies of health commodities to save lives and create a healthier future for all. The project purchases and delivers health commodities, offers comprehensive technical assistance to strengthen national supply chain systems, and provides global supply chain leadership. GHSC-PSM is implemented by Chemonics International, in collaboration with Arbola Inc., Axios International Inc., IDA Foundation, IBM, IntraHealth International, Kuehne + Nagel Inc., McKinsey & Company, Panagora Group, Population Services International, SGS Nederland B.V., and University Research Co., LLC. To learn more, visit ghsupplychain.org DISCLAIMER: The views expressed in this publication do not necessarily reflect the views of the U.S. Agency for International Development or the U.S. government. Contents Acronyms ....................................................................................................................................... 3 Executive Summary ...................................................................................................................... 4 Background
    [Show full text]
  • Master Data Management: Dos & Don’Ts
    Master Data Management: dos & don’ts Keesjan van Unen, Ad de Goeij, Sander Swartjes, and Ard van der Staaij Master Data Management (MDM) is high on the agenda for many organizations. At Board level too, people are now fully aware that master data requires specific attention. This increasing attention is related to the concern for the overall quality of data. By enhancing the quality of master data, disrup- tions to business operations can be limited and therefore efficiency is increased. Also, the availability of reliable management information will increase. However, reaching this goal is not an entirely smooth process. MDM is not rocket science, but it does have specific problems that must be addressed in the right way. This article describes three of these attention areas, including several dos and don’ts. C.J.W.A. van Unen is a senior manager at KPMG Management Consulting, IT Advisory. [email protected] A.S.M de Goeij RE is a manager at KPMG Management Consulting, IT Advisory. [email protected] S. Swartjes is an advisor at KPMG Risk Consulting, IT Advisory. [email protected] A.J. van der Staaij is an advisor at KPMG Risk Consulting, IT Advisory. [email protected] Compact_ 2012 2 International edition 1 Treat master data as an asset Introduction Do not go for gold immediately Every organization has to deal with master data, and seeks Although it may be very tempting, you should not go for ways to manage the quality of this specific group of data in the ‘ideal model’ when (re)designing the MDM organiza- an effective and efficient manner.
    [Show full text]