ATHABASCA UNIVERSITY
Applying Fuzzy Logic for Data Governance BY XiaoHai Lu
A project submitted in partial fulfillment Of the requirements for the degree of MASTER OF SCIENCE in INFORMATION SYSTEMS
Athabasca, Alberta November, 2014
© XiaoHai Lu, 2014
DEDICATION
This essay is dedicated to my supported wife Winnie and my boys Andrew and Michale. ABSTRACT
Every day, as we browse the internet, we consume big data from the various search engines and social networks that we visit. Like individuals, enterprises also confront a vast stream of information from individuals, communities, corporations, and governments. With vast volumes of information, long retention cycles and high velocity decision-making has the potential to derail the usefulness of information and do more damage than good to enterprises. The axiom 'better data means better decisions' becomes critical. Without solid data governance in place, data can be inaccurate and unfit for usage.
This essay will describe the history and future of data governance. It will also explain the current process of data governance before demonstrating a prototype of a data governance application in the banking industry.
Data governance processes such as matching and linking related records require mathematical support in the decision-making process. Fuzzy logic, which is a approach to computing that is based on varying degrees of truth, was found to be a good solution to this issue. As such, this essay successfully applies fuzzy logic to overcome and improve the process, reduce human intervention, and improve the data quality of data governance processes.
3 ACKNOWLEDGMENTS
I thank all who were involved in the support and review process of this book. Without their support, the essay could not have been satisfactorily completed.
Thanks go to all those who provided their insightful and constructive comments, in particular, to professor Richard Huntrods of Athabasca University who provided priceless suggestions and feedback on my essay.
4 Applying Fuzzy Logic for Data Governance
Table of Contents DEDICATION...... 2 ABSTRACT...... 3 ACKNOWLEDGMENTS...... 4 CHAPTER 1 – INTRODUCTION...... 7 Data Governance: The History...... 7 Data Governance: The current literature on the topic...... 8 Data Governance: The Future...... 9 CHAPTER2 – DATA GOVERNANCE PROCESS...... 11 Data Governance Process...... 11 CHAPTER 3 – ISSUES, CHALLENGES AND TRENDS...... 43 The Potential Overlay Task:...... 43 Match Duplicate Suspects to Create a New Master Record:...... 44 Link Related Records from Multiple Sources:...... 45 CHAPTER4 – FUZZY LOGIC...... 48 Traditional Logic:...... 48 Fuzzy Logic History...... 51 The Basic Concept of Fuzzy Logic ...... 52 A Fuzzy Implementation:...... 52 Brief Discussion:...... 57 CHAPTER 5 - CONCLUSIONS...... 57 References...... 58
5 Applying Fuzzy Logic for Data Governance
List of Figures
Figure 1: Data Governance Process...... 11 Figure 2: MDM Process...... 20 Figure 3: MDM Initial Load Process...... 24 Figure 4: MDM Delta Load Process...... 26 Figure 5: Quality Stage Initial Load Process...... 29 Figure 6: Quality Stage Delta Load Process...... 29 Figure 7: Case 5...... 43 Figure 8: Case 3...... 45 Figure 9: Case 2...... 46 Figure 10: Cases ...... 47 Figure 11: Training Set...... 49 Figure 12: Traditional Decision Tree...... 51 Figure 13: Fuzzy MF...... 52 Figure 14: Traditional Decision Tree...... 55 Figure 15: Decision Matrix...... 56
6 Applying Fuzzy Logic for Data Governance
CHAPTER 1 ± INTRODUCTION
Data Governance: The History
Data governance is an emerging discipline with an ever evolving definition. The discipline embodies a convergence of data quality, data management, data policies, business process management, and risk management surrounding the handling of data in an organization.1 The central point of this definition of data governance is related to data quality.
From the point of view of businesses, data governance needs to be able to provide qualified information. The data governance process is the practice of transforming data into qualified information, which can be used by businesses. Incidentally, the concept of data governance has been around since the beginning of relational databases. Data is stored across referenced tables. Businesses can retrieve information by joining the data through cross referencing those tables. With the growth of information technology, databases are gradually becoming central part of information systems. In order to insert qualified data into databases, data governance is extended from databases into a set of processes which are defined as extracting, transforming, and loading (ETL) areas in order to provide databases with clean, accurate, and prompt data feeds. New terms such as metadata, data source, target, and staging are emerging with the ETL approach. There are numerous ETL tools available on the market such as Informatica and Ab initio. However, the motivation for ETL comes from an information technology (IT) perspective and focuses on IT techniques. In 2004, IBM started to introduce data governance as a discipline for treating data as an enterprise asset, 3. As a financial asset, data has to be treated like other financial assets — just as one would treat a plant and equipment. Data inventory is required for enterprises with existing data, in as much
7 Applying Fuzzy Logic for Data Governance the same way as inventories are needed for physical assets. Preventing unauthorized data changes for critical data, should also be considered since this can affect the integrity of financial reporting, as well as the quality and reliability of daily business decisions.3 Protecting sensitive data and intellectual information property from both internal and external threats is also another element that falls under data governance. Since data is a business asset, the question of how to maximize its value is also under the umbrella of data governance.
Data Governance: The current literature on the topic
As an emerging form of technology, data governance has been mainly supported by business vendors rather than academic research. For example, performing an query on the subject
“data governance” on the ACM Digital library (Association for Computing Machinery) only yields 2824 results (queried on Aug 27, 2014). In contrast, when the same query is performed on Google, 36,200,000 results are yielded (queried on Aug 27, 2014). The technologies pushed by business vendors share common challenges, such as having broad fundamental concepts with aspects being emphasized differently by each vendor. For example, Oracle does not buy the unified processes introduced by IBM white paper. In addition, challenges include that the concepts of data governance and practices are still shadowed by their precedences such as ETL, data warehouse, and ERP products. “MDM is effectively Data
Warehousing branded with ERP market rhetoric and contains an added repository of 'master data'. We see MDM as another attempt at data integration due to the failure of previous Data
Warehousing, ERP and ERPII/BI initiatives.” 17 Although many companies prefer specialized
MDM solutions, the three main players in the MDM market are IBM, Oracle and SAP.
8 Applying Fuzzy Logic for Data Governance
Data Governance: The Future
Data governance is constantly evolving and morphing into new forms. This process of evolving has resulted in the next generation of data that is beginning to enter companies.
Different from traditional data, next generation data will be a part of companies' daily routine.
For example, when we make a cellphone call, the relationship data (which includes the callers' name, phone number, and location) will have been collected. Likewise, the transactional data (which includes the time of the call and the duration of the call) will have been collected as well. Such kinds of big data are not limited to mobile data, GPS coordinates, location awareness data, and social interactions such as LinkedIn and
Facebook. The way that next generation data is captured through the cloud will definitely change the way we deal with traditional data. It's one thing to be flooded with big data; it's another thing to be able to make sense of it and then be able to act on it or make recommendations for a human or another system to act on it.6 Big data by itself is merely unstructured data, as we have to analyze the data in order to understand it. MDM and data governance processes will make the analysis more efficient. Through data governance's identity resolution, we can have a single view of an entire company's data. With data governance, we will not be drawn by next generation big data; however, we can understand their relationship and react on it quickly.
Big data and the cloud, which generates and delivers real time data, will require us to react in real time, while next generation data governance will help us with understanding and reacting to real time data.
In addition, unlike traditional data, big data may be owned by a number of brokers or a third- party. The next generation data governance process should also have the ability to accept
9 Applying Fuzzy Logic for Data Governance different protocols.
10 Applying Fuzzy Logic for Data Governance
CHAPTER2 – DATA GOVERNANCE PROCESS
Data Governance Process
Below is a diagram detailing the process of data governance by IBM: 6
Figure 1: Data Governance Process
Note. Descriptive note. Adapted from “ The IBM Data Governance Unified Process” by Sunil Soares, 2010, p8 Copyright 2010 by MC Press
Online,LLC. Adapted with permission
1) Define the business problem
The main reason for the failure of data governance programs is that they do not identify a tangible business problem. It is imperative that the organization defines the initial scope of the
11 Applying Fuzzy Logic for Data Governance data governance program around a specific business problem, such as a failed audit, a data breach, or the need for improved data quality for risk- management purposes. Once the data governance program begins to tackle the identified business problems, it will receive support from the business functions to extend its scope to additional areas.
2) Obtain executive sponsorship
It is important to establish sponsorship from key IT and business executives for the data governance program. The best way to obtain this sponsorship is to establish value in terms of a business case and quick hits. For example, the business case might be focused on house holding and name-matching in order to improve the quality of data to support a customer- centricity program.
3) Conduct a maturity assessment
Every organization needs to conduct an assessment of its data governance maturity, preferably on an annual basis. The IBM Data Governance Council has developed a maturity model based on 11 categories (discussed in Chapter 5), such as Data Risk Management and
Compliance, Value Creation, and Stewardship. The data governance organization needs to assess the company’s current level of maturity (current state) and the desired future level of maturity (future state). The company's future state is usually projected at a time frame spanning 12 to 18 months ahead. This duration must be long enough to produce results.
However, at the same time, it must be short enough to ensure the continued buy-in from key stakeholders.
12 Applying Fuzzy Logic for Data Governance
4) Build a road map
The data governance organization needs to develop a roadmap to bridge the gap between the current state and the desired future state for the eleven categories of data governance maturity. For example, the data governance organization might review the maturity gap for stewardship and determine that the enterprise needs to appoint data stewards who will focus on targeted subject areas such as the customer, vendor, and product. The data governance program also needs to include quick hit areas where the initiative can drive near-term business value.
5) Establish an organizational blueprint
The data governance organization needs to build a charter to govern its operations, and to ensure that it has enough authority to act as a tiebreaker in critical situations. Data governance organizations operate best in a three-tier format. The top tier is the data governance council, which consists of the key functional business leaders who rely on data as an enterprise asset. The middle tier is the data governance working group,which consists of middle managers. The final tier consists of the data stewardship community, which is responsible for the quality of the data on a day-to-day basis.
6) Build a data dictionary
The effective management of business terms can help ensure that the same descriptive
13 Applying Fuzzy Logic for Data Governance language applies throughout the organization. A data dictionary or business glossary is a repository with definitions of key terms. It is used to gain consistency and agreement between the technical and business sides of an organization. For example, what is the definition of a
“customer”? Is a customer someone who has made a purchase, or someone who is considering a purchase? Is a former employee still categorized as an “employee”? Are the terms “partner” and “reseller” synonymous? These questions can be answered by building a common data dictionary. Once implemented, the data dictionary can span the organization to ensure that business terms are tied via metadata to technical terms and that the organization has a single, common understanding.
7) Understand data
Someone once said, “You cannot govern what you do not first understand.” Few applications stand alone today. Rather, they are made up of systems, and “systems of systems”, with applications and databases across the enterprise, yet integrated, or at least interrelated. The relational database model worsens the situation through the fragmentation of business entities for storage. However, how is everything related? The data governance team needs to discover the critical data relationships across the enterprise. Data discovery may include simple and hard-to-find relationships, as well as the locations of sensitive data within the enterprise’s IT systems.
8) Create a metadata repository
Metadata is data that has the purpose of giving information about other data. It is information regarding the characteristics of any data artifact, such as its technical name, business name,
14 Applying Fuzzy Logic for Data Governance location, perceived importance , and relationships to other data artifacts in the enterprise. The data governance program will generate a lot of business metadata from the data dictionary and a lot of technical metadata during the discovery phase. This metadata needs to be stored in a repository so that it can be shared and leveraged across multiple projects.
9) Define metrics:
Data governance needs to have robust metrics to measure and track progress. The data governance team must recognize that when something is measured, performance improves.
As a result, the data governance team must pick a few key performance indicators (KPIs) to measure the ongoing performance of the program. For example, a bank will want to assess the overall credit exposure by industry. In that case, the data governance program might select a percentage of null Standard Industry Classification (SIC) codes as a KPI, to track the quality of risk management information.
10) Govern master data
The most valuable information within an enterprise, which is critical data about customers, products, materials, vendors, and accounts, is commonly known as master data. Despite its importance, master data is often replicated and scattered across business processes, systems, and applications throughout the enterprise. Governing master data is an ongoing practice, whereby business leaders define the principles, policies, processes, business rules, and metrics for achieving business objectives, by managing the quality of their master data.
Challenges regarding master data tend to bedevil most organizations, but it is not always easy to get the right level of business sponsorship to fix the root cause of the issues. As a
15 Applying Fuzzy Logic for Data Governance result, it is important to justify an investment in a master data initiative. For example, consider an organization such as a bank, which sends multiple pieces of mail to the same household.
The bank can establish a quick return on investments by cleansing its customer data to create a single view of the “household.” The bottom line is that the vast majority of data governance programs deal with issues around data stewardship, data quality, master data, and compliance.
11) Govern analytics
Enterprises have invested huge sums of money to build data warehouses to gain competitive insight. However, these investments have not always yielded results. As a consequence, businesses are increasingly scrutinizing their investments. We define the “analytics governance” track as the setting of policies and procedures to better align business users with the investments in analytic infrastructure. Data governance organizations need to ask the following questions:
❏ How many users do we have for our data, by business area?
❏ How many reports do we create, by business area?
❏ Do the users derive value from these reports?
❏ How many report executions do we have per month?
❏ How long does it take to produce a new report?
❏ What is the cost of producing a new report?
❏ Can we train the users to produce their own reports?
Many organizations will want to set up a Business Intelligence Competency Centre (BICC) to educate users, increase business intelligence, and develop reports.
16 Applying Fuzzy Logic for Data Governance
12) Manage security and privacy
Data governance leaders, especially those who report to the chief information security officer, often have to deal with issues around data security and privacy.
Some of the common data security and privacy challenges include:
❏ Where is our sensitive data?
❏ Has the organization masked its sensitive data in non-production environments (for example, in development, testing, and training) to comply with privacy regulations?
❏ Are database audit controls in place to prevent privileged users, such as DBAs from accessing private data, such as employees' salaries and customer lists?
13) Govern the information lifecycle
Unstructured content makes up more than 80 percent of the data within the typical enterprise. As organizations move from data governance to information governance, they start to consider the governance of this unstructured content.
The lifecycle of information starts with data creation and ends with its removal from production. Data governance organizations have to deal with the following issues regarding the lifecycle of information:
❏ What is our policy regarding digitizing paper documents?
❏ What is our records management policy for paper documents, electronic documents, and email? (In other words, which documents do
17 Applying Fuzzy Logic for Data Governance we maintain as records and for how long?)
❏ How do we archive structured data to reduce storage costs and improve performance?
❏ How do we bring structured and unstructured data together under a common framework of policies and management?
14) Measure the results:
Data governance organizations must ensure continuous improvement by constantly monitoring metrics. In step nine, the data governance team sets up the metrics. In this step, the data governance team reports to senior stakeholders on the progress of those metrics from IT and the business.
Data Governance Business Application
Today, banking systems, establish and maintain line of business (LoB) specific customer views with associated accounts and product holdings – either in product systems or in LoB specific Customer Information Files (CIFs). Thus, the customer, account, and product relationship information resides in information silo applications. This limits the ability to understand the customer holistically (across LoBs) and does not provide an enterprise view of the customer.
The Master Data Management (MDM) initiative enables a complete 360 degree operational view of customers across the bank (enterprise goal). At the target state, the key capabilities of
MDM are to:
Provide consistent and accurate data about essential business entities derived from a
single trusted source.
18 Applying Fuzzy Logic for Data Governance
Uniquely identify a customer and all the associated relationships/holdings with the
bank, based on the customer's privacy preferences
To achieve the target state objective, the MDM solution will integrate/interface between the numerous LoB specific applications, consolidate the data, and create a single golden master record.
19 Applying Fuzzy Logic for Data Governance
Below is a typical data governance business (Master Data Management) application diagram:
Figure 2: MDM Process
20 Applying Fuzzy Logic for Data Governance
The solution overview diagram clearly depicts various sub-systems in the solution. At a high level, the entire solution is classified into following layers:
Presentation Layer
OCIF Sub-system
Data Integration and Quality Layer
Application Layer and
Database Layer
Presentation Layer
The presentation layer of the solution essentially implies user interface applications. The following user interface applications are included:
Reporting User Interface
Data Stewardship User Interface
Business Administration User Interface
The Reporting user interface will generate business and stewardship reports on the data available in MDM, the Data Stewardship user interface will provide various options for operating with customer information along with searching and handling duplicate or potential duplicate customers, and the Data Administration user interface will manage reference data and other meta-data in the MDM database.
21 Applying Fuzzy Logic for Data Governance
OCIF Sub-System
The OCIF is an existing authoritative operational source of customer information being used by multiple systems. This sub-system is presently being considered as a ‘Book of Record’ in the enterprise. The key objective of this system is creating and maintaining standardized and consistent customer information across the systems, reducing potential duplicate customers and improving customer data integrity significantly so that it can be treated as a single source of truth. In the current solution context, this system is considered as the only source systems from which customer information will be loaded into MDM data base. Based on the solution overview diagram, there will be two approaches of data synchronization between the systems. These are:
The initial load – The entire content of the data base
The delta load – The difference in content between the last day and the current day
To populate data into MDM from OCIF, an OCIF component/utility is required, which will extract required data. The new component that will be developed will be responsible for providing extracts on a daily basis which will be the input for downstream sub-systems to transform and load into MDM and thus synchronize two systems.
Information Integration Layer
Information Integration Layer – Data Stage
The information integration layer is a key component that is responsible for integrating OCIF and the MDM server application. The data format provided by OCIF is not compatible with the
22 Applying Fuzzy Logic for Data Governance
MDM server and hence it is not directly consumable. DataStage, being the part of the integration layer, is responsible for transforming OCIF extracts to an MDM specific format.
The key objectives of this layer is to:
Read extracts provided by the source system
Transform the extract in the required format based on a synchronization
mechanism
Transform the extract file in the format required by the data quality component
for standardization during the initial load
Transforming reference value to MDM specific codes depending on the source
system reference value
Loading the transformed data into a data base/file
The IIS DataStage component is responsible for reading extracts from the source system, transforming them into SIF format and pushing data into MDM in two different ways:
Directly into the database during the initial load
Writing into files (ExSIF) for the delta load
The following sections detail the approaches to be followed in the ETL layer.
23 Applying Fuzzy Logic for Data Governance
The diagram below describes the high level steps to be performed in DataStage during the initial data load.
Figure 3: MDM Initial Load Process
1. Custom DataStage extract job will be developed to read extract files from the ETL
receiving zone and parse each record based on the record type and sub-type into
individual records of the SIF format, which is a pipe delimited standard interface file.
2. Validation jobs will be responsible for data standardization. It will also perform the SIN
validation and phone number validation. Any failed record information will be logged
into an error log file through error handling jobs.
3. An ETL job will be invoked to populate a separate file for standardization which will be
24 Applying Fuzzy Logic for Data Governance
used by QualityStage. The above steps will generate the SIF files for consumption of
the BIL jobs.
4. The BIL import job imports the SIF file for processing.
5. A validation job validates the code column value and invokes error handling framework
jobs in the case of failure. In such scenarios, these records which are a source of
issue, are dropped from the requested SIF file. Based on the strategy of the initial load,
the dropping of records is minimized to synchronize the MDM with the source system
at the highest degree.
6. The party referential integrity validation job ensures every party has either a valid
PersonName or OrgName record and also verifies that a valid party record exists for
the “Provided By” Source System Key (SSK).
7. The BIL consists of one job for each Record Type or Sub Type (RT/ST) that performs
key assignment and database loading. For example the Contact key assignment job
assigns CONT_ID, PERSON_ID, ORG_ID and CONTEQUIV_ID to CONTACT,
PERSON, ORG and CONTEQUIV records respectively and inserts them into the MDM
database. Before loading the records into MDM, an MDM Involved Party ID will be
generated within ETL jobs. At a high level, the new MDM Involved Party ID will be of
an 18 character length where the first 2 characters will imply the version of the BIL and
the last 16 characters will be a random number.
8. The data quality error consolidation process reads the data quality error files created
during the import SIF, validation, and referential integrity validation phases and drops
any records associated with the records in the error file.
25 Applying Fuzzy Logic for Data Governance
The diagram below describes the high level steps to be performed in DataStage during the delta data load.
Figure 4: MDM Delta Load Process
1. Custom DataStage extract jobs for the process of the initial load will be re-used to
read extract files from the ETL receiving zone and to parse each record based on the
record type or sub-type.
2. Data validation jobs are responsible for CII data standardization. It will also perform
SIN validation and phone number validation. Any failed records will be logged into the
26 Applying Fuzzy Logic for Data Governance
log file through an error handling mechanism. The above two steps essentially
generate the SIF files for consumption of the BIL asset.
3. The DataStage import job imports the SIF file for processing.
4. Applicable business transformation rules are invoked using DataStage transformation
jobs which are responsible for generating extended SIF files for MDM to consume.
Errors are logged using DataStage's out of box error handling mechanism for further
analysis and action.
Data Quality Management – Quality Stage
The master data hub solution is about providing complete, accurate, standardized information about the customers stored in the MDM system. Even though OCIF maintains its own data quality, customer attributes need further standardization before they are stored in MDM as it will be the single version of truth on customer data throughout the enterprise. The
QualityStage component is primarily responsible for data standardization, the improvement of overall quality of the data asset of the enterprise, and identification of duplicate/potentially duplicate customers. The current solution places QualityStage with the following objectives:
Standardize name and address related attributes
Validate and correct customers' addresses with the Canada post address
repository implemented through SERP
Perform probabilistic matching to identify potential duplicate customers
The IIS QualityStage component is responsible for maintaining data quality stored in MDM.
27 Applying Fuzzy Logic for Data Governance
The key objective of QualityStage is:
Name and address standardization
Identifying duplicate/potentially duplicate customers
Matching
28 Applying Fuzzy Logic for Data Governance
The diagram below describes the high level steps to be performed during the initial load.
Source Source to QS Extract SIF DS MDM Code MDM File Stan jobs Jobs conformation DB
Figure 5: Quality Stage Initial Load Process
The diagram below describes the high level steps to be performed during the delta load.
QS QS Stan Jobs Stan No Code Jobs value conformation
Source Source to MDM Code Extract MDM MDM SIF DS conformation File jobs DB
Figure 6: Quality Stage Delta Load Process
Individual Customer Name Standardization
This standardization procedure will receive an individual name from MDM before processing the individual name through the MNNAME rule set. The MNNAME rule set will parse the individual name into separate name elements and create an analysis value or phonetic representation value for the first and last name of the individual.
29 Applying Fuzzy Logic for Data Governance
For example:
If an individual by the name of “Mr William Chen” was passed to the individual standardization procedure, this would be the standardization result.
Organizational Customer Name Standardization
This standardization procedure will receive an organization name from MDM and process the organization name through the MNNAME rule set. The MNNAME rule set will parse the organization name into separate word elements and create an analysis value or phonetic representation value for word1 and word2 of the organization name.
For Example:
If the organization name of “Bank of Example” was passed to this organization standardization procedure, this would be the standardization result.
The important thing to note is that the original name feed into QualityStage from MDM will be passed back to MDM. QualityStage does not change or enhance the organization name in any way. QualityStage parses the name into smaller elements for matching purposes only.
MDM will receive the original name, the phonetic representation of organization name, and the standardized name.
Address Standardization
This standardization procedure will receive an address from MDM and process the address through the MDMCADDR and MDMCAAREA rule sets. The MDMCAADDR rule set will parse
30 Applying Fuzzy Logic for Data Governance the address name into separate address elements and create an analysis value or phonetic representation value for street name. The MDMCAAREA rule set will parse the city, province, and postal code into separate address elements and create an analysis value or phonetic representation value for the city name.
For example:
If the address of “123 Maple Street Unit 5 ” was passed to this address standardization procedure, this would be the standardization result.
Matching
In order to maintain data quality, adding and updating a customer will trigger the matching process.
Individual and organizational customers will be processed by different match specifications in
QualityStage, which consists of blocking parameters and scoring specifications for different passes.
The MDM service will provide QualityStage (QS) with a set of candidates by searching the
MDM database through blocking parameters for different passes. The QS matching process will compare and score each candidate and return the match result to the MDM.
In order to implement the match specification and respond to MDM requests, ISD job and shared containers are created for the interface.
31 Applying Fuzzy Logic for Data Governance
Application Layer
The MDM server is an application component of the solution that interfaces with the data source where the master data will be stored. It is also responsible for providing various features for managing and maintaining master data to keep the data source as single version of truth. The application is responsible for:
Interfacing with the master data source through various protocols
Managing master data through the exposed interfaces with other sub-
systems/external sources
Controlling access in terms of data visibility and enhancing data security
Identifying and providing information on potential candidate list of duplicate
customers to assist quality stage with figuring out detailed information on
customer duplication and storing them in the data source.
Merging two/multiple customers to enforce MDM data source as single view of
the customer and a single version of truth.
Provides a user interface to merge and maintain customer information while
duplicates are potential and not guaranteed.
Provides a user interface to manage and configure metadata
Database Layer
The database layer in the solution is responsible for storing all the master data. It also stores the history data, audit, and meta-data required for the MDM application to execute. During the initial load, the database is populated directly by the information integration layer. Once the
32 Applying Fuzzy Logic for Data Governance initial population is successfully completed, daily extracts from source systems will be loaded into the MDM database through the MDM batch framework and maintenance services.
Apart from business data, the database layer also contains meta-data which is required for the MDM application. Meta-data is another key set of information which is configured for the
MDM application and controls the behaviour and functionality of the MDM application.
Data Quality Management in Detail
For example, if we have the following input file:
Profile Phone Party
File Name ID Name Address Number Type B2B Personal B2BPC 123 Main Street, Toronto, 416-549-
Cardholders 1 John Smith Ontario, Canada X1X1X1 7061
Cardholders 2 ABC Limited Alberta, Y2Y2Y2 7061
Cardholders 3 Jane Smith Ontario, Canada X1X1X1
Cardholders 4 A
There are several data quality requirements (baseline) that need to be followed in order to keep the data quality:
Requirement Description
33 Applying Fuzzy Logic for Data Governance
Name Formatting and Standardization
1. If a free form name, i.e. it is unparsed as a single string,is received
as an input, the MDM matching solution should parse or tokenize
the name to the common format required for processing. E.g.:
John Smith may need to be tokenized into First Name = John and
Last Name = Smith Address Formatting and Standardization
1. If a free form address, i.e. it is unparsed as a single string, is
received as an input, the MDM matching solution should parse or
tokenize the address to the common format required for
processing – for both Canadian and US addresses. E.g.: 123 Main
Street may need to be tokenized into Street Number = 123, Street
Name = Main Street. If the country code / name is missing in the
incoming files, the Canadian address standardization rules will be
applied as a default Address Validation and Correction
1. All addresses received in the input files should be validated and
corrected based on checks with Canada Post. In case of an
address correction, the address as provided by Canada Post will
be applied. Phone Number Formatting and Standardization
1. If a free form phone number, i.e. it is unparsed as a single string,
is received as an input, the MDM matching solution should parse
or tokenize the phone number to the common format required for
34 Applying Fuzzy Logic for Data Governance
processing. E.g.: 416-549-7061 may need to be tokenized into
Area Code = 416, Number = 549-7061 Name Patterns
1. The MDM matching solution should develop data processing rules
to handle the following patterns that may occur in the ‘name’
fields:
For individuals, the connectors that will identify such patterns are
a. Space And Space (e.g.: John And Jane Smith)
b. Space and Space (e.g.: John and Jane Smith)
c. & (e.g.: John&Jane Smith)
d. Space & Space (e.g.: John & Jane Smith)
e. / (e.g.: John/Jane Smith)
f. Space / space (e.g.: John / Jane Smith)
g. \ (e.g.: John\Jane Smith)
h. Space \ Space (e.g.: John \ Jane Smith)
For organizations, the connectors that will identify such patterns are
i. / (e.g.: John/ABC Limited)
j. Space / space (e.g.: John / ABC Limited)
k. \ (e.g.: John\ABC Limited)
l. Space \ Space (e.g.: John \ ABC Limited)
2. If a name pattern has either of the ‘And’, ‘and’ or ‘&’ connector, the
following requirements should be developed:
a. A lookup with the organization name directory should be
35 Applying Fuzzy Logic for Data Governance
performed.
b. If the name pattern matches with an organization name
from the directory, the record should not be split into
discrete records.
c. If the name pattern does not match with an organization
name from the directory, the record should be split into
discrete records.
Matching Process Requirement:
Req. ID Requirement Description FR Rules - Overview
2.3.1 1. The MDM matching rules should be designed and developed to match
incoming records across all input files – i.e. match all input files with
each other
2. The MDM matching rules should be designed and developed to match
the incoming records with all records stored in the MDM
FR Rules - List
2.3.2 The following matching rules should be designed and developed in the
MDM environment:
1. Rule 1: Individual Matching - Individual Full Name and Full Address
2. Rule 2: Organizational Matching - Organization Name and Full Address
3. Rule 3: Household Matching Individual - Last Name and Full Address
4. Rule 4: Address Matching - Full Address
36 Applying Fuzzy Logic for Data Governance
5. Rule 5: Phone Number Matching – Full Phone Number
NOTE: Each of the above matching rules should generate independent
match IDs/keys FR Rules - Data Elements
2.3.3 1. Full Name – When the matching rules are based on Full Names, the
following discrete data elements should be used:
a. First Name a.k.a Given Name
b. Last Name
c. Name Suffix
d. Organization Name (as applicable)
2. Full Address – When the matching rules are based on Full Address, the
following discrete data elements should be used:
a. Apartment / Unit Number
b. Street Number
c. Street Name
d. Street Type
e. City
f. Province
g. Postal Code
h. Country
i. Non Civic Address Info (as applicable)
3. Full Phone Number – When the matching rules are based on Full
Phone Number, the following discrete data elements should be used:
37 Applying Fuzzy Logic for Data Governance
a. Country Code
b. Area Code
c. Number
NOTE: The Individual Matching Process uses ‘Residential Primary’ or
equivalent addresses only, while Organizational Matching Process uses
‘Business Primary’ or equivalent addresses only
NOTE: Phonetic representations of first name, last name, and street name
are used by the current MDM matching process FR Rules – Guidelines
2.3.4 1. The corrected postal address should be used by the MDM matching
process.
2. Each record from each input file should undergo each of the 4 rules
stated above.
NOTE: For example, a record identified as ‘Individual’ should undergo the
Organizational match rule as well.
3. Wherever applicable, the Match IDs/keys as generated by the individual
matching rules should be cross referenced in the output files. E.g.: A
record could have an Individual Match Key as 123 and a Household
Match Key as 456.
4. A separate match ID / key should be generated for records within the
MDM that do not have a match with records in the input files. FR Rules – Weights, Thresholds and Categories
2.3.5 1. The MDM matching solution should be designed and developed for
38 Applying Fuzzy Logic for Data Governance
‘looser’ matching rules.
2. The weights and thresholds that are currently assigned in the MDM
environment should be used as a starting point for the design and
development.
3. The match categories that are currently identified in the MDM
environment should be used as a starting point for the design.
FR Rules – Error Condition
2.3.6 1. In case the incoming record was unable to be processed by the MDM
matching solution, it should be highlighted in the output file.
2. A description of the reason why the record could not undergo the
matching process should be included in the output file.
NOTE: These error descriptions should be as provided by the MDM
matching solution with no new requirements.
Output file:
Input Data Ad- Ad- Cor- dress dress rec- Valid- Valid- ted ation ation Ad- In- In- In- Se- or or dress put put Phon put quenc Cor- Cor- or Matc Input Pro- In- Ad- e Part e Spli rec- rec- Ad- File file put dres Num- y Num- t tion tion dress Pro Name ID Name s ber Type ber Name Indic- De- from cess
39 Applying Fuzzy Logic for Data Governance
scrip- ator tion MDM 123 123 Mai Main Stre Stree et, t, B2B Toro Toron Per- nto, to, sonal Onta Street Ontar Card- John rio, 416- John name io, hold- B2BP Smit X1X1 549-
40 Applying Fuzzy Logic for Data Governance
berta rta, , Y2Y2 Y5M2Y Y2 2 123 123 Main Main Stre Stree et, t, B2B Toro Toron Per- John nto, to, sonal and Onta Ontar Card- Jane rio, John io, hold- B2BP Smit X1X1
41 Applying Fuzzy Logic for Data Governance
in- forma- tion 789 Pop- lar Road, Ott- awa, Dav- Ontar id io,
42 Applying Fuzzy Logic for Data Governance
CHAPTER 3 – ISSUES, CHALLENGES AND TRENDS
Unfortunately, during the MDM matching process, there are still processes that need human intervention, such as the following tasks:
The Potential Overlay Task:
A potential overlay occurs when a record is updated with information that is radically different from the data already in the record. For example, consider the situation illustrated below:
C 100 as Family Given Date of Kumar e party id Name Name Gender Birth Avenue,Address phone last modify Markham, Ontario, Canada 416-549- ### LEWIS JANE F 26-Jun-71 A2B2C2 7070 08/24/06 5 456 King Avenue, calgary, Alberta, 416-549- ### XIANG LINDA F 13-Jan-78 Y2Y2Y2 7070 02/28/98 Figure 7: Case 5
The data steward will mark the record as a potential overlay record because the ID field from both records are the same. However, when we look closely on these two records, we can find that Linda Xiang and Jane Lewis are clearly not the same person. The ID
388293023980000000 was created on Feb 28, 1998 and belongs to Linda Xiang. Somehow, on Aug 24, 2006, the record was updated. It now appears to belong to a woman named Jane
Lewis. It may have been caused by a common typographical data entry mistake in which
43 Applying Fuzzy Logic for Data Governance
Linda Xiang's record was open on the screen when the customer service representative started typing, not realizing that he or she was typing over someone else's data.
There are also some situations in which this scenario would be perfectly valid. In cases of events such as marriage, divorce, a move, or phone-number change, a person's data would change significantly enough to flag a potential overlay task by a data steward application.
Using data mining and fuzzy logic can automatically solve the potential overlay tasks.
Match Duplicate Suspects to Create a New Master Record:
As a solution to Data warehouse applications, data governance will match the records from multiple lines of business (LOB). There are situations where customers from multi LOBs may have similar names, addresses, and telephone numbers and may have fields with blank or values that are not available (N/A). For instance, see below:
44 Applying Fuzzy Logic for Data Governance
Family Given Date of Case party id Name Name Birth Address phone 987 Village Ave, Toronto, Ontario, Canada 416-222- ### VERKIN SMITH 5-Aug-60 T2T1C1 3333 3 987 Village Ave, Toronto, Ontario, Canada 416-222- ### VERKI SMITH 5-Aug-60 T2T1C1 3333 Figure 8: Case 3
The two records above have the same Given name, Date of Birth, Address, Phone number.
However, the Family Name is slightly different. Are these two records the same customer?
Data governance applications currently available in markets will stop here and waiting human intervention to identify. Through the application of data mining and fuzzy logic we would be able to identify such cases without human intervention and generate a single customer profile with the best data from all sources.
Link Related Records from Multiple Sources:
With overlays, the task verifies the existing records in the system. With duplicate suspects, the task gets rid of extra records. For this task, it links records between systems. The current data steward application available in the market may not able to automatically link such
45 Applying Fuzzy Logic for Data Governance records due to not having enough data in common, such the example below:
Family Given Date of Case party id Name Name Birth Address phone Source 123 Main Street, GUGGENH Toronto, EIM REAL Ontario, ESTATE Canada 647-123- ### LLC 10-May-98 X1X1X1 4567 Market 2 St Unit 10, GUGGENH Toronto, EIM REAL Ontario, ESTATE Canada 647-123- ### LLC 10-May-98 X1X1X1 2352 Auto Figure 9: Case 2
For the above example, when you first look at these two records, they appear to be the same company. However, if one looks closely, one can determine some differences. First, the address are different: “Unit 10” is only displayed in one of the record's address field. Second, the phone numbers are different. One is “647-123-4567” and the other is “647-123-2352”.
Data mining and fuzzy logic can automatically verify these two records are the same company and link them together.
46 Applying Fuzzy Logic for Data Governance
CHAPTER4 – FUZZY LOGIC
Traditional Logic:
Now let's suppose that we generate the following training set based on the Data Steward application output including the potential overlay task, duplicate suspects, and related records from multiple sources. We would have:
47 Family Given Date of Case party id Name Name Birth Address phone Source Class 10 Main Street, Applying Fuzzy LogicMarkham, for Data Governance Ontario, Canada 915-123- ### VERKIN YOUSSOU 5-Aug-60 X2Y1X1 4213 Mortgage 1 N 10 Main Street, Markham, Ontario, Canada 915-123- ### VERKIN JANE 5-Aug-60 X2Y1X1 4213 Auto 123 Main Street, GUGGENH Toronto, EIM REAL Ontario, ESTATE Canada 647-123- ### LLC 10-May-98 123X1X1X1 Main 4567 Market 2 Y St Unit 10, GUGGENH Toronto, EIM REAL Ontario, ESTATE Canada 647-123- ### LLC 10-May-98 X1X1X1 2352 Auto 987 Village Ave, Toronto, Ontario, Canada 416-222- ### VERKIN SMITH 5-Aug-60 T2T1C1 3333 Life 3 Y 987 Village Ave, Toronto, Ontario, Canada 416-222- ### V. SMITH 5-Aug-60 T2T1C1 3333 Auto 456 King CREATIVE Avenue, LEADERSH calgary, IM GROUM Alberta, 416-549- ### LTD 5-Aug-60 Y2Y2Y2 7070 4 Y 456 King CREATIVE Avenue, LEADERSH calgary, IM GROUM Alberta, 416-549- ### LTD 5-Aug-60 Y2Y2Y2 7070 100 Kumar Avenue, Markham, Ontario, Canada 416-549- 24/08/20 5 ### LEWIS JANE 26-Jun-71 A2B2C2 7070 06 N 456 King Avenue, calgary, Alberta, 416-549- 28/02/19 ### XIANG LINDA 13-Jan-78 Y2Y2Y2 7070 98 Figure 10: Cases
48 Applying Fuzzy Logic for Data Governance
Traditional logic - the idea that the outcome can only be either true or false, 1 or 0, right or wrong. This form of logic dates back to ancient Greece and is perfectly adequate to answer simple questions in single dimensions. For example, if A is 1 and B is 0 what is A AND B ? It can be extended, as is done in Boolean algebra to more complex questions, as long as all the parts can be described using the same restricted alphabet of two symbols. Such logic is a deductive way of understanding consequences and is a highly valuable intellectual technique.
12
If we use the above traditional logic, we will get the following training set:
Family Given Date of case Name Name Birth Address phone class
1 T F T T T N
2 T T T F F Y
3 F T T T T Y
4 T T T T T Y
5 F F F F T N Figure 11: Training Set
Applying the information gain on the above training set, we will get the information gain on the attributes:
49 Applying Fuzzy Logic for Data Governance
m
Info(D)=−∑ pi log2( pi) i=1
v ∣D j∣ Info A(D)=∑ ×Info(D j) j=1 ∣D∣
Gain( A)=Info(D)−Info A (D)
2 2 3 3 Info(D)=− log − log =0.97 5 2 5 5 2 5
3 2 2 1 1 2 1 1 1 1 Info (D)= − log − log + − log − log =0.95 FamilyName 5( 3 2 3 3 2 3 ) 5 ( 2 2 2 2 2 2)
3 3 3 0 0 2 2 2 0 0 Info (D)= − log − log + − log − log =0 GivenName 5 ( 3 2 3 3 2 3 ) 5 ( 2 2 2 2 2 2) 4 3 3 1 1 1 1 1 0 0 Info (D)= − log − log + − log − log =0.65 DateofBirth 5 ( 4 2 4 4 2 4 ) 5 ( 1 2 1 1 2 1) 3 2 2 1 1 2 1 1 1 1 Info (D)= − log − log + − log − log =0.96 Address 5 ( 3 2 3 3 2 3) 5 ( 2 2 2 2 2 2 ) 4 2 2 2 2 1 0 0 1 1 Info (D)= − log − log + − log − log =0.8 Phone 5 ( 4 2 4 4 2 4 ) 5( 1 2 1 1 2 1 )
Hence, the gain in information from such a partitioning would be:
Gain(FamilyName)=Info(D)−InfoFamilyName (D)=0.97−0.95=0.02
Gain(GivenName)=Info(D)−InfoGivenName (D)=0.97−0=0.97 Gain(DateofBirth)=Info (D)−Info (D)=0.97−0.65=0.32 DateofBirth Courier Gain( Address)=Info(D)−Info Address(D)=0.97−0.96=0.01
Gain(Phone)=Info(D)−Info Phone (D)=0.97−0.8=0.17
50 Applying Fuzzy Logic for Data Governance
Since GivenName has the highest information gain among the attributes, it is selected as the splitting attribute. So we get the following decision tree:
Figure 12: Traditional Decision Tree
One of the issues about the above decision tree is the uncertainty of the attributes. For example, is the name “John Smith” the same as “J.Smith”. The above model only provided two states for attributes which considered whether the Given Name is the same or the Given
Name is not the same. I will illustrate here to tackle the uncertainty associated with the description of knowledge by using fuzzy logic.
51 Applying Fuzzy Logic for Data Governance
Fuzzy Logic History
The term "fuzzy logic" was introduced with the 1965 proposal of the fuzzy set theory by Lotfi
A. Zadeh .[2][3] Fuzzy logic has been applied to many fields, from control theory to artificial intelligence. Fuzzy logic however had been studied since the 1920s as infinite-valued logic notably by Łukasiewicz and Tarski.[4]
The Basic Concept of Fuzzy Logic
Fuzzy mathematics forms a branch of mathematics related to the fuzzy set theory and
fuzzy logic . It started in 1965 after the publication of Lotfi Asker Zadeh 's seminal work Fuzzy sets.[1] A fuzzy subset A of a set X is a function A:X→L, where L is the interval [0,1]. This function is also called a membership function. A membership function is a generalization of a
characteristic function or an indicator function of a subset defined for L = {0,1}. More generally, one can use a complete lattice L in a definition of a fuzzy subset A . 9
A Fuzzy Implementation:
For each input and output variable selected, I define two or more membership functions (MF).
There is qualitative category for each one, for example: true or false. The shape of these functions can be diverse but I will work with a triangle, which needs three points to define one
MF of one variable. Below is the triangle for the variable GivenName:
52 Applying Fuzzy Logic for Data Governance
y true 1 false
x0 x1 x2 x3 x4
Figure 13: Fuzzy MF
If we take GivenName as a variable, 'true' as the triangle, and 'false' as the trapezoid ( see the figure above),
– the MF 'true' will be defined by three points : (x0, x1, x2) (x0 is any negative
value. )
– the MF 'false' will be defined by four points : (x1, x2, x3, x4) ( x4 is any positive
value > x3. This means that 'false' will be 1 after x2 infinite. )
We have the following MF for Given Name:
true x−x0 x2−x y(triangle) ( x; x0 ,x1, x2) =max min , ,0 ( ( x1−x0 x2−x1 ) )
false x−x1 x4−x y(trapezoid)(x ; x1, x2, x3, x4)=max mix ,1, ,0 ( ( x2−x1 x4−x3 ) )
For the Given Name variable, I use the Levenshtein distance to calculate the value of x:
53
Applying Fuzzy Logic for Data Governance
5, then the two values are somewhat similar. If the distance is greater than 5, then the two values are not the same at all.
After the above specification, we have the fuzzificate real value for GivenName. For example, for “kitten” and “sitting” with distance 3, we can get the fuzzificated
3−∞ 5−3 ytrue=max min , ,0 =0.4 ( ( 0−∞ 5−0 ) )
false 3−0 ∞−3 y =max min ,1, ,0 =0.6 ( ( 5−0 ∞−15 ) )
Decision Tree definition:
Now let's reconsider the decision tree we introduced before:
Figure 14: Traditional Decision Tree
For this simple case, we have the following rule based on the decision tree above:
IF GivenName is equal (T), THEN two records are equal.
Next is to compute the degree of membership to the MF (true, false) of the output (the THEN
55 Applying Fuzzy Logic for Data Governance part). Once a variable such as the Given Name is fuzzificated, it takes a value between 0 and
1, indicating the degree of membership to a given MF of the specific variable. The degrees of membership of the input variables have to be combined to get the degree of membership of the output. For a single input variable, such as the rule specified above, we can for example have a fuzzy rule as shown below:
IF GivenName is equal (T), THEN two records are equal;
IF GivenName is not equal (F), THEN two records are not equal;
According to these rules, if we suppose that the degree of membership for GivenName is 0.6 to MF 'false', then the two records that are not equal are 0.6, too.
In case we have more than one input variable, the degree of membership for the output value will be the minimum value of the degree of membership for the different inputs. For example, suppose we have two input variables (GivenName X and Family Name Y) and the decision matrix below:
FamilyName equal not equal equal equal not equal GivenName not equal not equal not equal Figure 15: Decision Matrix
If we calculated the attributes as having the following fuzzificated values:
56 Applying Fuzzy Logic for Data Governance
equal yGivenName=0.8 notequal yFamilyName=0.9
Then we have the following rule satisfied:
IF GivenName is equal (degree of 0.8) and FamilyName is not equal (degree of 0.9) THEN the two records are not equal (degree of 0.8).
not equal yGivenName=0.8 equal yFamilyName=0.2
The following rule would also be satisfied:
IF GivenName is not equal (degree of 0.8) and FamilyName is equal (degree of 0.2) THEN the two records are not equal (degree of 0.2)
Brief Discussion:
In applying fuzzy logic to the data governance process, we can get a more accurate decision tree, which will enhance the decision making process. With the above example, using the traditional decision tree model, it has to be taken into consideration whether FamilyName and
GivenName are slightly different. If FamilyName and GivenName are different, the conclusion may be drawn that the two records belong to different persons. However, when we apply fuzzy logic, we may say that the records with FamilyName are not equal to some extent(let's say 20% not equal) and that GivenName is somewhat equal at 0.3 degrees. In that case, the records would be considered to belong to the same person based on fuzzificated logic.
Therefore, a more accurate result is gained.
57 Applying Fuzzy Logic for Data Governance
CHAPTER 5 - CONCLUSIONS
In this essay, the history of data governance was discussed, as well as current literature and the future of this process. The data governance process itself was then explained wherein it was found that the central point of data governance is related to data quality. In order to improve the data quality of the master data repository, fuzzy logic was applied to the data governance process. With data governance constantly evolving , we have the requirement to guarantee the quality of data governance. Applying fuzzy logic will definitely help to improve the quality of data governance. Fuzzy logic will not only improve the data quality process, but it will actually also improve the process automation.
58 Applying Fuzzy Logic for Data Governance
REFERENCE
1. Data Governance (November 7, 2013). In Wikimedia, the free encyclopedia. Retrieved December 5, 2013, from http://en.wikipedia.org/wiki/Data_governance 2. A Brief History of Data Quality (March 25, 2009). Data Governance Insider: Covering the world of big data and data governance. Retrieved from http://data- governance.blogspot.ca/2009/03/brief-history-of-data-quality.html 3. Nigel Turner (Nov 15, 2013). Kindling the Flames: The Future of Data Governance. Retrieved December 11, 2013, from http://smartdatacollective.com/dat- mai/167531/kindling-flames-future-data-governance 4. Rick Sherman (2011) A must to avoid: Worst practices in enterprise data governance. Retrieved from http://searchdatamanagement.techtarget.com/feature/A-must-to-avoid- Worst-practices-in-enterprise-data-governance 5. Marketing Data Governance in the Era of “Big Data” Retrieved from http://www.kbmg.- com/wp-content/uploads/2013/07/Winterberry-Group-White-Paper-Market- ing-Data-Governance-July-2013.pdf 6. Sunil Soares (Sept 2010). The IBM Data Governance Unified Process. Ketchum, USA: MC Press Online, LLC 7. Julie Langenkamp-Muenkel (Oct 2013). MDM and Next-Generation Data Sources. Information Management 8. Huey-Li Chen, Long-Hui Chen and chien-Yu Huang (2009). Fuzzy Goal Programming Approach To Solve The Equipments-Purchasing Problem of AN FMC. International Journal of Industrial Engineering, 16(4), 270-281, 2009 9. Fazzy Mathematics ( Nov 28, 2013). In Wikimedia, the free encyclopedia. Retrieved Feb 2, 2014, from http://en.wikipedia.org/wiki/Fuzzy_mathematics 10. A Fuzzy implementation. Retrieved Nov 15, 2014 from http://apps.ensic.inpl- nancy.fr/benchmarkWWTP/RiskAnalysis/RiskWeb/RiskModule_070423_fichiers/Fuzzy _implementation_070423.pdf 11. Risk Analysis (April 2007) Retrieved Nov 15, 2014 from http://apps.ensic.inpl- nancy.fr/benchmarkWWTP/RiskAnalysis/RiskWeb/RiskModule_070423_fichiers/
59 Applying Fuzzy Logic for Data Governance
12.Fuzzy Multidimensional Logic (March 2004). Retrieved Feb 18, 2014 from http://www.calresco.org/lucas/fuzzy.htm 13.Levenshtein distance ( Feb, 2014 ). Retrieved Feb 19, 2014 from http://en.wikipedia.org/wiki/Levenshtein_distance 14.Adler. Big Data Governance Maturity (March 2012). Retrieved Feb 23, 2014 from https://www.ibm.com/developerworks/community/blogs/adler/entry/big_data_governan ce_maturity?lang=en 15.DataFlux Data Management. The Intersection of Big Data, Data Governance and MDM. Retrieved Feb 23, 2014 from http://digital.info-mgmt.com/info- mgmt/DataFlux_SAS2012#pg1 16. Sammon, D. and Adam, F. “Making Sense of the Master Data Management (MDM) Concept: Old Wine in New Bottles or New Wine in Old Bottles?” Proceedings of the 2010 conference on Bridging the Socio-technical Gap in Decision Support Systems: Challenges for the Next Decade Pages 175-186
60