ATHABASCA UNIVERSITY

Applying Fuzzy for Data Governance BY XiaoHai Lu

A project submitted in partial fulfillment Of the requirements for the degree of MASTER OF SCIENCE in INFORMATION SYSTEMS

Athabasca, Alberta November, 2014

© XiaoHai Lu, 2014 

DEDICATION

This essay is dedicated to my supported wife Winnie and my boys Andrew and Michale. ABSTRACT

Every day, as we browse the internet, we consume big data from the various search engines and social networks that we visit. Like individuals, enterprises also confront a vast stream of information from individuals, communities, corporations, and governments. With vast volumes of information, long retention cycles and high velocity decision-making has the potential to derail the usefulness of information and do more damage than good to enterprises. The axiom 'better data means better decisions' becomes critical. Without solid data governance in place, data can be inaccurate and unfit for usage.

This essay will describe the and future of data governance. It will also explain the current process of data governance before demonstrating a prototype of a data governance application in the banking industry.

Data governance processes such as matching and linking related records require mathematical support in the decision-making process. Fuzzy logic, which is a approach to computing that is based on varying degrees of , was found to be a good solution to this issue. As such, this essay successfully applies fuzzy logic to overcome and improve the process, reduce human intervention, and improve the data quality of data governance processes.

3 ACKNOWLEDGMENTS

I thank all who were involved in the support and review process of this book. Without their support, the essay could not have been satisfactorily completed.

Thanks go to all those who provided their insightful and constructive comments, in particular, to professor Richard Huntrods of Athabasca University who provided priceless suggestions and feedback on my essay.

4 Applying Fuzzy Logic for Data Governance

Table of Contents DEDICATION...... 2 ABSTRACT...... 3 ACKNOWLEDGMENTS...... 4 CHAPTER 1 – INTRODUCTION...... 7 Data Governance: The History...... 7 Data Governance: The current literature on the topic...... 8 Data Governance: The Future...... 9 CHAPTER2 – DATA GOVERNANCE PROCESS...... 11 Data Governance Process...... 11 CHAPTER 3 – ISSUES, CHALLENGES AND TRENDS...... 43 The Potential Overlay Task:...... 43 Match Duplicate Suspects to Create a New Master Record:...... 44 Link Related Records from Multiple Sources:...... 45 CHAPTER4 – FUZZY LOGIC...... 48 Traditional Logic:...... 48 Fuzzy Logic History...... 51 The Basic Concept of Fuzzy Logic ...... 52 A Fuzzy Implementation:...... 52 Brief Discussion:...... 57 CHAPTER 5 - CONCLUSIONS...... 57 References...... 58

5 Applying Fuzzy Logic for Data Governance

List of Figures

Figure 1: Data Governance Process...... 11 Figure 2: MDM Process...... 20 Figure 3: MDM Initial Load Process...... 24 Figure 4: MDM Delta Load Process...... 26 Figure 5: Quality Stage Initial Load Process...... 29 Figure 6: Quality Stage Delta Load Process...... 29 Figure 7: Case 5...... 43 Figure 8: Case 3...... 45 Figure 9: Case 2...... 46 Figure 10: Cases ...... 47 Figure 11: Training Set...... 49 Figure 12: Traditional Decision Tree...... 51 Figure 13: Fuzzy MF...... 52 Figure 14: Traditional Decision Tree...... 55 Figure 15: Decision Matrix...... 56

6 Applying Fuzzy Logic for Data Governance

CHAPTER 1 ± INTRODUCTION

Data Governance: The History

Data governance is an emerging discipline with an ever evolving definition. The discipline embodies a convergence of data quality, data management, data policies, business process management, and risk management surrounding the handling of data in an organization.1 The central point of this definition of data governance is related to data quality.

From the point of view of businesses, data governance needs to be able to provide qualified information. The data governance process is the practice of transforming data into qualified information, which can be used by businesses. Incidentally, the concept of data governance has been around since the beginning of relational . Data is stored across referenced tables. Businesses can retrieve information by joining the data through cross referencing those tables. With the growth of information , databases are gradually becoming central part of information systems. In order to insert qualified data into databases, data governance is extended from databases into a set of processes which are defined as extracting, transforming, and loading (ETL) areas in order to provide databases with clean, accurate, and prompt data feeds. New terms such as metadata, data source, target, and staging are emerging with the ETL approach. There are numerous ETL tools available on the market such as Informatica and Ab initio. However, the motivation for ETL comes from an information technology (IT) perspective and focuses on IT techniques. In 2004, IBM started to introduce data governance as a discipline for treating data as an enterprise asset, 3. As a financial asset, data has to be treated like other financial assets — just as one would treat a plant and equipment. Data inventory is required for enterprises with existing data, in as much

7 Applying Fuzzy Logic for Data Governance the same way as inventories are needed for physical assets. Preventing unauthorized data changes for critical data, should also be considered since this can affect the integrity of financial reporting, as well as the quality and reliability of daily business decisions.3 Protecting sensitive data and intellectual information property from both internal and external threats is also another element that falls under data governance. Since data is a business asset, the question of how to maximize its value is also under the umbrella of data governance.

Data Governance: The current literature on the topic

As an emerging form of technology, data governance has been mainly supported by business vendors rather than academic . For example, performing an query on the subject

“data governance” on the ACM Digital library (Association for Computing Machinery) only yields 2824 results (queried on Aug 27, 2014). In contrast, when the same query is performed on Google, 36,200,000 results are yielded (queried on Aug 27, 2014). The pushed by business vendors share common challenges, such as having broad fundamental concepts with aspects being emphasized differently by each vendor. For example, Oracle does not buy the unified processes introduced by IBM white paper. In addition, challenges include that the concepts of data governance and practices are still shadowed by their precedences such as ETL, data warehouse, and ERP products. “MDM is effectively Data

Warehousing branded with ERP market rhetoric and contains an added repository of 'master data'. We see MDM as another attempt at data integration due to the failure of previous Data

Warehousing, ERP and ERPII/BI initiatives.” 17 Although many companies prefer specialized

MDM solutions, the three main players in the MDM market are IBM, Oracle and SAP.

8 Applying Fuzzy Logic for Data Governance

Data Governance: The Future

Data governance is constantly evolving and morphing into new forms. This process of evolving has resulted in the next generation of data that is beginning to enter companies.

Different from traditional data, next generation data will be a part of companies' daily routine.

For example, when we make a cellphone call, the relationship data (which includes the callers' name, phone number, and location) will have been collected. Likewise, the transactional data (which includes the time of the call and the duration of the call) will have been collected as well. Such kinds of big data are not limited to mobile data, GPS coordinates, location awareness data, and social interactions such as LinkedIn and

Facebook. The way that next generation data is captured through the cloud will definitely change the way we deal with traditional data. It's one thing to be flooded with big data; it's another thing to be able to make sense of it and then be able to act on it or make recommendations for a human or another system to act on it.6 Big data by itself is merely unstructured data, as we have to analyze the data in order to understand it. MDM and data governance processes will make the analysis more efficient. Through data governance's identity resolution, we can have a single view of an entire company's data. With data governance, we will not be drawn by next generation big data; however, we can understand their relationship and react on it quickly.

Big data and the cloud, which generates and delivers real time data, will require us to react in real time, while next generation data governance will help us with understanding and reacting to real time data.

In addition, unlike traditional data, big data may be owned by a number of brokers or a third- party. The next generation data governance process should also have the ability to accept

9 Applying Fuzzy Logic for Data Governance different protocols.

10 Applying Fuzzy Logic for Data Governance

CHAPTER2 – DATA GOVERNANCE PROCESS

Data Governance Process

Below is a diagram detailing the process of data governance by IBM: 6

Figure 1: Data Governance Process

Note. Descriptive note. Adapted from “ The IBM Data Governance Unified Process” by Sunil Soares, 2010, p8 Copyright 2010 by MC Press

Online,LLC. Adapted with permission

1) Define the business problem

The main reason for the failure of data governance programs is that they do not identify a tangible business problem. It is imperative that the organization defines the initial scope of the

11 Applying Fuzzy Logic for Data Governance data governance program around a specific business problem, such as a failed audit, a data breach, or the need for improved data quality for risk- management purposes. Once the data governance program begins to tackle the identified business problems, it will receive support from the business functions to extend its scope to additional areas.

2) Obtain executive sponsorship

It is important to establish sponsorship from key IT and business executives for the data governance program. The best way to obtain this sponsorship is to establish value in terms of a business case and quick hits. For example, the business case might be focused on house holding and name-matching in order to improve the quality of data to support a customer- centricity program.

3) Conduct a maturity assessment

Every organization needs to conduct an assessment of its data governance maturity, preferably on an annual basis. The IBM Data Governance Council has developed a maturity model based on 11 categories (discussed in Chapter 5), such as Data Risk Management and

Compliance, Value Creation, and Stewardship. The data governance organization needs to assess the company’s current level of maturity (current state) and the desired future level of maturity (future state). The company's future state is usually projected at a time frame spanning 12 to 18 months ahead. This duration must be long enough to produce results.

However, at the same time, it must be short enough to ensure the continued buy-in from key stakeholders.

12 Applying Fuzzy Logic for Data Governance

4) Build a road map

The data governance organization needs to develop a roadmap to bridge the gap between the current state and the desired future state for the eleven categories of data governance maturity. For example, the data governance organization might review the maturity gap for stewardship and determine that the enterprise needs to appoint data stewards who will focus on targeted subject areas such as the customer, vendor, and product. The data governance program also needs to include quick hit areas where the initiative can drive near-term business value.

5) Establish an organizational blueprint

The data governance organization needs to build a charter to govern its operations, and to ensure that it has enough authority to act as a tiebreaker in critical situations. Data governance organizations operate best in a three-tier format. The top tier is the data governance council, which consists of the key functional business leaders who rely on data as an enterprise asset. The middle tier is the data governance working group,which consists of middle managers. The final tier consists of the data stewardship community, which is responsible for the quality of the data on a day-to-day basis.

6) Build a data dictionary

The effective management of business terms can help ensure that the same descriptive

13 Applying Fuzzy Logic for Data Governance language applies throughout the organization. A data dictionary or business glossary is a repository with definitions of key terms. It is used to gain consistency and agreement between the technical and business sides of an organization. For example, what is the definition of a

“customer”? Is a customer someone who has made a purchase, or someone who is considering a purchase? Is a former employee still categorized as an “employee”? Are the terms “partner” and “reseller” synonymous? These questions can be answered by building a common data dictionary. Once implemented, the data dictionary can span the organization to ensure that business terms are tied via metadata to technical terms and that the organization has a single, common understanding.

7) Understand data

Someone once said, “You cannot govern what you do not first understand.” Few applications stand alone today. Rather, they are made up of systems, and “systems of systems”, with applications and databases across the enterprise, yet integrated, or at least interrelated. The relational model worsens the situation through the fragmentation of business entities for storage. However, how is everything related? The data governance team needs to discover the critical data relationships across the enterprise. Data discovery may include simple and hard-to-find relationships, as well as the locations of sensitive data within the enterprise’s IT systems.

8) Create a metadata repository

Metadata is data that has the purpose of giving information about other data. It is information regarding the characteristics of any data artifact, such as its technical name, business name,

14 Applying Fuzzy Logic for Data Governance location, perceived importance , and relationships to other data artifacts in the enterprise. The data governance program will generate a lot of business metadata from the data dictionary and a lot of technical metadata during the discovery phase. This metadata needs to be stored in a repository so that it can be shared and leveraged across multiple projects.

9) Define metrics:

Data governance needs to have robust metrics to measure and track progress. The data governance team must recognize that when something is measured, performance improves.

As a result, the data governance team must pick a few key performance indicators (KPIs) to measure the ongoing performance of the program. For example, a bank will want to assess the overall credit exposure by industry. In that case, the data governance program might select a percentage of null Standard Industry Classification (SIC) codes as a KPI, to track the quality of risk management information.

10) Govern master data

The most valuable information within an enterprise, which is critical data about customers, products, materials, vendors, and accounts, is commonly known as master data. Despite its importance, master data is often replicated and scattered across business processes, systems, and applications throughout the enterprise. Governing master data is an ongoing practice, whereby business leaders define the principles, policies, processes, business rules, and metrics for achieving business objectives, by managing the quality of their master data.

Challenges regarding master data tend to bedevil most organizations, but it is not always easy to get the right level of business sponsorship to fix the root cause of the issues. As a

15 Applying Fuzzy Logic for Data Governance result, it is important to justify an investment in a master data initiative. For example, consider an organization such as a bank, which sends multiple pieces of mail to the same household.

The bank can establish a quick return on investments by cleansing its customer data to create a single view of the “household.” The bottom line is that the vast majority of data governance programs deal with issues around data stewardship, data quality, master data, and compliance.

11) Govern analytics

Enterprises have invested huge sums of money to build data warehouses to gain competitive insight. However, these investments have not always yielded results. As a consequence, businesses are increasingly scrutinizing their investments. We define the “analytics governance” track as the setting of policies and procedures to better align business users with the investments in analytic infrastructure. Data governance organizations need to ask the following questions:

❏ How many users do we have for our data, by business area?

❏ How many reports do we create, by business area?

❏ Do the users derive value from these reports?

❏ How many report executions do we have per month?

❏ How long does it take to produce a new report?

❏ What is the cost of producing a new report?

❏ Can we train the users to produce their own reports?

Many organizations will want to set up a Business Intelligence Competency Centre (BICC) to educate users, increase business intelligence, and develop reports.

16 Applying Fuzzy Logic for Data Governance

12) Manage security and privacy

Data governance leaders, especially those who report to the chief information security officer, often have to deal with issues around data security and privacy.

Some of the common data security and privacy challenges include:

❏ Where is our sensitive data?

❏ Has the organization masked its sensitive data in non-production environments (for example, in development, testing, and training) to comply with privacy regulations?

❏ Are database audit controls in place to prevent privileged users, such as DBAs from accessing private data, such as employees' salaries and customer lists?

13) Govern the information lifecycle

Unstructured content makes up more than 80 percent of the data within the typical enterprise. As organizations move from data governance to information governance, they start to consider the governance of this unstructured content.

The lifecycle of information starts with data creation and ends with its removal from production. Data governance organizations have to deal with the following issues regarding the lifecycle of information:

❏ What is our policy regarding digitizing paper documents?

❏ What is our records management policy for paper documents, electronic documents, and email? (In other words, which documents do

17 Applying Fuzzy Logic for Data Governance we maintain as records and for how long?)

❏ How do we archive structured data to reduce storage costs and improve performance?

❏ How do we bring structured and unstructured data together under a common framework of policies and management?

14) Measure the results:

Data governance organizations must ensure continuous improvement by constantly monitoring metrics. In step nine, the data governance team sets up the metrics. In this step, the data governance team reports to senior stakeholders on the progress of those metrics from IT and the business.

Data Governance Business Application

Today, banking systems, establish and maintain line of business (LoB) specific customer views with associated accounts and product holdings – either in product systems or in LoB specific Customer Information Files (CIFs). Thus, the customer, account, and product relationship information resides in information silo applications. This limits the ability to understand the customer holistically (across LoBs) and does not provide an enterprise view of the customer.

The Master Data Management (MDM) initiative enables a complete 360 degree operational view of customers across the bank (enterprise goal). At the target state, the key capabilities of

MDM are to:

 Provide consistent and accurate data about essential business entities derived from a

single trusted source.

18 Applying Fuzzy Logic for Data Governance

 Uniquely identify a customer and all the associated relationships/holdings with the

bank, based on the customer's privacy preferences

To achieve the target state objective, the MDM solution will integrate/interface between the numerous LoB specific applications, consolidate the data, and create a single golden master record.

19 Applying Fuzzy Logic for Data Governance

Below is a typical data governance business (Master Data Management) application diagram:

Figure 2: MDM Process

20 Applying Fuzzy Logic for Data Governance

The solution overview diagram clearly depicts various sub-systems in the solution. At a high level, the entire solution is classified into following layers:

 Presentation Layer

 OCIF Sub-system

 Data Integration and Quality Layer

 Application Layer and

 Database Layer

Presentation Layer

The presentation layer of the solution essentially implies user interface applications. The following user interface applications are included:

 Reporting User Interface

 Data Stewardship User Interface

 Business Administration User Interface

The Reporting user interface will generate business and stewardship reports on the data available in MDM, the Data Stewardship user interface will provide various options for operating with customer information along with searching and handling duplicate or potential duplicate customers, and the Data Administration user interface will manage reference data and other meta-data in the MDM database.

21 Applying Fuzzy Logic for Data Governance

OCIF Sub-System

The OCIF is an existing authoritative operational source of customer information being used by multiple systems. This sub-system is presently being considered as a ‘Book of Record’ in the enterprise. The key objective of this system is creating and maintaining standardized and consistent customer information across the systems, reducing potential duplicate customers and improving customer data integrity significantly so that it can be treated as a single source of truth. In the current solution context, this system is considered as the only source systems from which customer information will be loaded into MDM data base. Based on the solution overview diagram, there will be two approaches of data synchronization between the systems. These are:

 The initial load – The entire content of the data base

 The delta load – The difference in content between the last day and the current day

To populate data into MDM from OCIF, an OCIF component/utility is required, which will extract required data. The new component that will be developed will be responsible for providing extracts on a daily basis which will be the input for downstream sub-systems to transform and load into MDM and thus synchronize two systems.

Information Integration Layer

Information Integration Layer – Data Stage

The information integration layer is a key component that is responsible for integrating OCIF and the MDM server application. The data format provided by OCIF is not compatible with the

22 Applying Fuzzy Logic for Data Governance

MDM server and hence it is not directly consumable. DataStage, being the part of the integration layer, is responsible for transforming OCIF extracts to an MDM specific format.

The key objectives of this layer is to:

 Read extracts provided by the source system

 Transform the extract in the required format based on a synchronization

mechanism

 Transform the extract file in the format required by the data quality component

for standardization during the initial load

 Transforming reference value to MDM specific codes depending on the source

system reference value

 Loading the transformed data into a data base/file

The IIS DataStage component is responsible for reading extracts from the source system, transforming them into SIF format and pushing data into MDM in two different ways:

 Directly into the database during the initial load

 Writing into files (ExSIF) for the delta load

The following sections detail the approaches to be followed in the ETL layer.

23 Applying Fuzzy Logic for Data Governance

The diagram below describes the high level steps to be performed in DataStage during the initial data load.

Figure 3: MDM Initial Load Process

1. Custom DataStage extract job will be developed to read extract files from the ETL

receiving zone and parse each record based on the record type and sub-type into

individual records of the SIF format, which is a pipe delimited standard interface file.

2. Validation jobs will be responsible for data standardization. It will also perform the SIN

validation and phone number validation. Any failed record information will be logged

into an error log file through error handling jobs.

3. An ETL job will be invoked to populate a separate file for standardization which will be

24 Applying Fuzzy Logic for Data Governance

used by QualityStage. The above steps will generate the SIF files for consumption of

the BIL jobs.

4. The BIL import job imports the SIF file for processing.

5. A validation job validates the code column value and invokes error handling framework

jobs in the case of failure. In such scenarios, these records which are a source of

issue, are dropped from the requested SIF file. Based on the strategy of the initial load,

the dropping of records is minimized to synchronize the MDM with the source system

at the highest degree.

6. The party referential integrity validation job ensures every party has either a valid

PersonName or OrgName record and also verifies that a valid party record exists for

the “Provided By” Source System Key (SSK).

7. The BIL consists of one job for each Record Type or Sub Type (RT/ST) that performs

key assignment and database loading. For example the Contact key assignment job

assigns CONT_ID, PERSON_ID, ORG_ID and CONTEQUIV_ID to CONTACT,

PERSON, ORG and CONTEQUIV records respectively and inserts them into the MDM

database. Before loading the records into MDM, an MDM Involved Party ID will be

generated within ETL jobs. At a high level, the new MDM Involved Party ID will be of

an 18 character length where the first 2 characters will imply the version of the BIL and

the last 16 characters will be a random number.

8. The data quality error consolidation process reads the data quality error files created

during the import SIF, validation, and referential integrity validation phases and drops

any records associated with the records in the error file.

25 Applying Fuzzy Logic for Data Governance

The diagram below describes the high level steps to be performed in DataStage during the delta data load.

Figure 4: MDM Delta Load Process

1. Custom DataStage extract jobs for the process of the initial load will be re-used to

read extract files from the ETL receiving zone and to parse each record based on the

record type or sub-type.

2. Data validation jobs are responsible for CII data standardization. It will also perform

SIN validation and phone number validation. Any failed records will be logged into the

26 Applying Fuzzy Logic for Data Governance

log file through an error handling mechanism. The above two steps essentially

generate the SIF files for consumption of the BIL asset.

3. The DataStage import job imports the SIF file for processing.

4. Applicable business transformation rules are invoked using DataStage transformation

jobs which are responsible for generating extended SIF files for MDM to consume.

Errors are logged using DataStage's out of box error handling mechanism for further

analysis and action.

Data Quality Management – Quality Stage

The master data hub solution is about providing complete, accurate, standardized information about the customers stored in the MDM system. Even though OCIF maintains its own data quality, customer attributes need further standardization before they are stored in MDM as it will be the single version of truth on customer data throughout the enterprise. The

QualityStage component is primarily responsible for data standardization, the improvement of overall quality of the data asset of the enterprise, and identification of duplicate/potentially duplicate customers. The current solution places QualityStage with the following objectives:

 Standardize name and address related attributes

 Validate and correct customers' addresses with the Canada post address

repository implemented through SERP

 Perform probabilistic matching to identify potential duplicate customers

The IIS QualityStage component is responsible for maintaining data quality stored in MDM.

27 Applying Fuzzy Logic for Data Governance

The key objective of QualityStage is:

 Name and address standardization

 Identifying duplicate/potentially duplicate customers

 Matching

28 Applying Fuzzy Logic for Data Governance

The diagram below describes the high level steps to be performed during the initial load.

Source Source to QS Extract SIF DS MDM Code MDM File Stan jobs Jobs conformation DB

Figure 5: Quality Stage Initial Load Process

The diagram below describes the high level steps to be performed during the delta load.

QS QS Stan Jobs Stan No Code Jobs value conformation

Source Source to MDM Code Extract MDM MDM SIF DS conformation File jobs DB

Figure 6: Quality Stage Delta Load Process

Individual Customer Name Standardization

This standardization procedure will receive an individual name from MDM before processing the individual name through the MNNAME rule set. The MNNAME rule set will parse the individual name into separate name elements and create an analysis value or phonetic representation value for the first and last name of the individual.

29 Applying Fuzzy Logic for Data Governance

For example:

If an individual by the name of “Mr William Chen” was passed to the individual standardization procedure, this would be the standardization result.

Organizational Customer Name Standardization

This standardization procedure will receive an organization name from MDM and process the organization name through the MNNAME rule set. The MNNAME rule set will parse the organization name into separate word elements and create an analysis value or phonetic representation value for word1 and word2 of the organization name.

For Example:

If the organization name of “Bank of Example” was passed to this organization standardization procedure, this would be the standardization result.

The important thing to note is that the original name feed into QualityStage from MDM will be passed back to MDM. QualityStage does not change or enhance the organization name in any way. QualityStage parses the name into smaller elements for matching purposes only.

MDM will receive the original name, the phonetic representation of organization name, and the standardized name.

Address Standardization

This standardization procedure will receive an address from MDM and process the address through the MDMCADDR and MDMCAAREA rule sets. The MDMCAADDR rule set will parse

30 Applying Fuzzy Logic for Data Governance the address name into separate address elements and create an analysis value or phonetic representation value for street name. The MDMCAAREA rule set will parse the city, province, and postal code into separate address elements and create an analysis value or phonetic representation value for the city name.

For example:

If the address of “123 Maple Street Unit 5 ” was passed to this address standardization procedure, this would be the standardization result.

Matching

In order to maintain data quality, adding and updating a customer will trigger the matching process.

Individual and organizational customers will be processed by different match specifications in

QualityStage, which consists of blocking parameters and scoring specifications for different passes.

The MDM service will provide QualityStage (QS) with a set of candidates by searching the

MDM database through blocking parameters for different passes. The QS matching process will compare and score each candidate and return the match result to the MDM.

In order to implement the match specification and respond to MDM requests, ISD job and shared containers are created for the interface.

31 Applying Fuzzy Logic for Data Governance

Application Layer

The MDM server is an application component of the solution that interfaces with the data source where the master data will be stored. It is also responsible for providing various features for managing and maintaining master data to keep the data source as single version of truth. The application is responsible for:

 Interfacing with the master data source through various protocols

 Managing master data through the exposed interfaces with other sub-

systems/external sources

 Controlling access in terms of data visibility and enhancing data security

 Identifying and providing information on potential candidate list of duplicate

customers to assist quality stage with figuring out detailed information on

customer duplication and storing them in the data source.

 Merging two/multiple customers to enforce MDM data source as single view of

the customer and a single version of truth.

 Provides a user interface to merge and maintain customer information while

duplicates are potential and not guaranteed.

 Provides a user interface to manage and configure metadata

Database Layer

The database layer in the solution is responsible for storing all the master data. It also stores the history data, audit, and meta-data required for the MDM application to execute. During the initial load, the database is populated directly by the information integration layer. Once the

32 Applying Fuzzy Logic for Data Governance initial population is successfully completed, daily extracts from source systems will be loaded into the MDM database through the MDM batch framework and maintenance services.

Apart from business data, the database layer also contains meta-data which is required for the MDM application. Meta-data is another key set of information which is configured for the

MDM application and controls the behaviour and functionality of the MDM application.

Data Quality Management in Detail

For example, if we have the following input file:

Profile Phone Party

File Name ID Name Address Number Type B2B Personal B2BPC 123 Main Street, Toronto, 416-549-

Cardholders 1 John Smith Ontario, Canada X1X1X1 7061 B2B Personal B2BPC 456 King Avenue, Calgary, 416-549-

Cardholders 2 ABC Limited Alberta, Y2Y2Y2 7061 B2B Personal B2BPC John and 123 Main Street, Toronto,

Cardholders 3 Jane Smith Ontario, Canada X1X1X1 B2B Personal B2BPC

Cardholders 4 A

There are several data quality requirements (baseline) that need to be followed in order to keep the data quality:

Requirement Description

33 Applying Fuzzy Logic for Data Governance

Name Formatting and Standardization

1. If a free form name, i.e. it is unparsed as a single string,is received

as an input, the MDM matching solution should parse or tokenize

the name to the common format required for processing. E.g.:

John Smith may need to be tokenized into First Name = John and

Last Name = Smith Address Formatting and Standardization

1. If a free form address, i.e. it is unparsed as a single string, is

received as an input, the MDM matching solution should parse or

tokenize the address to the common format required for

processing – for both Canadian and US addresses. E.g.: 123 Main

Street may need to be tokenized into Street Number = 123, Street

Name = Main Street. If the country code / name is missing in the

incoming files, the Canadian address standardization rules will be

applied as a default Address Validation and Correction

1. All addresses received in the input files should be validated and

corrected based on checks with Canada Post. In case of an

address correction, the address as provided by Canada Post will

be applied. Phone Number Formatting and Standardization

1. If a free form phone number, i.e. it is unparsed as a single string,

is received as an input, the MDM matching solution should parse

or tokenize the phone number to the common format required for

34 Applying Fuzzy Logic for Data Governance

processing. E.g.: 416-549-7061 may need to be tokenized into

Area Code = 416, Number = 549-7061 Name Patterns

1. The MDM matching solution should develop data processing rules

to handle the following patterns that may occur in the ‘name’

fields:

For individuals, the connectors that will identify such patterns are

a. Space And Space (e.g.: John And Jane Smith)

b. Space and Space (e.g.: John and Jane Smith)

c. & (e.g.: John&Jane Smith)

d. Space & Space (e.g.: John & Jane Smith)

e. / (e.g.: John/Jane Smith)

f. Space / space (e.g.: John / Jane Smith)

g. \ (e.g.: John\Jane Smith)

h. Space \ Space (e.g.: John \ Jane Smith)

For organizations, the connectors that will identify such patterns are

i. / (e.g.: John/ABC Limited)

j. Space / space (e.g.: John / ABC Limited)

k. \ (e.g.: John\ABC Limited)

l. Space \ Space (e.g.: John \ ABC Limited)

2. If a name pattern has either of the ‘And’, ‘and’ or ‘&’ connector, the

following requirements should be developed:

a. A lookup with the organization name directory should be

35 Applying Fuzzy Logic for Data Governance

performed.

b. If the name pattern matches with an organization name

from the directory, the record should not be split into

discrete records.

c. If the name pattern does not match with an organization

name from the directory, the record should be split into

discrete records.

Matching Process Requirement:

Req. ID Requirement Description FR Rules - Overview

2.3.1 1. The MDM matching rules should be designed and developed to match

incoming records across all input files – i.e. match all input files with

each other

2. The MDM matching rules should be designed and developed to match

the incoming records with all records stored in the MDM

FR Rules - List

2.3.2 The following matching rules should be designed and developed in the

MDM environment:

1. Rule 1: Individual Matching - Individual Full Name and Full Address

2. Rule 2: Organizational Matching - Organization Name and Full Address

3. Rule 3: Household Matching Individual - Last Name and Full Address

4. Rule 4: Address Matching - Full Address

36 Applying Fuzzy Logic for Data Governance

5. Rule 5: Phone Number Matching – Full Phone Number

NOTE: Each of the above matching rules should generate independent

match IDs/keys FR Rules - Data Elements

2.3.3 1. Full Name – When the matching rules are based on Full Names, the

following discrete data elements should be used:

a. First Name a.k.a Given Name

b. Last Name

c. Name Suffix

d. Organization Name (as applicable)

2. Full Address – When the matching rules are based on Full Address, the

following discrete data elements should be used:

a. Apartment / Unit Number

b. Street Number

c. Street Name

d. Street Type

e. City

f. Province

g. Postal Code

h. Country

i. Non Civic Address Info (as applicable)

3. Full Phone Number – When the matching rules are based on Full

Phone Number, the following discrete data elements should be used:

37 Applying Fuzzy Logic for Data Governance

a. Country Code

b. Area Code

c. Number

NOTE: The Individual Matching Process uses ‘Residential Primary’ or

equivalent addresses only, while Organizational Matching Process uses

‘Business Primary’ or equivalent addresses only

NOTE: Phonetic representations of first name, last name, and street name

are used by the current MDM matching process FR Rules – Guidelines

2.3.4 1. The corrected postal address should be used by the MDM matching

process.

2. Each record from each input file should undergo each of the 4 rules

stated above.

NOTE: For example, a record identified as ‘Individual’ should undergo the

Organizational match rule as well.

3. Wherever applicable, the Match IDs/keys as generated by the individual

matching rules should be cross referenced in the output files. E.g.: A

record could have an Individual Match Key as 123 and a Household

Match Key as 456.

4. A separate match ID / key should be generated for records within the

MDM that do not have a match with records in the input files. FR Rules – Weights, Thresholds and Categories

2.3.5 1. The MDM matching solution should be designed and developed for

38 Applying Fuzzy Logic for Data Governance

‘looser’ matching rules.

2. The weights and thresholds that are currently assigned in the MDM

environment should be used as a starting point for the design and

development.

3. The match categories that are currently identified in the MDM

environment should be used as a starting point for the design.

FR Rules – Error Condition

2.3.6 1. In case the incoming record was unable to be processed by the MDM

matching solution, it should be highlighted in the output file.

2. A description of the reason why the record could not undergo the

matching process should be included in the output file.

NOTE: These error descriptions should be as provided by the MDM

matching solution with no new requirements.

Output file:

Input Data Ad- Ad- Cor- dress dress rec- Valid- Valid- ted ation ation Ad- In- In- In- Se- or or dress put put Phon put quenc Cor- Cor- or Matc Input Pro- In- Ad- e Part e Spli rec- rec- Ad- File file put dres Num- y Num- t tion tion dress Pro Name ID Name s ber Type ber Name Indic- De- from cess

39 Applying Fuzzy Logic for Data Governance

scrip- ator tion MDM 123 123 Mai Main Stre Stree et, t, B2B Toro Toron Per- nto, to, sonal Onta Street Ontar Card- John rio, 416- John name io, hold- B2BP Smit X1X1 549- k> h rected found 1 cess 456 456 King King Aven- Aven ue, ue, Cal- B2B Calg gary, Per- ary, Al- sonal Albe Street berta Card- ABC rta, 416- ABC name , hold- B2BP Lim- Y2Y2 549- 1 ited rected found 2 cess B2B B2BP ABC 456 416- Lim- rected Code King cess sonal ited Aven 7061 ited Incor- Aven- Card- ue, rect ue, hold- Calg Cal- ers ary, gary, Albe Al-

40 Applying Fuzzy Logic for Data Governance

berta rta, , Y2Y2 Y5M2Y Y2 2 123 123 Main Main Stre Stree et, t, B2B Toro Toron Per- John nto, to, sonal and Onta Ontar Card- Jane rio, John io, hold- B2BP Smit X1X1 nk> 3 h Valid ate 1 cess 123 123 Main Main Stre Stree et, t, B2B Toro Toron Per- John nto, to, sonal and Onta Ontar Card- Jane rio, Jane io, hold- B2BP Smit X1X1 nk> 4 h Valid ate 1 cess B2B B2BP A nk> nk> k> nk> > fi- sonal cient Card- (or hold- blank) ers ad- dress

41 Applying Fuzzy Logic for Data Governance

in- forma- tion 789 Pop- lar Road, Ott- awa, Dav- Ontar id io, nk> nk> nk> nk> nk> 5 son > > 6 nk>

42 Applying Fuzzy Logic for Data Governance

CHAPTER 3 – ISSUES, CHALLENGES AND TRENDS

Unfortunately, during the MDM matching process, there are still processes that need human intervention, such as the following tasks:

The Potential Overlay Task:

A potential overlay occurs when a record is updated with information that is radically different from the data already in the record. For example, consider the situation illustrated below:

C 100 as Family Given Date of Kumar e party id Name Name Gender Birth Avenue,Address phone last modify Markham, Ontario, Canada 416-549- ### LEWIS JANE F 26-Jun-71 A2B2C2 7070 08/24/06 5 456 King Avenue, calgary, Alberta, 416-549- ### XIANG LINDA F 13-Jan-78 Y2Y2Y2 7070 02/28/98 Figure 7: Case 5

The data steward will mark the record as a potential overlay record because the ID field from both records are the same. However, when we look closely on these two records, we can find that Linda Xiang and Jane Lewis are clearly not the same person. The ID

388293023980000000 was created on Feb 28, 1998 and belongs to Linda Xiang. Somehow, on Aug 24, 2006, the record was updated. It now appears to belong to a woman named Jane

Lewis. It may have been caused by a common typographical data entry mistake in which

43 Applying Fuzzy Logic for Data Governance

Linda Xiang's record was open on the screen when the customer service representative started typing, not realizing that he or she was typing over someone else's data.

There are also some situations in which this scenario would be perfectly valid. In cases of events such as marriage, divorce, a move, or phone-number change, a person's data would change significantly enough to flag a potential overlay task by a data steward application.

Using data mining and fuzzy logic can automatically solve the potential overlay tasks.

Match Duplicate Suspects to Create a New Master Record:

As a solution to Data warehouse applications, data governance will match the records from multiple lines of business (LOB). There are situations where customers from multi LOBs may have similar names, addresses, and telephone numbers and may have fields with blank or values that are not available (N/A). For instance, see below:

44 Applying Fuzzy Logic for Data Governance

Family Given Date of Case party id Name Name Birth Address phone 987 Village Ave, Toronto, Ontario, Canada 416-222- ### VERKIN SMITH 5-Aug-60 T2T1C1 3333 3 987 Village Ave, Toronto, Ontario, Canada 416-222- ### VERKI SMITH 5-Aug-60 T2T1C1 3333 Figure 8: Case 3

The two records above have the same Given name, Date of Birth, Address, Phone number.

However, the Family Name is slightly different. Are these two records the same customer?

Data governance applications currently available in markets will stop here and waiting human intervention to identify. Through the application of data mining and fuzzy logic we would be able to identify such cases without human intervention and generate a single customer profile with the best data from all sources.

Link Related Records from Multiple Sources:

With overlays, the task verifies the existing records in the system. With duplicate suspects, the task gets rid of extra records. For this task, it links records between systems. The current data steward application available in the market may not able to automatically link such

45 Applying Fuzzy Logic for Data Governance records due to not having enough data in common, such the example below:

Family Given Date of Case party id Name Name Birth Address phone Source 123 Main Street, GUGGENH Toronto, EIM REAL Ontario, ESTATE Canada 647-123- ### LLC 10-May-98 X1X1X1 4567 Market 2 St Unit 10, GUGGENH Toronto, EIM REAL Ontario, ESTATE Canada 647-123- ### LLC 10-May-98 X1X1X1 2352 Auto Figure 9: Case 2

For the above example, when you first look at these two records, they appear to be the same company. However, if one looks closely, one can determine some differences. First, the address are different: “Unit 10” is only displayed in one of the record's address field. Second, the phone numbers are different. One is “647-123-4567” and the other is “647-123-2352”.

Data mining and fuzzy logic can automatically verify these two records are the same company and link them together.

46 Applying Fuzzy Logic for Data Governance

CHAPTER4 – FUZZY LOGIC

Traditional Logic:

Now let's suppose that we generate the following training set based on the Data Steward application output including the potential overlay task, duplicate suspects, and related records from multiple sources. We would have:

47 Family Given Date of Case party id Name Name Birth Address phone Source Class 10 Main Street, Applying Fuzzy LogicMarkham, for Data Governance Ontario, Canada 915-123- ### VERKIN YOUSSOU 5-Aug-60 X2Y1X1 4213 Mortgage 1 N 10 Main Street, Markham, Ontario, Canada 915-123- ### VERKIN JANE 5-Aug-60 X2Y1X1 4213 Auto 123 Main Street, GUGGENH Toronto, EIM REAL Ontario, ESTATE Canada 647-123- ### LLC 10-May-98 123X1X1X1 Main 4567 Market 2 Y St Unit 10, GUGGENH Toronto, EIM REAL Ontario, ESTATE Canada 647-123- ### LLC 10-May-98 X1X1X1 2352 Auto 987 Village Ave, Toronto, Ontario, Canada 416-222- ### VERKIN SMITH 5-Aug-60 T2T1C1 3333 Life 3 Y 987 Village Ave, Toronto, Ontario, Canada 416-222- ### V. SMITH 5-Aug-60 T2T1C1 3333 Auto 456 King CREATIVE Avenue, LEADERSH calgary, IM GROUM Alberta, 416-549- ### LTD 5-Aug-60 Y2Y2Y2 7070 4 Y 456 King CREATIVE Avenue, LEADERSH calgary, IM GROUM Alberta, 416-549- ### LTD 5-Aug-60 Y2Y2Y2 7070 100 Kumar Avenue, Markham, Ontario, Canada 416-549- 24/08/20 5 ### LEWIS JANE 26-Jun-71 A2B2C2 7070 06 N 456 King Avenue, calgary, Alberta, 416-549- 28/02/19 ### XIANG LINDA 13-Jan-78 Y2Y2Y2 7070 98 Figure 10: Cases

48 Applying Fuzzy Logic for Data Governance

Traditional logic - the idea that the outcome can only be either true or false, 1 or 0, right or wrong. This form of logic dates back to ancient Greece and is perfectly adequate to answer simple questions in single dimensions. For example, if A is 1 and B is 0 what is A AND B ? It can be extended, as is done in to more complex questions, as long as all the parts can be described using the same restricted alphabet of two symbols. Such logic is a deductive way of understanding consequences and is a highly valuable intellectual technique.

12

If we use the above traditional logic, we will get the following training set:

Family Given Date of case Name Name Birth Address phone class

1 T F T T T N

2 T T T F F Y

3 F T T T T Y

4 T T T T T Y

5 F F F F T N Figure 11: Training Set

Applying the information gain on the above training set, we will get the information gain on the attributes:

49 Applying Fuzzy Logic for Data Governance

m

Info(D)=−∑ pi log2( pi) i=1

v ∣D j∣ Info A(D)=∑ ×Info(D j) j=1 ∣D∣

Gain( A)=Info(D)−Info A (D)

2 2 3 3 Info(D)=− log − log =0.97 5 2 5 5 2 5

3 2 2 1 1 2 1 1 1 1 Info (D)= − log − log + − log − log =0.95 FamilyName 5( 3 2 3 3 2 3 ) 5 ( 2 2 2 2 2 2)

3 3 3 0 0 2 2 2 0 0 Info (D)= − log − log + − log − log =0 GivenName 5 ( 3 2 3 3 2 3 ) 5 ( 2 2 2 2 2 2) 4 3 3 1 1 1 1 1 0 0 Info (D)= − log − log + − log − log =0.65 DateofBirth 5 ( 4 2 4 4 2 4 ) 5 ( 1 2 1 1 2 1) 3 2 2 1 1 2 1 1 1 1 Info (D)= − log − log + − log − log =0.96 Address 5 ( 3 2 3 3 2 3) 5 ( 2 2 2 2 2 2 ) 4 2 2 2 2 1 0 0 1 1 Info (D)= − log − log + − log − log =0.8 Phone 5 ( 4 2 4 4 2 4 ) 5( 1 2 1 1 2 1 )

Hence, the gain in information from such a partitioning would be:

Gain(FamilyName)=Info(D)−InfoFamilyName (D)=0.97−0.95=0.02

Gain(GivenName)=Info(D)−InfoGivenName (D)=0.97−0=0.97 Gain(DateofBirth)=Info (D)−Info (D)=0.97−0.65=0.32 DateofBirth Courier Gain( Address)=Info(D)−Info Address(D)=0.97−0.96=0.01

Gain(Phone)=Info(D)−Info Phone (D)=0.97−0.8=0.17

50 Applying Fuzzy Logic for Data Governance

Since GivenName has the highest information gain among the attributes, it is selected as the splitting attribute. So we get the following decision tree:

Figure 12: Traditional Decision Tree

One of the issues about the above decision tree is the of the attributes. For example, is the name “John Smith” the same as “J.Smith”. The above model only provided two states for attributes which considered whether the Given Name is the same or the Given

Name is not the same. I will illustrate here to tackle the uncertainty associated with the description of knowledge by using fuzzy logic.

51 Applying Fuzzy Logic for Data Governance

Fuzzy Logic History

The term "fuzzy logic" was introduced with the 1965 proposal of the theory by Lotfi

A. Zadeh .[2][3] Fuzzy logic has been applied to many fields, from to . Fuzzy logic however had been studied since the 1920s as infinite-valued logic notably by Łukasiewicz and Tarski.[4]

The Basic Concept of Fuzzy Logic

Fuzzy mathematics forms a branch of mathematics related to the fuzzy set theory and

fuzzy logic . It started in 1965 after the publication of Lotfi Asker Zadeh 's seminal work Fuzzy sets.[1] A fuzzy subset A of a set X is a function A:X→L, where L is the interval [0,1]. This function is also called a membership function. A membership function is a generalization of a

characteristic function or an of a subset defined for L = {0,1}. More generally, one can use a complete lattice L in a definition of a fuzzy subset A . 9

A Fuzzy Implementation:

For each input and output selected, I define two or more membership functions (MF).

There is qualitative category for each one, for example: true or false. The shape of these functions can be diverse but I will work with a triangle, which needs three points to define one

MF of one variable. Below is the triangle for the variable GivenName:

52 Applying Fuzzy Logic for Data Governance

y true 1 false

x0 x1 x2 x3 x4

Figure 13: Fuzzy MF

If we take GivenName as a variable, 'true' as the triangle, and 'false' as the trapezoid ( see the figure above),

– the MF 'true' will be defined by three points : (x0, x1, x2) (x0 is any negative

value. )

– the MF 'false' will be defined by four points : (x1, x2, x3, x4) ( x4 is any positive

value > x3. This means that 'false' will be 1 after x2 infinite. )

We have the following MF for Given Name:

true x−x0 x2−x y(triangle) ( x; x0 ,x1, x2) =max min , ,0 ( ( x1−x0 x2−x1 ) )

false x−x1 x4−x y(trapezoid)(x ; x1, x2, x3, x4)=max mix ,1, ,0 ( ( x2−x1 x4−x3 ) )

For the Given Name variable, I use the Levenshtein distance to calculate the value of x:

53

Applying Fuzzy Logic for Data Governance

5, then the two values are somewhat similar. If the distance is greater than 5, then the two values are not the same at all.

After the above specification, we have the fuzzificate real value for GivenName. For example, for “kitten” and “sitting” with distance 3, we can get the fuzzificated

3−∞ 5−3 ytrue=max min , ,0 =0.4 ( ( 0−∞ 5−0 ) )

false 3−0 ∞−3 y =max min ,1, ,0 =0.6 ( ( 5−0 ∞−15 ) )

Decision Tree definition:

Now let's reconsider the decision tree we introduced before:

Figure 14: Traditional Decision Tree

For this simple case, we have the following rule based on the decision tree above:

IF GivenName is equal (T), THEN two records are equal.

Next is to compute the degree of membership to the MF (true, false) of the output (the THEN

55 Applying Fuzzy Logic for Data Governance part). Once a variable such as the Given Name is fuzzificated, it takes a value between 0 and

1, indicating the degree of membership to a given MF of the specific variable. The degrees of membership of the input variables have to be combined to get the degree of membership of the output. For a single input variable, such as the rule specified above, we can for example have a as shown below:

IF GivenName is equal (T), THEN two records are equal;

IF GivenName is not equal (F), THEN two records are not equal;

According to these rules, if we suppose that the degree of membership for GivenName is 0.6 to MF 'false', then the two records that are not equal are 0.6, too.

In case we have more than one input variable, the degree of membership for the output value will be the minimum value of the degree of membership for the different inputs. For example, suppose we have two input variables (GivenName X and Family Name Y) and the decision matrix below:

FamilyName equal not equal equal equal not equal GivenName not equal not equal not equal Figure 15: Decision Matrix

If we calculated the attributes as having the following fuzzificated values:

56 Applying Fuzzy Logic for Data Governance

equal yGivenName=0.8 notequal yFamilyName=0.9

Then we have the following rule satisfied:

IF GivenName is equal (degree of 0.8) and FamilyName is not equal (degree of 0.9) THEN the two records are not equal (degree of 0.8).

not equal yGivenName=0.8 equal yFamilyName=0.2

The following rule would also be satisfied:

IF GivenName is not equal (degree of 0.8) and FamilyName is equal (degree of 0.2) THEN the two records are not equal (degree of 0.2)

Brief Discussion:

In applying fuzzy logic to the data governance process, we can get a more accurate decision tree, which will enhance the decision making process. With the above example, using the traditional decision tree model, it has to be taken into consideration whether FamilyName and

GivenName are slightly different. If FamilyName and GivenName are different, the conclusion may be drawn that the two records belong to different persons. However, when we apply fuzzy logic, we may say that the records with FamilyName are not equal to some extent(let's say 20% not equal) and that GivenName is somewhat equal at 0.3 degrees. In that case, the records would be considered to belong to the same person based on fuzzificated logic.

Therefore, a more accurate result is gained.

57 Applying Fuzzy Logic for Data Governance

CHAPTER 5 - CONCLUSIONS

In this essay, the history of data governance was discussed, as well as current literature and the future of this process. The data governance process itself was then explained wherein it was found that the central point of data governance is related to data quality. In order to improve the data quality of the master data repository, fuzzy logic was applied to the data governance process. With data governance constantly evolving , we have the requirement to guarantee the quality of data governance. Applying fuzzy logic will definitely help to improve the quality of data governance. Fuzzy logic will not only improve the data quality process, but it will actually also improve the process automation.

58 Applying Fuzzy Logic for Data Governance

REFERENCE

1. Data Governance (November 7, 2013). In Wikimedia, the free encyclopedia. Retrieved December 5, 2013, from http://en.wikipedia.org/wiki/Data_governance 2. A Brief History of Data Quality (March 25, 2009). Data Governance Insider: Covering the world of big data and data governance. Retrieved from http://data- governance.blogspot.ca/2009/03/brief-history-of-data-quality.html 3. Nigel Turner (Nov 15, 2013). Kindling the Flames: The Future of Data Governance. Retrieved December 11, 2013, from http://smartdatacollective.com/dat- mai/167531/kindling-flames-future-data-governance 4. Rick Sherman (2011) A must to avoid: Worst practices in enterprise data governance. Retrieved from http://searchdatamanagement.techtarget.com/feature/A-must-to-avoid- Worst-practices-in-enterprise-data-governance 5. Marketing Data Governance in the Era of “Big Data” Retrieved from http://www.kbmg.- com/wp-content/uploads/2013/07/Winterberry-Group-White-Paper-Market- ing-Data-Governance-July-2013.pdf 6. Sunil Soares (Sept 2010). The IBM Data Governance Unified Process. Ketchum, USA: MC Press Online, LLC 7. Julie Langenkamp-Muenkel (Oct 2013). MDM and Next-Generation Data Sources. Information Management 8. Huey-Li Chen, Long-Hui Chen and chien-Yu Huang (2009). Fuzzy Goal Programming Approach To Solve The Equipments-Purchasing Problem of AN FMC. International Journal of Industrial , 16(4), 270-281, 2009 9. Fazzy Mathematics ( Nov 28, 2013). In Wikimedia, the free encyclopedia. Retrieved Feb 2, 2014, from http://en.wikipedia.org/wiki/Fuzzy_mathematics 10. A Fuzzy implementation. Retrieved Nov 15, 2014 from http://apps.ensic.inpl- nancy.fr/benchmarkWWTP/RiskAnalysis/RiskWeb/RiskModule_070423_fichiers/Fuzzy _implementation_070423.pdf 11. Risk Analysis (April 2007) Retrieved Nov 15, 2014 from http://apps.ensic.inpl- nancy.fr/benchmarkWWTP/RiskAnalysis/RiskWeb/RiskModule_070423_fichiers/

59 Applying Fuzzy Logic for Data Governance

12.Fuzzy Multidimensional Logic (March 2004). Retrieved Feb 18, 2014 from http://www.calresco.org/lucas/fuzzy.htm 13.Levenshtein distance ( Feb, 2014 ). Retrieved Feb 19, 2014 from http://en.wikipedia.org/wiki/Levenshtein_distance 14.Adler. Big Data Governance Maturity (March 2012). Retrieved Feb 23, 2014 from https://www.ibm.com/developerworks/community/blogs/adler/entry/big_data_governan ce_maturity?lang=en 15.DataFlux Data Management. The Intersection of Big Data, Data Governance and MDM. Retrieved Feb 23, 2014 from http://digital.info-mgmt.com/info- mgmt/DataFlux_SAS2012#pg1 16. Sammon, D. and Adam, F. “Making Sense of the Master Data Management (MDM) Concept: Old Wine in New Bottles or New Wine in Old Bottles?” Proceedings of the 2010 conference on Bridging the Socio-technical Gap in Decision Support Systems: Challenges for the Next Decade Pages 175-186

60