ATHABASCA UNIVERSITY Applying Fuzzy Logic for Data Governance

ATHABASCA UNIVERSITY Applying Fuzzy Logic for Data Governance BY XiaoHai Lu A project submitted in partial fulfillment Of the requirements for the degree of MASTER OF SCIENCE in INFORMATION SYSTEMS Athabasca, Alberta November, 2014 © XiaoHai Lu, 2014 DEDICATION This essay is dedicated to my supported wife Winnie and my boys Andrew and Michale. ABSTRACT Every day, as we browse the internet, we consume big data from the various search engines and social networks that we visit. Like individuals, enterprises also confront a vast stream of information from individuals, communities, corporations, and governments. With vast volumes of information, long retention cycles and high velocity decision-making has the potential to derail the usefulness of information and do more damage than good to enterprises. The axiom 'better data means better decisions' becomes critical. Without solid data governance in place, data can be inaccurate and unfit for usage. This essay will describe the history and future of data governance. It will also explain the current process of data governance before demonstrating a prototype of a data governance application in the banking industry. Data governance processes such as matching and linking related records require mathematical support in the decision-making process. Fuzzy logic, which is a approach to computing that is based on varying degrees of truth, was found to be a good solution to this issue. As such, this essay successfully applies fuzzy logic to overcome and improve the process, reduce human intervention, and improve the data quality of data governance processes. 3 ACKNOWLEDGMENTS I thank all who were involved in the support and review process of this book. Without their support, the essay could not have been satisfactorily completed. Thanks go to all those who provided their insightful and constructive comments, in particular, to professor Richard Huntrods of Athabasca University who provided priceless suggestions and feedback on my essay. 4 Applying Fuzzy Logic for Data Governance Table of Contents DEDICATION...........................................................................................................................................2 ABSTRACT...............................................................................................................................................3 ACKNOWLEDGMENTS.........................................................................................................................4 CHAPTER 1 – INTRODUCTION............................................................................................................7 Data Governance: The History..............................................................................................................7 Data Governance: The current literature on the topic...........................................................................8 Data Governance: The Future...............................................................................................................9 CHAPTER2 – DATA GOVERNANCE PROCESS.................................................................................11 Data Governance Process....................................................................................................................11 CHAPTER 3 – ISSUES, CHALLENGES AND TRENDS.....................................................................43 The Potential Overlay Task:................................................................................................................43 Match Duplicate Suspects to Create a New Master Record:...............................................................44 Link Related Records from Multiple Sources:....................................................................................45 CHAPTER4 – FUZZY LOGIC................................................................................................................48 Traditional Logic:................................................................................................................................48 Fuzzy Logic History............................................................................................................................51 The Basic Concept of Fuzzy Logic ....................................................................................................52 A Fuzzy Implementation:....................................................................................................................52 Brief Discussion:.................................................................................................................................57 CHAPTER 5 - CONCLUSIONS.............................................................................................................57 References................................................................................................................................................58 5 Applying Fuzzy Logic for Data Governance List of Figures Figure 1: Data Governance Process.........................................................................................................11 Figure 2: MDM Process...........................................................................................................................20 Figure 3: MDM Initial Load Process.......................................................................................................24 Figure 4: MDM Delta Load Process........................................................................................................26 Figure 5: Quality Stage Initial Load Process...........................................................................................29 Figure 6: Quality Stage Delta Load Process............................................................................................29 Figure 7: Case 5.......................................................................................................................................43 Figure 8: Case 3.......................................................................................................................................45 Figure 9: Case 2.......................................................................................................................................46 Figure 10: Cases ......................................................................................................................................47 Figure 11: Training Set.............................................................................................................................49 Figure 12: Traditional Decision Tree.......................................................................................................51 Figure 13: Fuzzy MF................................................................................................................................52 Figure 14: Traditional Decision Tree.......................................................................................................55 Figure 15: Decision Matrix......................................................................................................................56 6 Applying Fuzzy Logic for Data Governance CHAPTER 1 ± INTRODUCTION Data Governance: The History Data governance is an emerging discipline with an ever evolving definition. The discipline embodies a convergence of data quality, data management, data policies, business process management, and risk management surrounding the handling of data in an organization.1 The central point of this definition of data governance is related to data quality. From the point of view of businesses, data governance needs to be able to provide qualified information. The data governance process is the practice of transforming data into qualified information, which can be used by businesses. Incidentally, the concept of data governance has been around since the beginning of relational databases. Data is stored across referenced tables. Businesses can retrieve information by joining the data through cross referencing those tables. With the growth of information technology, databases are gradually becoming central part of information systems. In order to insert qualified data into databases, data governance is extended from databases into a set of processes which are defined as extracting, transforming, and loading (ETL) areas in order to provide databases with clean, accurate, and prompt data feeds. New terms such as metadata, data source, target, and staging are emerging with the ETL approach. There are numerous ETL tools available on the market such as Informatica and Ab initio. However, the motivation for ETL comes from an information technology (IT) perspective and focuses on IT techniques. In 2004, IBM started to introduce data governance as a discipline for treating data as an enterprise asset, 3. As a financial asset, data has to be treated like other financial assets — just as one would treat a plant and equipment. Data inventory is required for enterprises with existing data, in as much 7 Applying Fuzzy Logic for Data Governance the same way as inventories are needed for physical assets. Preventing unauthorized data changes for critical data, should also be considered since this can affect the integrity of financial reporting, as well as the quality and reliability of daily business decisions.3 Protecting sensitive data and intellectual information property from both internal and external threats is also another element that falls under data governance. Since

Load more