ATHABASCA UNIVERSITY Applying Fuzzy Logic for Data Governance

Total Page:16

File Type:pdf, Size:1020Kb

ATHABASCA UNIVERSITY Applying Fuzzy Logic for Data Governance ATHABASCA UNIVERSITY Applying Fuzzy Logic for Data Governance BY XiaoHai Lu A project submitted in partial fulfillment Of the requirements for the degree of MASTER OF SCIENCE in INFORMATION SYSTEMS Athabasca, Alberta November, 2014 © XiaoHai Lu, 2014 DEDICATION This essay is dedicated to my supported wife Winnie and my boys Andrew and Michale. ABSTRACT Every day, as we browse the internet, we consume big data from the various search engines and social networks that we visit. Like individuals, enterprises also confront a vast stream of information from individuals, communities, corporations, and governments. With vast volumes of information, long retention cycles and high velocity decision-making has the potential to derail the usefulness of information and do more damage than good to enterprises. The axiom 'better data means better decisions' becomes critical. Without solid data governance in place, data can be inaccurate and unfit for usage. This essay will describe the history and future of data governance. It will also explain the current process of data governance before demonstrating a prototype of a data governance application in the banking industry. Data governance processes such as matching and linking related records require mathematical support in the decision-making process. Fuzzy logic, which is a approach to computing that is based on varying degrees of truth, was found to be a good solution to this issue. As such, this essay successfully applies fuzzy logic to overcome and improve the process, reduce human intervention, and improve the data quality of data governance processes. 3 ACKNOWLEDGMENTS I thank all who were involved in the support and review process of this book. Without their support, the essay could not have been satisfactorily completed. Thanks go to all those who provided their insightful and constructive comments, in particular, to professor Richard Huntrods of Athabasca University who provided priceless suggestions and feedback on my essay. 4 Applying Fuzzy Logic for Data Governance Table of Contents DEDICATION...........................................................................................................................................2 ABSTRACT...............................................................................................................................................3 ACKNOWLEDGMENTS.........................................................................................................................4 CHAPTER 1 – INTRODUCTION............................................................................................................7 Data Governance: The History..............................................................................................................7 Data Governance: The current literature on the topic...........................................................................8 Data Governance: The Future...............................................................................................................9 CHAPTER2 – DATA GOVERNANCE PROCESS.................................................................................11 Data Governance Process....................................................................................................................11 CHAPTER 3 – ISSUES, CHALLENGES AND TRENDS.....................................................................43 The Potential Overlay Task:................................................................................................................43 Match Duplicate Suspects to Create a New Master Record:...............................................................44 Link Related Records from Multiple Sources:....................................................................................45 CHAPTER4 – FUZZY LOGIC................................................................................................................48 Traditional Logic:................................................................................................................................48 Fuzzy Logic History............................................................................................................................51 The Basic Concept of Fuzzy Logic ....................................................................................................52 A Fuzzy Implementation:....................................................................................................................52 Brief Discussion:.................................................................................................................................57 CHAPTER 5 - CONCLUSIONS.............................................................................................................57 References................................................................................................................................................58 5 Applying Fuzzy Logic for Data Governance List of Figures Figure 1: Data Governance Process.........................................................................................................11 Figure 2: MDM Process...........................................................................................................................20 Figure 3: MDM Initial Load Process.......................................................................................................24 Figure 4: MDM Delta Load Process........................................................................................................26 Figure 5: Quality Stage Initial Load Process...........................................................................................29 Figure 6: Quality Stage Delta Load Process............................................................................................29 Figure 7: Case 5.......................................................................................................................................43 Figure 8: Case 3.......................................................................................................................................45 Figure 9: Case 2.......................................................................................................................................46 Figure 10: Cases ......................................................................................................................................47 Figure 11: Training Set.............................................................................................................................49 Figure 12: Traditional Decision Tree.......................................................................................................51 Figure 13: Fuzzy MF................................................................................................................................52 Figure 14: Traditional Decision Tree.......................................................................................................55 Figure 15: Decision Matrix......................................................................................................................56 6 Applying Fuzzy Logic for Data Governance CHAPTER 1 ± INTRODUCTION Data Governance: The History Data governance is an emerging discipline with an ever evolving definition. The discipline embodies a convergence of data quality, data management, data policies, business process management, and risk management surrounding the handling of data in an organization.1 The central point of this definition of data governance is related to data quality. From the point of view of businesses, data governance needs to be able to provide qualified information. The data governance process is the practice of transforming data into qualified information, which can be used by businesses. Incidentally, the concept of data governance has been around since the beginning of relational databases. Data is stored across referenced tables. Businesses can retrieve information by joining the data through cross referencing those tables. With the growth of information technology, databases are gradually becoming central part of information systems. In order to insert qualified data into databases, data governance is extended from databases into a set of processes which are defined as extracting, transforming, and loading (ETL) areas in order to provide databases with clean, accurate, and prompt data feeds. New terms such as metadata, data source, target, and staging are emerging with the ETL approach. There are numerous ETL tools available on the market such as Informatica and Ab initio. However, the motivation for ETL comes from an information technology (IT) perspective and focuses on IT techniques. In 2004, IBM started to introduce data governance as a discipline for treating data as an enterprise asset, 3. As a financial asset, data has to be treated like other financial assets — just as one would treat a plant and equipment. Data inventory is required for enterprises with existing data, in as much 7 Applying Fuzzy Logic for Data Governance the same way as inventories are needed for physical assets. Preventing unauthorized data changes for critical data, should also be considered since this can affect the integrity of financial reporting, as well as the quality and reliability of daily business decisions.3 Protecting sensitive data and intellectual information property from both internal and external threats is also another element that falls under data governance. Since
Recommended publications
  • A Philosophical Treatise on the Connection of Scientific Reasoning
    mathematics Review A Philosophical Treatise on the Connection of Scientific Reasoning with Fuzzy Logic Evangelos Athanassopoulos 1 and Michael Gr. Voskoglou 2,* 1 Independent Researcher, Giannakopoulou 39, 27300 Gastouni, Greece; [email protected] 2 Department of Applied Mathematics, Graduate Technological Educational Institute of Western Greece, 22334 Patras, Greece * Correspondence: [email protected] Received: 4 May 2020; Accepted: 19 May 2020; Published:1 June 2020 Abstract: The present article studies the connection of scientific reasoning with fuzzy logic. Induction and deduction are the two main types of human reasoning. Although deduction is the basis of the scientific method, almost all the scientific progress (with pure mathematics being probably the unique exception) has its roots to inductive reasoning. Fuzzy logic gives to the disdainful by the classical/bivalent logic induction its proper place and importance as a fundamental component of the scientific reasoning. The error of induction is transferred to deductive reasoning through its premises. Consequently, although deduction is always a valid process, it is not an infallible method. Thus, there is a need of quantifying the degree of truth not only of the inductive, but also of the deductive arguments. In the former case, probability and statistics and of course fuzzy logic in cases of imprecision are the tools available for this purpose. In the latter case, the Bayesian probabilities play a dominant role. As many specialists argue nowadays, the whole science could be viewed as a Bayesian process. A timely example, concerning the validity of the viruses’ tests, is presented, illustrating the importance of the Bayesian processes for scientific reasoning.
    [Show full text]
  • Machine Learning in Scientometrics
    DEPARTAMENTO DE INTELIGENCIA ARTIFICIAL Escuela Tecnica´ Superior de Ingenieros Informaticos´ Universidad Politecnica´ de Madrid PhD THESIS Machine Learning in Scientometrics Author Alfonso Iba´nez˜ MS Computer Science MS Artificial Intelligence PhD supervisors Concha Bielza PhD Computer Science Pedro Larranaga˜ PhD Computer Science 2015 Thesis Committee President: C´esarHerv´as Member: Jos´eRam´onDorronsoro Member: Enrique Herrera Member: Irene Rodr´ıguez Secretary: Florian Leitner There are no secrets to success. It is the result of preparation, hard work, and learning from failure. Acknowledgements Ph.D. research often appears a solitary undertaking. However, it is impossible to maintain the degree of focus and dedication required for its completion without the help and support of many people. It has been a difficult long journey to finish my Ph.D. research and it is of justice to cite here all of them. First and foremost, I would like to thank Concha Bielza and Pedro Larra~nagafor being my supervisors and mentors. Without your unfailing support, recommendations and patient, this thesis would not have been the same. You have been role models who not only guided my research but also demonstrated your enthusiastic research attitudes. I owe you so much. Whatever research path I do take, I will be prepared because of you. I would also like to express my thanks to all my friends and colleagues at the Computa- tional Intelligence Group who provided me with not only an excellent working atmosphere and stimulating discussions but also friendships, care and assistance when I needed. My special thank-you goes to Rub´enArma~nanzas,Roberto Santana, Diego Vidaurre, Hanen Borchani, Pedro L.
    [Show full text]
  • Application of Fuzzy Query Based on Relation Database Dongmei Wei Liangzhong Yi Zheng Pei
    Application of Fuzzy Query Based on Relation Database Dongmei Wei Liangzhong Yi Zheng Pei School of Mathematics & Computer Engineering, Xihua University, Chengdu 610039, China Abstract SELECT <list of ¯elds> FROM <list of tables> ; The traditional query in relation database is unable WHERE <attribute> in <multi-valued attribute> to satisfy the needs for dealing with fuzzy linguis- tic values. In this paper, a new data query tech- SELECT <list of ¯elds> FROM <list of tables>; nique combined fuzzy theory and SQL is provided, WHERE NOT <condition> and the query can be implemented for fuzzy linguis- tic values query via a interface to Microsoft Visual SELECT <list of ¯elds> FROM <list of tables>; Foxpro. Here, we applied it to an realism instance, WHERE <subcondition> AND <subcondition> questions could be expressed by fuzzy linguistic val- ues such as young, high salary, etc, in Employee SELECT <list of ¯elds> FROM <list of tables>; relation database. This could be widely used to WHERE <subcondition> OR <subcondition> realize the other fuzzy query based on database. However, the Complexity is limited in precise data processing and is unable to directly express Keywords: Fuzzy query, Relation database, Fuzzy fuzzy concepts of natural language. For instance, theory, SQL, Microsoft visual foxpro in employees relation database, to deal with a query statement like "younger, well quali¯ed or better 1. Introduction performance ", it is di±cult to construct SQL be- cause the query words are fuzzy expressions. In Database management systems(DBMS) are ex- order to obtain query results, there are two basic tremely useful software products which have been methods of research in the use of SQL Combined used in many kinds of systems [1]-[3].
    [Show full text]
  • On Fuzzy and Rough Sets Approaches to Vagueness Modeling⋆ Extended Abstract
    From Free Will Debate to Embodiment of Fuzzy Logic into Washing Machines: On Fuzzy and Rough Sets Approaches to Vagueness Modeling⋆ Extended Abstract Piotr Wasilewski Faculty of Mathematics, Informatics and Mechanics University of Warsaw Banacha 2, 02-097 Warsaw, Poland [email protected] Deep scientific ideas have at least one distinctive property: they can be applied both by philosophers in abstract fundamental debates and by engineers in concrete practi- cal applications. Mathematical approaches to modeling of vagueness also possess this property. Problems connected with vagueness have been discussed at the beginning of XXth century by philosophers, logicians and mathematicians in developing foundations of mathematics leading to clarification of logical semantics and establishing of math- ematical logic and set theory. Those investigations led also to big step in the history of logic: introduction of three-valued logic. In the second half of XXth century some mathematical theories based on vagueness idea and suitable for modeling vague con- cepts were introduced, including fuzzy set theory proposed by Lotfi Zadeh in 1965 [16] and rough set theory proposed by Zdzis¸saw Pawlak in 1982 [4] having many practical applications in various areas from engineering and computer science such as control theory, data mining, machine learning, knowledge discovery, artificial intelligence. Concepts in classical philosophy and in mathematics are not vague. Classical the- ory of concepts requires that definition of concept C hast to provide exact rules of the following form: if object x belongs to concept C, then x possess properties P1,P2,...,Pn; if object x possess properties P1,P2,...,Pn, then x belongs to concept C.
    [Show full text]
  • Research Hotspots and Frontiers of Product R&D Management
    applied sciences Article Research Hotspots and Frontiers of Product R&D Management under the Background of the Digital Intelligence Era—Bibliometrics Based on Citespace and Histcite Hongda Liu 1,2, Yuxi Luo 2, Jiejun Geng 3 and Pinbo Yao 2,* 1 School of Economics & Management, Tongji University, Shanghai 200092, China; [email protected] or [email protected] 2 School of Management, Shanghai University, Shanghai 200444, China; [email protected] 3 School of Management, Shanghai International Studies University, Shanghai 200333, China; [email protected] * Correspondence: [email protected] Abstract: The rise of “cloud-computing, mobile-Internet, Internet of things, big-data, and smart-data” digital technology has brought a subversive revolution to enterprises and consumers’ traditional patterns. Product research and development has become the main battlefield of enterprise compe- Citation: Liu, H.; Luo, Y.; Geng, J.; tition, facing an environment where challenges and opportunities coexist. Regarding the concepts Yao, P. Research Hotspots and and methods of product R&D projects, the domestic start was later than the international ones, and Frontiers of Product R&D many domestic companies have also used successful foreign cases as benchmarks to innovate their Management under the Background management methods in practice. “Workers must first sharpen their tools if they want to do their of the Digital Intelligence jobs well”. This article will start from the relevant concepts of product R&D projects and summarize Era—Bibliometrics Based on current R&D management ideas and methods. We combined the bibliometric analysis software Citespace and Histcite. Appl. Sci. Histcite and Citespace to sort out the content of domestic and foreign literature and explore the 2021, 11, 6759.
    [Show full text]
  • Download (Accessed on 30 July 2016)
    data Article Earth Observation for Citizen Science Validation, or Citizen Science for Earth Observation Validation? The Role of Quality Assurance of Volunteered Observations Didier G. Leibovici 1,* ID , Jamie Williams 2 ID , Julian F. Rosser 1, Crona Hodges 3, Colin Chapman 4, Chris Higgins 5 and Mike J. Jackson 1 1 Nottingham Geospatial Science, University of Nottingham, Nottingham NG7 2TU, UK; [email protected] (J.F.R.); [email protected] (M.J.J.) 2 Environment Systems Ltd., Aberystwyth SY23 3AH, UK; [email protected] 3 Earth Observation Group, Aberystwyth University Penglais, Aberystwyth SY23 3JG, UK; [email protected] 4 Welsh Government, Aberystwyth SY23 3UR, UK; [email protected] 5 EDINA, University of Edinburgh, Edinburgh EH3 9DR, UK; [email protected] * Correspondence: [email protected] Received: 28 August 2017; Accepted: 19 October 2017; Published: 23 October 2017 Abstract: Environmental policy involving citizen science (CS) is of growing interest. In support of this open data stream of information, validation or quality assessment of the CS geo-located data to their appropriate usage for evidence-based policy making needs a flexible and easily adaptable data curation process ensuring transparency. Addressing these needs, this paper describes an approach for automatic quality assurance as proposed by the Citizen OBservatory WEB (COBWEB) FP7 project. This approach is based upon a workflow composition that combines different quality controls, each belonging to seven categories or “pillars”. Each pillar focuses on a specific dimension in the types of reasoning algorithms for CS data qualification.
    [Show full text]
  • A Framework for Software Modelling in Social Science Re- Search
    A FRAMEWORK FOR SOFTWARE MODELLING IN SOCIAL SCIENCE RESEARCH by Piper J. Jackson B.Sc., Simon Fraser University, 2005 B.A. (Hons.), McGill University, 1996 a Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the School of Computing Science Faculty of Applied Sciences c Piper J. Jackson 2013 SIMON FRASER UNIVERSITY Summer 2013 All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for \Fair Dealing." Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately. APPROVAL Name: Piper J. Jackson Degree: Doctor of Philosophy Title of Thesis: A Framework for Software Modelling in Social Science Re- search Examining Committee: Dr. Steven Pearce Chair Dr. Uwe Gl¨asser Senior Supervisor Professor Dr. Vahid Dabbaghian Supervisor Adjunct Professor, Mathematics Associate Member, Computing Science Dr. Lou Hafer Internal Examiner Associate Professor Dr. Nathaniel Osgood External Examiner Associate Professor, University of Saskatchewan Date Approved: April 29, 2013 ii Partial Copyright Licence iii Abstract Social science is critical to decision making at the policy level. Software modelling and sim- ulation are innovative computational methods that provide alternative means of developing and testing theory relevant to policy decisions. Software modelling is capable of dealing with obstacles often encountered in traditional social science research, such as the difficulty of performing real-world experimentation. As a relatively new science, computational research in the social sciences faces significant challenges, both in terms of methodology and accep- tance.
    [Show full text]
  • Leveraging the Power of Place in Citizen Science for Effective Conservation Decision Making
    BIOC-06887; No of Pages 10 Biological Conservation xxx (2016) xxx–xxx Contents lists available at ScienceDirect Biological Conservation journal homepage: www.elsevier.com/locate/bioc Leveraging the power of place in citizen science for effective conservation decision making G. Newman a,⁎,1,2, M. Chandler b,1,2,M.Clydec,1,2,B.McGreavyd,1,M.Haklaye,H.Ballardf, S. Gray g, R. Scarpino a,2, R. Hauptfeld a,2,D.Mellorh,J.Galloi,1 a Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, CO 80523, USA b Earthwatch Institute, Allston, MA 02134, USA c University of New Hampshire, Cooperative Extension, Durham, NH 03824-2500, USA d University of Maine, Department of Communication and Journalism, Orono, ME 04469, USA e University College London, WC1E 6BT London, UK f UC Davis, School of Education, Davis, CA 95616, USA g Department of Community Sustainability, Michigan State University, East Lansing, MI 48823, USA h Center for Open Science, Charlottesville, VA 22903, USA i Conservation Biology Institute, Corvallis, OR 97333, USA article info abstract Article history: Many citizen science projects are place-based - built on in-person participation and motivated by local conserva- Received 16 October 2015 tion. When done thoughtfully, this approach to citizen science can transform humans and their environment. De- Received in revised form 19 May 2016 spite such possibilities, many projects struggle to meet decision-maker needs, generate useful data to inform Accepted 17 July 2016 decisions, and improve social-ecological resilience. Here, we define leveraging the ‘power of place’ in citizen sci- Available online xxxx ence, and posit that doing this improves conservation decision making, increases participation, and improves community resilience.
    [Show full text]
  • A Survey of Fuzzy Systems Software: Taxonomy, Current Research Trends, and Prospects
    A Survey of Fuzzy Systems Software: Taxonomy, Current Research Trends, and Prospects Jes´us Alcal´a-Fdez1 Jose M. Alonso2 Abstract Fuzzy systems have been used widely thanks to their ability to successfully solve a wide range of problems in different application fields. However, their replication and application requires a high level of knowledge and experience. Furthermore, few researchers publish the software and/or source code associated with their proposals, which is a major obstacle to scientific progress in other disciplines and in industry. In recent years, most fuzzy system software has been developed in order to facilitate the use of fuzzy systems. Some software is commercially distributed but most software is available as free and open source software, reducing such obstacles and providing many advantages: quicker detection of errors, innovative applications, faster adoption of fuzzy systems, etc. In this paper, we present an overview of freely available and open source fuzzy systems software in order to provide a well-established framework that helps researchers to find existing proposals easily and to develop well founded future work. To accomplish this, we propose a two-level taxonomy and we describe the main contributions related to each field. Moreover, we provide a snapshot of the status of the publications in this field according to the ISI Web of Knowledge. Finally, some considerations regarding recent trends and potential research directions are presented. Key words: Fuzzy logic, fuzzy systems, fuzzy systems software, software for applications, software engineering, educational software, open source software. 1. Introduction Fuzzy systems are one of the most important areas for the application of fuzzy set theory [1].
    [Show full text]
  • Fuzzy Sets, Fuzzy Logic and Their Applications • Michael Gr
    Fuzzy Sets, Fuzzy Logic and Their Applications • Michael Gr. Voskoglou • Michael Gr. Fuzzy Sets, Fuzzy Logic and Their Applications Edited by Michael Gr. Voskoglou Printed Edition of the Special Issue Published in Mathematics www.mdpi.com/journal/mathematics Fuzzy Sets, Fuzzy Logic and Their Applications Fuzzy Sets, Fuzzy Logic and Their Applications Special Issue Editor Michael Gr. Voskoglou MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin Special Issue Editor Michael Gr. Voskoglou Graduate Technological Educational Institute of Western Greece Greece Editorial Office MDPI St. Alban-Anlage 66 4052 Basel, Switzerland This is a reprint of articles from the Special Issue published online in the open access journal Mathematics (ISSN 2227-7390) (available at: https://www.mdpi.com/journal/mathematics/special issues/Fuzzy Sets). For citation purposes, cite each article independently as indicated on the article page online and as indicated below: LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Article Number, Page Range. ISBN 978-3-03928-520-4 (Pbk) ISBN 978-3-03928-521-1 (PDF) c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND. Contents About the Special Issue Editor ...................................... vii Preface to ”Fuzzy Sets, Fuzzy Logic and Their Applications” ...................
    [Show full text]
  • COBWEB PROJECT Final Publishable Summary Report
    COBWEB PROJECT Final publishable summary report Grant Agreement number: 308513 Project acronym: COBWEB Project title: Citizen Observatory Web Funding Scheme: FP7 CP Period covered: from 01/11/2012 to 19/4/2017 Name of the scientific representative of the project's coordinator and organisation: Chris Higgins (University of Edinburgh) Tel: +44 (0) 7595 11 69 91 E-mail: [email protected] Project website address: www.cobwebproject.eu This project has received funding from the European Union’s Seventh Programme for research, technological development and demonstration under grant agreement No 308513 Table of Contents 1. Executive summary ....................................................................................... 3 2. Summary description of project context and objectives ........................... 4 2.1 Project context ................................................................................................. 4 2.2 Project objectives ............................................................................................. 6 3. Description of main S&T results .................................................................. 7 3.1 System design through stakeholder engagement and co-design ..................... 7 3.1.1 Co-Design ....................................................................................................................... 8 3.2 COBWEB Workflow ....................................................................................... 10 3.3 Architecture ..................................................................................................
    [Show full text]
  • Fuzzy Logic and Intelligent Technologies in Nuclear Science
    BE9900077 Da RUAN Fuzzy Logic and Intelligent Technologies in Nuclear Science Scientific staff TJ LINS, an acronym for Fuzzy Logic and Intel- and be only a step towards future FL appli- Da RUAN it ligent technologies in Nuclear Science, has cations in nuclear power plants. However, li- Xiaozhong Li been recognized since 1994 as a unique inter- censing this technology as a nuclear technol- national forum on Fuzzy Logic (FL) and intelli- ogy could be more challenging and time con- gent systems for nuclear science and industry. suming. The main task for FUNS for the coming years Programme Started at the beginning of is to solve many intricate problems pertain- 1995, both fuzzy software and hardware spon- ing to the nuclear environment by using mod- sored by OMRON Electronics NV (Belgium) were ern technologies as additional tools and to successfully implemented in BRi. To be al- bridge a gap between novel technologies and lowed by the safety authorities to carry out the industrial nuclear world. Specific prototyp- our on-line fuzzy-control experiment at BRI, we ing of Fuzzy Logic Control (FLC) of SCK«CEN's made several off-line tests and a safety on-line BRi research reactor has been chosen as FLINS' test scheme. first priority. This is an on-going R&D project for controlling BRI'S power level. The project started in 1995 and aims to investigate the In the meantime, we have constructed a demo added value of FLC for nuclear reactors. In this model which is not only suitable for most con- framework, the availability of BRi greatly sim- trol algorithm testing experiments, but also for plifies the effort to validate the model descrip- simulating the power control principle of BRi.
    [Show full text]