New Artifacts for the Knowledge Discovery Via Data Analytics (Kdda) Process
Total Page:16
File Type:pdf, Size:1020Kb
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2014 NEW ARTIFACTS FOR THE KNOWLEDGE DISCOVERY VIA DATA ANALYTICS (KDDA) PROCESS Yan Li [email protected] Follow this and additional works at: https://scholarscompass.vcu.edu/etd Part of the Management Information Systems Commons, and the Management Sciences and Quantitative Methods Commons © The Author Downloaded from https://scholarscompass.vcu.edu/etd/3609 This Dissertation is brought to you for free and open access by the Graduate School at VCU Scholars Compass. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of VCU Scholars Compass. For more information, please contact [email protected]. © Yan Li 2014 All Rights Reserved NEW ARTIFACTS FOR THE KNOWLEDGE DISCOVERY VIA DATA ANALYTICS (KDDA) PROCESS A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University. by YAN LI Master of Science in Information Systems Virginia Commonwealth University, 2009 Bachelor of Science in Chemical Physics, University of Science and Technology of China, 1999 Director: KWEKU-MUATA OSEI-BRYSON PROFESSOR, INFORMATION SYSTEMS Virginia Commonwealth University Richmond, Virginia December 2014 ii Acknowledgement My journey in the VCU Information Systems PhD program is a long, eventful, and fruitful one. During this journey, I have oriented my career in the direction that integrates research, teaching, and practice in the realm of information science. This dissertation is the result of my intellectual curiosity for data analytics and my passion for designing and building things. It is also the result of many experiences I have encountered at VCU from many extraordinary individuals. I am using this opportunity to express my gratitude to these individuals who supported me throughout the PhD process. First and foremost I wish to express my deepest gratitude to my dissertation advisor, Dr. Kweku-Muata Osei-Bryson. The inception of this dissertation started six and a half year ago when I took Data Mining, Databases, and Data Warehousing courses from him in the master's program. His has supported me not only by intellectually preparing me with a sound technical background, but also academically and emotionally through the bumpy road to finish this dissertation. He has been a true mentor in shaping who I am today, including emails that he forwarded to me about the academic job openings. I would like to thank my wonderful dissertation committee: Dr. Jose Dula, Dr. Richard Redmond, Dr. Manoj A. Thomas, and Dr. Heinz Roland Weistroffer. Dr Dula's input helped me restructur my research statement that have been well received by others. Dr. Redmond stopped me the day before Christmas in 2008 to ask about my application to the PhD program, which resulted in my staying at VCU. He has given me tremendous help in making many critical decisions. First time I met Dr. Thomas, I thought he was a random engineering student who was iii stealing EMBA food. Later, sharing the same passion for emergent technologies and design science, he becomes my teacher, colleague, and friend. Many of ideas in this dissertation were results of our length discussions and brainstorming. I must also thank Dr Weistroffer for being always supportive as a PhD advisor, and for not letting me quit two years ago. A very special thanks goes out to Mr. Joe Cipolla, who I truly cherish as a mentor in both academia and industry. His feedback and inputs are always invaluable. I would also like to thank Dr. Wynne, Dr. Kasper, Dr. Andrews, among other VCU school of business faculty who have helped and taught me immensely. I want to thank my peers in the PhD program, especially Yurita Yakimin Abdul Talib who was with me through all ups and downs. I would also like to thank my family for the support they provided me through the process, and without whose love, encouragement and editing assistance, I would not have finished this dissertation. Finally, I want to dedicate this dissertation to my daughter, Lianna. She arrived in the middle of my PhD program and has never stopped amazing me and reminding me to appreciate everything in my life. iv TABLE OF CONTENTS TABLE OF CONTENTS ............................................................................................................................. iv LIST OF TABLES ..................................................................................................................................... viii LIST OF FIGURES ..................................................................................................................................... ix LIST OF ACRONYMS ............................................................................................................................... xi ABSTRACT ................................................................................................................................................ xii CHAPTER 1 INTRODUCTION ............................................................................................................ 1 1.1. BACKGROUND .......................................................................................................................... 1 1.2. DEFINITIONS .............................................................................................................................. 4 1.2.1. Knowledge Discovery and Data Mining ............................................................................... 4 1.2.2. Business Intelligence and Analytics...................................................................................... 5 1.2.3. Different Types of Users ....................................................................................................... 6 1.3. KNOWLEDGE DISCOVERY PROCESS MODELS.................................................................. 7 1.4. RESEARCH MOTIVATION .................................................................................................... 10 1.4.1. Lack of Decision Support for Business Understanding Phase ............................................ 11 1.4.2. Need for an Integrated Knowledge Repository ................................................................... 15 1.4.3. Lack of Decision Support for Data Quality Verification .................................................... 17 1.4.4. Missing Model Maintenance and Reuse ............................................................................. 18 1.5. RESEARCH OBJECTIVE AND SCOPE .................................................................................. 20 1.6. SIGNIFICANCE OF THE RESEARCH .................................................................................... 21 1.7. OUTLINE ................................................................................................................................... 22 CHAPTER 2 LITERATURE REVIEW ............................................................................................... 24 2.1. KNOWLEDGE DISCOVERY PROCESSES AND PROCESS MODELS ............................... 25 2.1.1. CRISP-DM .......................................................................................................................... 26 2.1.2. IKDDM ............................................................................................................................... 28 2.1.3. Limitations in KDDM Process Models ............................................................................... 29 v 2.2. DECISION SUPPORT IN KNOWLEDGE DISCOVERY PROCESS ...................................... 31 2.2.1. Ontology-Based Decision Support for Knowledge Discovery ........................................... 31 2.2.2. Case-based Reasoning Decision Support for Knowledge Discovery ................................. 41 2.2.3. Workflow Management Approach ...................................................................................... 47 2.3. MODEL MANAGEMENT......................................................................................................... 51 2.4. DATA QUALITY ....................................................................................................................... 54 2.4.1. DQ Dimensions ................................................................................................................... 55 2.4.2. Information Quality and Software Quality ......................................................................... 56 2.4.3. Data Quality Assessment and Quality Factors .................................................................... 58 2.4.4. Data Quality Management Methodology ............................................................................ 60 2.5. TEXT MINING AS A SPECIAL CASE OF KNOWLEDGE DISCOVERY ............................ 62 2.6. MULTIPLE CRITERIA DECISION ANALYSIS ..................................................................... 64 2.6.1. MCDA Methods .................................................................................................................. 65 2.6.2. MCDA Software ................................................................................................................. 68 2.6.3. MCDA Software Selection ................................................................................................. 69 CHAPTER 3 RESEARCH METHODOLOGY .................................................................................... 72 3.1. THE SCIENCE OF DESIGN ....................................................................................................