Developing a Data Quality Scorecard That Measures Data Quality in a Data Warehouse
Total Page:16
File Type:pdf, Size:1020Kb
Developing a Data Quality Scorecard that Measures Data Quality in a Data Warehouse A thesis submitted for the degree of Doctor of Philosophy By Aderibigbe Grillo College of Engineering, Design and Physical Sciences Brunel University November 2018 ABSTRACT The main purpose of this thesis is to develop a data quality scorecard (DQS) that aligns the data quality needs of the Data warehouse stakeholder group with selected data quality dimensions. To comprehend the research domain, a general and systematic literature review (SLR) was carried out, after which the research scope was established. Using Design Science Research (DSR) as the methodology to structure the research, three iterations were carried out to achieve the research aim highlighted in this thesis. In the first iteration, as DSR was used as a paradigm, the artefact was build from the results of the general and systematic literature review conduct. A data quality scorecard (DQS) was conceptualised. The result of the SLR and the recommendations for designing an effective scorecard provided the input for the development of the DQS. Using a System Usability Scale (SUS), to validate the usability of the DQS, the results of the first iteration suggest that the DW stakeholders found the DQS useful. The second iteration was conducted to further evaluate the DQS through a run through in the FMCG domain and then conducting a semi-structured interview. The thematic analysis of the semi-structured interviews demonstrated that the stakeholder's participants‘ found the DQS to be transparent; an additional reporting tool; Integrates; easy to use; consistent; and increases confidence in the data. However, the timeliness data dimension was found to be redundant, necessitating a modification to the DQS. The third iteration was conducted with similar steps as the second iteration but with the modified DQS in the oil and gas domain. The results from the third iteration suggest that DQS is a useful tool that is easy to use on a daily basis. The research contributes to theory by demonstrating a novel approach to DQS design This was achieved by ensuring the design of the DQS aligns with the data quality concern areas of the DW stakeholders and the data quality dimensions. Further, this research lay a good foundation for the future by establishing a DQS model that can be used as a base for further development. 2 This thesis is dedicated to the Grillo Family 3 Table of Contents Table of Contents .................................................................................................................................. 4 List of Figures ........................................................................................................................................ 7 List of tables......................................................................................................................................... 10 Acknowledgement ............................................................................................................................... 11 Chapter 1: Introduction ..................................................................................................................... 12 1.1 Overview .............................................................................................................................. 12 1.2 Research background ......................................................................................................... 12 1.3 The Research Problem........................................................................................................ 18 1.4 Research Aim and Objective .............................................................................................. 19 1.5 Research Methodology ....................................................................................................... 20 1.6 Thesis Layout ...................................................................................................................... 23 Chapter 2: Research Design ............................................................................................................... 26 2.1 Introduction ............................................................................................................................... 26 2.2 Design Science Research (DSR) Paradigm ............................................................................. 26 2.3 Research Methods and Techniques ......................................................................................... 36 2.4 Practical Application of DSR in this Research ....................................................................... 41 2.4.1 First DSR Iteration Cycle .................................................................................................. 42 2.4.2 Second DSR Iteration Cycle .............................................................................................. 45 2.4.3 Third DSR Iteration Cycle ................................................................................................ 47 2.5 Summary .................................................................................................................................... 49 Chapter 3: Data warehouse, Data Quality Dimensions and Scorecards Literature ..................... 50 3.1 Introduction ............................................................................................................................... 50 3.2 The Data Warehouse Domain .................................................................................................. 50 3.2.1 Data Quality ....................................................................................................................... 52 3.2.2 Dimensions of Data Quality .............................................................................................. 54 3.2.3 Data Quality Model Foundations ..................................................................................... 58 3.2.4 Data Quality in Data warehouses ..................................................................................... 60 3.2.5 Data Quality Tools ............................................................................................................. 70 3.2.6 State of the art of Data quality .......................................................................................... 72 3.3 Stakeholders and Data Quality Goals ..................................................................................... 78 3.3.1 Stakeholder Data Quality Perception............................................................................... 88 4 3.3.2 Stakeholder Data quality Concern Areas ........................................................................ 91 3.4 Designing an effective Data Quality Scorecard – DQS .......................................................... 95 3.5 Existing Scorecard Design –Systematic Literature Review .................................................. 97 3.5.1 Systematic Literature Review Analysis .......................................................................... 100 3.5.2 Limitations of Existing DQS ........................................................................................... 103 3.6 Summary .................................................................................................................................. 104 Chapter 4: DQS Model Development and Validation – Iteration I .............................................. 105 4.1 Overview .................................................................................................................................. 105 4.2 DQS Model Development ....................................................................................................... 106 4.3 Scorecard Mechanics – Electronic and Web-centric DQS Development ........................... 111 4.3.1 Walkthrough of the Web-centric DQS ........................................................................... 114 4.4 DQS Validation ....................................................................................................................... 116 4.4.1 Data Collection Techniques ............................................................................................. 116 4.4.2 SUS Questionnaire Design ............................................................................................... 117 4.4.3 Participants ....................................................................................................................... 118 4.4.4 Procedure .......................................................................................................................... 119 4.4.5 SUS Evaluation Results ................................................................................................... 119 4.4.6 Analysis of results............................................................................................................. 121 4.5 Summary .................................................................................................................................. 122 Chapter 5: DQS Evaluation-Iteration II ......................................................................................... 124 5.1 Introduction ............................................................................................................................. 124 5.2 About Brewing Company Ltd ................................................................................................ 124 5.2.1 The Data Quality Problem at Brewery Ltd