Data Profiling Model for Assessing the Quality Traits of Master Data Management
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-8 Issue-6, March 2020 Data Profiling Model for Assessing the Quality Traits of Master Data Management Dilbag Singh, Dupinder Kaur Abstract: Enterprise Resource Planning (ERP) and Business Thus it is crucial to addresses these critical data entities Intelligence (BI) system demand progressive rules for (golden record) also known as master data, which is a single maintaining the valuable information about customers, truth version of record among different copies that belongs products, suppliers and vendors as data captured through to same person having different values. Master data is the different sources may not be of high quality due to human complete record of a human, like the data is about errors, in many cases. The problem encounters when this information is accessible across multiple systems, within same customers, patients, suppliers, partners, employees, and organization. Providing adequacy to this scattered data is a top other critical entities. For this purpose, a high quality and agenda for any organization as maintaining the data is consistent copy of master data becomes a primary task for complicated, as having high quality data. Master Data any organization. The processes and systems which Management (MDM) provides a solution to these problems by maintain this data are known as Master Data Management maintaining “a single reference of truth” with authoritative [5]. source of master data (Customer, products, employees etc). Master Data Management (MDM) is a holistic framework Master Data Management (MDM) is a highlighted concern now consisting of processes, tools and technologies which a day as valid data is the demand for strategic, tactical and provides coordinated master data in the enterprise. When operational steering of every organization. The lane to MDM initiates with the quality of data which demands for discovery of data is collected from multiple, conflicting sources of master data, profiling and analysis. As inadequacy of data may information, MDM initiates by determining, collecting and leads to adverse effects such as wrong decision, loss of time, bad transforming(cleaning and standardizing )the data thus results and unnecessary risk. Thus there is a need to deal with ultimately providing repair to that data [12]. Quality of master data and quality of this specific data in a successful and master data is determined by the extent to which master data efficient manner. For ensuring this purpose, an approach is is satisfied to defined quality traits. Data Quality is defined proposed in this paper. The research focuses on development of as the process of making data fit for use and meeting the a Model for Data Profiling to assess the level of Quality Traits requirements of user [3][4]. Only qualitative data can for Master Data Management. Results are shown by executing provide useful, resilient, functional and applicable results. It the defined steps on TALEND tool over collected dataset. Thus, level of quality traits processes directly correlates with an allows a profound understanding of the research data that is organization’s ability to make the proper decisions and better always up-to-date. The completeness, uniqueness, outcomes. correctness and timeliness of the data are crucial for successful operational processes [5]. Keywords: Data Quality, Data Profiling, Master Data Data profiling is a technique of exploring data from existing Management, Quality Traits. information sources and collecting informative summaries about that data. The intention is to discover data quality as a I. INTRODUCTION component of a large project by determining the accuracy, In the changing dimensions of business, major constituent of completeness, and consistency of data. This initial analysis any organization’s day-to-day business is the data that is ensures at an early stage if the correct data is available for used in functional, operational and progressive activities. use thus anomalies can be handled subsequently. Thus the The quantity and complexity of storing this data becomes objective of this research is to define and categorize the hard with the growth of organizations. If this data is problems related to quality of data and to provide a incorrect, wrong or out of date, the organization may suffer systematic approach for assessing the level of quality traits from delays, bad results or financial losses. As data is in Master Data Management. scattered and stored in different types of locations, it A. Data Quality Traits becomes difficult to provide data consistency, accuracy, uniqueness, completeness etc. There must be proper Data Quality trait is a measure for qualitative or quantitative organization and management with massive growth of data analysis in order to determine the quality of data [4].The [2]. For example if a company creates a service which major issue is that data is initially taken either from single provides communication through XML messages that have a source or multiple sources. The scattered data over different single view of customer data. But if the record of same locations make the data unfit for use as it leads to quality customer is stored in four databases with two different issues such as duplicity of information inconsistency and addresses, it becomes a tedious task to provide service to incompleteness [6]. that customer. Revised Manuscript Received on February 10, 2020. * Correspondence Author Dr. Dilbag Singh: Computer Science and Applications, Chaudhary Devi Lal University, Sirsa (HR), India. Email: [email protected] Dupinder Kaur*: Computer Science and Applications, Chaudhary Devi Lal University, Sirsa (HR), India. Email: [email protected] Published By: Retrieval Number: F7307038620/2020©BEIESP Blue Eyes Intelligence Engineering DOI:10.35940/ijrte.F7307.038620 446 & Sciences Publication Data Profiling Model for Assessing the Quality Traits of Master Data Management Table- I: Data Quality Traits Otmane Azeroual (2017) stated that effective adequacy level of the data quality can be achieved at various stages of any S Quality Definition Example research information system. In this respect, improvement of . Traits N data quality of RIS is targeted. It suggested a quality o workflow procedure and a use case in order to better Assessment of The address of a person has evaluate the data quality in RIS [5]. 1 equivalent data the same value and format as Consistency values across in person’s ID card as that multiple stored within the database. III. RESEARCH METHODOLOGY systems. A research is about establishing facts and reaching new Measurement of Student’s first name and last 2 Completeness presence of name are mandatory; the conclusions by systematic study of a system. Research is non-blank record is incomplete if both conducted by taking some objectives in mind. The desired values. fields are not provided. goals of an organization cannot be achieved, if their Degree to Data entered into different business processes are irregular, inefficient and do not meet 3 which data data formats as DD/MM/YY Accuracy matches with and MM/DD/YY may affect their client needs. Thus the ultimate solution is to adopt “real-life” accuracy of data. MDM strategies to resolve the problem of inconsistent data. objects. Thus the objective of this research is to access the levels of Same data is Student id must be unique. If data quality traits and to provide a model for data profiling 4 Redundancy stored at two students have same Id no, that an organization follows to achieve an effective different fields it will make redundant or tables. information. adequacy level of data. In this study, Quantitative and Domain Defined valid In “AGE” column, only Qualitative research methodologies are applied which 5 Integrity set of rules for integer value is allowed provide insight information to know the extent of data Constraint attributes. rather than character, string consistency, accuracy and redundancy. The research is or time. performed on TOSDQ (TALEND Open Studio for Data Table I describes the necessary traits for adequate data Quality) software. It is an ETL (Extract, Transform and quality in creating master data. It is suggested that these Load) tool that provides a software solution for data traits and standards should be adopted by data quality preparation, data quality, data integration and data professionals as these designed methods are useful for Management. accessing and maintaining the level of data quality [6]. IV. IMPACT OF DATA QUALITY ON MASTER II. LITERATURE REVIEW DATA It presents the review of work done by various researchers If master data is not consistent and accurate, organizations that provides a useful insight for improving data quality and will be less reactive to new systematic compliance which managing the master data management. This work done is ultimately affects the reporting processes. The loss due to very helpful in understanding what has been done in the poor data is mostly in financial terms [10]. Gartner field of analyzing data quality issues. conducted a research on the set of organizations and found Alex Gamero (2019) focused on processes of manual data that the annual cost due to dirty data was on average of cleaning and the problem of information quality in $13.3 million dollars, in 2014 [8]. Thus, in the process of microfinance sector. MDM is proposed as the basis for data source handling, data quality is of utmost importance. interaction between different segments of data governance The quality of data is subjective term which varies as per the and infrastructure levels. The proposed model comprises of perspective of the system that means the data which is of four sub microfinance organization levels including 8 good quality for one system might not be good for another phases. Among these phases only 2 are supportable, 40 used system. The requirement varies as per the needs of for evaluation criteria and 5 levels are kept for maturity[1]. organizations. Inadequate data quality may leads to Fajar Gumelar (2018) gave an advice to the organization to followings problems: [8] upgrade the master data maturity at different levels.