Data Mining This Book Is a Part of the Course by Jaipur National University, Jaipur
Total Page:16
File Type:pdf, Size:1020Kb
Data Mining This book is a part of the course by Jaipur National University, Jaipur. This book contains the course content for Data Mining. JNU, Jaipur First Edition 2013 The content in the book is copyright of JNU. All rights reserved. No part of the content may in any form or by any electronic, mechanical, photocopying, recording, or any other means be reproduced, stored in a retrieval system or be broadcast or transmitted without the prior permission of the publisher. JNU makes reasonable endeavours to ensure content is current and accurate. JNU reserves the right to alter the content whenever the need arises, and to vary it at any time without prior notice. Index I. Content ......................................................................II II. List of Figures ........................................................ VII III. List of Tables ...........................................................IX IV. Abbreviations .......................................................... X V. Case Study .............................................................. 159 VI. Bibliography ......................................................... 175 VII. Self Assessment Answers ................................... 178 Book at a Glance I/JNU OLE Contents Chapter I ....................................................................................................................................................... 1 Data Warehouse – Need, Planning and Architecture ............................................................................... 1 Aim ................................................................................................................................................................ 1 Objectives ...................................................................................................................................................... 1 Learning outcome .......................................................................................................................................... 1 1.1 Introduction ............................................................................................................................................. 2 1.2 Need for Data Warehousing ..................................................................................................................... 4 1.3 Basic Elements of Data Warehousing ...................................................................................................... 5 1.4 Project Planning and Management .......................................................................................................... 6 1.5 Architecture and Infrastructure ................................................................................................................ 8 1.5.1 Infrastructure ...........................................................................................................................11 1.5.2 Metadata ................................................................................................................................. 13 1.5.3 Metadata Components ........................................................................................................... 14 Summary ..................................................................................................................................................... 17 References ................................................................................................................................................... 17 Recommended Reading ............................................................................................................................. 17 Self Assessment ........................................................................................................................................... 18 Chapter II ................................................................................................................................................... 20 Data Design and Data Representation ..................................................................................................... 20 Aim .............................................................................................................................................................. 20 Objectives .................................................................................................................................................... 20 Learning outcome ........................................................................................................................................ 20 2.1 Introduction ............................................................................................................................................ 21 2.2 Design Decision ..................................................................................................................................... 21 2.3 Use of CASE Tools ................................................................................................................................ 21 2.4 Star Schema ........................................................................................................................................... 23 2.4.1 Review of a Simple STAR Schema ....................................................................................... 23 2.4.2 Star Schema Keys .................................................................................................................. 24 2.5 Dimensional Modelling ......................................................................................................................... 26 2.5.1 E-R Modelling versus Dimensional Modelling ..................................................................... 26 2.6 Data Extraction ...................................................................................................................................... 26 2.6.1 Source Identification .............................................................................................................. 27 2.6.2 Data Extraction Techniques ................................................................................................... 28 2.6.3 Data in Operational Systems .................................................................................................. 28 2.7 Data Transformation .............................................................................................................................. 33 2.7.1 Major Transformation Types .................................................................................................. 34 2.7.2 Data Integration and Consolidation ....................................................................................... 36 2.7.3 Implementing Transformation ............................................................................................... 37 2.8 Data Loading .......................................................................................................................................... 38 2.9 Data Quality ........................................................................................................................................... 39 2.10 Information Access and Delivery ......................................................................................................... 40 2.11 Matching Information to Classes of Users OLAP in Data Warehouse ................................................ 40 2.11.1 Information from the Data Warehouse ................................................................................. 41 2.11.2 Information Potential ........................................................................................................... 41 Summary ..................................................................................................................................................... 43 References ................................................................................................................................................... 43 Recommended Reading ............................................................................................................................. 43 Self Assessment ........................................................................................................................................... 44 II/JNU OLE Chapter III .................................................................................................................................................. 46 Data Mining ................................................................................................................................................ 46 Aim .............................................................................................................................................................. 46 Objectives .................................................................................................................................................... 46 Learning outcome ........................................................................................................................................ 46 3.1 Introduction ............................................................................................................................................ 47 3.2 Crucial Concepts of Data Mining .......................................................................................................... 48 3.2.1 Bagging (Voting, Averaging) ................................................................................................