Building a Data Mining Model Using Data Warehouse and OLAP Cubes IST 734 SS Chung
Total Page:16
File Type:pdf, Size:1020Kb
Cleveland State University Building a Data Mining Model using Data Warehouse and OLAP Cubes IST 734 SS Chung 14 Sunnie S Chung IST 734 Build a Data Mining Model using Data Warehouse and OLAP cubes Contents 1. Abstract: ................................................................................................................................................ 3 2. Introduction: .......................................................................................................................................... 3 3. Adventure works database: ................................................................................................................... 4 4. Getting familiar with Sql Server Analysis Services (SSAS) tools and various datamining algorithms 5 4.1. Microsoft Association Algorithm .................................................................................................. 6 4.2. Microsoft Clustering Algorithm .................................................................................................... 9 4.3. Microsoft Time Series Algorithm ................................................................................................ 12 4.4. Microsoft Decision Trees Algorithm .......................................................................................... 16 5. Star schema: ........................................................................................................................................ 19 5.1. Fact Tables and Dimension Tables .................................................................................................. 19 5.2. Steps in Star Schema Design: .......................................................................................................... 19 6. Cube AND MDX: ............................................................................................................................... 19 6.1. Cube: ........................................................................................................................................... 19 6.2. Dimensions: ................................................................................................................................ 20 6.3. Measure & Measure Groups ....................................................................................................... 20 6.4. Steps in Building and Deploying a cube: .................................................................................... 20 7. Quering On Cubes Using MDX-Examples ......................................................................................... 26 7.1. Queries: ....................................................................................................................................... 26 7.2. Screenshots: ................................................................................................................................ 27 8. DataMining and DMX ....................................................................................................................... 32 8.1. Business Scenario1: .................................................................................................................... 32 8.2. Business Scenario2: .................................................................................................................... 36 Sunnie S Chung IST 734 8.3. Business Scenario3: .................................................................................................................... 38 9. Conclusion: .......................................................................................................................................... 41 10. References: ..................................................................................................................................... 41 1. Abstract: The purpose of this project is learn the application of various database and datamining concpets and its application in current business using Adventure works database and various database tools.We begin by designing a star schema and building a DataWarehouse OLAP cube, for Sales Analysis using Sql Server Analysis Services (SSAS) and query the cube using MultiDimensional eXpressions language(MDX) to find the current business trends.Next we create data mining structure using Data Mining Extensions(DMX) and find the hidden pattern and predict things that are helpful in improving the business and its growth. Some of the data mining techniques that will be considering to accomplish the above tasks are Microsoft Association Mining, Cluster analysis, Time series and Naïve- Bayes Algorithm. 2. Introduction: A data warehouse is a centralized repository that stores data from multiple information sources and transforms them into a common, multidimensional data model for efficient querying and analysis.OLAP and Data Mining are two complementary technologies for Business Intelligence.Online Analytical Processing (OLAP) is a technology that is used to organize large business databases and support business intelligence. OLAP is a database technology that has been optimized for querying and reporting, instead of processing transactions OLAP databases are divided into one or more cubes, and each cube is organized and designed by a cube administrator to fit the way that you retrieve and analyze data. OLAP is used for decision-support systems to analyze aggregated information for sales, finance, budget, and many other types of applications. OLAP is about aggregating measures based on dimensionhierarchies and storing these precalculated aggregations in a special data structure. With the help of preaggregations and special indexes, you can query aggregated data and get decision-support query results back in real time OLAP provides us with a very good view of what is happening, but can not predict what will happen in the future or why it is happening.This part is done by datamining.Data Mining is a combination of discovering techniques + prediction techniques. Sunnie S Chung IST 734 The sequence of steps that will be followed in this project is 1. Understand the Adventure works database which will be used in this project(fully understand the transactional data available in the database). 2. Getting familiar with Sql Server Analysis Services (SSAS) tools and various datamining algorithms such as Clustering,Association,Time Series etc in the SSAS tool 3. The next step will be to come up with a list of questions: what questions need to be answered, what metrics will help business managers monitor and grow the business. 4. Based on the business questions that need to be answered, data staging layer in a star schema format will be designed. 5. Then the cube will be built to extract data from the Star schema staging layer and we perform our data mining on the cube. 6. Perform Datamining in the Adventure works database to find hidden patterns and information using DMX and MDX. 3. Adventure works database: In this section we will try to understand the Adventureworks database that will be used as a part of the project.We try to understand scope of the business its various components and products etc. 3.1. Business Overview: Adventure Works Cycles is a large multinational bicycle manufacturer, with headquarters located in Bothell, Washington. The company has approximately 300 employees, 29 of which are sales representatives. The primary distribution channel for Adventure Works Cycles through the retail stores of their resellers. These resellers are located in Australia, Canada, France, Germany, the United Kingdom, and the United States. Adventure Works Cycles also sells to individual customers worldwide by means of the Internet. Adventure Works Cycles has five major product offerings: Bikes – Three primary bike product lines: Mountain, Road, and Touring. Accessories – Examples include helmets and water bottles. Clothing – Examples include jerseys and biking shorts. Components – Examples include bottom brackets and frames. Services – Examples include premium service and standard service. Sunnie S Chung IST 734 The version of Adventure works database that will be used in this project is Adventure works 2012 4. Getting familiar with SQL Server Analysis Services (SSAS) tools and various data mining algorithms In this section we will learn and understand some data mining algorithms and its applications that will be used as a part of project later. Sunnie S Chung IST 734 4.1. Microsoft Association Algorithm The Microsoft Association algorithm is an association algorithm provided by Analysis Services that is useful for recommendation engines. A recommendation engine recommends products to customers based on items they have already bought, or in which they have indicated an interest. The Microsoft Association algorithm is also useful for market basket analysis. Association models are built on datasets that contain identifiers both for individual cases and for the items that the cases contain. A group of items in a case is called an itemset . An association model consists of a series of itemsets and the rules that describe how those items are grouped together within the cases. The rules that the algorithm identifies can be used to predict a customer's likely future purchases, based on the items that already exist in the customer's shopping cart. Sunnie S Chung IST 734 Sunnie S Chung IST 734 Sunnie S Chung IST 734 4.2. Microsoft Clustering Algorithm The Microsoft Clustering algorithm is a segmentation algorithm provided by Analysis Services. The algorithm uses iterative techniques to