Introduction to Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd [email protected] Objectives

• Overview Data Mining • Introduce typical applications and scenarios • Explain some DM concepts • Review wider product platform

This seminar is partly based on ―Data Mining‖ book by ZhaoHui Tang and Jamie MacLennan, and also on Jamie’s presentations. Thank you to Jamie and to Donald Farmer for helping me in preparing this session. Thank you to Roni Karassik for a slide. Thank you to Mike Tsalidis, Olga Londer, and Marin Bezic for all the support. Thank you to Maciej Pilecki for assistance with demos.

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.

© 2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.

2 Before We Dive In...

• To help me select the most suitable examples and demonstrations I would like to ask you about your background • Who do you identify yourself with: • IT Professional, • Professional, • Software/System Developer?

3 The Essence of Data Mining as Part of

4 Business Intelligence Improving Business Insight

―A broad category of applications and technologies for gathering, storing, analyzing, sharing and providing access to data to help enterprise users make better business decisions.‖ – Gartner

5 Relationships And Acronyms...

Data Mining (DM)

Knowledge Discovery in (KDD)

Business Intelligence (BI)

6 Data Mining

• Technologies for analysis of data and discovery of (very) hidden patterns • Fairly young (<20 years old) but clever algorithms developed through database research • Uses a combination of statistics, probability analysis and database technologies

7 What does Data Mining Do?

Explores Finds Performs Your Data Patterns Predictions

8 DM and BI

• BI is geared at an end user, such as a business owner, knowledge worker etc. • DM is an IT technology generally geared towards a more advanced user – today

• By the way: who is qualified to use DM today?

9 DM Past and Present

• Traditional approaches from Microsoft’s competitors are for DM experts: ―White-coat PhD statisticians‖ • DM tools also fairly expensive

• Microsoft’s ―full‖ approach is designed for those with some database skills • Tools similar to T-SQL and Management Studio • DM built into Microsoft SQL Server 2005 and 2008 at no extra cost • DM ―easy‖ is geared at any Excel-aware user

10 DM Enables Predictive Analysis

Role of Software Data mining Proactive

Predictive Analysis

Interactive OLAP

Ad-hoc reporting

Canned reporting Passive Business Presentation Exploration Discovery Insight 11 Application and Scenarios

12 Value of Predictive Analysis Typical Applications

Seek Profitable Customers

Correct Understand Data During Customer ETL Needs

Predictive Analysis Detect and Anticipate Prevent Customer Fraud Churn

Build Predict Effective Sales & Marketing Inventory Campaigns

13 Data Mining Process CRISP-DM

“Doing Data Mining” Business Data Understanding Understanding

Data Preparation Data Deployment

Modeling

Evaluation “Putting Data Mining to Work” www.crisp-dm.org

14 Customer Profitability

• Typically, you will: 1. Segment or classify customers in a relevant way • Clustering 2. Find a relationship between profit and customer characteristics • Decision Tree 3. Understand customer preferences • Association Rules 4. Study customer behaviour • Sequence Clustering and 1. Predict profitability of potential new customers

15 Predict Sales and Inventory

• You may: 1. Structure the sales or inventory data as a time series • Perhaps from a 2. Forecast future sales and needs • Time Series or Decision Trees with Regression

16 Build Effective Marketing Campaigns • You would: 1. Segment your existing customers • Clustering and Decision Trees 2. Study what makes them respond to your campaigns • Decision Tree, Naive Bayes, Clustering, Neural Network 3. Experiment with a campaign by focusing it • Lift Charts 4. Run the campaign • Predict recipients 5. Review your strategy as you get response • Update your models

17 Detect and Prevent Fraud

• You could: 1. Build a risk model for existing customers or transactions • Decision Trees, Clustering, Neural Networks, and often Logistic Regression 2. Assess risk of a new transaction • Predict risk and its probability using the model • Or 1. Model transaction sequences • Sequence Clustering 2. Find unusual ones (outliers) • Mine the mining model – neural networks, trees, clustering 3. Assess new events as they happen • Predicting by means of the metamodel

18 New Opportunity: Intelligent Applications

• Examples of Intelligent Applications: • Input Validation, based on previously accepted data, not on fixed rules • Business Process Validation – early detection of failure • Adaptive User Interface based on past behaviour • Also known as Predictive Programming

• Learn more by downloading “Build More Intelligent Applications using Data Mining” from www.microsoft.com/technetspotlight

19 Data Mining Products

20 Microsoft DM Competitors All trademarks respectfully implicitly acknowledged

• SAS, largest market share • Oracle (10g), supports of DM, specialised Java APIs product for traditional • Angoss experts (KnowledgeSTUDIO), • SPSS (Clementine), result visualisation, works strength in statistical with SQL Server analysis • KXEN, supports OLAP • IBM (Intelligent Miner) tied and Excel, to DB2, interoperates with • CRM space: Unica, Microsoft through PMML ThinkAnalytics, Portrait, Epiphany, Fair Isaac

21 SQL Server We Need More Than Just Database Engine

Integrate Analyze Report

 Data acquisition and  Knowledge and  Data presentation integration from pattern detection and distribution multiple sources through Data Mining  Publishing of Data   Data enrichment with Mining results and synthesis using logic rules and Data Mining hierarchical views

22 DM Technologies in SQL Server 2005

• Strong, patented algorithms from Microsoft Research labs • Interoperability • PMML (Predictive Model Markup Language) for SAS, SPSS, IBM and Oracle • Multiple tools: • Business Intelligence Development Studio (BIDS) • for Excel (and more) • DMX and OLE DB for Data Mining • XML for Analysis (XMLA)

23 What is New in SQL Server 2008? Data Mining Enhancements

• Enhanced Mining Structures • Easier to prepare and test your models • Models allow for cross-validation • Filtering • Algorithm Updates • Improved Time Series algorithm combining best of ARIMA and ARTXP • ―What-If‖ analysis • Microsoft Data Mining Framework • Supplements CRISP-DM

24 DM Add-Ins for Microsoft Office 2007

efine Data

dentify Task

et Results

25 Demo 1. Using Data Mining Add-in Table Tools for Microsoft Excel 2007 Server Mining Architecture

BIDS Excel/Visio/SSRS/Your App Excel Visio SSMS OLE DB/ADOMD/XMLA/AMO App Deploy Data

Analysis Services Mining Model Server

Data Mining Algorithm Data Source

27 Conclusions

28 ABS-CBN Interactive (ABSi) Subsidiary of the largest integrated media and entertainment company in the Philippines Wireless Services Firm Doubles Response Rates with SQL Server 2005 Data Mining

Challenge Solution Benefit

• Selling custom ring tones • ABSi deployed Microsoft® • More accurate and and other downloadable SQL Server™ 2005 to use personalized service content for mobile phone its data mining feature to recommendations to users requires staying in determine product customers tune with the market. recommendations. • Doubling response rates • Searching transactional from marketing campaigns data for hints on what to • Ad hoc reporting in offer users in cross-selling minutes, not days value-added mobile • Eight times faster data services took days and mining process didn’t provide customer- • Faster data mining specific recommendations. prediction

―Our management is very impressed that we could double our response rate through our SQL Server 2005 data mining … managers of other services ask us to provide the same magic for them—which is what we will do with the full project rollout‖ - Grace Cunanan, Technical Specialist, ABS-CBN Interactive

29 Clalit Health Services Data Mining Helps Clalit Preserve Health and Save Lives Provides health care for 3.7 million insured members, representing about 60 percent of Israel’s population

Challenge Solution Benefit

• Identify which members • Use sociodemographic and • A chance to preserve life would most benefit from medical records to generate a and enhance life quality proactive intervention to predictive score, identifying • Reduced health care prevent health deterioration elder members with highest costs risk for health deterioration • Tightly integrated solution

• Once identified, physicians can try to involve these patients in proactive treatment plans to prevent health deterioration

―Providing physicians with a list of patients that the data mining model predicts are at risk of health deterioration over the next year, gives them the opportunity to intervene, and prevent what has been predicted.‖ - Mazal Tuchler, Data Warehouse Manager , Clalit Health Services

30 More Data Mining Customers

.8 TB SS2005 DW for Ring-Tone Marketing Uses Relational, OLAP and Data Mining

3 TB end-to-end BI Oracle competitive win

End-to end DW on SQL Server, including OLAP Extensive use of Data Mining Decision Trees

1.2 TB, 20 billion records Large Brazilian Grocery Chain

.8 TB DW at main TV network in Italy Increased viewership by understanding trends

.5 TB DW at US Cable company End to end BI, Analysis and Reporting

31 Summary

• Data Mining is a powerful technology still undiscovered by many IT and database professionals • Turns data into intelligence • SQL Server 2005 and 2008 Analysis Services have been created with you in mind

• Let’s mine for valuable gems of knowledge in our databases!

32 © 2008 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.

© 2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.

33