Oracle® Data Mining Concepts

Oracle® Data Mining Concepts 19c E97867-03 November 2020 Oracle Data Mining Concepts, 19c E97867-03 Copyright © 2005, 2020, Oracle and/or its affiliates. Primary Author: Sarika Surampudi This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract. The terms governing the U.S. Government’s use of Oracle cloud services are defined by the applicable contract for such services. No other rights are granted to the U.S. Government. This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc, and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle. Contents Preface Audience xiii Documentation Accessibility xiii Related Documentation xiii Conventions xiv Changes in This Release for Oracle Data Mining Concepts Guide Changes in Oracle Data Mining 19c xvi Part I Introductions 1 What Is Data Mining? 1.1 What Is Data Mining? 1-1 1.1.1 Automatic Discovery 1-1 1.1.2 Prediction 1-2 1.1.3 Grouping 1-2 1.1.4 Actionable Information 1-2 1.1.5 Data Mining and Statistics 1-2 1.1.6 Data Mining and OLAP 1-3 1.1.7 Data Mining and Data Warehousing 1-3 1.2 What Can Data Mining Do and Not Do? 1-3 1.2.1 Asking the Right Questions 1-4 1.2.2 Understanding Your Data 1-4 1.3 The Data Mining Process 1-4 1.3.1 Problem Definition 1-5 1.3.2 Data Gathering, Preparation, and Feature Engineering 1-5 1.3.3 Model Building and Evaluation 1-6 1.3.4 Knowledge Deployment 1-6 iii 2 Introduction to Oracle Data Mining 2.1 About Oracle Data Mining 2-1 2.2 Data Mining in the Database Kernel 2-1 2.3 Data Mining in Oracle Exadata 2-2 2.4 About Partitioned Model 2-3 2.5 Interfaces to Oracle Data Mining 2-3 2.5.1 PL/SQL API 2-3 2.5.2 SQL Functions 2-4 2.5.3 Oracle Data Miner 2-5 2.5.4 Predictive Analytics 2-5 2.6 Overview of Database Analytics 2-6 3 Oracle Data Mining Basics 3.1 Mining Functions 3-1 3.1.1 Supervised Data Mining 3-1 3.1.1.1 Supervised Learning: Testing 3-2 3.1.1.2 Supervised Learning: Scoring 3-2 3.1.2 Unsupervised Data Mining 3-2 3.1.2.1 Unsupervised Learning: Scoring 3-3 3.2 Algorithms 3-3 3.2.1 Oracle Data Mining Supervised Algorithms 3-4 3.2.2 Oracle Data Mining Unsupervised Algorithms 3-5 3.3 Data Preparation 3-6 3.3.1 Oracle Data Mining Simplifies Data Preparation 3-6 3.3.2 Case Data 3-7 3.3.2.1 Nested Data 3-7 3.3.3 Text Data 3-7 3.4 In-Database Scoring 3-8 3.4.1 Parallel Execution and Ease of Administration 3-8 3.4.2 SQL Functions for Model Apply and Dynamic Scoring 3-8 Part II Mining Functions 4 Regression 4.1 About Regression 4-1 4.1.1 How Does Regression Work? 4-1 4.1.1.1 Linear Regression 4-2 4.1.1.2 Multivariate Linear Regression 4-3 iv 4.1.1.3 Regression Coefficients 4-3 4.1.1.4 Nonlinear Regression 4-3 4.1.1.5 Multivariate Nonlinear Regression 4-4 4.1.1.6 Confidence Bounds 4-4 4.2 Testing a Regression Model 4-4 4.2.1 Regression Statistics 4-4 4.2.1.1 Root Mean Squared Error 4-4 4.2.1.2 Mean Absolute Error 4-5 4.3 Regression Algorithms 4-5 5 Classification 5.1 About Classification 5-1 5.2 Testing a Classification Model 5-2 5.2.1 Confusion Matrix 5-2 5.2.2 Lift 5-3 5.2.2.1 Lift Statistics 5-3 5.2.3 Receiver Operating Characteristic (ROC) 5-4 5.2.3.1 The ROC Curve 5-4 5.2.3.2 Area Under the Curve 5-5 5.2.3.3 ROC and Model Bias 5-5 5.2.3.4 ROC Statistics 5-5 5.3 Biasing a Classification Model 5-6 5.3.1 Costs 5-6 5.3.1.1 Costs Versus Accuracy 5-6 5.3.1.2 Positive and Negative Classes 5-6 5.3.1.3 Assigning Costs and Benefits 5-7 5.3.2 Priors and Class Weights 5-8 5.4 Classification Algorithms 5-8 6 Anomaly Detection 6.1 About Anomaly Detection 6-1 6.1.1 One-Class Classification 6-1 6.1.2 Anomaly Detection for Single-Class Data 6-2 6.1.3 Anomaly Detection for Finding Outliers 6-2 6.2 Anomaly Detection Algorithm 6-3 7 Clustering 7.1 About Clustering 7-1 7.1.1 How are Clusters Computed? 7-1 v 7.1.2 Scoring New Data 7-2 7.1.3 Hierarchical Clustering 7-2 7.1.3.1 Rules 7-2 7.1.3.2 Support and Confidence 7-2 7.2 Evaluating a Clustering Model 7-2 7.3 Clustering Algorithms 7-2 8 Association 8.1 About Association 8-1 8.1.1 Association Rules 8-1 8.1.2 Market-Basket Analysis 8-1 8.1.3 Association Rules and eCommerce 8-2 8.2 Transactional Data 8-2 8.3 Association Algorithm 8-3 9 Feature Selection and Extraction 9.1 Finding the Best Attributes 9-1 9.2 About Feature Selection and Attribute Importance 9-2 9.2.1 Attribute Importance and Scoring 9-2 9.3 About Feature Extraction 9-2 9.3.1 Feature Extraction and Scoring 9-3 9.4 Algorithms for Attribute Importance and Feature Extraction 9-3 10 Time Series 10.1 About Time Series 10-1 10.2 Choosing a Time Series Model 10-1 10.3 Time Series Statistics 10-2 10.3.1 Conditional Log-Likelihood 10-2 10.3.2 Mean Square Error (MSE) and Other Error Measures 10-3 10.3.3 Irregular Time Series 10-4 10.3.4 Build Apply 10-4 10.4 Time Series Algorithm 10-4 Part III Algorithms vi 11 Apriori 11.1 About Apriori 11-1 11.2 Association Rules and Frequent Itemsets 11-2 11.2.1 Antecedent and Consequent 11-2 11.2.2 Confidence 11-2 11.3 Data Preparation for Apriori 11-2 11.3.1 Native Transactional Data and Star Schemas 11-2 11.3.2 Items and Collections 11-2 11.3.3 Sparse Data 11-3 11.3.4 Improved Sampling 11-3 11.3.4.1 Sampling Implementation 11-4 11.4 Calculating Association Rules 11-4 11.4.1 Itemsets 11-4 11.4.2 Frequent Itemsets 11-5 11.4.3 Example: Calculating Rules from Frequent Itemsets 11-6 11.4.4 Aggregates 11-7 11.4.5 Example: Calculating Aggregates 11-8 11.4.6 Including and Excluding Rules 11-8 11.4.7 Performance Impact for Aggregates 11-9 11.5 Evaluating Association Rules 11-9 11.5.1 Support 11-9 11.5.2 Minimum Support Count 11-9 11.5.3 Confidence 11-10 11.5.4 Reverse Confidence 11-10 11.5.5 Lift 11-10 12 CUR Matrix Decomposition 12.1 About CUR Matrix Decomposition 12-1 12.2 Singular Vectors 12-1 12.3 Statistical Leverage Score 12-2 12.4 Column (Attribute) Selection and Row Selection 12-2 12.5 CUR Matrix Decomposition Algorithm Configuration 12-3 13 Decision Tree 13.1 About Decision Tree 13-1 13.1.1 Decision Tree Rules 13-1 13.1.1.1 Confidence and Support 13-2 13.1.2 Advantages of Decision Trees 13-3 13.1.3 XML for Decision Tree Models 13-3 vii 13.2 Growing a Decision Tree 13-3 13.2.1 Splitting 13-4 13.2.2 Cost Matrix 13-5 13.2.3 Preventing Over-Fitting 13-5 13.3 Tuning the

Oracle® Data Mining Concepts

SQL ANALYTICS – Spark Mllib Algorithm Integration E.G

Oracle White Paper June 2009

Oracle Data Sheet

Courses Can Be Given Throughout the Year, on Request

Visualizing In-Database Machine Learning with APEX

Data-Centric Automated Data Mining

Oracle Data Mining Administrator's Guide Explains How to Install the Various Components of Oracle Data Mining and Perform Basic Administration Tasks

Empowering Applications with Spatial Analysis and Mining

HOL: Learn to Use SQL Developer's Oracle Data Miner Workflow UI Move the Algorithms; Not the Data!

Oracle R Enterprise Charlie Berger Sr

Oracle Data Mining Concepts, 11G Release 2 (11.2) E16808-07

Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL Script for Workflow Deployment