Oracle Machine Learning
& Advanced Analytics
Data Management Platforms
Move the Algorithms; Not the Data!
Charlie Berger, MS Engineering, MBA
Sr. Director Product Management,
Machine Learning, AI and Cognitive Analytics [email protected]
www.twitter.com/CharlieDataMine
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
2
Machine Learning, Data Mining, Predictive Analytics
Automatically sift through large amounts of data to find
hidden patterns, discover new insights and make predictions
A1 A2 A3 A4 A5 A6 A7
• Identify most important factor (Attribute Importance)
• Predict customer behavior (Classification)
• Predict or estimate a value (Regression)
• Find profiles of targeted people or items (Decision Trees)
• Segment a population (Clustering)
• Find fraudulent or “rare events” (Anomaly Detection) • Determine co-occurring items in a “baskets” (Associations)
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Oracle’s Data Management and Machine Learning
Market Observations:
• Machine learning, predictive analytics & “AI” now must-have requirements
• Separate islands for data management and data science just don’t work
• Enterprises whose data science teams most rapidly extract insights and
predictions win
Conclusions:
• Must “operationalize” ML insights and predictions throughout enterprise
• Multilingual Machine Learning: SQL, R, Python, Workflow UI, Notebooks, Embed ML in Appls,
• Evolving towards combined Data Management + Machine Learning
environment that can essentially to manage and “think” about data
- Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
- 4
Oracle’s Data Management and Machine Learning
Architectural Strategy
• In-Database proprietary implementations of machine learning algorithms • Leverage strengths of the Database and adds new ML tech
– Counting, conditional probabilities, sort, rank, partition, group-by, collections, etc. – Parallel execution, bitmap indexes, partitioning, aggregations, recursion w/in parallel infrastructure, IEEE float, frequent itemsets, Automatic Data Preparation (ADP), Text processing, etc.
• Focus on intelligent ML defaults, simplification & automation to enable applications
– ADP, xforms, binning, missing values, Prediction_Details, Predictive_Queries, Model_views
• Machine learning models built via PL/SQL script; scored via SQL functions (1st class DB objects)
select cust_id
from customers
True power evident when scoring
where region = ‘US’
and prediction_probability(churnmod, ‘Y’ using *) > 0.8;
models using SQL functions, e.g.
– “Smart scan” ML model scoring “push down”; Supports OLTP and ATPC environments
• Machine Learning and Advanced Analytics are peer to rest of Oracle Data Mgmt features
– Security, Back-up, Encryption, Scalability, “Big Data” eco-system, BDA, Big Data SQL, Cloud, Spark, etc. – Best ML & analytical development and deployment platform
- Copyright © 2016 Oracle and/or its affiliates. All rights reserved.
- 5
Oracle’s Machine Learning & Advanced Analytics
Fastest Way to Deliver Enterprise-wide Predictive Analytics
Traditional ML
Oracle’s in-DB Machine Learning
Major Benefits
Data Import Data Mining
. Data remains in Database & Hadoop
. Model building and scoring occur in-database
. Use R packages with data-parallel invocations
Model “Scoring”
Data Prep. &
Transformation
avings
. Leverage investment in Oracle IT
. Eliminate data duplication
Data Mining
Model Building
. Eliminate separate analytical servers
Data Prep &
Transformation
. Deliver enterprise-wide applications
. GUI for ML/Predictive Analytics & code gen
. R interface leverages database as HPC engine
Model “Scoring”
Embedded Data Prep Model Building
Data Extraction
Data Preparation
Secs, Mins or Hours
Hours, Days or Weeks
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle’s Machine Learning & Adv. Analytics Algorithms
- CLASSIFICATION
- REGRESSION
- FEATURE EXTRACTION
- – Naïve Bayes
- – Linear Model
- – Principal Comp Analysis (PCA)
– Non-negative Matrix Factorization – Singular Value Decomposition (SVD) – Explicit Semantic Analysis (ESA)
– Logistic Regression (GLM) – Decision Tree
– Generalized Linear Model – Support Vector Machine (SVM) – Stepwise Linear regression
– Neural Network
– Random Forest
– Neural Network
TEXT MINING SUPPORT
– Support Vector Machine – Explicit Semantic Analysis
– LASSO
– Algorithms support text type – Tokenization and theme extraction – Explicit Semantic Analysis (ESA) for
document similarity
A1 A2 A3 A4 A5 A6 A7
ATTRIBUTE IMPORTANCE
CLUSTERING
– Minimum Description Length
– Principal Comp Analysis (PCA)
– Unsupervised Pair-wise KL Div – CUR decomposition for row & AI
– Hierarchical K-Means
– Hierarchical O-Cluster – Expectation Maximization (EM)
STATISTICAL FUNCTIONS
– Basic statistics: min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.
- ANOMALY DETECTION
- ASSOCIATION RULES
- – One-Class SVM
- – A priori/ market basket
- TIME SERIES
- PREDICTIVE QUERIES
- R PACKAGES
– State of the art forecasting using
Exponential Smoothing.
- – Predict, cluster, detect, features
- – CRAN R Algorithm Packages
through Embedded R Execution
– Spark MLlib algorithm integration
SQL ANALYTICS
– Includes all popular models e.g. Holt-Winters with trends,
seasons, irregularity, missing data
– SQL Windows, SQL Patterns,
SQL Aggregates
EXPORTABLE ML MODELS
• OAA (Oracle Data Mining + Oracle R Enterprise) and ORAAH combined
– REST APIs for deployment
• OAA includes support for Partitioned Models, Transactional, Unstructured, Geo-spatial, Graph data. etc,
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Multiple Data Scientist User Roles Supported
Oracle’s Machine Learning/Advanced Analytics
Additional relevant data
Historical or Current Data to
be “scored” for predictions and “engineered features”
Oracle Database 12c
- Historical data
- Assembled
- Predictions & Insights
historical data
Sensor data, Text, unstructured data, transactional data, spatial data, etc.
- DBAs
- Application Developers
- OML Notebook Users
- R, Python Users, Data Scientists
- Data Analysts, Citizen Data Scientists
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OAA Model Build and Real-time SQL Apply Prediction
Simple SQL Syntax
ML Model Build (PL/SQL)
begin dbms_data_mining.create_model('BUY_INSUR1', 'CLASSIFICATION',
'CUST_INSUR_LTV', 'CUST_ID', 'BUY_INSURANCE', CUST_INSUR_LTV_SET'); end;
/
Model Apply (SQL query)
Select prediction_probability(BUY_INSUR1, 'Yes'
USING 3500 as bank_funds, 825 as checking_amount, 400 as credit_balance, 22 as age, 'Married' as marital_status, 93 as
MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)
from dual;
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Fraud Prediction Demo
Automated In-DB Analytical Methodology
drop table CLAIMS_SET; exec dbms_data_mining.drop_model('CLAIMSMODEL');
create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));
insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES'); insert into CLAIMS_SET values ('PREP_AUTO','ON'); commit;
begin dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION',
'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET');
end; /
Automated Monthly “Application”! Just
-- Top 5 most suspicious fraud policy holder claims select * from (select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud, rank() over (order by prob_fraud desc) rnk from
(select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud from CLAIMS
add:
Create
View CLAIMS2_30 As Select * from CLAIMS2 Where mydate > SYSDATE – 30 where PASTNUMBEROFCLAIMS in ('2to4', 'morethan4'))) where rnk <= 5 order by percent_fraud desc;
Time measure: set timing on;
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
ML Model Deployment for Real-Time Scoring
Real-Time Scoring, Predictions and Recommendations
• On-the-fly, single record apply with new data (e.g. from call center)
Oracle Cloud
Select prediction_probability(CLAS_DT_1_15, 'Yes'
USING 7800 as bank_funds, 125 as checking_amount, 20 as credit_balance, 55 as age, 'Married' as marital_status, 250 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)
from dual;
Social Media
Likelihood to respond:
Call Center
Branch
Office
Get Advice
R
Mobile
Web
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Manage and Analyze All Your Data
Data Scientists, R Users, Citizen Data Scientists
Architecturally,
Many Options
and Flexibility
SQL / R
Boil down the Data Lake
“Engineered Features”
Big Data SQL / R
– Derived attributes that reflect domain
knowledge—key to
best models e.g.:
• Counts
Object
Store
• Totals • Changes
over time
- Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
- 12
OAA Algorithm Performance
- Algorithm
- Time (sec.)
- Number of Attributes
SVM SGD Solver GLM Classification SVM IPM Solver
GLM Regression
K-Means
83s
180s 811s
59s
900
9000
900
900
- 168s
- 9000
• In-memory 640 million rows, Airline On-Time dataset • SPARC M8-2 hardware • Courtesy of the SPARC Performance Team
Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
OAA Algorithm Scalability: Rows and Parallelism
Random Forest
200 180
160
140 120 100
80
60 40 20
0
- 100
- 150
- 200
- 250
- 300
- 350
- 400
- 450
- 500
- 550
Row Count (M records)
• X6-2 hardware • Benchmarks from Oracle Labs, Systems
Research Group
- Copyright © 2018 Oracle and/or its affiliates. All rights reserved. |
- 14
Operationalizing and Embedding Analytics for Action
How long does it take to put a defined model into operational use?
?
?
Length of time to put a model into production.
?
Based on 141 respondents who
stated they are doing this today
- Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
- 15
The Core Ingredients of Good Machine Learning
- Domain Knowledge + Data Machine Learning Algorithms
- Insights, Predictions
Machine Learning Algorithms
A1 A2 A3 A4 A5
A1 A2 A3 A4 A5 A6 A7
A6 A7
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Deployment!!
Most Important Factor in Machine Learning?
Machine Learning Algorithms
A “Thinking”
Database
A1 A2 A3 A4 A5 A6 A7
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Introducing:
Oracle Machine Learning SQL Notebook
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Introducing Oracle Autonomous Data Warehouse Cloud
Value Proposition
- Easy
- Fast
- Elastic
• Provision a data warehouse in as little
- • Up to 14x performance advantage than
- • Only Pay for What you Use with user
defined sizing, on-demand scaling & idle shut-off
as 15-seconds
• Automated management of database
Redshift1
• High concurrency supports multi-user
administration
• Simple Load and Go with Automated
access and workloads
• Based on Exadata for extreme
• Independent scaling of compute and
storage
Tuning
- performance
- • Instant scaling with zero downtime
• Dedicated cloud-ready migration tools including Redshift
- Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
- 19
Confidential – Oracle
Oracle Autonomous Data Warehouse Cloud Key Features
Highly Elastic
High-Performance Queries
Independently scale compute and storage, without having to overpay for fixed blocks of resources
and Concurrent Workloads
Optimized query performance with
preconfigured resource profiles for different types of users
Oracle Machine Learning
Oracle SQL
Built-in Web-Based SQL ML Tool
Apache Zeppelin Oracle Machine Learning
notebooks ready to run ML from browser
Autonomous DW Cloud is compatible with all business analytics tools that support
Oracle Database
Database migration utility
Self Driving
Dedicated cloud-ready migration tools
for easy migration from Amazon
Fully automated database for self-tuning patching and upgrading itself while the system is running
Redshift, SQL Server and other databases
Cloud-Based Data Loading
Fast, scalable data-loading from Oracle Object Store, AWS S3, or on-premises
Enterprise Grade Security
Data is encrypted by default in the cloud, as well as in transit and at rest
- Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
- 20
Confidential – Oracle
ADWC + Oracle Machine Learning Notebooks
Developer
Analytics
Autonomous Data Warehouse Cloud
Tools
Data Warehouse Services
(EDWs, DW, departmental marts and sandboxes)
- Service Management
- Built-in Access Tools
Oracle Exadata Cloud Service
Oracle Analytics Cloud
Oracle SQL Developer
Oracle Database Cloud Service
Service Console
Oracle Machine Learning
Express Cloud Service
Data
Integration
Services
Oracle Data Integration
Platform Cloud
Autonomous Database Cloud
3rd Party DI on
Oracle Cloud Compute
Oracle Object Storage Cloud
Flat Files and Staging
3rd Party DI On-premises
- Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
- 21
Oracle Machine Learning
Machine Learning Notebook for Autonomous Data Warehouse Cloud
Key Features
• Collaborative UI for data scientists
– Packaged with Autonomous Data
Warehouse Cloud (V1)
– Easy access to shared notebooks, templates, permissions, scheduler, etc.
– SQL ML algorithms API (V1)
– Supports deployment of ML analytics
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning
Machine Learning Notebook for Autonomous Data Warehouse Cloud
Key Features
• Collaborative UI for data scientists
– Packaged with Autonomous Data
Warehouse Cloud (V1)
– Easy access to shared notebooks, templates, permissions, scheduler, etc.
– SQL ML algorithms API (V1)
– Supports deployment of ML analytics
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle’s Machine Learning & Adv. Analytics Algorithms
- CLASSIFICATION
- REGRESSION
- FEATURE EXTRACTION
- – Naïve Bayes
- – Linear Model
- – Principal Comp Analysis (PCA)
– Non-negative Matrix Factorization – Singular Value Decomposition (SVD) – Explicit Semantic Analysis (ESA)
– Logistic Regression (GLM) – Decision Tree
– Generalized Linear Model – Support Vector Machine (SVM) – Stepwise Linear regression
– Neural Network
– Random Forest
– Neural Network
STATISTICAL FUNCTIONS
– Basic statistics: min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.
– Support Vector Machine – Explicit Semantic Analysis
A1 A2 A3 A4 A5 A6 A7
ATTRIBUTE IMPORTANCE
– Minimum Description Length
– Principal Comp Analysis (PCA)
– Unsupervised Pair-wise KL Div
CLUSTERING
– Hierarchical K-Means
– Hierarchical O-Cluster – Expectation Maximization (EM)
ASSOCIATION RULES
ANOMALY DETECTION
– A priori/ market basket
– One-Class SVM