Introducing Oracle Machine Learning for Python Mark Hornick - Senior Director, Data Science and Machine Learning at Oracle [email protected], www.twitter.com/MarkHornick

1 Future and past TechCasts:

Submit a topic to share at https://analyticsanddatasummit.org/techcasts/

Analytics & Data Oracle User Community Same great technical content…new name! www.andouc.org Save the Date TechCast Days-Winter Session January 26-28, 2021 Watch our website & social media channels for more details

Share your knowledge, expertise and ideas! Submit your presentation by going to our website and clicking on “TechCasts”

3 4 Oracle Machine Learning for Python Introduction

Mark Hornick

Senior Director, Data Science and Machine Learning

November 19, 2020

Copyright © 2020 Oracle and/or its affiliates. Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of . What is Python?

An interpreted, object-oriented, high level, general purpose programming language Designed for rapid application development and scripting to connect existing components Created in the late 1980s and first released in 1991 Open source: https://www.python.org World-wide usage • Widely taught in Universities • Many Data Scientists know and use Python Thousands of open source packages to enhance productivity

Copyright © 2020 Oracle and/or its affiliates. Traditional Python and Data Source Interaction

Read/Write files using built-in tool capabilities Flat Files Data Source read extract / export

export load

Data source connectivity packages, e.g., cx_Oracle

Access latency Deployment Paradigm shift: Python → Data Access Language → Python Ad hoc cron job Memory limitation – data size, in-memory processing Single threaded Issues for backup, recovery, security Ad hoc production deployment

Copyright © 2020 Oracle and/or its affiliates. Oracle Machine Learning d OML4SQL OML Notebooks SQL API with Apache Zeppelin on Autonomous OML4R Oracle Data Miner R API Oracle SQL Developer extension OML4Py* OML4Spark Python API R API on Big Data OML AutoML UI* Code-free AutoML interface on Autonomous Database OML Services* Model Deployment and Management, Cognitive Text

* Coming soon Copyright © 2020 Oracle and/or its affiliates. Oracle Machine Learning Notebooks Autonomous Database as a Data Science Platform

Collaborative UI • Based on Apache Zeppelin • Supports data scientists, data analysts, application developers, DBAs with SQL and Python • Easy notebook sharing • Permissions, versioning, and scheduling of notebooks Included with Autonomous Database • Automatically provisioned and managed • In-database algorithms and analytics functions • Explore and prepare, build and evaluate models, score data, deploy solutions

Copyright © 2020 Oracle and/or its affiliates. Oracle Machine Learning for Python Supported in Oracle Autonomous Database with OML Notebooks

Use Oracle Database as HPC environment • Explore, transform, and analyze data faster and at scale OML Notebooks REST Interface Use in-database parallelized and distributed ML algorithms OML4Py • Build more models on more data, and score large volume data – faster • Use in-database algorithms from OML4SQL via natural Python API • Increased productivity from automatic data preparation, partitioned models, and integrated text mining capabilities Execute Python scripts and manage Python objects in-database • Collaborate: hand-off data science products from data scientist to developers easily • Run user-defined functions in data-parallel, task-parallel, and non-parallel fashion • Return structured and image results in Python and REST API New automatic machine learning (AutoML) and model explainability (MLX) • Enhance data scientist productivity and enable non-experts to use and benefit from machine learning • Algorithm selection, feature selection, hyperparameter tuning, model selection • Model-agnostic identification of important features that impact model predictions

Copyright © 2020 Oracle and/or its affiliates. Transparency Layer In-database performance – indexes, query optimization, parallelism, partitioning

Leverages proxy objects for database data: oml.DataFrame DATA.shape DATA.head() • # Create table from Pandas DataFrame data DATA.describe() DATA = oml.create(data, table = 'BOSTON') DATA.std() • # Get proxy object to DB table boston DATA.skew() DATA = oml.sync(table = 'BOSTON') TRAIN, TEST = Uses familiar Python syntax to manipulate database data DATA.split() Overloads Python functions translating functionality to SQL TRAIN.shape TEST.shape

Copyright © 2020 Oracle and/or its affiliates. In-database scalable aggregation Example using the crosstab function

ONTIME_S = oml.sync(table="ONTIME_S") res = ONTIME_S.crosstab('DEST') type(res) res.head()

Source data is a DataFrame, ONTIME_S, select DEST, count(*) OML Notebooks which is an Oracle Database table from ONTIME_S group by DEST crosstab() function overloaded to accept OML OML4Py DataFrame objects and transparently generates SQL for scalable processing in Oracle Autonomous Database Oracle Database In-db Returns an ‘oml.core.frame.DataFrame’ object stats User tables

Copyright © 2020 Oracle and/or its affiliates. OML4Py 1.0 Machine Learning in-database algorithms

Classification Clustering Association Rules • Decision Tree • Expectation Maximization • Naïve Bayes • Apriori – Association Rules • Generalized Linear Model • Hierarchical k-Means • Support Vector Machine • Random Forest • Neural Network Attribute Importance Feature Extraction • Minimum Description Length • Singular Value Decomposition Regression • Explicit Semantic Analysis • Principal Component Analysis via SVD • Generalized Linear Model Anomaly Detection • Neural Network • Support Vector Machine • 1 Class Support Vector Machine

Supports automatic data preparation, partitioned model ensembles, integrated text mining

Copyright © 2020 Oracle and/or its affiliates. Scalable in-database algorithms Example using Support Vector Machine from oml import svm

# create proxy object OML Notebooks ONTIME_S = oml.sync(table='ONTIME_S') OML4Py # define model object settings = {'svms_outlier_rate' : 0.01} svm_mod = svm('anomaly_detection', Oracle Autonomous svms_kernel_function = Database 'dbms_data_mining.svms_linear', **settings) # build anomaly detection model svm_mod = svm_mod.fit(x=ONTIME_S, y=None) User tables # view model object svm_mod

Copyright © 2020 Oracle and/or its affiliates. Use matplotlib visualization with in-database model results Example using OML Notebooks with in-database clustering model build and score

Drop existing model

Build k-Means model

Score using model

16 Copyright © 2020, Oracle and/or its affiliates Embedded Python Execution Example of parallel partitioned data flow using third party package

REST Interface OML Notebooks # user-defined function using sklearn def build_lm(dat): OML4Py from sklearn import linear_model lm = linear_model.LinearRegression() Oracle Autonomous X = dat[['PETAL_WIDTH']] Database y = dat[['PETAL_LENGTH']] lm.fit(X, y) return lm User tables # select column(s) for partitioning data index = oml.DataFrame(IRIS['SPECIES']) # invoke function in parallel on IRIS table mods = oml.group_apply(IRIS, index, Python Engine spawns func=build_lm, OML4Py parallel=2) mods.pull().items() Python Engine OML4Py Copyright © 2020 Oracle and/or its affiliates. REST Interface for Embedded Python Execution py_scripts for executing user-defined functions (Python “scripts”)

/oml/tenants/////py-scripts/v1///

Name of script in repository

Name of do-eval Customer pluggable table-apply tenant database name group-apply Cloud service within ADB URL index-apply row-apply

Example synchronous invocation from cURL $ curl -X POST --header “Authorization: Bearer ${token}” --header 'Content-Type: application/json' --header 'Accept: application/json' -d '-d ‘{“graphicsFlag”:true, “service”:“MEDIUM”}' "/oml/tenants/MYTENANT/databases/MYADW/api/py-scripts/v1/ RandomRedDots/do-eval”

Asynchronous invocation also available

18 Copyright © 2020, Oracle and/or its affiliates AutoML – new with OML4Py Increase data scientist productivity – reduce overall compute time

Auto Algorithm Auto Feature Auto Model Selection Selection Tuning ML Data Much faster than De-noise data and Significant accuracy Table exhaustive search reduce # of features improvement Model

Auto Algorithm Selection Auto Feature Selection Auto Model Tuning – Identify in-database – Reduce # of features by – Automatic tuning of algorithm algorithm that achieves identifying most predictive hyperparameters highest model quality – Improve performance – Avoid manual or exhaustive – Find best algorithm faster and accuracy search techniques than with exhaustive search

Enables non-expert users to leverage Machine Learning

Copyright © 2020 Oracle and/or its affiliates. Demo

20 Copyright © 2020, Oracle and/or its affiliates Summary – OML4Py

Python access to Oracle Machine Learning in Autonomous Database • Scalable data exploration, preparation, and analysis • Scalable in-database machine learning • Automation for greater data scientist productivity and non-expert use

Extends Python for enterprise use • In-database performance and scalability • Platform for application integration • Simplified production deployment of data science solutions

Copyright © 2020 Oracle and/or its affiliates. Helpful Links

ORACLE AUTONOMOUS CLOUD – ALWAYS FREE TIER https://cloud.oracle.com/tryit

ORACLE MACHINE LEARNING ON OTN https://www.oracle.com/machine-learning

OML TUTORIALS Interactive tour: https://docs.oracle.com/en/cloud/paas/autonomous-database/oml-tour Basic getting started: https://docs.oracle.com/en/cloud/paas/autonomous-data-warehouse-cloud/omlug/get- started-oracle-machine-learning.html

OML OFFICE HOURS https://asktom.oracle.com/pls/apex/asktom.search?office=6801#sessionss

ORACLE ANALYTICS CLOUD https://www.oracle.com/solutions/business-analytics/data-visualization/examples.html

22 Thank You

Mark Hornick [email protected] @MarkHornick