Introducing Oracle Machine Learning for Python Mark Hornick - Senior Director, Data Science and Machine Learning at Oracle [email protected], www.twitter.com/MarkHornick
1 Future and past TechCasts:
Submit a topic to share at https://analyticsanddatasummit.org/techcasts/
Analytics & Data Oracle User Community Same great technical content…new name! www.andouc.org Save the Date TechCast Days-Winter Session January 26-28, 2021 Watch our website & social media channels for more details
Share your knowledge, expertise and ideas! Submit your presentation by going to our website and clicking on “TechCasts”
3 4 Oracle Machine Learning for Python Introduction
Mark Hornick
Senior Director, Data Science and Machine Learning
November 19, 2020
Copyright © 2020 Oracle and/or its affiliates. Safe Harbor
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. What is Python?
An interpreted, object-oriented, high level, general purpose programming language Designed for rapid application development and scripting to connect existing components Created in the late 1980s and first released in 1991 Open source: https://www.python.org World-wide usage • Widely taught in Universities • Many Data Scientists know and use Python Thousands of open source packages to enhance productivity
Copyright © 2020 Oracle and/or its affiliates. Traditional Python and Data Source Interaction
Read/Write files using built-in tool capabilities Flat Files Data Source read extract / export
export load
Data source connectivity packages, e.g., cx_Oracle
Access latency Deployment Paradigm shift: Python → Data Access Language → Python Ad hoc cron job Memory limitation – data size, in-memory processing Single threaded Issues for backup, recovery, security Ad hoc production deployment
Copyright © 2020 Oracle and/or its affiliates. Oracle Machine Learning d OML4SQL OML Notebooks SQL API with Apache Zeppelin on Autonomous Database OML4R Oracle Data Miner R API Oracle SQL Developer extension OML4Py* OML4Spark Python API R API on Big Data OML AutoML UI* Code-free AutoML interface on Autonomous Database OML Services* Model Deployment and Management, Cognitive Text
* Coming soon Copyright © 2020 Oracle and/or its affiliates. Oracle Machine Learning Notebooks Autonomous Database as a Data Science Platform
Collaborative UI • Based on Apache Zeppelin • Supports data scientists, data analysts, application developers, DBAs with SQL and Python • Easy notebook sharing • Permissions, versioning, and scheduling of notebooks Included with Autonomous Database • Automatically provisioned and managed • In-database algorithms and analytics functions • Explore and prepare, build and evaluate models, score data, deploy solutions
Copyright © 2020 Oracle and/or its affiliates. Oracle Machine Learning for Python Supported in Oracle Autonomous Database with OML Notebooks
Use Oracle Database as HPC environment • Explore, transform, and analyze data faster and at scale OML Notebooks REST Interface Use in-database parallelized and distributed ML algorithms OML4Py • Build more models on more data, and score large volume data – faster • Use in-database algorithms from OML4SQL via natural Python API • Increased productivity from automatic data preparation, partitioned models, and integrated text mining capabilities Execute Python scripts and manage Python objects in-database • Collaborate: hand-off data science products from data scientist to developers easily • Run user-defined functions in data-parallel, task-parallel, and non-parallel fashion • Return structured and image results in Python and REST API New automatic machine learning (AutoML) and model explainability (MLX) • Enhance data scientist productivity and enable non-experts to use and benefit from machine learning • Algorithm selection, feature selection, hyperparameter tuning, model selection • Model-agnostic identification of important features that impact model predictions
Copyright © 2020 Oracle and/or its affiliates. Transparency Layer In-database performance – indexes, query optimization, parallelism, partitioning
Leverages proxy objects for database data: oml.DataFrame DATA.shape DATA.head() • # Create table from Pandas DataFrame data DATA.describe() DATA = oml.create(data, table = 'BOSTON') DATA.std() • # Get proxy object to DB table boston DATA.skew() DATA = oml.sync(table = 'BOSTON') TRAIN, TEST = Uses familiar Python syntax to manipulate database data DATA.split() Overloads Python functions translating functionality to SQL TRAIN.shape TEST.shape
Copyright © 2020 Oracle and/or its affiliates. In-database scalable aggregation Example using the crosstab function
ONTIME_S = oml.sync(table="ONTIME_S") res = ONTIME_S.crosstab('DEST') type(res) res.head()
Source data is a DataFrame, ONTIME_S, select DEST, count(*) OML Notebooks which is an Oracle Database table from ONTIME_S group by DEST crosstab() function overloaded to accept OML OML4Py DataFrame objects and transparently generates SQL for scalable processing in Oracle Autonomous Database Oracle Database In-db Returns an ‘oml.core.frame.DataFrame’ object stats User tables
Copyright © 2020 Oracle and/or its affiliates. OML4Py 1.0 Machine Learning in-database algorithms
Classification Clustering Association Rules • Decision Tree • Expectation Maximization • Naïve Bayes • Apriori – Association Rules • Generalized Linear Model • Hierarchical k-Means • Support Vector Machine • Random Forest • Neural Network Attribute Importance Feature Extraction • Minimum Description Length • Singular Value Decomposition Regression • Explicit Semantic Analysis • Principal Component Analysis via SVD • Generalized Linear Model Anomaly Detection • Neural Network • Support Vector Machine • 1 Class Support Vector Machine
Supports automatic data preparation, partitioned model ensembles, integrated text mining
Copyright © 2020 Oracle and/or its affiliates. Scalable in-database algorithms Example using Support Vector Machine from oml import svm
# create proxy object OML Notebooks ONTIME_S = oml.sync(table='ONTIME_S') OML4Py # define model object settings = {'svms_outlier_rate' : 0.01} svm_mod = svm('anomaly_detection', Oracle Autonomous svms_kernel_function = Database 'dbms_data_mining.svms_linear', **settings) # build anomaly detection model svm_mod = svm_mod.fit(x=ONTIME_S, y=None) User tables # view model object svm_mod
Copyright © 2020 Oracle and/or its affiliates. Use matplotlib visualization with in-database model results Example using OML Notebooks with in-database clustering model build and score
Drop existing model
Build k-Means model
Score using model
16 Copyright © 2020, Oracle and/or its affiliates Embedded Python Execution Example of parallel partitioned data flow using third party package
REST Interface OML Notebooks # user-defined function using sklearn def build_lm(dat): OML4Py from sklearn import linear_model lm = linear_model.LinearRegression() Oracle Autonomous X = dat[['PETAL_WIDTH']] Database y = dat[['PETAL_LENGTH']] lm.fit(X, y) return lm User tables # select column(s) for partitioning data index = oml.DataFrame(IRIS['SPECIES']) # invoke function in parallel on IRIS table mods = oml.group_apply(IRIS, index, Python Engine spawns func=build_lm, OML4Py parallel=2) mods.pull().items() Python Engine OML4Py Copyright © 2020 Oracle and/or its affiliates. REST Interface for Embedded Python Execution py_scripts for executing user-defined functions (Python “scripts”)
Name of script in repository
Name of do-eval Customer pluggable table-apply tenant database name group-apply Cloud service within ADB URL index-apply row-apply
Example synchronous invocation from cURL $ curl -X POST --header “Authorization: Bearer ${token}” --header 'Content-Type: application/json' --header 'Accept: application/json' -d '-d ‘{“graphicsFlag”:true, “service”:“MEDIUM”}' "
Asynchronous invocation also available
18 Copyright © 2020, Oracle and/or its affiliates AutoML – new with OML4Py Increase data scientist productivity – reduce overall compute time
Auto Algorithm Auto Feature Auto Model Selection Selection Tuning ML Data Much faster than De-noise data and Significant accuracy Table exhaustive search reduce # of features improvement Model
Auto Algorithm Selection Auto Feature Selection Auto Model Tuning – Identify in-database – Reduce # of features by – Automatic tuning of algorithm algorithm that achieves identifying most predictive hyperparameters highest model quality – Improve performance – Avoid manual or exhaustive – Find best algorithm faster and accuracy search techniques than with exhaustive search
Enables non-expert users to leverage Machine Learning
Copyright © 2020 Oracle and/or its affiliates. Demo
20 Copyright © 2020, Oracle and/or its affiliates Summary – OML4Py
Python access to Oracle Machine Learning in Autonomous Database • Scalable data exploration, preparation, and analysis • Scalable in-database machine learning • Automation for greater data scientist productivity and non-expert use
Extends Python for enterprise use • In-database performance and scalability • Platform for application integration • Simplified production deployment of data science solutions
Copyright © 2020 Oracle and/or its affiliates. Helpful Links
ORACLE AUTONOMOUS CLOUD – ALWAYS FREE TIER https://cloud.oracle.com/tryit
ORACLE MACHINE LEARNING ON OTN https://www.oracle.com/machine-learning
OML TUTORIALS Interactive tour: https://docs.oracle.com/en/cloud/paas/autonomous-database/oml-tour Basic getting started: https://docs.oracle.com/en/cloud/paas/autonomous-data-warehouse-cloud/omlug/get- started-oracle-machine-learning.html
OML OFFICE HOURS https://asktom.oracle.com/pls/apex/asktom.search?office=6801#sessionss
ORACLE ANALYTICS CLOUD https://www.oracle.com/solutions/business-analytics/data-visualization/examples.html
22 Thank You
Mark Hornick [email protected] @MarkHornick