GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS

Brad Rees, Ph.D. - Senior Solution Architect - NVIDIA

GTC DC, November 2017

The AI Computing Company AGENDA – TWO PARTS Discuss Analysis from the Perspective of Data Science

“Data science, also known as data-driven • Part 1 science, is an interdisciplinary field about scientific methods, processes, and systems to • Big Data and Spark extract knowledge or insights from data …” - WIkipedia • GPU Barriers Better Exploration ∝ Better Science

• Part 2 Faster Analytics yield better Exploration • GOAI

Fail Fast Needs to be Embraces I have not failed. I've just found 10,000 ways that won't work. - Thomas A. Edison

the Big Data Catalyst The Glue that Binds Big Data

• Spark has become synonymous with Hadoop and Big Data • It’s the interface/API for big data app to app communication • The processing layer for big data and leading ML framework SPARK IS NOT ENOUGH We Want More Efficiency and Speed

• Common issue is speed at scale

• Scaling out to get the necessary speed for mission critical workloads is prohibitively expensive

• Clients want core ML on GPU Commercial Government HPC

We need a GPU-equivalent to Spark … But there are some Barriers GPU ADOPTION BARRIERS • Too much data movement • Too many makeshift data formats

Concerns: • No inter-GPU communication • Too Hard to Integrate GPUs • No Python API for data manipulation • Not suited for Data Science • No all inclusive Machine Learning Library DATA MOVEMENT AND TRANSFORMATION The bane of productivity / performance

• Too much time spent Moving data • Data movement and conversion hinder any performance gains • No Inter-GPU Communication

CPU Parquet CSV GML Panda Avro HDFS XML Numpy

JSON DATA FORMATS Pickle

ProtoBuf CSC CSR COO

Plain Text vs Binary Compressed vs Uncompressed

* Not a complete list ARE THE GPU BARRIERS TO GREAT? Is there any hope?

☹️ Data movement ☹️ Data formats ☹️ Inter-GPU communication ☹️ No Python API for data manipulation

☹️ No all inclusive Machine Learning Library GPU OPEN ANALYTICS INITIATIVE Luckily others were also thinking about the problems • Formed in March at Strata SJ; Launched at GTC in May • Goal: GOAI seeks to foster open collaboration between GPU analytics projects and products to enable data scientists to efficiently combine the best tools for their workflows. ACCELERATED ANALYTICS ECOSYSTEM Prior State (pre-March 2017)

● Fragmented with too INTERACTION Graphistry Jupyter NB MapD Immerse many holes ● Still too reliant on CPU for moving data between applications Data Manipulation ● 80-90% of data science is PROCESSING accelerated analytics, not MapD Anaconda * Fast Data deep learning yet BlazingDB (Dask NV Graph AND (Streaming) ANALYTICS (“SQL”) “Python”)

IN GPU

MEMORY Many Columnar Data Frames DATA (everyone has their own makeshift data frame) STRUCTURE Key: Open Source

Free to Use STORAGE MapD GPU Ram BlazingDB Disk Closed Source

* Primarily x86 w/ some GPU acceleration ACCELERATED ANALYTICS ECOSYSTEM Post-March 2017

INTERACTION Graphistry Jupyter NB MapD Immerse

Data Manipulation PROCESSING MapD Anaconda H2O (Data. Fast Data H2O.ai (GPU BlazingDB (Dask NV Graph AND Table “”) (Streaming) MLlib) ANALYTICS (“SQL”) “Python”)

IN GPU

MEMORY Standard Columnar Data Frame DATA (Open Sourced/Free to Use from MapD) STRUCTURE Key: Open Source

Free to Use MapD + BlazingDB STORAGE MapD GPU Ram BlazingDB Disk System Memory Closed Source LEARNING FROM APACHE ARROW Interoperability Big Data ecosystem facing similar issues

Major push in the big data world to remove bottlenecks of copy & converting data between systems

Apache Arrow™

• enables execution engines to take advantage of the latest SIMD (Single input multiple data) operations

• Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.

• The Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead. THE GPU DATA FRAME First GOAI Project

✓ Data movement ✓ Data formats ✓ Inter-GPU communication ✓ Python API ✓ Machine Learning Library

CPU

So …. What does this get me? SEAMLESS CALLS BETWEEN APPLICATIONS

What does GOAI get me? Big improvement for Data Science

• Load data into MapD • Call an H2O ML algorithm • All via Anaconda Python • Within a Jupyter Notebook

Demos available on goai github SEAMLESS CALLS BETWEEN APPLICATIONS

What does GOAI get me? Big improvement for Data Science

• Load data into MapDpygdf: Python library for manipulating GDFs • Call an H2O ML algorithm• Creating GDFs from numpy arrays and DataFrames • Performing math operations on columns • All via Anaconda Python• Import/export via CUDA IPC • Sort, join, reductions • Within a Jupyter Notebook• JIT compilation of group by and filter kernels using Numba

Demos available on goai github SIMPLE DATA CONVERSION

Convert from Pandas and Numpy Several Examples Available on GOAI GitHub GOAL OF GOAI Better Adoption with Better Usability and TCO

Hadoop Processing, Reading from disk

HDFS HDFS HDFS HDFS HDFS SQL Query ETL Train Read Write Read Write Read

Spark In-Memory Processing Large TCO benefit 25-100x Improvement over Hadoop Less code HDFS Language flexible SQL Query ETL ML Train Large Adoption Read Primarily In-Memory

GPU + Spark In-Memory Processing Small TCO benefit 5-10x Improvement over Spark More code HDFS GPU SQL CPU GPU CPU GPU ML Language rigid ETL Small Adoption Read Read QueryWrite Read Write Read Train Substantially on GPU

End-to-End GPU Processing (GOAI) Large TCO benefit 25-100x Improvement over Spark Same code Arrow SQL ML Language flexible ETL Large Adoption? Read Query Train Primarily on GPU • libgdf: library of helper functions: • Copying GDF metadata block to the host and parsing it INITIAL LIBRARIES to a host-side struct • Importing/exporting via CUDA IPC GPU Data Frame • CUDA kernels to perform element-wise math operations on GDF columns. • CUDA sort, join, and reduction operations on GDFs. github.com/gpuopenanalytics • pygdf: Python library for manipulating GDFs • Creating GDFs from numpy arrays and Pandas DataFrames • Performing math operations on columns • Import/export via CUDA IPC • Sort, join, reductions • JIT compilation of group by and filter kernels using Numba

• dask_gdf: Extension for Dask to work with distributed GDFs. • Same operations as pygdf, but working on GDFs chunked onto different GPUs and different servers. ABOUT

~8.5x speedup on half a DGX ~100x speedup using MapD on Python on GPU... to produce a robust GLM via half a DGX to analyze census Numba and Pandas 10-fold cross-validation vs an 8 data vs a 20 node Spark cluster node Spark cluster

~5X faster than Redshift to utilize full disk storage and system memory >50x speedup in ~100x more cyber security data performing pagerank on a interactively visualized using an graph on half a DGX vs intuitive layout algorithm on a an 8 node Spark cluster single GPU as a connected graph MapD GPU-accelerated analytics platform Consists of MapD Core database and MapD Immerse

MapD Core database is an in-GPU-memory, columnar, open-source, GPU-accelerated, SQL database.

MapD Enterprise brings distributed and high availability modes, GPU-accelerated backend rendering, Kerberos/LDAP security, and ODBC/JDBC.

MapD Immerse is a visual analytics platform on top of the MapD Core database that allows data scientists and analysts to interactively explore large datasets. 1.1 BILLION TAXI RIDES BENCHMARK

Query 1 Query 2 Query 3 Query 4 GPU Memory based 10190 8134 19624 85942 5000databases 45008x to 15x faster

4000than CPU in- memory databases such3500 as Redshift. 2970 3000 100x to 485x faster 2500 than Spark 2250 2000on 11-servers 1560 Time in Milliseconds 1500 1209 1250 Open Source core 1000 DBMS 795 596 518 500 372 150 21 80 Free0 Community EditionMapD DGX-1 Kinetica DGX-1 Redshift 6-node Spark 11-node

Source: MapD Benchmarks on DGX-1 from internal NVIDIA testing following guidelines of @marklit82 Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS BlazingDb GPU-accelerated petabyte scale data warehouse

Consists of BlazingDB database

BlazingDB database is a disk-based, columnar, GPU-accelerated SQL database.

BlazingDB has distributed and high availability modes, JDBC, and Python/C# APIs.

BlazingDB offers a Community Edition that can be downloaded for free and has an Enterprise Edition that you can launch today on AWS. Blazing DB high performance SQL on petabyte scale

Blazing speedup

BlazingDB SQL is built on a columnar relational data model. Enterprise grade security through Spring Security BlazingDB distributes both data and computation to multiple instances, for more data, or faster query speeds •https://blazingdb.com/ Anaconda Python Open-source focused, GPU-accelerated data science platform

Contains Anaconda Accelerate, Numba, and Dask

Anaconda Accelerate provides access to libraries optimized for performance on NVIDIA GPUs such as CUDA Sorting and cuBLAS.

Numba is a compiler for Python functions that generates native code for GPU hardware.

Dask is a parallel computing library for analytic computing in Python. It enables distributed computing in Pure Python and integrates with Anaconda Accelerate and Numba. NUMBA PERFORMANCE How Fast

Jeremy Howard

Deep learning researcher & educator.

Founder: fast.ai Faculty: USF & Singularity University Previously - CEO: Enlitic President: Kaggle CEO Fastmail

Rewrote the PolynomialFeatures from scikit_learn in Numba. Got a 40x speedup in only 12 lines of code H2O.ai Open-source GPU-accelerated machine learning platform

Contains H2O.ai platform

H2O.ai has a working implementation of GPU- accelerated generalized linear modeling.

H2O.ai is working to GPU-accelerate additional machine learning algorithms such as random forests, gradient boosting machines, and clustering.

H2O.ai is working on porting data.table, a columnar data frame library, along with the world's fastest implementation of the sort algorithm to NVIDIA GPUs. MACHINE LEARNING LIBRARY H2O4GPU Roadmap Graphistry GPU-accelerated graph visualization engine

Consists of Graphistry graph visualization engine

Graphistry uses GPUs in the backend for layout calculation and machine learning.

Graphistry uses GPUs in the frontend for rendering the visualization in a web browser.

Graphistry allows a user to interactively visualize magnitudes more data than traditional solutions in an intuitive way. Different Graphs, Different Questions

IR: Killchain Analysis Hunting: Daily SecOps: Shadow IT Anomalies Use

Threat Intel: Botnet Ops/NOC: Outage Fraud: Tracking Analysis Root Cause Embezzlers Gunrock Open-source GPU-accelerated graph analytics library

Consists of Gunrock graph analytics library

Gunrock has multi-GPU implementations of graph algorithms such as PageRank, Breadth First Search, Single Source Shortest Path, etc.

Gunrock has high level API in C that is accessible from Python. JOIN THE REVOLUTION Everyone Can Help!

GPU Open Analytics APACHE ARROW Initiative http://gpuopenanalytics.com/ https://arrow.apache.org/ https://parquet.apache.org/ @Gpuoai @ApacheArrow @ApacheParquet

Integrations, feedback, documentation support, pull requests, new issues, or donations welcomed! GOAI PARTNER SESSION LINE-UP AT GTC DC 2017

Session # Topi c Wednesday 11/1 DC7213 World's Fastest Machine Learning With GPUs 2:00pm Jon Mckinney - Senior Developer, H2O.ai Hemisphere A Wednesday 11/1 DC7212 Interpretable AI: Not Just For Regulators 2:30pm Patrick Hall - Director of Data Science, H2O.ai Hemisphere A Wednesday 11/1 DC7189 The Impact of GPUs in Geovisualization for Government 5:00pm Todd Mostak - CEO & Founder, MapD Polaris Thursday11/2 DC7133 Scaling Event Data Investigations with GPU Visual Graph Analytics 2:00pm Leo Meyerovich - CEO, Graphistry, Inc Hemisphere B Thursday 11/2 DC7111 Accelerating Cyber Threat Detection with GPUs 4:30pm Josh Patterson - NVIDIA Atrium Hall Fundamentals NVIDIA DEEP LEARNING INSTITUTE Training available as online self-paced labs and instructor-led workshops

Take self-paced labs at www.nvidia.com/dlilabs

Find or request an instructor-led workshop at www.nvidia.com/dli

Educators: download the Teaching Kit at developer.nvidia.com/teaching-kit and contact [email protected] for info on the University Ambassador Program Autonomous Vehicles Healthcare Media & Entertainment

…and more

Machine Vision - IVA Finance http://gpuopenanalytics.com/

Thank You !