Gpu Open Analytics Initiative End-To-End Accelerated Analytics

GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS Brad Rees, Ph.D. - Senior Solution Architect - NVIDIA GTC DC, November 2017 The AI Computing Company AGENDA – TWO PARTS Discuss Analysis from the Perspective of Data Science “Data science, also known as data-driven • Part 1 science, is an interdisciplinary field about scientific methods, processes, and systems to • Big Data and Spark extract knowledge or insights from data …” - WIkipedia • GPU Barriers Better Exploration ∝ Better Science • Part 2 Faster Analytics yield better Exploration • GOAI Fail Fast Needs to be Embraces I have not failed. I've just found 10,000 ways that won't work. - Thomas A. Edison the Big Data Catalyst The Glue that Binds Big Data • Spark has become synonymous with Hadoop and Big Data • It’s the interface/API for big data app to app communication • The processing layer for big data and leading ML framework SPARK IS NOT ENOUGH We Want More Efficiency and Speed • Common issue is speed at scale • Scaling out to get the necessary speed for mission critical workloads is prohibitively expensive • Clients want core ML on GPU Commercial Government HPC We need a GPU-equivalent to Spark … But there are some Barriers GPU ADOPTION BARRIERS • Too much data movement • Too many makeshift data formats Concerns: • No inter-GPU communication • Too Hard to Integrate GPUs • No Python API for data manipulation • Not suited for Data Science • No all inclusive Machine Learning Library DATA MOVEMENT AND TRANSFORMATION The bane of productivity / performance • Too much time spent Moving data • Data movement and conversion hinder any performance gains • No Inter-GPU Communication CPU Parquet CSV GML Panda Avro HDFS XML Numpy JSON DATA FORMATS Pickle ProtoBuf CSC CSR COO Plain Text vs Binary Compressed vs Uncompressed * Not a complete list ARE THE GPU BARRIERS TO GREAT? Is there any hope? ☹️ Data movement ☹️ Data formats ☹️ Inter-GPU communication ☹️ No Python API for data manipulation ☹️ No all inclusive Machine Learning Library GPU OPEN ANALYTICS INITIATIVE Luckily others were also thinking about the problems • Formed in March at Strata SJ; Launched at GTC in May • Goal: GOAI seeks to foster open collaboration between GPU analytics projects and products to enable data scientists to efficiently combine the best tools for their workflows. ACCELERATED ANALYTICS ECOSYSTEM Prior State (pre-March 2017) ● Fragmented with too INTERACTION Graphistry Jupyter NB MapD Immerse many holes ● Still too reliant on CPU for moving data between applications Data Manipulation ● 80-90% of data science is PROCESSING accelerated analytics, not MapD Anaconda * Fast Data deep learning yet BlazingDB (Dask NV Graph AND (Streaming) ANALYTICS (“SQL”) “Python”) IN GPU MEMORY Many Columnar Data Frames DATA (everyone has their own makeshift data frame) STRUCTURE Key: Open Source Free to Use STORAGE MapD GPU Ram BlazingDB Disk Closed Source * Primarily x86 w/ some GPU acceleration ACCELERATED ANALYTICS ECOSYSTEM Post-March 2017 INTERACTION Graphistry Jupyter NB MapD Immerse Data Manipulation PROCESSING MapD Anaconda H2O (Data. Fast Data H2O.ai (GPU BlazingDB (Dask NV Graph AND Table “R”) (Streaming) MLlib) ANALYTICS (“SQL”) “Python”) IN GPU MEMORY Standard Columnar Data Frame DATA (Open Sourced/Free to Use from MapD) STRUCTURE Key: Open Source Free to Use MapD + BlazingDB STORAGE MapD GPU Ram BlazingDB Disk System Memory Closed Source LEARNING FROM APACHE ARROW Interoperability Big Data ecosystem facing similar issues Major push in the big data world to remove bottlenecks of copy & converting data between systems Apache Arrow™ • enables execution engines to take advantage of the latest SIMD (Single input multiple data) operations • Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs. • The Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead. THE GPU DATA FRAME First GOAI Project ✓ Data movement ✓ Data formats ✓ Inter-GPU communication ✓ Python API ✓ Machine Learning Library CPU So …. What does this get me? SEAMLESS CALLS BETWEEN APPLICATIONS What does GOAI get me? Big improvement for Data Science • Load data into MapD • Call an H2O ML algorithm • All via Anaconda Python • Within a Jupyter Notebook Demos available on goai github SEAMLESS CALLS BETWEEN APPLICATIONS What does GOAI get me? Big improvement for Data Science • Load data into MapDpygdf: Python library for manipulating GDFs • Call an H2O ML algorithm• Creating GDFs from numpy arrays and Pandas DataFrames • Performing math operations on columns • All via Anaconda Python• Import/export via CUDA IPC • Sort, join, reductions • Within a Jupyter Notebook• JIT compilation of group by and filter kernels using Numba Demos available on goai github SIMPLE DATA CONVERSION Convert from Pandas and Numpy Several Examples Available on GOAI GitHub GOAL OF GOAI Better Adoption with Better Usability and TCO Hadoop Processing, Reading from disk HDFS HDFS HDFS HDFS HDFS SQL Query ETL Train Read Write Read Write Read Spark In-Memory Processing Large TCO benefit 25-100x Improvement over Hadoop Less code HDFS Language flexible SQL Query ETL ML Train Large Adoption Read Primarily In-Memory GPU + Spark In-Memory Processing Small TCO benefit 5-10x Improvement over Spark More code HDFS GPU SQL CPU GPU CPU GPU ML Language rigid ETL Small Adoption Read Read QueryWrite Read Write Read Train Substantially on GPU End-to-End GPU Processing (GOAI) Large TCO benefit 25-100x Improvement over Spark Same code Arrow SQL ML Language flexible ETL Large Adoption? Read Query Train Primarily on GPU • libgdf: C library of helper functions: • Copying GDF metadata block to the host and parsing it INITIAL LIBRARIES to a host-side struct • Importing/exporting via CUDA IPC GPU Data Frame • CUDA kernels to perform element-wise math operations on GDF columns. • CUDA sort, join, and reduction operations on GDFs. github.com/gpuopenanalytics • pygdf: Python library for manipulating GDFs • Creating GDFs from numpy arrays and Pandas DataFrames • Performing math operations on columns • Import/export via CUDA IPC • Sort, join, reductions • JIT compilation of group by and filter kernels using Numba • dask_gdf: Extension for Dask to work with distributed GDFs. • Same operations as pygdf, but working on GDFs chunked onto different GPUs and different servers. ABOUT ~8.5x speedup on half a DGX ~100x speedup using MapD on Python on GPU... to produce a robust GLM via half a DGX to analyze census Numba and Pandas 10-fold cross-validation vs an 8 data vs a 20 node Spark cluster node Spark cluster ~5X faster than Redshift to utilize full disk storage and system memory >50x speedup in ~100x more cyber security data performing pagerank on a interactively visualized using an graph on half a DGX vs intuitive layout algorithm on a an 8 node Spark cluster single GPU as a connected graph MapD GPU-accelerated analytics platform Consists of MapD Core database and MapD Immerse MapD Core database is an in-GPU-memory, columnar, open-source, GPU-accelerated, SQL database. MapD Enterprise brings distributed and high availability modes, GPU-accelerated backend rendering, Kerberos/LDAP security, and ODBC/JDBC. MapD Immerse is a visual analytics platform on top of the MapD Core database that allows data scientists and analysts to interactively explore large datasets. 1.1 BILLION TAXI RIDES BENCHMARK Query 1 Query 2 Query 3 Query 4 GPU Memory based 10190 8134 19624 85942 5000databases 45008x to 15x faster 4000than CPU in- memory databases such3500 as Redshift. 2970 3000 100x to 485x faster 2500 than Spark 2250 2000on 11-servers 1560 Time in Milliseconds 1500 1209 1250 Open Source core 1000 DBMS 795 596 518 500 372 150 21 80 Free0 Community EditionMapD DGX-1 Kinetica DGX-1 Redshift 6-node Spark 11-node Source: MapD Benchmarks on DGX-1 from internal NVIDIA testing following guidelines of @marklit82 Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS BlazingDb GPU-accelerated petabyte scale data warehouse Consists of BlazingDB database BlazingDB database is a disk-based, columnar, GPU-accelerated SQL database. BlazingDB has distributed and high availability modes, JDBC, and Python/C# APIs. BlazingDB offers a Community Edition that can be downloaded for free and has an Enterprise Edition that you can launch today on AWS. Blazing DB high performance SQL on petabyte scale Blazing speedup BlazingDB SQL is built on a columnar relational data model. Enterprise grade security through Spring Security BlazingDB distributes both data and computation to multiple instances, for more data, or faster query speeds •https://blazingdb.com/ Anaconda Python Open-source focused, GPU-accelerated data science platform Contains Anaconda Accelerate, Numba, and Dask Anaconda Accelerate provides access to libraries optimized for performance on NVIDIA GPUs such as CUDA Sorting and cuBLAS. Numba is a compiler for Python functions that generates native code for GPU hardware. Dask is a parallel computing library for analytic computing in Python. It enables distributed computing in Pure Python and integrates with Anaconda Accelerate and Numba. NUMBA PERFORMANCE How Fast Jeremy Howard Deep learning researcher & educator. Founder: fast.ai Faculty: USF & Singularity University Previously - CEO: Enlitic President: Kaggle CEO Fastmail Rewrote the PolynomialFeatures from scikit_learn in Numba. Got a 40x speedup in only 12 lines of code H2O.ai Open-source GPU-accelerated machine learning platform Contains H2O.ai platform H2O.ai has a working implementation of GPU- accelerated generalized linear modeling. H2O.ai is working to GPU-accelerate additional machine learning algorithms such as random forests, gradient boosting machines, and clustering. H2O.ai is working on porting data.table, a columnar data frame library, along with the world's fastest implementation of the sort algorithm to NVIDIA GPUs. MACHINE LEARNING LIBRARY H2O4GPU Roadmap Graphistry GPU-accelerated graph visualization engine Consists of Graphistry graph visualization engine Graphistry uses GPUs in the backend for layout calculation and machine learning.

Load more