In-Memory Performance Durability of Disk

© 2018 GridGain Systems, Inc. Apache Ignite and Apache Spark

Where Fast Data Meets the IoT

Akmal Chaudhri GridGain Systems

© 2018 GridGain Systems, Inc. Agenda

• IoT Demands to Software • IoT Software Stack • Device OS/RTOS • Data Collection and Enrichment • NewSQL • Application APIs • Demo

© 2018 GridGain Systems, Inc. IoT Demands to Software

Real-time Processing

SQL, Geo-Spatial

Analytics (BI, ML)

High-Availability

Simple Scalability

© 2018 GridGain Systems, Inc. IoT Software Stack

Application APIs

NewSQL Database

Data Collection and Enrichment

Device OS/Real-Time OS

© 2018 GridGain Systems, Inc. Apache IoT Software Stack

Application APIs

NewSQL Database

Data Collection and Enrichment

Device OS/Real-Time OS

© 2018 GridGain Systems, Inc. Apache MyNewt

Open Source RTOS Cortex M, MIPS Bluetooth, Wifi, TCP/IP

Secured Bootloader Remote Firmware Upgrade

© 2018 GridGain Systems, Inc. Data Collection and Enrichment

DURABLE MEMORY

DURABLE MEMORY

Ignite Cluster

© 2018 GridGain Systems, Inc. Apache Ignite Database, Caching and Processing Platform

Financial Telco Travel & E-Commerce Pharma & IoT Services Logistics Healthcare

SQL Key/Value Transactions Compute Services Streaming ML

Memory-Centric Storage

Ignite Native Persistence Third-Party Persistence (Flash, SSD, Intel 3D XPoint) (RDBMS, HDFS, NoSQL)

© 2018 GridGain Systems, Inc. Ignite and Spark Integration

Spark Application

Spark Worker Spark Worker Spark Worker Share state and Boost DataFrame data among and SQL Spark jobs Spark Spark Spark Spark Spark Spark Performance Job Job Job Job Job Job

No data In-Memory Shared RDD or DataFrame SQL on top movement of RDDs

GridGain Node GridGain Node GridGain Node

In-place query execution

Yarn Mesos HDFS

© 2018 GridGain Systems, Inc. SQL Queries Execution Flow

Ignite Node

Toronto Montreal 2 Canada Ottawa Calgary 1

Ignite Node 3 2

Mumbai India New Delhi 1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one

© 2018 GridGain Systems, Inc. Comparing Ignite and Spark

• Distributed memory-centric database • Ingests data from HDFS or another storage

• Fully fledged compute platform: SQL, • Streaming and compute engine transactions, key-value, collocated processing, ML/DL

• OLAP and OLTP • Inclined towards OLAP and focused on MR payloads

© 2018 GridGain Systems, Inc. Ignite and Spark Together

Ignite is a memory-centric store for Spark

• No data movement from Ignite to Spark + • In-place query execution

• Boost DataFrame and SQL performance • Share state and data among Spark jobs

• Faster data and streaming analytics

© 2018 GridGain Systems, Inc. DEMO

© 2018 GridGain Systems, Inc. Any Questions?

Thank you for joining us. Follow the conversation. http://ignite.apache.org

#apacheignite

© 2018 GridGain Systems, Inc.