How fast is fast enough? SAP HANA in-memory technologies for Big Data Dmitry Shepelyavy, Platform Business Area Head, SAP CIS Oct 08, 2014 How to turn new signals into business value?
:-)
Brand Predictive Network Structured Data Sentiment Maintenance Optimization
Automobiles Location- based Data
Machine Data
Mobile Asset Personalized Product IMHO, it’s great! Tracking Care Recommendation
Text Data
Click Stream Propensity Real-time Demand/ 360O Customer Point of Sale to Churn Supply Forecast View Social Network Customer Data RFID Smart Meter
Insider Risk Mitigation, Fraud Threats Real-time Detection
© 2014 SAP AG or an SAP affiliate company. All rights reserved. Customer 2 SAP HANA Platform for Big Data
HANA Apps, StartUp DWH & SAP Accelerators & Any Apps Datamarts BusinessSuite & RDS ISV Apps on HANA
REAL-TIME APPLICATIONS REAL-TIME ANALYTICS
Consumer Sense & Planning & Operational Big Data Predictive, Spatial & Text Engagement Respond Optimization Analytics Analytics
SAPSAP HANA HANA PLATFORM Platform
Extended Application Services Administration Processing Engine
Database Services Development Application Function Libraries & Data Models
Integration Services
Deployment: On-Premise | Hybrid | On-Demand © 2014 SAP (Schweiz) AG. All rights reserved. 3 Data Processing Simplified & Optimized with SAP HANA
• Fully ACID compliant, In-memory, columnar, massively parallel SAP HANA PLATFORM processing database platform Application Services • Open Interfaces: SQL, ODBC, JDBC, Processing MDX, JSON, XML, … Event Processing Engine Planning
Calculation Predictive Text Mining • In-memory stored procedures and
Administration Administration Data virtualization with smart data
Database Services Services access
OLTP + OLAP SIMD In-Memory
• Integrated data processing for end to
Services
Deployment Deployment end analytic processing MPP CPU Cache Aware Shared Nothing
Rules Search Graph Scan 5 billion billion integer/sec/core Machine Learning Spatial GIS Time Series
12.5 million aggregates/sec /core Integration Services Ingest Deployment Service 1.5 million records/sec/node OnDemand | Hybrid | OnPremise
© 2014 SAP (Schweiz) AG. All rights reserved. 4 SAP HANA Software & Hardware Architecture
64bit address space Multi-Core Architecture 6 TB in current servers 8 CPU x 15 Cores per node Dramatic decline in Massive parallel scaling with many price/performance CPU blades
L3 L3 L3 L3 L3 L3 L3 L3 Cach Cach Cach Cach Cach Cach Cach Cach e e e e e e e e
Apps MEMORY + + DB no In database Row + OLTP+OLAP Compression aggregate tables algorithms Columnar
Logging and Backup
STORAGE SSD HDD In-Memory database Combine OLTP, OLAP and HW acceleration Today SAP HANA complex, duplicate, inconsistent easy-to-deploy, real-time, simplified experience
Transact Analyze Accelerate Transactions In-memory + analysis acceleration Several copies of data Eliminate unnecessary Different data models complexity & latency Inherent data latency Less hardware to manage Accelerate through simplification + in-memory
© 2013 SAP AG. All rights reserved. Create new possibilities 6
In-Memory computing – More than a Database Move data intense operations to the in-memory computing
High performance apps delegate data intense operations to the in-memory computing Traditional applications execute many data intense operations in the application layer In-Memory Computing Imperatives Avoid movement of detailed data Calculate first, then move results Eliminate unnecessary process steps
© 2013 SAP AG. All rights reserved. Remove Latency 7 SAP HANA - Simplifying Business Intelligence and Analytics
© 2013 SAP AG. All rights reserved. Customer 8 The Big Data Challenge
ACQUIRE PROCESS & ANALYZE ACT REAL RESULTS STORE
REAL TIME
SAP: Big Data, Real-time, with Real Results
© 2014 SAP (AGSchweiz or an) SAP AG. affiliateAll rights company. reserved. All rights reserved. Customer 9 9 Document Store
Different new types of data technologies { "firstName": "John", "lastName": "Smith", "age": 25, "address": Returns a chunk of { data using a hash key Store hierarchical "streetAddress": "21 2nd Street", "city": "New York", documents rather than "state": "NY", rows "postalCode": "10021" Key Value Key-Value }, "phoneNumber": [ { "type": "home", "number": "212 555-1234" Hadoop Document }, { "type": "fax", "number": "646 555-4567" } ] }
Big Graph Store Key-Value Store Relationships (between Data nodes) are first class Graph citizens New SQL Databases …
Cloud NewSQL VoltDB Solutions Databases
High performance by skipping recovery, latching, locking and buffer pools
© 2014 SAP (Schweiz) AG. All rights reserved. 10 Document Store
Different new types of data technologies { "firstName": "John", "lastName": "Smith", "age": 25, "address": Returns a chunk of { data using a hash key Key-Value Store hierarchical "streetAddress": "21 2nd Street", "city": "New York", e.g. documents rather than "state": "NY", rows "postalCode": "10021" Key Value Cassandra, }, Hbase, "phoneNumber": Hadoop Document [ SimpleDB, { HDFS, Hive, Voldemort Stores "type": "home", e.g. "number": "212 555-1234" Hbase, Pig, }, Mahout, Couchbase, { CouchDB, "type": "fax", MongoDB, "number": "646 555-4567" … } ] }
Big Graph Graph Store Key-Value Store e.g. Neo4j, Relationships (between Data Giraph, nodes) are first class GraphBase, citizens New SQL Databases … GraphLab, Infinite Graph
Cloud Solutions NewSQL e.g, Amazon Databases VoltDB SimpleDB VoltDB, DynamoDB, Starcounter High performance by skipping Redshift recovery, latching, locking and buffer pools
© 2014 SAP (Schweiz) AG. All rights reserved. 11 Complexity of IT landscape Point optimization is not enough to meet the new frontiers of real-time business
IMPACT ON BUSINESS Slow Response Times | Usability Challenges | Lack Of Adaptability
Real-time Business Scenario Product Predictive Network Insider Fraud Recommendation Maintenance Optimization Threats Detection
Planning Order Operational Real-time Risk Trend Sentiment Predictive Pattern Location Predict Processing Reporting & Fraud Analysis Analytics Analytics Recognition Intelligence Monitor Communicate Analyze Summarize Aggregate ETL Staging Clean-Data 1 Quality
Collect Transactional Data Sensors Mobile Archives Social & Text Geo-Spatial Transact 0 Datastore Warehouse Data Data IMPACT ON IT High Latency | Complexity | High Cost of Solutions
© 2014 SAP (Schweiz) AG. All rights reserved. 12 SAP HANA Platform – More than just a database
SAP Business Suite Any Apps and BW ABAP App Server Any App Server
SQL MDX R JSON Open Connectivity
SAP HANA Platform Extended Application Services
App Server| UI Integration Services | Web Server Life
Supports any Device UnifiedAdministration
Processing Engine -
cycleManagement
OLTP | OLAP | Search | Text Analysis |Predictive | Events | Spatial | Rules | Planning | Graph Security
Database Services
Application Application
Development Application Function Libraries & Data Models
Predictive Analysis Libraries | Business Function Libraries | Data Models & Stored Procedures
Process OrchestrationProcess Integration Services Data Virtualization | Replication | ETL/ELT | Mobile Synch | Streaming
Deployment: On-Premise | Hybrid | On-Demand
SAP HANA platform converges Database, Data Processing and Application Platform capabilities & provides libraries for predictive, planning, text, spatial, and business analytics so businesses can operate in real-time. © 2014 SAP (Schweiz) AG. All rights reserved. 13 SAP Event Stream Processor
INPUT Studio OUTPUT STREAMS/EVENTS (Authoring) STREAMS/EVENTS
Event Data Event Stream Alerts
Processor ? Dashboard Sensors (ESP) Message Bus
Applications SAP HANA Business Data Analytics
Integrate events & history output to applications, dashboards, devices, Extreme performance & scalability messaging platforms © 2014 SAP (Schweiz) AG. All rights reserved. 14
SAP HANA - Spatial Engine
Mobility Visualization Analytics HTML 5 GIS Applications
SAP HANA Spatial Processing Business Data + Spatial Data + Real-time Data
Geo – Services Geo – Content Columnar Spatial Calc Model / Views Spatial Functions Spatial Data Types - Geocoding - Political Processing - Joins - Area - Points - Base maps Boundaries - Views - Distance - Lines - POIs - Within - Polygons - Roads
Transaction Unstructured Location Data Machine Data Data Data
Real-time Spatial Processing Spatial Analytics Optimization Spatial Data Types & Functions Geo-content & services High-performance algorithms analyze Columnar storage architecture eliminates need Store, process, manipulate, share and Maps, geo-content and geospatial services massive amounts of spatial data in real-time to create spatial indexes, tessellation, or other retrieve spatial data directly in the database open integration for seamless application optimization techniques. development and deployment
© 2014 SAP (AGSchweiz or an) SAP AG. affiliateAll rights company. reserved. All rights reserved. Customer 15 SAP HANA - Text Engine
© 2014 SAP (Schweiz) AG. All rights reserved. 16 Predictive Analytics SAP HANA
SAP HANA KNN Regression classification Main Memory K-means C4.5 decision tree SQL Script ABC classification Associate Optimized Query Plan analysis: Weighted score market basket Text Analysis PAL tables
R-scripts R-Engine Spatial Data Unstructured HANA Studio/AFM, Apps & Tools
Accelerate predictive analysis and scoring with in-database algorithms delivered out-of-the-box. Adapt the models frequently Execute R commands as part of overall query plan by transferring intermediate DB tables directly to R as vector-oriented data structures Predictive analytics across multiple data types and sources. (e.g.: Unstructured Text, Geospatial, Hadoop) SAP HANA Smart Data Access
Transactions + Analytics Leverage remote compute engines Single development environment Heterogeneous data SAP HANA sources Hadoop (Hive) SDK for adding support for additional data sources Teradata SDK for Custom Query monitoring and Hadoop, Oracle, Adapters statistics Hive SAP IQ SQL Server Performance and query
© 2014 SAP (Schweiz) AG. All rights reserved. optimization 18
“If I had asked people what they wanted, they would have said faster horses.”
Henry Ford
© 2014 SAP (Schweiz) AG. All rights reserved. 19