The Most Powerful In-memory Analytic

Introduction @ Sphinx IT in Vienna 25.11.2016

© 2016 EXASOL AG Our history – inventing world’s fastest in-memory database

2008 2012 Record in Inclusion in Gartner’s 2000 TPC-H Benchmark „Magic Quadrant for Company foundation („Oracle dethroned“) Data Management Systems“

90ies 2006 2010 2014 early Research Success Pilot Customer Karstadt- Most Successful Vendor of Successful global expansion, (University Erlangen- Quelle uses EXASolution in analytical database systems in 400+ customers across 12 countries Nürnberg) Production Germany (BARC) 100TB TPC-H benchmark

© 2016 EXASOL AG Great recognition in the market

2016

© 2016 EXASOL AG What Gartner says about EXASOL

“EXASOL is a prime example of what Gartner considers to be the future of DBMS” Source: Gartner

© 2016 EXASOL AG Why would you be looking for a new Database Performance/ Pricing Issues with New Requests for Existing DWH agile or predictive Analytics

Changing Plattforms or Regulatory Issues growing DataSources

© 2016 EXASOL AG King: leading interactive entertainment company

. Analyzes customer behavior . Optimizes game revenues . Lots and lots of data . 100 million daily active users . 1 billion game plays per day . 10 billion events per day

EXASOL database: 200TB

© 2016 EXASOL AG Zalando: Rising star of e-commerce

. Online fashion retailer with 14m+ customers across Europe . 150,000 products available online . EXASOL complements DWH to enable fast analytics and reporting . Database optimizes stock availability, returns process and targeted marketing

EXASOL database: 15TB

© 2016 EXASOL AG Adidas: CRM

. Several projects in different regions (Europe, USA, Russia) . Agile BI -> flexible reporting functionality for quick projects . BW on HANA: “Cruise liner” . EXASOL: “Speedboat” . Powerful analytical CRM solution . Customer DNA for individual B2C communication

EXASOL database: 6TB

© 2016 EXASOL AG Wooga

. Let the customer speak

© 2016 EXASOL AG Technical Introduction

The Most Powerful In-memory Analytic Database

© 2016 EXASOL AG Cold, Warm and Hot Data Strategy

Cold Data Hot Data

PByte Data Volume Data Hadoop

TByte

Traditional RDBMS BI-Tool GByte Data Engine (PRIME/ iCubes)

Performance

© 2016 EXASOL AG The most powerful engine for your analytics

© 2016 EXASOL AG Database administration

Conventional EXASOL

. Partition Pruning . Optional: Distribution Keys . Materialized Views . Optional: Replication Border . Bitmap Indexes  Automatic Table Analyzer . Bitmap Join Indexes  Automatic Index Generation . Index Organized Tables  Strong Cost-based Query Optimizer . DIMENSION Objects . Current Statistics . Histograms . Table Compression . CACHE . PCTFREE . ...

© 2016 EXASOL AG EXASOL MPP database architecture

© 2016 EXASOL AG What is EXASOL?

. a column store, massively parallel processing (MPP), in-memory analytic database . modern software designed for analytics . runs on standard x86 hardware . uses standard SQL language (with optional extensions) . suitable for any scale of data & any number of users . mature, proven & very cost effective . quick to implement & easy to operate The World’s Fastest Analytic Database

© 2016 EXASOL AG EXASOL – fast, flexible, cost effective

High Performance • MPP • Column store • Innovative Easy Integration compression Highly Scalable •Open connectivity • ODBC, JDBC, .NET, • hundreds of nodes MDX • thousands of users • Fully ACID compliant • terabytes of data • Native Connectors • SAP, Hadoop & Oracle

Highly Powerful Automated Analytics • auto data distribution • ANSI SQL • auto data • Geo-Spatial compression • R, Python & Lua • auto tuning • MapReduce

Easy to use Fast to deploy • no special schemas • Commodity hardware • no indexing • bare metal • no tuning • virtual machine • minimal maintenance • cloud • appliance

© 2016 EXASOL AG EXASOL as Analytic Offload/High Performance Sidecar

BI / Application-Layer Near-Realtime / AdHoc / Standard BI / Reporting Advanced Analytics Interactive

Database Layer

OLTP / LEGACY RDBMS DWH (Legacy)

Integration Layer

ETL / Replication

Data Sources

Structured Data Polystructured Data ERP CRM Custom Apps Legacy Sensor Mail Social Media Log Files ...

Just move your performance critical worksloads to EXASOL as an analytic offload

© 2016 EXASOL AG EXASOL as Analytic One-Stop-Shop

BI / Application-Layer Near-Realtime / AdHoc / Standard BI / Reporting Advanced Analytics Interactive

Database Layer

OLTP / LEGACY RDBMS

Integration Layer

ETL / Replication

Data Sources

Structured Data Polystructured Data ERP CRM Custom Apps Legacy Sensor Mail Social Media Log Files ...

Move workloads to EXASOL incrementally until your legacy DB can be dropped

© 2016 EXASOL AG EXASOL as an integral High Performance Layer

BI / Application-Layer

Business Analytics Visualization, Dashboarding Custom Applications

Business Critical Performance Layer

Data Layer

Legacy Graph …

DWH/OLTP DB …

HIVE

Spark

HBase Stinger

Impala Key Value

Spark Spark SQL MapReduce

Document HADOOP RDBMS NoSQL

HDFS Search

Data Sources Structured Data Polystructured Data ERP CRM Applications … Sensor Mail Social Media Log Files ...

EXASOL: Your high performance data hub to accelerate your business without risk

© 2016 EXASOL AG EXASOL v6.0 – The Logical Current Schedule: Q1/2017 – Test now!!

© 2016 EXASOL AG A look back: from V4.0 to V6.0

4.0 (2011) 4.2 (2013) TPC-H leadership Resource Management & 5.0 (2014) Connectivity Expand TPC-H leadership 5.0 (2014) Query Cache… Improvements & Features 6.0 Standard (Client/Server Encryption) Resize, Join, Insert Edition 6.0 Backup, Merge ….. EXASOL Automation, S3, HDFS …

4.1 (2012) Advanced6.0 UDFs (R, Python, Lua) 5.0 (2014) Edition Improved In-DB analytics: Java, Skyline …

6.0 Data Virtualization & Pluggable script language Dynamic Return types, Bucket FS… Flexible Import

© 2016 EXASOL AG EXASOL V6.0

© 2016 EXASOL AG The Logical Data Warehouse

Characteristics 1. DWH relies on more than physical database 2. Heterogeneous set of data sources that each contain a fragment of the data end-users need for business intelligence, reporting and analytics applications 3. Presents itself as a single data sourceit the technical key concept:

Customer Customer Customer Master Data Salesforce History Transactions

The logical data warehouse is a system architecture that (just) pretends that all the data is stored in one big database.

© 2016 EXASOL AG Today’s Standard DWH Approach: ETL-based replication (Transparent ecosystem integration framework)

Situation / Traditional Approach Data from business critical data sources is either completely or partially replicated into the DWH via ETL

Data Warehouse In particular, in environments with

• Big Data infractrutures (e.g. Hadoop) • NoSQL systems (e.g. MongoDB) • Cloud data sources (Salesforce, Google Big Query) • High requirements in terms of up-to-dateness

this approach is often suboptimal.

Disadvantages ETL (transformation & replication)  High Redundancy (identical data in different systems)  Maintenance of ETL tools & ETL jobs  Long integration cycles for new data sources  Data in DWH is instantly outdated

© 2016 EXASOL AG Logical Data Warehouse with Virtual Schemas (Transparent ecosystem integration framework. Part 1: Virtual Schemas )

Solution Virtual Schemas • Only metadata of virtually connected data sources are visible. • Whether virtual or “physical”: fully transparent from application perspective. Data Warehouse • Access to these virtual schemas is dynamically 1 2 forwarded to the connected data sources (1). Data is transferred on demand. Virtual Schema • If required, the data can be physically replicated into the DWH on demand without the need for additional ETL tools (2). • Coexistence with ETL

ETL Online Access Advantages  Agile access to most recent information  No/reduced redundancy  Less ETL-jobs  No waste of disk space

© 2016 EXASOL AG Advanced Import Capabilities & Common Framework (Transparent ecosystem integration framework. Part 2: Import & Extensibility)

EXASOL 6.0 also provides new flexible import capabilites.

Common framework for virtualized access and standard import • Well documented, easy to use & open source (GitHub) • Newly added data source adapter will be available for virtualized access and import • Easy implementation/customization of any data source adapter on demand by customers & partners

© 2016 EXASOL AG EXASOL V6.0

© 2016 EXASOL AG V6.0 - Universal/pluggable language support

EXASOL 6 offers a framework to integrate any analytical programming language. You are not limited anymore to the languages provided by EXASOL out-of-the-box. Just package the programming language of your choice or the language used within your company, deploy it to your EXASOL database and use it for in-database analytics.

Python Python JULIA 2.x 3.x

© 2016 EXASOL AG V6.0 - Universal/pluggable language support

EXASOL 6 offers a framework to integrate any analytical programming language. You are not limited anymore to the languages provided by EXASOL out-of-the-box. Just package the programming language of your choice or the language used within your company, deploy it to your EXASOL database and use it for in-database analytics.

Details . Supported UDF languages for analytics are encapsulated in isolated-managed containers for secure programming and optimal resource management . Customers and partners are able to:  modify containers (update, extend)  create new language containers based on provided tools and documentation  use different versions of one language in parallel inside one database instance

© 2016 EXASOL AG Use cases when doing analytics with EXASOL

UDFs and Programming Languages 1. Make available additional libraries for UDFs 2. Use different versions of the same library in different UDFs without conflicts 3. Use multiple versions of one programming language together with specific libraries in parallel

Analytical Models 4. Store analytical models that need to be accessed by UDFs or other functionalities (e.g. large models generated by R)

Other binary objects 5. Manage binary objects, that have to be available on every node (e.g. digital certificates)

© 2016 EXASOL AG EXASOL V6.0

© 2016 EXASOL AG Advanced Automation

Status Quo • Increasingly automated environments • Based on virtualization and cloud usage • Limited (human) resources Install • Request for shorter provisioning times • Request for reproducible processes

Configure Consequence Manual installations and configurations are more and more Operate undesirable

EXASOL 6 completes the XML-RPC based interface (API). Now you can completely automate almost all installations and configuration tasks without human intervention

© 2016 EXASOL AG Connectivity: WebSocket API, native Python driver ...

Motivation • JSON, a text-based format, has become the de-facto interchange format for web services • Many new platforms solely rely on JSON without appropriate support for JDBC/ODBC drivers JDBC .NET ODBC EXASOL 6 offers a JSON-based WebSocket API.

Using this API, virtually every platform can be easily connected to EXASOL based on a simple wrapper-like driver or direct WebSocket connectivity.

A native lightweight Python driver will be included in the release. Drivers for further languages will follow!

Further new connectivity features: • HDFS support for backup/restore • AWS S3 support (also for compatible storage systems like CEPH) • Backup/Restore • Import/Export • …

© 2016 EXASOL AG EXASOL V6.0

© 2016 EXASOL AG Lots of Performance Improvements

Data Movement . Faster DISTRIBUTE BY  merge  ETL  cluster resizing . Consistent hashing  Accelerated cluster enlargement . EXAStorage performance improvements  Fast Backup & Restore  Accelerated Recovery

© 2016 EXASOL AG Lots of Performance Improvements

Query and Data Manipulation . Improved MERGE  Accelerated merge fo small into large table . Join Improvements  Nested loop

 Full outer joins Column-oriented . Hybrid table storage tail blocks  Fast single Insert  Fast delete Row-oriented  Approximate COUNT DISTINCT tail block Misc  Faster connection establishment  Faster metadata retrieval  …

© 2016 EXASOL AG Hardware Performance: CPU Intel® Xeon® E7-8800/4800 v4 vs. E7-8800/4400 v3 Product Family

Baseline configuration: New configuration: EXASOL 6.0 EXASOL 6.0 4 Intel® Xeon® processors E7-8890 v3 4 Intel® Xeon® processors E7-8890 v4 Increased performance with the 4-socket Intel® Haswell-EX Broadwell-EX Xeon® processor E7-8890 v4 2.5 GHz, 18 physical cores / socket 2.2 GHz, 24 physical cores / socket 512GB DDR3/1333 Registered DIMM 512GB DDR3/1333 Registered DIMM,

Xeon E7 v3 Xeon E7 v4 Perf. Gain 1 Power 191013 259784 36% 44% Throuput 444852 683186 53% Performance faster Overall Perf. 291501 421285 44% May 27, 2016

Intel® Xeon® Normalized Normalized Many-core optimization for optimal intra-node scalability. 0 1.44X performance improvement vs. previous Xeon generation 4S Intel® Xeon® processor E7-8890 v3 2.5 GHz (Haswell-EX) in TPC-H benchmark. 4S Intel® Xeon® processor E7-8890 v4 2.2 GHz Workload: custom TPC-H Performance gain, more concurrent users and lower TCO from EX platform CPU upgrade.

© 2016 EXASOL AG EXASOL: Try it for yourself

© 2016 EXASOL AG Real-world DWH sizing

e.G SAP HANA 400-600 GB 25% MEM 9,3% 150 GB

Hot (vs. Cold) Data

Compressed Data 554GB 8,2% 28% (+ 45 GB Indices) Compression (high for denormalized Star schema und flat tables, lower for normalized Snowflake)

Raw Data 56% 1,6 TB

High Overhead for most traditional DBMS Systems (index structures, materialized views, pre-aggregated cubes)

Administrated Data Pool 2,5 TB

© 2016 EXASOL AG Transcending the limits of disk-centric systems

Actual "In-Memory" performance boosts follow rather from CPUs and parallelized+vectorized algorithms than from excessive MEM capacities.

© 2016 EXASOL AG High Availability in 4+1 config (redundancy level 2)

X

Higher redundancy levels are possible! ( failure of more than one node simultaneously)

Node failure EXACluster OS EXACluster OS Database(s) are Recreation of a

recognizes uses a hot up and running redundancies

+2 s +2 s +8

+ 5 5 + s failure and stops standby node as finished affected new active node

database(s) and restarts the + 30 min 30 + database(s) Full cluster performance

© 2016 EXASOL AG