EXASOL Business Whitepaper The world’s fastest analytic Contents

Introduction...... 3 What is EXASOL?...... 5 Value proposition...... 8 Deployment benefits...... 10 Conclusion...... 11

www.exasol.com • Share this whitepaper: 2 Introduction

EXASOL was founded in Nuremberg, Germany with a As a and analytics engine, either stand- single purpose – to engineer the world’s fastest database for alone or integrated with Hadoop, EXASOL is being used in analytics, with no limits on data volumes. For more than a a wide range of Big Data use cases, including accelerating decade, the company has focused exclusively on delivering standard reporting, running multi-user ad-hoc analytics, and ultra-fast, massively scalable, analytic performance. performing complex modelling using predictive in-database analytics. The company’s flagship product, is a high-performance, in-memory, MPP database designed specifically for analytics. EXASOL delivers the following key In 2011, EXASOL set a new performance record in the TPC- capabilities: H benchmark for clustered, decision support , and in 2014 extended these performance and price/performance • In-memory technology records. EXASOL remains in the number one position for all Innovative in-memory algorithms enable large amounts of volumes of data, from 100GB right up to 100TB. data to be processed in main memory for dramatically faster access times. Supported from EXASOL offices in Germany, UK, USA and Brazil and with partners across Europe, Israel and Japan, over • Column-based storage and compression 300 organisations are improving their business operational Columnar storage and compression reduces the number of efficiencies and/or delivering excellent customer service I/O operations and amount of data needed for processing in using EXASOL. The company`s customers span many market main memory and accelerates performance. sectors including Digital Media, Retail, Communications, Financial Services, Manufacturing and Research. • Massively Parallel Processing EXASOL was developed as a parallel system based on Designed from the ground up, using state-of-the-art soft- a shared nothing architecture. Queries are distributed ware techniques and principles, EXASOL runs on low cost, across all nodes in a cluster using optimised, parallel commodity x86 hardware; scales from 10’s GBs to 100’s TBs algorithms that process data locally in each node’s main data; is quick to implement and delivers extreme performance, memory. without cost and complexity. • High user concurrency Thousands of users can simultaneously access and analyse large amounts of data without compromising query performance. www.exasol.com • Share this whitepaper: 3 • Scalability Linear scalability lets you to extend your system and increase performance by adding additional nodes.

• Tuning-free database Intelligent algorithms continuously monitor usage and perform self-tuning, optimizing system performance and minimizing administrative work.

• Industry-standard interfaces Easily connect to your existing SQL-based BI and data integration tools via for ODBC, JDBC, MDX, and ADO.net.

• Advanced Analytics User Defined Functions (UDF) allow in-database advanced analytics to be easily run using R, Python, Lua and Java

• Big Data MapReduce processing capabilities and Hadoop integration services enable you to perform high-speed analytics against structured and unstructured data to get new insights from Big Data faster and easier.

www.exasol.com • Share this whitepaper: 4 2What is EXASOL? Core architecture The logical architecture for EXASOL is shown in Figure 1. The database is managed via an easy to use, highly featured 2 EXASOL is a column-store, in-memory, Massively Parallel web based GUI – EXAOperation, and flexibility in deployment 2 Processing (MPP) RDBMS for data warehousing and and configuration is provided through its own cluster and analytics applications. Unlike other products which have been storage management software – EXAClusterOS and built around old and inefficient legacy architectures and which EXAStorage. EXASOL supports ANSI standard 2008 SQL (in- therefore have inherent inefficiencies, operational complexi- cluding all the analytic functions) and much of the commonly ty and performance challenges, EXASOL has been designed used Oracle SQL dialect. The inclusion of Oracle dialect is and developed from the ground up to deliver phenomenal of particular benefit when migrating Oracle applications as performance and ease of use. it significantly reduces or even negates the need for code re-factoring. With a shared nothing architecture to ensure high resilience, EXASOL runs on low cost commodity x86 servers, and can Resilience and redundancy is provided by deploying additional scale from a single node to hundreds of nodes to support “hot standby” servers in the cluster. In the event that any 1000’s concurrent users and 100’s TB data. individual operational server fails EXASOL will automatically bring one of the “hot standby” servers into operation and the cluster will continue to operate whilst the standby server is being re-built. The defective server can be removed and replaced (becoming a “hot standby” server) without having to EXASOL take EXASOL out of operation.

EXAStorage EXAOperation

EXAClusterOS

CentOS/Linux CentOS/Linux CentOS/Linux

Server Server Server

Figure 1. EXASOL Logical Architecture

www.exasol.com • Share this whitepaper: 5 EXASOL has high degrees of automation built in to its Since EXASOL‘s raw performance is derived from design to ensure that it delivers high performance with minimal in-memory processing, EXASOL has developed a highly need for expensive DBA resource to operate it. Some of the intelligent, cost-based query optimizer, based on self-learning key areas of automation are: algorithms, that manages what data is kept in-memory. An important feature of the optimiser is the automatic creation • Automatic distribution of data evenly across all servers in and management of join indices in-memory, to maximise the the cluster; benefits of in-memory processing. In real-world deployments • Automatic duplication of data across servers to ensure data the optimiser adapts over time to the workload presented, integrity in the event of a server failure; in order to deliver optimum performance from the database • Automatic selection of innovative compression algorithms resources available. Therefore, workload management at the that are data type specific and optimized for in-memory system level is significantly automated, reducing the need for processing. These algorithms also work independently on skilled and expensive DBA resource. each node to ensure optimal performance; • Automatic compression of data at column level with identical In addition, as part of the workload management, EXASOL images stored both in memory and on disk to optimize automatically monitors and logs its resource utilization. performance; Therefore as workloads increase (i.e. more data, more users, • Automatic monitoring and logging of system resources more complex queries) and the performance of EXASOL (RAM, Disk, CPU) to aid in capacity planning; starts to degrade, the system monitoring information is avail- • Automatic disk reclaim to minimize house-keeping activities. able to help determine how much more memory to allocate to the database in each server (scale up), or if necessary, The decreasing cost of RAM has encouraged a number of what number of new nodes (servers) need to be added to the vendors to develop in-memory options for their existing cluster (scale out) to maintain performance levels. database products. It is important to note that the database has been architected and designed from the outset as an Scaling up and allocating more memory to the database in in-memory database. This is not an „add-on“ feature and each server is a straightforward command executed through unlike a number of competing products, with EXASOL it is the EXAOperation GUI. Adding new servers is also a straight- not necessary to store the entire database in memory. For forward process. Once the new hardware is connected in to optimum performance, sufficient memory is required to the cluster, data is automatically re-distributed across current service the query workload presented. As with persistent and new nodes as a background process, and users can still storage on disk, compression is also helpful in this respect. be running queries whilst this is on-going. This affords considerable flexibility and advantages in terms of balancing memory, servers, performance and cost. In specific use-cases, where workload is highly diverse and/ or user priorities vary significantly at different times of day, EXASOL provides workload management through user www.exasol.com • Share this whitepaper: 6 prioritisation. This facilitates manual configuration of users’ data feeds and is able to process single row inserts in parallel or groups of users’ (roles) priorities by weighting resource with more traditional (batch-load) use-cases. allocation and, if required, defining a schedule. EXASOL can, optionally, be extended with EXAPowerlytics, Other features a comprehensive suite of analytical functions including scalar, aggregate and geospatial functions. The EXASOL supports standard interfaces for integrating EXAPowerlytics product also enables the user to write User upstream (Data Integration) and downstream (BI) tools. Defined Functions (UDFs) using the Lua, Python, Java or R The standard interfaces that ship with the software include languages to support in-database analytics. ODBC, JDBC, and ADO.NET. The database has already been tested and is working in production environments with BI EXAPowerlytics also supports the integration of Hadoop Tools and Data Integration tools from all the leading vendors clusters, the implementation of map-reduce algorithms and including: Informatica, Talend, Pentaho, Tableau, Business the processing of unstructured data. External UDFs (E-UDFs) Objects, Cognos and Microstrategy. provide an open framework for integration in any language that supports the ZeroMQ libraries. Optional interfaces, including MDX (the fast OLAP connector), a native connector for Oracle databases and a native connector A unique new feature is Skyline – an extension to SQL for SAP R3 systems, are also available. supporting preference analytics, the next generation of analytics. Preference analytics addresses the fundamental An SQL pre-processor, which facilitates “code once, use problems of traditional data mining approaches, where the many times” translation of proprietary SQL to ANSI standard- ever-increasing volumes of data and multiplicity of variables compliant SQL, is also included as standard. This is another mean that sorting, filtering and weighting lead to sub-optimal differentiating feature that significantly reduces the time and analyses; for example, data has to be discarded due to system risk associated with migrating existing SQL applications. constraints, and subjective decisions are made regarding significance to simplify algorithms. Investment fund selection EXASOL includes a Java fast loader. Integration of this fast is a good example, where establishing an objective analysis of loader with Data Integration products is straightforward. A key daily risk/return across thousands of funds is far from straight- feature is the ability to process compressed files, for example forward. in gzip format, which enables even faster load speeds. As organisations use analytics more widely to support their day-to-day business operations the ability to regularly update and refresh data in operational data warehouses has become a key requirement. EXASOL is able to handle near real-time www.exasol.com • Share this whitepaper: 7 3Value proposition EXASOL delivers high performance analytics on a system The EXAPowerlytics product opens up numerous possibilities that is easy to use, highly scalable, fast to implement and for improving or creating new analytical business capabilities. 33 extremely cost effective. These fall into two broad categories: Implementing EXASOL does not require you to replace 1. Extension and integration of platform-bound applications, your existing systems. It can be implemented alongside where the volume of data that can be processed is existing infrastructure as a complementary solution, to provide restricted by the application architecture. This is often the high performance analytics that your current systems the case for products like MATLAB and SAS; and cannot deliver. This complementary approach enables you to continue to leverage your existing investments and prove 2. The creation of new High Performance Computing (HPC) the value that high performance analytics will deliver to your applications that are only feasible with MPP technology like business without impacting existing operations. If appropria- EXASOL and EXAPowerlytics. This is characterised by te you can then plan a phased migration of existing analytics the need for large datasets to be processed rapidly utilising applications to EXASOL over time. complex algorithms.

EXASOL‘s advantage in terms of the significant EXAPowerlytics provides an open framework for integration Price/Performance it delivers is clearly demonstrated by the and application development. Integration can be via the independent Transaction Processing Performance Council standard SQL interface (encapsulating non-SQL code), or TPC-H benchmark, where EXASOL holds the number one utilising another programming language (via E-UDFs). position, by a significant margin over other solutions, for both The open approach allows a business to plan and architect raw performance and price-performance on data volumes solutions that utilise technology investment optimally. This is in ranging from 300GB through to 100TB. contrast to other vendor approaches that often provide closed, point-solutions for integration, for example, tightly integrating The graph in figure 2 illustrates the performance advan- a particular Hadoop distribution. tage of EXASOL at all scale factors and the detailed TPC-H results can be found on the Transaction Processing Performance Council website at http://www.tpc.org/tpch/results/tpch_perf_results.asp

www.exasol.com • Share this whitepaper: 8 TPC-H Performance at all Scale Factors

11.000.000 10.000.000 9.000.000

8.000.000 7.000.000

6.000.000 5.000.000 4.000.000

Performance (QphH) 3.000.000 2.000.000

1.000.000 0 100GB 300GB 1TB 3TB 10TB 100TB TPC-H Scale Factor

EXASOL – 1st place Other databases – 2nd, 3rd, 4th place

Figure 2: EXASOL performance advantage

www.exasol.com • Share this whitepaper: 9 4 Deployment benefits EXASOL offers considerable flexibility in deployment, • EXASOL includes an SQL pre-processor, facilitating enabling highly efficient utilisation of both hardware and rapid re-factoring of application queries without impacting 44 software licensing: the applications themselves; • EXASOL requires a lower specification of (commodity) • Open integration of other technology platforms and hardware to deliver a given workload; programming languages can be achieved using EXAPowerlytics. • EXASOL licensing is based on RAM allocated to the application – there are no end-user limits, and the storage of infrequently used or entirely new, but as yet unexplored, data sets is not penalised;

• EXASOL performance is dependent on the relationship between database size, workload and RAM allocated. (Note: It is not necessary to have sufficient RAM to hold the entire database.) If the workload can be run entirely in RAM then performance will be optimal. However, performance with less RAM (and lower licensing cost) typically still meets SLA requirements;

• Increasing workload demands can be addressed flexibly by increasing RAM allocation per server (scale-up) and/or deploying additional servers (scale-out);

• Several databases can be run on a single hardware cluster - ideal for supporting multi-tenant applications;

• EXASOL supports much of the commonly used Oracle SQL syntax, facilitating rapid, low-impact migration in Oracle environments;

www.exasol.com • Share this whitepaper: 10 5Conclusion Whether you need to accelerate existing reporting and analytics, run predictive analytics, make business intelligence & analytics available to a large numbers of users or deliver 55 a “Big Data” solution to your Business, EXASOL is the platform that enables you to do this quickly, easily and cost effectively.

If you would like to see for yourself the advantages that EXASOL can deliver please go to http://www.exasol.com/en/test-drive/ where you can register to try EXASOL free of charge.

Or if you would like to discuss your requirements with one of our experts please get in touch at [email protected]

About this data sheet: Information listed here may change after the data sheet has gone to print (February 2015). EXASOL is a registered trademark. All trademarks named are protected and the property of their respective owner. © 2015, EXASOL AG | All rights reserved

www.exasol.com • Share this whitepaper: 11