Research Report IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

Executive Summary The problem: how to analyze vast amounts of data (Big Data) most efficiently.

The solution: the solution is threefold:

1. Find ways to organize and compress data such that large amounts of data take up less space – and find a way to read compressed data to speed query completion; 2. Use efficient algorithms to accelerate the speed of Big Data analysis; and, 3. Select a system environment that provides balanced resource utilization such that CPU power, memory, input/output (I/O), networks and storage all work together in a balanced fashion in order to generate query results as expeditiously as possible.

Three companies – IBM, SAP, and Oracle – all build software environments designed to accelerate Big Data analysis. There are, however, very significant differences in how each vendor organizes/queries data – and in related system designs:

 IBM’s approach uses an innovative new technique known as “DB2 BLU Acceler- ation”. Using a columnar approach, “BLU” quickly whittles down the size of a Big Data to isolate relevant data, in effect speed reading large (this approach enables BLU to achieve a 10-50X performance advantage over traditional row-based approaches). IBM’s approach also features database compression, the ability to read compressed data in memory, and a balanced system design;  SAP’s HANA relies on placing large amounts of columnar data in main memory where the whole database can be analyzed in real time. HANA also compresses data by up to 20X (but need to decompress it to enable query processing). We like HANA, but we question whether system resource utilization is well-balanced; and,  Oracle’s own Website describes Exadata Database Machine as “combining massive memory and low-cost disks to deliver the highest performance and petabyte scala- bility at the lowest cost”. To us, Exadata is a highly-tuned Oracle real application cluster (RAC) packaged as an appliance with storage that uses the Oracle database along with in-database advanced analytics. This offering does not exploit columnar data; does not read compressed data; and its compression facilities lag IBM’s DB2.

A closer look at each vendor’s Big Data analytics offerings shows major differences in how data is cached and compressed; how workloads are managed; how memory is used – and in system balance/optimization. As we examined each vendor’s offerings from these perspec- tives, we found that IBM’s DB2 with BLU acceleration has strong advantages over SAP HANA and Oracle Exadata – particularly when it comes to balanced system design and performance. In this Research Report, Clabby Analytics discusses in greater depth how each of these systems differ. IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

The Big Data Marketplace Due to advances parallel processing; due to continual reductions in storage costs; due to the simplification of data management software; and due to lower cost, more powerful and flexible business analytics software, it has now become more affordable than ever before to run analytics applications against large volumes of enterprise data. For years enterprises have been capturing valuable, usable data — but this data has been too expensive to mine (analyze). Now, with reduced systems, storage and software costs, enterprises are finding that they can achieve a clear return-on-investment by analyzing more and more of the structured and unstructured data that they have long been able to capture.

A CEO study entitled “Leading Through Connections” shows that enterprises now realize the strategic value of business analytics. Enterprises are cleansing their data, consolidating and parallelizing their databases, and building integrated infrastructures. As evidence of the strategic importance of Big Data analysis, also consider this study published in the MIT Sloan Management Review that concludes that enterprises that use Big Data analytics are twice as likely to outperform their competitors. This study also found that there has been a 60% increase in the use of business analytics over the past few years.

The use of analytics is growing — and it is growing because enterprises now see the strategic value of analyzing all sorts of data. By moving to Big Data analytics enterprise are better able to respond to new opportunities as well as to create or respond more quickly to competitive threats. Enterprises that have embraced Big Data analytics are finding that they can better service customers, better manage risk, spot trends — and thus improve customer relationships, reduce risk, and exploit new business opportunities.

Organizing and Working with Big Data The goal of business analytics is to help people make more informed decisions — thus leading to better business outcomes. In order to make more informed decisions, enterprises need to:

 Organize, integrate, and govern their structured as well as unstructured data;  Address data growth (scale) through data and compute parallelism; and,  Find ways to cost effectively manage and store large, complex data sets.

To achieve data consistency and operational efficiency, multiple, siloed data warehouses need to be consolidated and parallelized in order to create one version of the truth (a single, common database ). Further, Quality-of-Service (QoS) requirements for reliability, availability, and security — as well as scalability and performance – need to be addressed.

Once the data has been cleansed and federated, enterprises need to figure out how to work with that data. Entire databases can be placed into memory, or they can be dynamically cached (placed in the storage subsystem, then accessed as needed). Data can be tiered such that the most important data can be located on fast disks in close proximity to the process- ors such that it can be analyzed most expeditiously. Data can be compressed such that it takes up less space (saving on storage and memory costs). Data can be organized into columns in order to improve analytics performance over traditional row-based data. Some vendors can even read this compressed data, producing results more quickly than competitors.

September, 2013 © 2013 Clabby Analytics Page 2 IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

Big Differences Start to Show-up When Analyzing How Data Is Organized and How the System Design Supports Big Data Analytics Processing Some of the comparison points that we look for when comparing and contrasting IBM’s, SAP’s, and Oracle’s approaches include:

 What is the data structure? Is it row data or columnar – or both?  Is the data cached or is it all held in memory?  How is compression handled?  What are the system design characteristics? Where/how is the data processed? What is being done to streamline CPU, memory and I/O interaction?  What are the deployment characteristics?

As Figure 1 shows, there are big differences in how IBM, SAP, and Oracle express data (rows vs. columnar); how memory management/caching is handled; how compression is handled; and how data is processed by the system (resource utilization).

Figure 1 – How IBM, SAP, and Oracle Organized Big Data

Source: Clabby Analytics – September, 2013

The way we see it, organizing data into columns has big performance advantages as compared with processing row-based data. We like in-memory databases like SAP’s HANA, especially for real-time processing, but we have some resource utilization issues with this design (so clever caching is more appealing to us). We see the ability to analyze compressed data as a huge competitive differentiator for

September, 2013 © 2013 Clabby Analytics Page 3 IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

IBM because it speeds the time it takes to achieve results. From a systems design perspective, we like to see the balanced use of all resources – and we like to see processors exploit SIMD instructions (single instruction, multiple data) in order to more efficiently process data in parallel. As for deployment (and operations), it can be argued that IBM’s approach is less complex (load-and-go simplicity) and more flexible (more control over how to configure/build a Big Data server) as compared with its SAP and Oracle competitors.

A Closer Look at IBM’s DB2 BLU Acceleration Solution First, it is important to note that IBM tunes its DB2 BLU acceleration for its own hardware – just as Oracle does for its own hardware. SAP, on the other hand, has been designed to run on the commodity () hardware offered by many vendors.

As we looked at IBM’s BLU Acceleration data structure we found that it can be used with row or column-based data. In column mode, it has been reported to be 10-50X faster than traditional row-based relational databases. From a memory management perspective, IBM’s DB2 with BLU Acceleration is designed to use in-memory columnar processing that maintains in-memory performance while dynamically moving unused data to/from storage as needed. As for compression, DB2 has long had a compression advantage over other major database competitors (a few years ago we wrote about how some Oracle customers were able to save 40% of their storage costs by taking advantage of DB2 compression). But, in addition to compression efficiency advantages, BLU Acceleration is able to read compressed data (no decompression necessary) while also employing data skipping algorithms to speed-read compressed databases. BLU Acceleration also takes advantage of processor level parallel vector processing to exploit multi-core and SIMD (single instruction, multiple data) parallelism (SIMD instructions help improve parallel performance, helping to produce query results faster than systems that do not exploit SIMD).

DB2 with BLU Acceleration is NOT an in-memory database processing environment (like SAP’s HANA) – instead it uses dynamic memory management caching techniques to off-load some data to near proximity fast storage. DB2 compression can reduce the size of Big Data databases more efficiently than Oracle or SAP. BLU Acceleration can read compressed data in memory without having to decompress it (IBM’s BLU Acceleration uses advanced data-skipping techniques – both SAP’s HANA and Oracle’s Exadata do not read compressed data in memory). We see this ability to read compressed data in memory as a huge advantage for IBM’s BLU when it comes to the speed of query completion.

Also noteworthy, from a systems design perspective, IBM’s DB2 BLU Accelerator can be deployed on IBM POWER-based Power Systems as well as x86-based System x servers. Because POWER processors can execute twice as many threads as their Intel counterparts, it is reasonable to expect Power Systems to be able to significantly outperform x86-based SAP and Oracle counterparts when running the same query.

Finally, note that IBM’s DB2 BLU Acceleration takes advantage of a balanced system design where CPU, memory, and I/O all work together in an optimized fashion. As we examined how IBM’s DB2 BLU Acceleration works with underlying processors and subsystems, we observed that underlying subsystems are optimized for:

September, 2013 © 2013 Clabby Analytics Page 4 IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

 In-memory processing because: 1) the most useful data is placed in memory (the data stays compressed so more data can be placed in memory while data in storage is scan-friendly from a caching perspective); 2) less data is placed in memory (as a result of the use of columnar data, late materialization, and data skipping techniques); and, 3) memory latency is optimized for scans, joins and aggregation.  High CPU performance – thanks to the use of SIMD instructions that speed scans, joins, grouping and arithmetic performance; and thanks to core friendly parallelism.  I/O optimization – because the system design places less stress on the I/O subsystem (because there is less data to read thanks to columnar processing and the ability to read compressed data in memory). When data is retrieved from cache, it is easier to read because it has been packaged as scan-friendly. And, finally, specialized columnar prefetching algorithms also speed-up cache calls.

IBM’s system design is a great example of how to build a well-balanced environment that places the most important data in memory while making cached data easy to retrieve. We like this dynamic data caching approach better than an in-memory database approach largely because it accommodates database scalability better. With in-memory databases, to deal with the size of ever-growing Big Data databases, either more systems needs to be purchased or data needs to be dropped out of memory in favor of new data. Dynamic caching provides more flexibility when dealing with data sets that are larger than main memory can hold. We also like the way the processor is fed a steady stream of data to process, as well as the use of SIMD instructions to improve parallel processing performance. We also like the way the I/O subsystem is organized such that cache calls are fewer and further between, and when cache is called prefetch algorithms speed data acquisition. Finally, we note that IBM’s DB2 environment offers advanced workload management facilities – whereas its SAP HANA competitor does not.

A Closer Look at SAP’s HANA SAP’s HANA environment is an “in-memory” processing environment. With HANA, ALL data is placed in main memory for fast processing. (An excellent overview of how this architecture processes queries can be found here).

The emphasis in the HANA design is to converge online analytical processing (OLAP) and online transaction processing (OLTP) into one columnar store in order to eliminate latency (reads/writes to disk) and thus speed real-time decision making. The design advantage in using an in-memory data store is that cache/disk latency is eliminated, and OLAP and OLTP activities can take place in parallel in real time. In contrast, IBM’s DB2 Accel- eration would need to access row-based tables stored on disk to perform certain operational queries – while trend and historical data would probably reside in an in-memory data mart (having to go to two different places in order to gather data could make BLU less suitable for real-time analysis as compared with using main memory exclusively).

Enterprises that need to converge OLAP and OLTP into one common environment for real time decision making, therefore, would likely be well served by adopting the HANA approach.

The big questions to be answered with respect to how HANA handles large in-memory databases are – given that the size of memory is finite – is “how many concurrent users can

September, 2013 © 2013 Clabby Analytics Page 5 IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata be supported” and “how does the system perform as queries complexity increases”. SAP’s own HANA Memory Usage guide indirectly raises these same questions when it states that “ the amount of extra memory will depend on the size of the tables (larger tables will create larger intermediate result-tables in operations like joins), but even more on the expected workload in terms of the concurrency and complexity of the analytical queries (each concurrent query needs its own workspace)”. To us, this means that as query complexity increases, performance will slow down – and, because each query needs its own workspace, the number of concurrent workloads may need to be decreased in order to meet service level requirements for performance.

The other important elements we examine when evaluating Big Data processing environ- ments include compression efficiency and the ability to analyze compressed data. We have not yet seen a published HANA vs. BLU compression comparison, but we did see an IBM Labs test that showed that for the same 220 GB raw, uncompressed fact data, IBM beat HANA compression by 13%. Further, when it comes to analyzing uncompressed data, HANA needs to decompress data before reading it – which should result in substantial performance advantages for IBM’s DB2 BLU Accelerator.

In short, we think SAP’s HANA has been designed as a real-time OLTP/OLAP environment. If the database can fit into memory – and if the queries are simple – this is a good architecture to process real-time operational OLTP/OLAP workloads. We do have reservations, however, on intermediate and complex query demands on the system and the level of concurrency that can be supported accordingly.

Oracle’s Exadata Database Machine Environment Oracle’s Exadata environment is a highly tuned x86-based real application cluster environ- ment (packaged as an appliance) that has been designed to process large Oracle databases. It does not use columnar processing other than as a compression technique; and it is not an in-memory-only system solution (like SAP’s HANA). The way it deals with large volumes of data is similar to IBM’s BLU approach in that it places hot data in main memory and caches the overflow.

The beauty of this environment is that it features a tuned and optimized Oracle database (which is important because some enterprises have standardized on Oracle as their primary database), an operating system, servers, and storage along with analytics tools and utilities, all as an integrated solution. It offers good performance and can scale readily. It can be used to perform all types of analytics ranging from scan-intensive applications to highly concurrent transactional processing. Finally, it offers solid workload management facilities. Oracle customers love Exadata because it is a prepackaged data appliance that is straightforward to deploy and provides an immediate performance boost when compared to standard Oracle database performance on various commercial servers.

Our big issue with Oracle’s Exadata Database Machine is that it appears to be a jack-of-all- trades – designed to handle a variety of analytics workloads (we don’t see this design as being optimized for any specific analytics workload). In contrast, consider IBM’s PureData Systems – systems that have been engineered to process specific analytics workloads. For instance, IBM’s PureData System for Analytics is a high performance,

September, 2013 © 2013 Clabby Analytics Page 6 IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata scalable, massively parallel appliance that has been tuned and optimized to perform analytics on large volumes of historical data. IBM’s PureData System for Hadoop simplifies Hadoop deployment and accelerates Hadoop performance. PureData System for Operational Analytics is optimized for operational analytics and has been designed to hand 1000+ concurrent operational queries. And IBM’s PureData System for Transactions has been designed specifically to process high volumes of transactions. IBM builds appliances that are tuned and optimized to handle specific analytics workloads optimally.

As for comparing Oracle’s Exadata Database Machine compression with IBM’s DB2 BLU Acceleration compression, we have already mentioned that Oracle customers who had switched to IBM’s DB2 saw compression rates greatly improve. We suspect that IBM’s compression efficiency is probably @20% better than Oracle’s compression efficiency – saving IBM DB2 customers from having to purchase a lot of additional storage hardware.

We also stated earlier is that Oracle’s Exadata does not read compressed data – requiring that data be decompressed before it is acted upon. Decompression can happen at the storage tier (as part of a smart scan) or at the database tier – but regardless of where it occurs it has a performance impact.

As for Oracle’s Exadata Database Machine’s failure to exploit columnar processing, this too has an impact on database processing performance. Because Oracle’s Exadata is not based on a columnar database design, it does not have the ability to read just 1 column for many rows – so it places all columns for a given row into a “compression unit”. During decompression, rows are reconstructed out of the compression unit. Decompressed rows are returned to the database if Smartscan is used, but if it isn’t used than the entire com- pression unit is returned to buffer cache. Compression, on the other hand, occurs on the database tier only (never on storage cells). It is reasonable to assume that having to con- stantly compress/decompress data will likely negatively impact analytics performance.

From a system design perspective, there are IBM has a further advantage over Oracle’s x86-based Exadata Database Machine in that IBM can offer its DB2 BLU Acceleration on POWER-based systems. Due to faster threading and the ability to support SIMD instruct- tions, IBM’s it stands to reason that IBM Power Systems can process more work on fewer cores than Oracle’s Exadata (using less hardware may result in price advantages when selecting an IBM BLU Acceleration solution).

As for the “complexity to manage” aspect that we described in Figure 1, consider how Exadata handles query parallelism. The Oracle database as implemented on Exadata allocates parallel execution processes on a first-come-first-served basis until the maximum number of parallel processes is achieved. When the system load is low, queries are allocated the maximum number of parallel execution processes, thus improving perform- ance. But when the load is high, Oracle downgrades the number of execution processes allocated to queries – and/or forces queries to wait in queues. Downgrading the number of parallel execution processes while queuing others degrades query performance as fewer resources are made available to execute query requests. Further, downgraded queries remain downgraded until the query is finished. To us, this is an example of why this environment is difficult to manage.

September, 2013 © 2013 Clabby Analytics Page 7 IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

Finally, it is important to note that due to processor advantages (faster processors, SIMD instruction execution, the ability to process more threads using fewer cores) and balanced I/O and memory management, we suspect that IBM’s POWER-based Power Systems with Blue Acceleration should be able to outperform Oracle’s Exadata by 1.5X to 2.5X depending on the complexity of the reports being executed.

Summary Observations The bottom line in comparing these three architectures is that there are some common- alities in the ways that each vendor approaches Big Data analytics – but there are also some very distinct differences. These differences manifest themselves in database query process- ing speed; number of concurrent users that can be supported; the affect that query complex- ity can have on system performance; system efficiency/optimization; and manageability.

The key take-aways from this report should be as follows:

 IBM’s DB2 BLU Accelerator and SAP’s HANA use a columnar approach to set up data for processing (IBM can also use rows). Columnar processing can be much faster than row processing (Oracle uses a row-only processing approach);  IBM’s DB2 BLU Accelerator has a distinct advantage over SAP’s HANA and Oracle’s Exadata in that it can analyze compressed data in memory. This means that more data can be read more quickly;  SAP’s HANA can be excellent at processing large volumes of data in-memory for combined OLAP/OLTP environments where real-time results are required. We suspect, however, that as query complexity or volume (concurrent users) increases, performance will slow down significantly as each query contends for resources;  Oracle’s Exadata Database Machine has been well received by the Oracle commun- ity because it offers good out-of-the-box performance and scales well. We see this appliance, however, as a general purpose analytics processor – not tuned for specific analytics workloads. Other shortcomings are its row-based orientation as well as its inability to read compressed data in memory. We think that given IBM’s system design advantages (faster processors, more threads, SIMD, balanced I/O handling, solid memory management), IBM POWER-based Power Systems should be able to easily outperform Exadata systems when processing complex and intermediate workloads respectively.

When choosing a Big Data processing environment, each approach has its merits. But, from our perspective, IBM’s DB2 BLU Acceleration has several design advantages that should lead to consistently higher performance when querying Big Data databases.

Clabby Analytics Clabby Analytics is an independent technology research and http://www.clabbyanalytics.com analysis organization. Unlike many other research firms, we Telephone: 001 (207) 846-6662 advocate certain positions — and encourage our readers to find counter opinions —then balance both points-of-view in order to © 2013 Clabby Analytics decide on a course of action. Other research and analysis All rights reserved September, 2013 conducted by Clabby Analytics can be found at: www.ClabbyAnalytics.com.