Report IBM DB2 BLU Acceleration Vs
Total Page:16
File Type:pdf, Size:1020Kb
Research Report IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata Executive Summary The problem: how to analyze vast amounts of data (Big Data) most efficiently. The solution: the solution is threefold: 1. Find ways to organize and compress data such that large amounts of data take up less space – and find a way to read compressed data to speed query completion; 2. Use efficient algorithms to accelerate the speed of Big Data analysis; and, 3. Select a system environment that provides balanced resource utilization such that CPU power, memory, input/output (I/O), networks and storage all work together in a balanced fashion in order to generate query results as expeditiously as possible. Three companies – IBM, SAP, and Oracle – all build software environments designed to accelerate Big Data analysis. There are, however, very significant differences in how each vendor organizes/queries data – and in related system designs: IBM’s approach uses an innovative new technique known as “DB2 BLU Acceler- ation”. Using a columnar approach, “BLU” quickly whittles down the size of a Big Data database to isolate relevant data, in effect speed reading large databases (this approach enables BLU to achieve a 10-50X performance advantage over traditional row-based approaches). IBM’s approach also features database compression, the ability to read compressed data in memory, and a balanced system design; SAP’s HANA relies on placing large amounts of columnar data in main memory where the whole database can be analyzed in real time. HANA also compresses data by up to 20X (but need to decompress it to enable query processing). We like HANA, but we question whether system resource utilization is well-balanced; and, Oracle’s own Website describes Exadata Database Machine as “combining massive memory and low-cost disks to deliver the highest performance and petabyte scala- bility at the lowest cost”. To us, Exadata is a highly-tuned Oracle real application cluster (RAC) packaged as an appliance with storage that uses the Oracle database along with in-database advanced analytics. This offering does not exploit columnar data; does not read compressed data; and its compression facilities lag IBM’s DB2. A closer look at each vendor’s Big Data analytics offerings shows major differences in how data is cached and compressed; how workloads are managed; how memory is used – and in system balance/optimization. As we examined each vendor’s offerings from these perspec- tives, we found that IBM’s DB2 with BLU acceleration has strong advantages over SAP HANA and Oracle Exadata – particularly when it comes to balanced system design and performance. In this Research Report, Clabby Analytics discusses in greater depth how each of these systems differ. IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata The Big Data Marketplace Due to advances parallel processing; due to continual reductions in storage costs; due to the simplification of data management software; and due to lower cost, more powerful and flexible business analytics software, it has now become more affordable than ever before to run analytics applications against large volumes of enterprise data. For years enterprises have been capturing valuable, usable data — but this data has been too expensive to mine (analyze). Now, with reduced systems, storage and software costs, enterprises are finding that they can achieve a clear return-on-investment by analyzing more and more of the structured and unstructured data that they have long been able to capture. A CEO study entitled “Leading Through Connections” shows that enterprises now realize the strategic value of business analytics. Enterprises are cleansing their data, consolidating and parallelizing their databases, and building integrated infrastructures. As evidence of the strategic importance of Big Data analysis, also consider this study published in the MIT Sloan Management Review that concludes that enterprises that use Big Data analytics are twice as likely to outperform their competitors. This study also found that there has been a 60% increase in the use of business analytics over the past few years. The use of analytics is growing — and it is growing because enterprises now see the strategic value of analyzing all sorts of data. By moving to Big Data analytics enterprise are better able to respond to new opportunities as well as to create or respond more quickly to competitive threats. Enterprises that have embraced Big Data analytics are finding that they can better service customers, better manage risk, spot trends — and thus improve customer relationships, reduce risk, and exploit new business opportunities. Organizing and Working with Big Data The goal of business analytics is to help people make more informed decisions — thus leading to better business outcomes. In order to make more informed decisions, enterprises need to: Organize, integrate, and govern their structured as well as unstructured data; Address data growth (scale) through data and compute parallelism; and, Find ways to cost effectively manage and store large, complex data sets. To achieve data consistency and operational efficiency, multiple, siloed data warehouses need to be consolidated and parallelized in order to create one version of the truth (a single, common database view). Further, Quality-of-Service (QoS) requirements for reliability, availability, and security — as well as scalability and performance – need to be addressed. Once the data has been cleansed and federated, enterprises need to figure out how to work with that data. Entire databases can be placed into memory, or they can be dynamically cached (placed in the storage subsystem, then accessed as needed). Data can be tiered such that the most important data can be located on fast disks in close proximity to the process- ors such that it can be analyzed most expeditiously. Data can be compressed such that it takes up less space (saving on storage and memory costs). Data can be organized into columns in order to improve analytics performance over traditional row-based data. Some vendors can even read this compressed data, producing results more quickly than competitors. September, 2013 © 2013 Clabby Analytics Page 2 IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata Big Differences Start to Show-up When Analyzing How Data Is Organized and How the System Design Supports Big Data Analytics Processing Some of the comparison points that we look for when comparing and contrasting IBM’s, SAP’s, and Oracle’s approaches include: What is the data structure? Is it row data or columnar – or both? Is the data cached or is it all held in memory? How is compression handled? What are the system design characteristics? Where/how is the data processed? What is being done to streamline CPU, memory and I/O interaction? What are the deployment characteristics? As Figure 1 shows, there are big differences in how IBM, SAP, and Oracle express data (rows vs. columnar); how memory management/caching is handled; how compression is handled; and how data is processed by the system (resource utilization). Figure 1 – How IBM, SAP, and Oracle Organized Big Data Source: Clabby Analytics – September, 2013 The way we see it, organizing data into columns has big performance advantages as compared with processing row-based data. We like in-memory databases like SAP’s HANA, especially for real-time processing, but we have some resource utilization issues with this design (so clever caching is more appealing to us). We see the ability to analyze compressed data as a huge competitive differentiator for September, 2013 © 2013 Clabby Analytics Page 3 IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata IBM because it speeds the time it takes to achieve results. From a systems design perspective, we like to see the balanced use of all resources – and we like to see processors exploit SIMD instructions (single instruction, multiple data) in order to more efficiently process data in parallel. As for deployment (and operations), it can be argued that IBM’s approach is less complex (load-and-go simplicity) and more flexible (more control over how to configure/build a Big Data server) as compared with its SAP and Oracle competitors. A Closer Look at IBM’s DB2 BLU Acceleration Solution First, it is important to note that IBM tunes its DB2 BLU acceleration for its own hardware – just as Oracle does for its own hardware. SAP, on the other hand, has been designed to run on the commodity (x86) hardware offered by many vendors. As we looked at IBM’s BLU Acceleration data structure we found that it can be used with row or column-based data. In column mode, it has been reported to be 10-50X faster than traditional row-based relational databases. From a memory management perspective, IBM’s DB2 with BLU Acceleration is designed to use in-memory columnar processing that maintains in-memory performance while dynamically moving unused data to/from storage as needed. As for compression, DB2 has long had a compression advantage over other major database competitors (a few years ago we wrote about how some Oracle customers were able to save 40% of their storage costs by taking advantage of DB2 compression). But, in addition to compression efficiency advantages, BLU Acceleration is able to read compressed data (no decompression necessary) while also employing data skipping algorithms to speed-read compressed databases. BLU Acceleration also takes advantage of processor level parallel vector processing to exploit multi-core and SIMD (single instruction, multiple data) parallelism (SIMD instructions help improve parallel performance, helping to produce query results faster than systems that do not exploit SIMD). DB2 with BLU Acceleration is NOT an in-memory database processing environment (like SAP’s HANA) – instead it uses dynamic memory management caching techniques to off-load some data to near proximity fast storage.