A Comparison of Leading Database Storage Engines in Support Of
Total Page:16
File Type:pdf, Size:1020Kb
East Tennessee State University Digital Commons @ East Tennessee State University Electronic Theses and Dissertations Student Works 5-2013 A Comparison of Leading Database Storage Engines in Support of Online Analytical Processing in an Open Source Environment Gabriel Tocci East Tennessee State University Follow this and additional works at: https://dc.etsu.edu/etd Part of the Databases and Information Systems Commons Recommended Citation Tocci, Gabriel, "A Comparison of Leading Database Storage Engines in Support of Online Analytical Processing in an Open Source Environment" (2013). Electronic Theses and Dissertations. Paper 1111. https://dc.etsu.edu/etd/1111 This Thesis - Open Access is brought to you for free and open access by the Student Works at Digital Commons @ East Tennessee State University. It has been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of Digital Commons @ East Tennessee State University. For more information, please contact [email protected]. A Comparison of Leading Database Storage Engines in Support of Online Analytical Processing in an Open Source Environment _____________________ A thesis presented to the faculty of the Department of Computer and Information Science East Tennessee State University In partial fulfillment of the requirements for the degree Master of Science in Computer and Information Science _____________________ by Gabriel Tocci May 2013 _____________________ Dr. Ronald Zucker, Chair Dr. Don Bailes Dr. Tony Pittarese Keywords: Online Analytical Processing, Open Source, MyISAM, InnoDB ABSTRACT A Comparison of Leading Database Storage Engines in Support of Online Analytical Processing in an Open Source Environment by Gabriel Tocci Online Analytical Processing (OLAP) has become the de facto data analysis technology used in modern decision support systems. It has experienced tremendous growth, and is among the top priorities for enterprises. Open source systems have become an effective alternative to proprietary systems in terms of cost and function. The purpose of the study was to investigate the performance of two leading database storage engines in an open source OLAP environment. Despite recent upgrades in performance features for the InnoDB database engine, the MyISAM database engine is shown to outperform the InnoDB database engine under a standard benchmark. This result was demonstrated in tests that included concurrent user sessions as well as asynchronous user sessions using data sets ranging from 6GB to 12GB. Although MyISAM outperformed InnoDB in all test performed, InnoDB provides ACID compliant transaction technologies are beneficial in a hybrid OLAP/OLTP system. 2 CONTENTS Page ABSTRACT .................................................................................................................................... 2 LIST OF TABLES .......................................................................................................................... 7 LIST OF FIGURES ........................................................................................................................ 8 Chapter 1. INTRODUCTION .................................................................................................................... 10 2. BACKGROUND ...................................................................................................................... 14 2.1 The Open Source Software Development Model ............................................................... 14 2.2 Open Source Database Storage Engines ............................................................................. 17 2.2.1 Storage Engine History................................................................................................. 17 2.2.2 MySql ........................................................................................................................... 18 2.2.2.1 MyISAM ............................................................................................................... 19 2.2.2.2 InnoDB .................................................................................................................. 19 2.3 Online Analytical Processing .............................................................................................. 20 2.3.1 Analytical and Transactional Processing Incompatibilities ......................................... 20 2.3.2 Benefit to End Users ..................................................................................................... 22 2.3.3 Multidimensional Data Model ...................................................................................... 23 2.3.4 OLAP Operations ......................................................................................................... 24 2.3.5 OLAP Engine Architectures ......................................................................................... 25 3 2.3.5.1 ROLAP ................................................................................................................. 25 2.3.5.2 MOLAP................................................................................................................. 28 2.3.5.3 HOLAP ................................................................................................................. 28 2.3.6 Data Homogenization .............................................................................................. 28 2.4 Open Source Online Analytical Processing Engines .......................................................... 29 2.4.1 Mondrian ...................................................................................................................... 29 2.4.2. Palo .............................................................................................................................. 30 2.4.3. Comparison.................................................................................................................. 30 2.5 Industry Standards ............................................................................................................... 30 2.5.1 Java Database Connectivity .......................................................................................... 30 2.5.2 Multidimensional Expressions ..................................................................................... 31 2.4.3 Extensible Markup Language for Analysis .................................................................. 31 2.6 Online Analytical Processing Performance Benchmarks.................................................... 32 2.6.1 Analytical Processing Benchmark ................................................................................ 32 2.6.2 TPC-DS ........................................................................................................................ 33 2.7 Summary ............................................................................................................................. 34 3. EXPERIMENTAL METHODS................................................................................................ 35 3.1 Motivation ........................................................................................................................... 35 3.2 Open Source Compliance .................................................................................................... 36 3.3 Feature Comparison ............................................................................................................ 37 4 3.4 Online Analytical Processing Performance Benchmark ..................................................... 37 3.4.1 Benchmark Overview ................................................................................................... 37 3.4.2 Data Model ................................................................................................................... 37 3.4.3 Database Population ..................................................................................................... 38 3.4.4 Data Analysis................................................................................................................ 39 3.4.5 Query Implementation .................................................................................................. 40 3.4.6. Performance Statistics ................................................................................................. 42 3.5 Server Configuration ....................................................................................................... 43 4. RESULTS ................................................................................................................................. 45 4.1 Feature Summary ................................................................................................................ 45 4.2 TPC-DS Benchmark Results ............................................................................................... 46 4.2.1 TPC-DS Power Test Statistics ...................................................................................... 46 4.2.2 TPC-DS Throughput Test Statistics ............................................................................. 56 4.3 Scaled Data Size Summary ................................................................................................. 65 5. ANALYSIS ............................................................................................................................... 74 5.1 Qualitative Analysis ...........................................................................................................