A Comparison of Leading Database Storage Engines in Support Of

Total Page:16

File Type:pdf, Size:1020Kb

A Comparison of Leading Database Storage Engines in Support Of East Tennessee State University Digital Commons @ East Tennessee State University Electronic Theses and Dissertations Student Works 5-2013 A Comparison of Leading Database Storage Engines in Support of Online Analytical Processing in an Open Source Environment Gabriel Tocci East Tennessee State University Follow this and additional works at: https://dc.etsu.edu/etd Part of the Databases and Information Systems Commons Recommended Citation Tocci, Gabriel, "A Comparison of Leading Database Storage Engines in Support of Online Analytical Processing in an Open Source Environment" (2013). Electronic Theses and Dissertations. Paper 1111. https://dc.etsu.edu/etd/1111 This Thesis - Open Access is brought to you for free and open access by the Student Works at Digital Commons @ East Tennessee State University. It has been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of Digital Commons @ East Tennessee State University. For more information, please contact [email protected]. A Comparison of Leading Database Storage Engines in Support of Online Analytical Processing in an Open Source Environment _____________________ A thesis presented to the faculty of the Department of Computer and Information Science East Tennessee State University In partial fulfillment of the requirements for the degree Master of Science in Computer and Information Science _____________________ by Gabriel Tocci May 2013 _____________________ Dr. Ronald Zucker, Chair Dr. Don Bailes Dr. Tony Pittarese Keywords: Online Analytical Processing, Open Source, MyISAM, InnoDB ABSTRACT A Comparison of Leading Database Storage Engines in Support of Online Analytical Processing in an Open Source Environment by Gabriel Tocci Online Analytical Processing (OLAP) has become the de facto data analysis technology used in modern decision support systems. It has experienced tremendous growth, and is among the top priorities for enterprises. Open source systems have become an effective alternative to proprietary systems in terms of cost and function. The purpose of the study was to investigate the performance of two leading database storage engines in an open source OLAP environment. Despite recent upgrades in performance features for the InnoDB database engine, the MyISAM database engine is shown to outperform the InnoDB database engine under a standard benchmark. This result was demonstrated in tests that included concurrent user sessions as well as asynchronous user sessions using data sets ranging from 6GB to 12GB. Although MyISAM outperformed InnoDB in all test performed, InnoDB provides ACID compliant transaction technologies are beneficial in a hybrid OLAP/OLTP system. 2 CONTENTS Page ABSTRACT .................................................................................................................................... 2 LIST OF TABLES .......................................................................................................................... 7 LIST OF FIGURES ........................................................................................................................ 8 Chapter 1. INTRODUCTION .................................................................................................................... 10 2. BACKGROUND ...................................................................................................................... 14 2.1 The Open Source Software Development Model ............................................................... 14 2.2 Open Source Database Storage Engines ............................................................................. 17 2.2.1 Storage Engine History................................................................................................. 17 2.2.2 MySql ........................................................................................................................... 18 2.2.2.1 MyISAM ............................................................................................................... 19 2.2.2.2 InnoDB .................................................................................................................. 19 2.3 Online Analytical Processing .............................................................................................. 20 2.3.1 Analytical and Transactional Processing Incompatibilities ......................................... 20 2.3.2 Benefit to End Users ..................................................................................................... 22 2.3.3 Multidimensional Data Model ...................................................................................... 23 2.3.4 OLAP Operations ......................................................................................................... 24 2.3.5 OLAP Engine Architectures ......................................................................................... 25 3 2.3.5.1 ROLAP ................................................................................................................. 25 2.3.5.2 MOLAP................................................................................................................. 28 2.3.5.3 HOLAP ................................................................................................................. 28 2.3.6 Data Homogenization .............................................................................................. 28 2.4 Open Source Online Analytical Processing Engines .......................................................... 29 2.4.1 Mondrian ...................................................................................................................... 29 2.4.2. Palo .............................................................................................................................. 30 2.4.3. Comparison.................................................................................................................. 30 2.5 Industry Standards ............................................................................................................... 30 2.5.1 Java Database Connectivity .......................................................................................... 30 2.5.2 Multidimensional Expressions ..................................................................................... 31 2.4.3 Extensible Markup Language for Analysis .................................................................. 31 2.6 Online Analytical Processing Performance Benchmarks.................................................... 32 2.6.1 Analytical Processing Benchmark ................................................................................ 32 2.6.2 TPC-DS ........................................................................................................................ 33 2.7 Summary ............................................................................................................................. 34 3. EXPERIMENTAL METHODS................................................................................................ 35 3.1 Motivation ........................................................................................................................... 35 3.2 Open Source Compliance .................................................................................................... 36 3.3 Feature Comparison ............................................................................................................ 37 4 3.4 Online Analytical Processing Performance Benchmark ..................................................... 37 3.4.1 Benchmark Overview ................................................................................................... 37 3.4.2 Data Model ................................................................................................................... 37 3.4.3 Database Population ..................................................................................................... 38 3.4.4 Data Analysis................................................................................................................ 39 3.4.5 Query Implementation .................................................................................................. 40 3.4.6. Performance Statistics ................................................................................................. 42 3.5 Server Configuration ....................................................................................................... 43 4. RESULTS ................................................................................................................................. 45 4.1 Feature Summary ................................................................................................................ 45 4.2 TPC-DS Benchmark Results ............................................................................................... 46 4.2.1 TPC-DS Power Test Statistics ...................................................................................... 46 4.2.2 TPC-DS Throughput Test Statistics ............................................................................. 56 4.3 Scaled Data Size Summary ................................................................................................. 65 5. ANALYSIS ............................................................................................................................... 74 5.1 Qualitative Analysis ...........................................................................................................
Recommended publications
  • Mariadb Presentation
    THE VALUE OF OPEN SOURCE MICHAEL ”MONTY” WIDENIUS Entrepreneur, MariaDB Hacker, MariaDB CTO MariaDB Corporation AB 2019-09-25 Seoul 11 Reasons Open Source is Better than Closed Source ● Using open standards (no lock in into proprietary standards) ● Resource friendly; OSS software tend to work on old hardware ● Lower cost; Usually 1/10 of closed source software ● No cost for testing the full software ● Better documentation and more troubleshooting resources ● Better support, in many cases directly from the developers ● Better security, auditability (no trap doors and more eye balls) ● Better quality; Developed together with users ● Better customizability; You can also participate in development ● No vendor lock in; More than one vendor can give support ● When using open source, you take charge of your own future Note that using open source does not mean that you have to become a software producer! OPEN SOURCE, THE GOOD AND THE BAD ● Open source is a better way to develop software ● More developers ● More spread ● Better code (in many cases) ● Works good for projects that can freely used by a lot of companies in their production or products. ● It's very hard to create a profitable company developing an open source project. ● Not enough money to pay developers. ● Hard to get money and investors for most projects (except for infrastructure projects like libraries or daemon services). OPEN SOURCE IS NATURAL OR WHY OPEN SOURCE WORKS ● You use open source because it's less expensive (and re-usable) ● You solve your own problems and get free help and development efforts from others while doing it.
    [Show full text]
  • GIS Features in Mariadb and Mysql What Has Happened in Recent Years?
    GIS features in MariaDB and MySQL What has happened in recent years? Hartmut Holzgraefe Principal Support Engineer at MariaDB Inc. [email protected] August 20, 2016 Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 1 / 35 Overview 1 GIS Introduction 2 MySQL GIS History 3 Other Open Source GIS Databases 4 Performance 5 The End ... Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 2 / 35 GIS Introduction 1 GIS Introduction Examples 2 MySQL GIS History 3 Other Open Source GIS Databases 4 Performance 5 The End ... Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 3 / 35 GIS Data Types Geospatial Information System (GIS) data types describe geometries in a (usually) two-dimensional space. There are several different geometric subtypes: Simple types: POINT, LINESTRING, POLYGON, GEOMETRY Collection types: MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 4 / 35 Spatial Properties Spatial properties of a geometry can be: Coordinates Length Area Is Closed Bounding Rectangle ... Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 5 / 35 Spatial Relationships The most important spatial relationships between two geometries: Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 6 / 35 Examples 1 GIS Introduction Examples 2 MySQL GIS History 3 Other Open Source GIS Databases 4 Performance 5 The
    [Show full text]
  • Mariadb Subscription Services Agreement
    MARIADB CORPORATION MariaDB Subscription Services MariaDB Subscription customers have access to technical support services including Problem Resolution Support, Engineering Support, Consultative Support, Remote Login Support, and Telephone Support for the MariaDB platform (MariaDB server, MariaDB TX for transactions, MariaDB AX for analytics, MariaDB MaxScale, and related products like storage engines) via the Customer Support Portal. Each designated technical contact will receive a Customer Support Portal login (based on the associated email address) that can be used to report new support issues, monitor ongoing issues, or review historical issues. Information regarding making changes to technical contacts can be found in the "Welcome Letter" provided after signup, and is also available in the “Contact Us” section of the Customer Support Portal. If you have issues initially logging into the Customer Support Portal, you will be prompted to email [email protected] for further assistance. If Remote DBA services are purchased, an on-boarding call is scheduled to gather the necessary information for the MariaDB Remote DBA team to remotely access supported products. Information about the architecture, operating systems, database server versions, backup schedules, etc will also be documented during this call. Once the required information has been collected, monitoring software will be installed and setup to alert MariaDB Corporation. Certain alerts such as server availability, replication health, and others will be configured to open issues automatically in the Customer Support Portal. All services are delivered in English. MariaDB Corporation will use reasonable efforts to provide technical support in languages other than English using MariaDB Corporation’s available personnel via voice calls and in-person meetings, but may not have such resources available at all or at the time of the support request.
    [Show full text]
  • Index Selection for In-Memory Databases
    International Journal of Computer Sciences &&& Engineering Open Access Research Paper Volume-3, Issue-7 E-ISSN: 2347-2693 Index Selection for in-Memory Databases Pratham L. Bajaj 1* and Archana Ghotkar 2 1*,2 Dept. of Computer Engineering, Pune University, India, www.ijcseonline.org Received: Jun/09/2015 Revised: Jun/28/2015 Accepted: July/18/2015 Published: July/30/ 2015 Abstract —Index recommendation is an active research area in query performance tuning and optimization. Designing efficient indexes is paramount to achieving good database and application performance. In association with database engine, index recommendation technique need to adopt for optimal results. Different searching methods used for Index Selection Problem (ISP) on various databases and resemble knapsack problem and traveling salesman. The query optimizer reliably chooses the most effective indexes in the vast majority of cases. Loss function calculated for every column and probability is assign to column. Experimental results presented to evidence of our contributions. Keywords— Query Performance, NPH Analysis, Index Selection Problem provides rules for index selections, data structures and I. INTRODUCTION materialized views. Physical ordering of tuples may vary from index table. In cluster indexing, actual physical Relational Database System is very complex and it is design get change. If particular column is hits frequently continuously evolving from last thirty years. To fetch data then clustering index drastically improves performance. from millions of dataset is time consuming job and new In in-memory databases, memory is divided among searching technologies need to develop. Query databases and indexes. Different data structures are used performance tuning and optimization was an area of for indexes like hash, tree, bitmap, etc.
    [Show full text]
  • Amazon Aurora Mysql Database Administrator's Handbook
    Amazon Aurora MySQL Database Administrator’s Handbook Connection Management March 2019 Notices Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents current AWS product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers. © 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved. Contents Introduction .......................................................................................................................... 1 DNS Endpoints .................................................................................................................... 2 Connection Handling in Aurora MySQL and MySQL ......................................................... 3 Common Misconceptions .................................................................................................... 5 Best Practices ...................................................................................................................... 6 Using Smart Drivers ........................................................................................................
    [Show full text]
  • Mysql Cluster – Evaluation and Tests, OCTOBER 2, 2012 1 Mysql Cluster – Evaluation and Tests
    MySQL Cluster – Evaluation and Tests, OCTOBER 2, 2012 1 MySQL Cluster – Evaluation and Tests Michael Raith (B.Sc.), Master-Student F Abstract Websites or web applications, whether they represent shopping systems, on demand services or a social networks, have something in common: data must be stored somewhere and somehow. This job can be achieved by various solutions with very different performance characteristics, e.g. based on simple data files, databases or high performance RAM storage solutions. For today’s popular web applications it is important to handle database operations in a minimum amount of time, because they are struggling with a vast increase in visitors and user generated data. Therefore, a major requirement for modern database application is to handle huge data (also called “big data”) in a short amount of time and to provide high availability for that data. A very popular database application in the open source community is MySQL, which was originally developed by a swedisch company called MySQL AB and is now maintenanced by Oracle. MySQL is shipped in a bundle with the Apache web server and therefore has a large distribution. This database is easily installed, maintained and administrated. By default MySQL is shipped with the MyISAM storage engine, which has good performance on read requests, but a poor one on massive parallel write requests. With appropriate tuning of various database settings, special architecture setups (replication, partitioning, etc.) or other storage engines, MySQL can be turned into a fast database application. For example Wikipedia uses MySQL for their backend data storage. In the lecture “Ultra Large Scale Systems” and “System Engineering” teached by Walter Kriha at Media University Stuttgart, the question “Can a MySQL database application handle more then 3000 database requests per second?” came up some time.
    [Show full text]
  • Innodb 1.1 for Mysql 5.5 User's Guide Innodb 1.1 for Mysql 5.5 User's Guide
    InnoDB 1.1 for MySQL 5.5 User's Guide InnoDB 1.1 for MySQL 5.5 User's Guide Abstract This is the User's Guide for the InnoDB storage engine 1.1 for MySQL 5.5. Beginning with MySQL version 5.1, it is possible to swap out one version of the InnoDB storage engine and use another (the “plugin”). This manual documents the latest InnoDB plugin, version 1.1, which works with MySQL 5.5 and features cutting-edge improvements in performance and scalability. This User's Guide documents the procedures and features that are specific to the InnoDB storage engine 1.1 for MySQL 5.5. It supplements the general InnoDB information in the MySQL Reference Manual. Because InnoDB 1.1 is integrated with MySQL 5.5, it is generally available (GA) and production-ready. WARNING: Because the InnoDB storage engine 1.0 and above introduces a new file format, restrictions apply to the use of a database created with the InnoDB storage engine 1.0 and above, with earlier versions of InnoDB, when using mysqldump or MySQL replication and if you use the older InnoDB Hot Backup product rather than the newer MySQL Enterprise Backup product. See Section 1.4, “Compatibility Considerations for Downgrade and Backup”. For legal information, see the Legal Notices. Document generated on: 2014-01-30 (revision: 37565) Table of Contents Preface and Legal Notices .................................................................................................................. v 1 Introduction to InnoDB 1.1 ............................................................................................................... 1 1.1 Features of the InnoDB Storage Engine ................................................................................ 1 1.2 Obtaining and Installing the InnoDB Storage Engine ............................................................... 3 1.3 Viewing the InnoDB Storage Engine Version Number ............................................................
    [Show full text]
  • Connecting to SAP HANA with Microsoft Excel 2007 Pivottables and ODBO
    Connecting to SAP HANA with Microsoft Excel 2007 PivotTables and ODBO Applies to: ODBO, MDX Query Language, Microsoft Excel, Microsoft Excel 2007, PivotTables for connecting to SAP HANA. Summary This paper provides an overview of SAP HANA MDX. It also includes step-by-step instructions for you to access and analyze SAP HANA data with Microsoft Excel PivotTables. Authors: Amyn Rajan, Darryl Eckstein, George Chow, King Long Tse and Roman Moehl Company: SAP; Simba Technologies Inc. Created on: 2 December 2011 Author Bio About Simba Technologies Simba Technologies Incorporated (http://www.simba.com) builds development tools that make it easier to connect disparate data analysis products to each other via standards such as, ODBC, JDBC, OLE DB, OLE DB for OLAP (ODBO), and XML for Analysis (XMLA). Independent software vendors that want to extend their proprietary architectures to include advanced analysis capabilities look to Simba for strategic data connectivity solutions. Customers use Simba to leverage and extend their proprietary data through high performance, robust, fully customized, standards-based data access solutions that bring out the strengths of their optimized data stores. Through standards-based tools, Simba solves complex connectivity challenges, enabling customers to focus on their core businesses. About SAP SAP helps companies of all sizes and industries run better. From back office to boardroom, warehouse to storefront, desktop to mobile device, SAP empowers people and organizations to work together more efficiently and use business insight more effectively to stay ahead of the competition. We do this by extending the availability of software across on-premise installations, on-demand deployments, and mobile devices.
    [Show full text]
  • Mysql Table Doesn T Exist in Engine
    Mysql Table Doesn T Exist In Engine expensive:resumedIs Chris explicable majestically she outnumber or unlaboriousand prematurely, glowingly when and she short-lists pokes retread her some her gunmetals. ruffian Hirohito denationalise snools pleadingly? finically. Varicose Stanwood Sloane is Some basic functions to tinker with lowercase table name of the uploaded file size of lines in table mysql doesn exits If school has faced such a meadow before, please help herself with develop you fixed it. Hi actually am facing the trial error MySQL error 1932 Table '' doesn't exist in engine All of him sudden when cold try with access my module. This website uses cookies to insist you get the aggregate experience after our website. Incorrect file in my database corruption can not. File or work on your help others failed database is randomly generated unique identifiers, so i may require it work and different database and reload the. MDY file over a external network. Cookies usually has not contain information that allow your track broke down. You want to aws on how can a data is no, and easy to. Can be used for anybody managing a missing. Please double its capital letter promote your MySQL table broke and. Discuss all view Extensions that their available for download. Xampp-mysql Table doesn't exist the engine 1932 I have faced same high and sorted using below would Go to MySQL config file my file at Cxamppmysql. Mysqlfetchassoc supplied argument is nice a valid MySQL result. Config table in the. Column count of table in my problem persists.
    [Show full text]
  • Mysql Backup and Recovery Abstract
    MySQL Backup and Recovery Abstract This is the MySQL Backup and Recovery extract from the MySQL 8.0 Reference Manual. For legal information, see the Legal Notices. For help with using MySQL, please visit the MySQL Forums, where you can discuss your issues with other MySQL users. Document generated on: 2021-10-01 (revision: 70946) Table of Contents Preface and Legal Notices .................................................................................................................. v 1 Backup and Recovery ..................................................................................................................... 1 1.1 Backup and Recovery Types ................................................................................................ 2 1.2 Database Backup Methods ................................................................................................... 5 1.3 Example Backup and Recovery Strategy ............................................................................... 7 1.3.1 Establishing a Backup Policy ..................................................................................... 8 1.3.2 Using Backups for Recovery .................................................................................... 10 1.3.3 Backup Strategy Summary ....................................................................................... 10 1.4 Using mysqldump for Backups ............................................................................................ 10 1.4.1 Dumping Data in SQL Format with mysqldump ........................................................
    [Show full text]
  • In Mysql/Mariadb?
    T h e O W A S P F o u n d a t i o n h t t p : / / w w w . o w a s p . o r g O W A S P E U T o u r B u c h a Do you r e s“GRANT ALL PRIVILEGES” t ... in MySQL/MariaDB? 2 0 1 DevOps Engineer 3 Gabriel PREDA [email protected] @eRadical Co pyr igh t © Th e O W AS P Fo un dat ion Per mi ssi on is gr ant ed to co py, dis tri bu te an d/ or mo dif y thi s do cu me nt un de r the ter ms of the O W AS P Lic en se. 2 DevOps = new BORG DevOps Engineer ??? ● Development – Web Applications (“Certified MySQL Associate”, “Zend Certified Engineer”) – Real Time Analytics ● Operations – MySQL DBA (15+ instances) – Sysadmin (<25 virtual & physical servers) 3 My MySQL● Over 15 MariaDB / TokuDBMariaDB(s) instances ● Statistics in MariaDB – < 1TB from Oct 2012 – < 12G raw data daily – < 12,000,000 events processed daily – < 90,000,000 rows added daily BigData? NO!!! ● I can copy all of that to my laptop ● “Working data set” - less than 1G & less than 7,500,000 rows 4 MySQL History ● 1983 – first version of MySQL created by Monty Wideniuns ● 1994 – MySQL is released OpenSource ● 2004 Oct – MySQL 4.1 GA ● 2005 Oct – InnoDB (Innobase) is bought by Oracle – Black Friday ● 2008 Ian – MySQL AB is bought by Sun (1bn $) ● 2008 Nov – MySQL 5.1 GA ● 2009 Apr – Sun is bought by Oracle (7,4 bn $) ● 2010 Dec – MySQL 5.5 GA ● 2012 Apr – MariaDB 5.5 GA ● 2013 Feb – MySQL 5.6 – first version made by Oracle ● 2013 Feb – MySQL will be replaced by MariaDB in Fedora & OpenSuSE * Max Mether – SkySQL “MySQL and MariaDB: Past, Present and Future” 5 Where are we NOW()? Drizzle MySQL TokuDB (Oracle) (Tokutek) Percona Server (Percona) MariaDB (Monty Program, Brighthouse MariaDB Foundation) (Infobright) Replication: ● Asynchronous InfiniDB ● Semi-synchronous (Calpont) ● Galera Synchronous (Codership) ● Tungsten Replication (Continuent) 6 Elementary..
    [Show full text]
  • ODBC Access to SAP BW
    Back to the Future with RelationalCube – ODBC Access to SAP BW Back to the Future with RelationalCube – ODBC Access to SAP BW Applies To: ODBC / JDBC / SQL Access to SAP BW. Summary SAP R/3 revolutionized ERP in the late 20th century and Business Information Warehouse (“BW”) looks to extend SAP’s leading position in these early days of the 21st century. As SAP is a leading technology company, BW is furnished with modern APIs (OLE DB for OLAP and XML for Analysis) and the usual “legacy” API (RFC BAPI). What is curiously absent from this list is the industry standard ODBC API (or its cousin JDBC). This paper presents a SAP compliant method to retrieve data from SAP BW via ODBC / JDBC / SQL. Authors: Amyn Rajan and George Chow Company: Simba Technologies Inc. Created on: 22 August 2007 Author Bio About Simba Technologies Simba Technologies Incorporated (http://www.simba.com) builds development tools that make it easier to connect disparate data analysis products to each other via standards like ODBC, JDBC, OLE DB, OLE DB for OLAP (ODBO), and XML for Analysis (XMLA). Independent software vendors that want to extend their proprietary architectures to include advanced analysis capabilities look to Simba for strategic data connectivity solutions. Customers use Simba to leverage and extend their proprietary data through high performance, robust, fully customized, standards-based data access solutions that bring out the strengths of their optimized data stores. Through standards-based tools, Simba solves complex connectivity challenges, enabling customers to focus on their core businesses. About the Authors Amyn Rajan is President and CEO of Simba Technologies.
    [Show full text]