A Database Performance Platform DEMO PAPER

Total Page:16

File Type:pdf, Size:1020Kb

A Database Performance Platform DEMO PAPER SQALPEL: A database performance platform DEMO PAPER M.L. Kersten P. Koutsourakis S. Manegold Y. Zhang CWI MonetDB Solutions CWI MonetDB Solutions Amsterdam Amsterdam Amsterdam Amsterdam ABSTRACT benchmark reports systems reported TPC-C 368 Oracle, IBM DB2, MS SQLserver, Despite their popularity, database benchmarks only high- Sybase, SymfoWARE light a small fraction of the capabilities of any given DBMS. TPC-DI 0 They often do not highlight problematic components en- TPC-DS 1 Intel countered in real life database applications or provide hints TPC-E 77 MS SQLserver TPC-H ≤ SF-300 252 MS SQLserver, Oracle, EXASOL, for further research and engineering. Action Vector 5.0, Sybase, IBM To alleviate this problem we coined discriminative perfor- DB2, Informix, Teradata, Paraccel mance benchmarking as the way to go. It aids in exploring TPC-H SF-1000 4 MS SQLserver, a larger query search space to find performance outliers and TPC-H SF-3000 6 MS SQLserver, Actian Vector 5.0 their underlying cause. The approach is based on deriving TPC-H SF-10000 9 MS SQLserver a domain specific language from a sample complex query to TPC-H SF-30000 1 MS SQLserver TPC-VMS 0 identify and execute a query workload. TPCx-BB 4 Cloudera The demo illustrates sqalpel, a complete platform to col- TPCx-HCI 0 lect, manage and selectively disseminate performance facts, TPCx-HS 0 that enables repeatability studies, and economy of scale by TPCx-IoT 1 Hbase sharing performance experiences. Table 1: TPC benchmarks (http://www.tpc.org/) 1. INTRODUCTION Standard benchmarks have long been a focal point in busi- follows: 1) standard benchmarks are good in underpinning ness to aid customers to make an informed decision about a technical innovations, 2) standard benchmarks are hardly product's expected performance. The Transaction Process- representative for real-life workloads [11], and 3) most bench- ing Counsel (TPC) currently supports several benchmarks marking results are kept private unless it is a success story. ranging from OLTP to IoT. An overview of their active set addresses 2) using a new way of looking at perfor- is shown in Table 1. Surprisingly, the number of publicly sqalpel mance benchmarking and 3) by providing a public repository accessible results remains extremely low. Just a few ven- to collect, administer, and share performance reports. dors go through the rigorous process to obtain results for Discriminative performance benchmarking. In a publication. nutshell, we believe that database performance evaluation In a product development setting, benchmarks provide a should step away from a predefined and frozen query set. yardstick for regression testing. Each new DBMS version For, consider two systems A and B, which may be different or deployment on a new hardware platform ideally shows altogether or merely two versions of the same system. Sys- a better performance. Open-source systems are in that re- tem B may be considered an overall better system, beating spect not different from commercial products. Performance system A on all benchmarked TPC-H queries. This does not stability and quality assurance over releases are as critical. mean that no queries can be handled more efficiently by A. State of affairs. In database research TPC-C and TPC- These queries might simply not be part of the benchmark H are also commonly used to illustrate technical innovation. suite. Or the improvement is just obtained in the restricted Partly because their description is easy to follow and the cases covered by the benchmark. data generators easy to deploy. Therefore, the key questions to consider are \what queries However, after almost four decades of database perfor- perform relatively better on A"? and \what queries run rel- mance assessments the state of affairs can be summarized as atively better on B"? Such queries give clues on the side- effects of new features or identify performance cliffs. We coined the term discriminative benchmark queries [6] and the goal is to find them for any pair of systems quickly. This article is published under a Creative Commons Attribution License Performance repositories. The second hurdle of data- (http://creativecommons.org/licenses/by/3.0/), which permits distribution base benchmarking is the lack of easy reporting and shar- and reproduction in any medium as well allowing derivative works, pro- ing knowledge on experiments conducted. Something akin vided that you attribute the original work to the author(s) and CIDR 2019. 9th Biennial Conference on Innovative Data Systems Research (CIDR ‘19) to a software repository for sharing code. A hurdle to in- January 13-16, 2019 , Asilomar, California, USA. clude product results is the \deWitt" clause in most end- user-license agreements, which seemingly prohibits publish- application. These approaches can be considered static and ing benchmark results for study and research, commentary, labor intensive, as they require the test engineer to provide and professional advice1 and which is, in our laymen's opin- weights and hints up front. ion, certainly not intended by the Copyright Laws2 for shar- Another track pursued is based on genetic algorithms. A ing consumer research information. good example is the open-source project SQLsmith4, which Wouldn't we save an awful lot of energy if experimen- provides a tool to generate random SQL queries by directly tal design and performance data was shared more widely to reading the database schema from the target system. It steer technical innovations? Wouldn't we reduce the cost has been reported to find a series of serious errors, but often of running proof-of-concept projects by sharing experiences using very long runs. Unlike randomized testing and genetic within larger teams or the database community at large? processing, sqalpel guides the system through the search Contributions. In the remainder of this paper we intro- space by morphing the queries in a stepwise fashion. duce sqalpel, a SaaS solution to develop and archive per- Unlike work on compiler technology [8], grammar based formance projects. It steps away from fixed benchmark sets experimentation in the database arena is hindered by the into queries taken from the application and turning them relatively high cost of running a single experiment. Some into a grammar as a description of a much larger query space. preliminary work has focused on generating test data with The system explores this space using a guided random walk enhanced context-free grammars [5] or based on user defined to find the discriminative queries. It leads to the following constraints in the intermediate results [1]. A seminal work contributions: is [10], where massive stochastic testing of several SQL data- base systems was undertaken to improve their correctness. • We extend the state of the art in grammar based data- A mature recent framework to consider performance anal- base performance evaluation. ysis of OLTP workloads is described in [3]. It integrates a handful of performance benchmarks and provides visual an- • We provide a full fledge database performance reposi- alytic tools to assess the impact of concurrent workloads. It tory to share information easily and publicly. relies on the JDBC interface to study mostly multi-user in- teraction with the target systems. It is a pity that over 150 • We bootstrap the platform with a sizable number of project forks of the OLTPbenchmark project5 reported on OLAP cases and products. GitHub have not yet resulted in a similar number of publicly accessible reports to increase community insight. takes comparative performance assessment to a sqalpel The urgent need for a platform to share performance data new level. It is inspired by a tradition in grammar based about products can be illustrated further with the recently testing of software [7]. It assumes that the systems be- started EU project DataBench6, which covers a brought area ing compared understand more-or-less the same SQL dialect of performance benchmarking, and finished EU project Hob- and that a large collection of queries can conveniently be de- bit7, which focused on benchmarking RDF stores. scribed by a grammar. Minor differences in syntax are easily accommodated using dialect sections for the lexical tokens in the grammar specification. 3. SQALPEL PERFORMANCE SPACE A complete software stack is needed to manage perfor- In this section we briefly describe our approach to find mance projects and to deal with sharing potentially confi- discriminative queries based on user-supplied sample queries dential information. We are standing on the shoulders of from their envisioned application setting. We focus on the GitHub3 in the way they provide such a service. This in- specification of experiments and running them against target cludes overall organization of projects and their legal struc- systems. Further details can be found in [6]. ture to make a performance platform valuable to the re- search community. 3.1 Query space grammar Outline. In the remainder of this paper we focus on The key to any sqalpel performance project is a domain the design of sqalpel. Section 2 outlines related work. In specific language G to specify a query (sub) space. All sen- Section 3 we give an overview of the performance benchmark tences in the language derived, i.e., L(G), are candidate generation. Section 4 addresses access control and legalities. experiments to be run against target system(s). Figure 1 Section 5 summarizes the demo being presented. illustrates a query space grammar with seven rules. Each grammar rule is identified by a name and contains a num- 2. BACKGROUND ber of alternatives, i.e., free-format sentences with embedded Grammar based testing has a long history in software en- references to other rules using an EBNF-like encoding. The gineering, in particular in compiler validation, but it also sqalpel syntax is designed to be a concise, readable descrip- remained a niche in database system testing.
Recommended publications
  • Query All Tables in a Schema
    Query All Tables In A Schema Orchidaceous and unimpressionable Thor often air-dried some iceberg imperially or amortizing knee-high. Dotier Griffin smatter blindly. Zionism Danie industrializing her interlay so opaquely that Timmie exserts very fustily. Redshift Show Tables How your List Redshift Tables FlyData. How to query uses mutexes will only queried data but what is fine and built correctly: another advantage we can easily access a string. Exception to query below queries to list of. 1 Do amount of emergency following Select Tools List Tables On the toolbar click 2 In the. How can easily access their business. SQL to Search for her VALUE data all COLUMNS of all TABLES in. This system table has the user has a string value? Search path for improving our knowledge and dedicated professional with ai model for redshift list all constraints, views using schemas. Sqlite_temp_schema works without loop. True if you might cause all. Optional message bit after finishing an awesome blog. Easy way are a list all objects have logs all databases do you can be logged in lowercase, fully managed automatically by default description form. How do not running sap, and sizes of all object privileges granted, i understood you all redshift of how about data professional with sqlite? Any questions or. The following research will bowl the T-SQL needed to change every rule change the WHERE clause define the schema you need and replace. Lists all of schema name is there you can be specified on other roles held by email and systems still safe even following command? This data scientist, thanx for schemas that you learn from sysindexes as sqlite.
    [Show full text]
  • A Platform for Networked Business Analytics BUSINESS INTELLIGENCE
    BROCHURE A platform for networked business analytics BUSINESS INTELLIGENCE Infor® Birst's unique networked business analytics technology enables centralized and decentralized teams to work collaboratively by unifying Leveraging Birst with existing IT-managed enterprise data with user-owned data. Birst automates the enterprise BI platforms process of preparing data and adds an adaptive user experience for business users that works across any device. Birst networked business analytics technology also enables customers to This white paper will explain: leverage and extend their investment in ■ Birst’s primary design principles existing legacy business intelligence (BI) solutions. With the ability to directly connect ■ How Infor Birst® provides a complete unified data and analytics platform to Oracle Business Intelligence Enterprise ■ The key elements of Birst’s cloud architecture Edition (OBIEE) semantic layer, via ODBC, ■ An overview of Birst security and reliability. Birst can map the existing logical schema directly into Birst’s logical model, enabling Birst to join this Enterprise Data Tier with other data in the analytics fabric. Birst can also map to existing Business Objects Universes via web services and Microsoft Analysis Services Cubes and Hyperion Essbase cubes via MDX and extend those schemas, enabling true self-service for all users in the enterprise. 61% of Birst’s surveyed reference customers use Birst as their only analytics and BI standard.1 infor.com Contents Agile, governed analytics Birst high-performance in the era of
    [Show full text]
  • Exasol User Manual Version 6.0.8
    Exasol User Manual Version 6.0.8 Empowering analytics. Experience the world´s fastest, most intelligent, in-memory analytics database. Copyright © 2018 Exasol AG. All rights reserved. The information in this publication is subject to change without notice. EXASOL SHALL NOT BE HELD LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN NOR FOR ACCIDENTAL OR CONSEQUENTIAL DAMAGES RES- ULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF. No part of this publication may be photocopied or reproduced in any form without prior written consent from Exasol. All named trademarks and registered trademarks are the property of their respective owners. Exasol User Manual Table of Contents Foreword ..................................................................................................................................... ix Conventions ................................................................................................................................. xi Changes in Version 6.0 ................................................................................................................. xiii 1. What is Exasol? .......................................................................................................................... 1 2. SQL reference ............................................................................................................................ 5 2.1. Basic language elements .................................................................................................... 5 2.1.1. Comments
    [Show full text]
  • SQL Server Column Store Indexes Per-Åke Larson, Cipri Clinciu, Eric N
    SQL Server Column Store Indexes Per-Åke Larson, Cipri Clinciu, Eric N. Hanson, Artem Oks, Susan L. Price, Srikumar Rangarajan, Aleksandras Surna, Qingqing Zhou Microsoft {palarson, ciprianc, ehans, artemoks, susanpr, srikumar, asurna, qizhou}@microsoft.com ABSTRACT SQL Server column store indexes are “pure” column stores, not a The SQL Server 11 release (code named “Denali”) introduces a hybrid, because they store all data for different columns on new data warehouse query acceleration feature based on a new separate pages. This improves I/O scan performance and makes index type called a column store index. The new index type more efficient use of memory. SQL Server is the first major combined with new query operators processing batches of rows database product to support a pure column store index. Others greatly improves data warehouse query performance: in some have claimed that it is impossible to fully incorporate pure column cases by hundreds of times and routinely a tenfold speedup for a store technology into an established database product with a broad broad range of decision support queries. Column store indexes are market. We’re happy to prove them wrong! fully integrated with the rest of the system, including query To improve performance of typical data warehousing queries, all a processing and optimization. This paper gives an overview of the user needs to do is build a column store index on the fact tables in design and implementation of column store indexes including the data warehouse. It may also be beneficial to build column enhancements to query processing and query optimization to take store indexes on extremely large dimension tables (say more than full advantage of the new indexes.
    [Show full text]
  • Replication at the Speed of Change – a Fast, Scalable Replication Solution for Near Real-Time HTAP Processing
    Replication at the Speed of Change – a Fast, Scalable Replication Solution for Near Real-Time HTAP Processing Dennis Butterstein Daniel Martin Jia Zhong Lingyun Wang Knut Stolze Felix Beier IBM Silicon Valley Lab IBM Research & Development GmbH 555 Bailey Ave Schonaicher¨ Strasse 220 San Jose, CA 95141, United States 71032 Boblingen,¨ Germany [email protected] [email protected] [email protected] fdanmartin,stolze,[email protected] ABSTRACT engine was first replaced with Netezza (IBM PureData Sys- 2 The IBM Db2 Analytics Accelerator (IDAA) is a state- tem for Analytics ). Netezza's design is to always use table of-the art hybrid database system that seamlessly extends scans on all of its disks in parallel, leveraging FPGAs to ap- the strong transactional capabilities of Db2 for z/OS with ply decompression, projection and filtering operations before the very fast column-store processing in Db2 Warehouse. the data hits the main processors of the cluster nodes. The The Accelerator maintains a copy of the data from Db2 for row-major organized tables were hash-distributed across all z/OS in its backend database. Data can be synchronized nodes (and disks) based on selected columns (to facilitate at a single point in time with a granularity of a table, one co-located joins) or in a random fashion. The engine itself or more of its partitions, or incrementally as rows changed is optimized for table scans; besides Zone Maps there are no using replication technology. other structures (e. g., indices) that optimize the processing IBM Change Data Capture (CDC) has been employed as of predicates.
    [Show full text]
  • A Peek Under the Hood
    White paper Technical A Peek under the hood White paper Technical Contents A peek under the hood 01 02 03 Introduction 3 Being really fast 5 Providing a Great User Experience 12 Massively Parallel Processing MPP 5 Self-Optimization 13 Large-Scale In-Memory Advanced Analytics 04 Architecture 9 and Data Science 14 Supporting Business Integration Filters, Joins and Sorting 10 and Day-to-Day Operation 16 Query Optimizer and Interfaces and Tool Query Cache 11 Integration 17 Data Ingestion and Data Integration 18 The Virtual Schema Framework for Data Virtualization & 05 Hybrid Clouds 20 Summary 13 Fail Safety, Dual Data Center Operation and Backup/Restore 24 SQL Preprocessor 25 01 White paper Technical 3 Introduction Exasol was founded in founders recognized that database designed specifically Nuremberg, Germany, in the year new opportunities were made for analytics. Exasol holds 2000 – a time when two trends possible by these trends. With performance records in the in hardware were starting to RAM falling in cost and rising in TPC-H online transaction emerge: capacity and cluster computing processing benchmark from being merely a commodity, it the Transaction Processing Major improvements in was now conceivable to apply Performance Council (TPC) for processing power were no the principles and architectures decision-support databases, longer coming from ever of high-performance computing outperforming competitors by increasing clock speeds of to database design. In the years orders of magnitudes and scaling central processing units (CPUs), that followed, the company up to hundreds of terabytes of but instead from parallel and exclusively focused on delivering data. distributed systems.
    [Show full text]
  • EXASOL AG Our History – Inventing World’S Fastest In-Memory Database
    The Most Powerful In-memory Analytic Database Introduction @ Sphinx IT in Vienna 25.11.2016 © 2016 EXASOL AG Our history – inventing world’s fastest in-memory database 2008 2012 Record in Inclusion in Gartner’s 2000 TPC-H Benchmark „Magic Quadrant for Company foundation („Oracle dethroned“) Data Management Systems“ 90ies 2006 2010 2014 early Research Success Pilot Customer Karstadt- Most Successful Vendor of Successful global expansion, (University Erlangen- Quelle uses EXASolution in analytical database systems in 400+ customers across 12 countries Nürnberg) Production Germany (BARC) 100TB TPC-H benchmark © 2016 EXASOL AG Great recognition in the market 2016 © 2016 EXASOL AG What Gartner says about EXASOL “EXASOL is a prime example of what Gartner considers to be the future of DBMS” Source: Gartner © 2016 EXASOL AG Why would you be looking for a new Database Performance/ Pricing Issues with New Requests for Existing DWH agile or predictive Analytics Changing Plattforms or Regulatory Issues growing DataSources © 2016 EXASOL AG King: leading interactive entertainment company . Analyzes customer behavior . Optimizes game revenues . Lots and lots of data . 100 million daily active users . 1 billion game plays per day . 10 billion events per day EXASOL database: 200TB © 2016 EXASOL AG Zalando: Rising star of e-commerce . Online fashion retailer with 14m+ customers across Europe . 150,000 products available online . EXASOL complements DWH to enable fast analytics and reporting . Database optimizes stock availability, returns process and targeted marketing EXASOL database: 15TB © 2016 EXASOL AG Adidas: CRM . Several projects in different regions (Europe, USA, Russia) . Agile BI -> flexible reporting functionality for quick projects . BW on HANA: “Cruise liner” .
    [Show full text]
  • IBM Cognos Analytics Version 11.1 : Administration and Security Guide Chapter 1
    IBM Cognos Analytics Version 11.1 Administration and Security Guide IBM © Product Information This document applies to IBM Cognos Analytics version 11.1.0 and may also apply to subsequent releases. Copyright Licensed Materials - Property of IBM © Copyright IBM Corp. 2005, 2021. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at " Copyright and trademark information " at www.ibm.com/legal/copytrade.shtml. The following terms are trademarks or registered trademarks of other companies: • Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. • Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. • Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. • UNIX is a registered trademark of The Open Group in the United States and other countries. • Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Microsoft product screen shot(s) used with permission from Microsoft. © Copyright International Business Machines Corporation . US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
    [Show full text]
  • Columnar Storage in SQL Server 2012
    Columnar Storage in SQL Server 2012 Per-Ake Larson Eric N. Hanson Susan L. Price [email protected] [email protected] [email protected] Abstract SQL Server 2012 introduces a new index type called a column store index and new query operators that efficiently process batches of rows at a time. These two features together greatly improve the performance of typical data warehouse queries, in some cases by two orders of magnitude. This paper outlines the design of column store indexes and batch-mode processing and summarizes the key benefits this technology provides to customers. It also highlights some early customer experiences and feedback and briefly discusses future enhancements for column store indexes. 1 Introduction SQL Server is a general-purpose database system that traditionally stores data in row format. To improve performance on data warehousing queries, SQL Server 2012 adds columnar storage and efficient batch-at-a- time processing to the system. Columnar storage is exposed as a new index type: a column store index. In other words, in SQL Server 2012 an index can be stored either row-wise in a B-tree or column-wise in a column store index. SQL Server column store indexes are “pure” column stores, not a hybrid, because different columns are stored on entirely separate pages. This improves I/O performance and makes more efficient use of memory. Column store indexes are fully integrated into the system. To improve performance of typical data warehous- ing queries, all a user needs to do is build a column store index on the fact tables in the data warehouse.
    [Show full text]
  • Fastest Analytics Database
    Exasol at a Glance Opening Introducing Exasol Three ways to use Exasol for acceleration Summary Performance limitations Performance is king for many mission-critical analytics workloads such as fraud and risk analysis, compliance reporting, and real-time customer analytics. Delayed time to analytics output can lead to catastrophic outcomes including increased risk exposure, steep fines, and customer churn. As more businesses become data-driven, more people than ever before need access to analytics. But current data warehouses seldom scale well with increased use, nor do they perform well enough to support those time-sensitive analytics workloads due to their suboptimal architecture. As a result, the analytics team must constantly tune Opening performance, a time-consuming and costly effort. Lack of support for modern analytics use cases Despite nearly four decades of data warehouse advancements, the majority of enterprises still struggle to achieve tangible The current business climate has many unforeseen questions, and organizations are ROI for their analytics investments. In fact, a number of recent increasingly turning to their data to find the best possible answers, fast. Those changing studies have shown that over 80% are not satisfied with the business dynamics drive the need for new types of analytics use cases such as ad hoc results of their data warehousing initiatives. and real-time analysis. Given the complex, time-sensitive nature of these modern analytics workloads, analytics teams must put other projects on hold to support them, resulting in even more manual performance tuning and reconfiguration. What is preventing data warehouses from delivering on their promise? Difficulty operationalizing data science workloads Despite the hype and investment going into machine learning, 85% of data science projects fail.
    [Show full text]
  • An Analytics Database Company to Watch
    Vendor Profile Exasol: An Analyt ics Dat abase Company t o Wat ch Philip Carnelley Erica Spinoni David Wells IDC OPINION Exasol is a Germany-headquartered analytics database management software company founded in 2000 in Nuremberg, with offices across Europe and the U.S. and recently growing rapidly on both sides of the Atlantic. With 20 years of experience in distributed database technology and a significant customer base, Exasol is becoming an influential player in the in-memory analytics database market. IN THIS VENDOR PROFILE This IDC Vendor Profile analyzes the success and the market positioning of Exasol, a Germany- headquartered analytics database management software company. From its foundation, Exasol planned to build database management software designed to take advantage of the increasing affordability of both parallel processing hardware platforms and RAM. SITUATION OVERVIEW Company Overview Exasol is an analytics database management software company founded 20 years ago in Nuremberg, Germany. In recent years, it has transformed from a small start-up to a company with an international footprint, principally in Europe and the U.S. In the nine months to November 2019, Exasol's recurring revenue grew over 45%, and IDC estimates it exceeded $23 million in 2019, with around a fifth of that coming from the U.S. Exasol's customer base consists mainly of large commercial enterprises with complex analytical needs from industries such as retail, financial services, media and publishing, and ecommerce — its entire range of customers is quite broad. The company has offices in both Europe and the U.S., providing support for its 165+ customers and over 500 installations.
    [Show full text]
  • Exasolution User Manual
    The most powerful engine for your analytics! EXASolution User Manual Version 4.2.11 Copyright © 2014 EXASOL AG. All rights reserved. The information in this publication is subject to change without notice. EXASOL SHALL NOT BE HELD LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN NOR FOR ACCIDENTAL OR CONSEQUENTIAL DAMAGES RES- ULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF. No part of this publication may be photocopied or reproduced in any form without prior written consent from EXASOL. All named trademarks and registered trademarks are the property of their respective owners. EXASolution User Manual Table of Contents Foreword ..................................................................................................................................... ix Conventions ................................................................................................................................. xi Changes in Version 4.2 ................................................................................................................. xiii 1. What is EXASolution? ................................................................................................................. 1 2. SQL reference ............................................................................................................................ 5 2.1. Basic language elements .................................................................................................... 5 2.1.1. Comments in SQL ................................................................................................
    [Show full text]