A Database Performance Platform DEMO PAPER
Total Page:16
File Type:pdf, Size:1020Kb
SQALPEL: A database performance platform DEMO PAPER M.L. Kersten P. Koutsourakis S. Manegold Y. Zhang CWI MonetDB Solutions CWI MonetDB Solutions Amsterdam Amsterdam Amsterdam Amsterdam ABSTRACT benchmark reports systems reported TPC-C 368 Oracle, IBM DB2, MS SQLserver, Despite their popularity, database benchmarks only high- Sybase, SymfoWARE light a small fraction of the capabilities of any given DBMS. TPC-DI 0 They often do not highlight problematic components en- TPC-DS 1 Intel countered in real life database applications or provide hints TPC-E 77 MS SQLserver TPC-H ≤ SF-300 252 MS SQLserver, Oracle, EXASOL, for further research and engineering. Action Vector 5.0, Sybase, IBM To alleviate this problem we coined discriminative perfor- DB2, Informix, Teradata, Paraccel mance benchmarking as the way to go. It aids in exploring TPC-H SF-1000 4 MS SQLserver, a larger query search space to find performance outliers and TPC-H SF-3000 6 MS SQLserver, Actian Vector 5.0 their underlying cause. The approach is based on deriving TPC-H SF-10000 9 MS SQLserver a domain specific language from a sample complex query to TPC-H SF-30000 1 MS SQLserver TPC-VMS 0 identify and execute a query workload. TPCx-BB 4 Cloudera The demo illustrates sqalpel, a complete platform to col- TPCx-HCI 0 lect, manage and selectively disseminate performance facts, TPCx-HS 0 that enables repeatability studies, and economy of scale by TPCx-IoT 1 Hbase sharing performance experiences. Table 1: TPC benchmarks (http://www.tpc.org/) 1. INTRODUCTION Standard benchmarks have long been a focal point in busi- follows: 1) standard benchmarks are good in underpinning ness to aid customers to make an informed decision about a technical innovations, 2) standard benchmarks are hardly product's expected performance. The Transaction Process- representative for real-life workloads [11], and 3) most bench- ing Counsel (TPC) currently supports several benchmarks marking results are kept private unless it is a success story. ranging from OLTP to IoT. An overview of their active set addresses 2) using a new way of looking at perfor- is shown in Table 1. Surprisingly, the number of publicly sqalpel mance benchmarking and 3) by providing a public repository accessible results remains extremely low. Just a few ven- to collect, administer, and share performance reports. dors go through the rigorous process to obtain results for Discriminative performance benchmarking. In a publication. nutshell, we believe that database performance evaluation In a product development setting, benchmarks provide a should step away from a predefined and frozen query set. yardstick for regression testing. Each new DBMS version For, consider two systems A and B, which may be different or deployment on a new hardware platform ideally shows altogether or merely two versions of the same system. Sys- a better performance. Open-source systems are in that re- tem B may be considered an overall better system, beating spect not different from commercial products. Performance system A on all benchmarked TPC-H queries. This does not stability and quality assurance over releases are as critical. mean that no queries can be handled more efficiently by A. State of affairs. In database research TPC-C and TPC- These queries might simply not be part of the benchmark H are also commonly used to illustrate technical innovation. suite. Or the improvement is just obtained in the restricted Partly because their description is easy to follow and the cases covered by the benchmark. data generators easy to deploy. Therefore, the key questions to consider are \what queries However, after almost four decades of database perfor- perform relatively better on A"? and \what queries run rel- mance assessments the state of affairs can be summarized as atively better on B"? Such queries give clues on the side- effects of new features or identify performance cliffs. We coined the term discriminative benchmark queries [6] and the goal is to find them for any pair of systems quickly. This article is published under a Creative Commons Attribution License Performance repositories. The second hurdle of data- (http://creativecommons.org/licenses/by/3.0/), which permits distribution base benchmarking is the lack of easy reporting and shar- and reproduction in any medium as well allowing derivative works, pro- ing knowledge on experiments conducted. Something akin vided that you attribute the original work to the author(s) and CIDR 2019. 9th Biennial Conference on Innovative Data Systems Research (CIDR ‘19) to a software repository for sharing code. A hurdle to in- January 13-16, 2019 , Asilomar, California, USA. clude product results is the \deWitt" clause in most end- user-license agreements, which seemingly prohibits publish- application. These approaches can be considered static and ing benchmark results for study and research, commentary, labor intensive, as they require the test engineer to provide and professional advice1 and which is, in our laymen's opin- weights and hints up front. ion, certainly not intended by the Copyright Laws2 for shar- Another track pursued is based on genetic algorithms. A ing consumer research information. good example is the open-source project SQLsmith4, which Wouldn't we save an awful lot of energy if experimen- provides a tool to generate random SQL queries by directly tal design and performance data was shared more widely to reading the database schema from the target system. It steer technical innovations? Wouldn't we reduce the cost has been reported to find a series of serious errors, but often of running proof-of-concept projects by sharing experiences using very long runs. Unlike randomized testing and genetic within larger teams or the database community at large? processing, sqalpel guides the system through the search Contributions. In the remainder of this paper we intro- space by morphing the queries in a stepwise fashion. duce sqalpel, a SaaS solution to develop and archive per- Unlike work on compiler technology [8], grammar based formance projects. It steps away from fixed benchmark sets experimentation in the database arena is hindered by the into queries taken from the application and turning them relatively high cost of running a single experiment. Some into a grammar as a description of a much larger query space. preliminary work has focused on generating test data with The system explores this space using a guided random walk enhanced context-free grammars [5] or based on user defined to find the discriminative queries. It leads to the following constraints in the intermediate results [1]. A seminal work contributions: is [10], where massive stochastic testing of several SQL data- base systems was undertaken to improve their correctness. • We extend the state of the art in grammar based data- A mature recent framework to consider performance anal- base performance evaluation. ysis of OLTP workloads is described in [3]. It integrates a handful of performance benchmarks and provides visual an- • We provide a full fledge database performance reposi- alytic tools to assess the impact of concurrent workloads. It tory to share information easily and publicly. relies on the JDBC interface to study mostly multi-user in- teraction with the target systems. It is a pity that over 150 • We bootstrap the platform with a sizable number of project forks of the OLTPbenchmark project5 reported on OLAP cases and products. GitHub have not yet resulted in a similar number of publicly accessible reports to increase community insight. takes comparative performance assessment to a sqalpel The urgent need for a platform to share performance data new level. It is inspired by a tradition in grammar based about products can be illustrated further with the recently testing of software [7]. It assumes that the systems be- started EU project DataBench6, which covers a brought area ing compared understand more-or-less the same SQL dialect of performance benchmarking, and finished EU project Hob- and that a large collection of queries can conveniently be de- bit7, which focused on benchmarking RDF stores. scribed by a grammar. Minor differences in syntax are easily accommodated using dialect sections for the lexical tokens in the grammar specification. 3. SQALPEL PERFORMANCE SPACE A complete software stack is needed to manage perfor- In this section we briefly describe our approach to find mance projects and to deal with sharing potentially confi- discriminative queries based on user-supplied sample queries dential information. We are standing on the shoulders of from their envisioned application setting. We focus on the GitHub3 in the way they provide such a service. This in- specification of experiments and running them against target cludes overall organization of projects and their legal struc- systems. Further details can be found in [6]. ture to make a performance platform valuable to the re- search community. 3.1 Query space grammar Outline. In the remainder of this paper we focus on The key to any sqalpel performance project is a domain the design of sqalpel. Section 2 outlines related work. In specific language G to specify a query (sub) space. All sen- Section 3 we give an overview of the performance benchmark tences in the language derived, i.e., L(G), are candidate generation. Section 4 addresses access control and legalities. experiments to be run against target system(s). Figure 1 Section 5 summarizes the demo being presented. illustrates a query space grammar with seven rules. Each grammar rule is identified by a name and contains a num- 2. BACKGROUND ber of alternatives, i.e., free-format sentences with embedded Grammar based testing has a long history in software en- references to other rules using an EBNF-like encoding. The gineering, in particular in compiler validation, but it also sqalpel syntax is designed to be a concise, readable descrip- remained a niche in database system testing.