
Testing SQL-compliance of current DBMSs Elias Spanos Master of Science School of Informatics University of Edinburgh 2017 Abstract Database Management Systems (DBMSs) are widely used in various fields ranging from online banking, web systems and other online services and are very crucial in the day to day management of data. The correct and efficient operation of such sys- tems is an important factor when choosing the proper DBMS software. In the past few decades, the amount of data has increased exponentially, causing a rapid increase in the demand of such systems that can store, organise and manipulate data. With these requirements in mind, different vendors have implemented their own DBMS. However, since these DBMSs have been implemented with a significant amount of differences, a Standard was proposed in order to provide a common way of using such systems. Even though, a Standard has been introduced and adopted by most DBMS vendors, there are some differences that still exist, probably due to the difficulty of interpreting and implementing all the parts of the Standard, and also due to other issues regarding the performance of their systems. Hence, there is a need for alleviating such a prob- lem by exposing such issues between DBMSs in order to evaluate their conformance to the common Standard. In particular, this project investigates five popular DBMSs implementations and evaluate their conformance to the SQL Standard. A crucial ques- tion that will be investigated by conducting this project is whether DBMSs have been implemented the SQL Standard in the same way. The SQL language should pledge that identical SQL code should always return identical answers when it is evaluated on the same database independently of which DBMS is running on. The aim of this project is the implementation of a random query generator and a comparison tool for investigating and highlighting the differences that may exist among current DBMSs. Further, it aims to provide a detailed explanation in regards to the SQL Standard of potential differences and explain how they might affect the transition between current DBMSs. i Acknowledgements I would like to thank my supervisors Paolo Guagliardo and Leonid Libkin who were always willing to advise me and help me in order to overcome any difficulty. In addition, I want to thank my family who is always by my side. ii Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (Elias Spanos) iii Table of Contents 1 Introduction1 1.1 Motivation................................3 1.2 Related Work..............................3 1.3 Thesis Structure.............................4 2 Background5 2.1 The SQL Standard...........................5 2.2 The SQL Language...........................6 2.3 Commands of SQL...........................6 2.3.1 Missing values......................... 12 2.4 SQL Standard issues.......................... 14 2.5 The Database Management Systems.................. 15 3 Methodology 17 3.1 Methodology.............................. 17 4 Implementation 19 4.1 Internal Implementation........................ 19 4.1.1 Random Query Generator Tool................. 20 4.1.1.1 Configuration file.................. 21 4.1.2 Comparison Tool........................ 23 4.1.3 Random data generator tool.................. 24 5 Experimental Evaluation 26 5.1 The experiment Setup.......................... 26 5.1.1 Performance Evaluation.................... 27 5.2 Experiment Results........................... 28 iv 6 Conclusion 46 6.1 Conclusion............................... 46 6.2 Summary of the findings........................ 47 6.3 Suggestions for future work....................... 48 A Source Code 50 B Compilation & Execution Instructions 51 Bibliography 54 v List of Figures 1.1 Abstract DBMS architecture......................2 2.1 Popularity of modern DBMSs..................... 15 3.1 High level overview of the framework................. 18 4.1 Complete architecture of the framework................ 19 4.2 Abstract Architecture of Comparison Tool............... 23 4.3 Method of importing csv files...................... 25 5.1 Performance Evaluation......................... 27 vi List of Tables 2.1 Aggregation Commands........................8 2.2 Complex Conditions..........................9 2.3 SET Commands............................ 10 2.4 Data Types............................... 11 2.5 String Commands............................ 11 2.6 OR Truth Table............................. 13 2.7 AND Truth Table............................ 13 2.8 NOT Truth Table............................ 13 5.1 Statistics of Generated queries..................... 29 5.2 Difference #1.............................. 30 5.3 Difference #2.............................. 31 5.4 Difference #3.............................. 31 5.5 Difference #4.............................. 32 5.6 Difference #5.............................. 33 5.7 Difference #6.............................. 34 5.8 Difference #7.............................. 35 5.9 Difference #8.............................. 36 5.10 Difference #9.............................. 37 5.11 Difference #10............................. 37 5.12 Difference #11............................. 38 5.13 Difference #12............................. 39 5.14 Difference #13............................. 40 5.15 Difference #14............................. 41 5.16 Difference #15............................. 41 5.17 Difference #16............................. 42 5.18 Difference #17............................. 43 vii 5.19 Difference #18............................. 43 5.20 Difference #19............................. 44 5.21 Difference #20............................. 45 6.1 Summarised Results........................... 47 6.2 Summarised Results........................... 48 viii Chapter 1 Introduction Database Management Systems (DBMSs) have thrived since their first appearance, and they remain the dominant manner for storing various kinds of information. Specif- ically, DBMSs have been extensively used in many fields and has been widely used by almost all companies as they provide a relatively easy way of performing various operations on data, such as insertion, deletion and modification [15, 21, 22]. For this reason, applications can be implemented efficiently and reliably without the need for handling low-level issues such as concurrent and efficient access of data which it is taken cared by DBMS. Hence, the primary role of a DBMS is not only to store data but also to provide a common interface for manipulating it. Figure 1.1 shows from a high-level point of view the structure of a modern DBMS. Moreover, the relational model (RM) is the mainstream model for the DBMSs for managing data and it is by far the most popular model for current DBMSs. It was proposed by Edgar F.Codd in 1969 [6,7] and it has brought a revolution in the area of data management due to its simplicity. A database that uses the relational model is known as a relational database. According to the relational model, data is stored in a database as tables, and each table is composed of rows and columns. Also, each row of a table is known as a tuple based on RM and each column as characteristic or attribute. In fact, in early stages, each database system had its own interface, and migrat- ing applications from one system to another was a slow and difficult task, since an almost complete rewriting of the code would be required. However, as these systems were promising from their first appearance, a standardised language was unavoidable. Structured Query language (SQL) has become a standardised language for querying and managing data and has rapidly become the most widely used DBMS language [4]. Since then, comparable languages have been emerged over the years, however, SQL 1 Chapter 1. Introduction 2 Figure 1.1: Abstract DBMS architecture has persisted to be the dominant language since it has been easy to learn. SQL users and programmers can take advantage of SQL language in order to learn a new lan- guage which is used by essentially all modern DBMSs, and they can write SQL code that with minor changes it can be run on any database system. On the other hand, a sig- nificant disadvantage of programming languages is that each language has its benefits and usage and they cannot always run on any system. In particular, such systems have been extensively studied and tested for their cor- rectness and efficiency. Software testing is an essential approach for testing and eval- uating the quality of such systems [5]. However, implementing a software that can be used to assess complex systems, it usually requires sophisticated and large-scale im- plementation. On the contrary, by automating the process of testing is the only way to have a systematic and effective way of testing such complex systems. Since DBMSs are the most popular systems for storing and manipulating data, many studies have conducted focusing mainly for evaluating their performance. For example, TPC-H is a popular benchmark for evaluating vendors DBMSs implementations [11]. However, this benchmark is designed for analyzing performance and it generates a relatively small number of SQL queries. Chapter 1. Introduction 3 1.1 Motivation The SQL Standard was established a long time ago, and it aims to provide a common interface among modern DBMSs [2, 12]. As each vendor provides its own DBMS implementation,
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages64 Page
-
File Size-