Automata Theory Based Approach to the Join Ordering Problem in Relational Database Systems

Automata Theory based Approach to the Join Ordering Problem in Relational Database Systems Miguel Rodríguez1, Daladier Jabba1, Elias Niño1, Carlos Ardila1 and Yi-Cheng Tu2 1Department of Systems Engineering, Universidad del Norte, KM5 via Puerto Colombia, Barranquilla, Colombia 2Department of Computer Science and Engineering, University of South Florida, Tampa, U.S.A. Keywords: Automata Theory, Query Optimization, Join Ordering Problem. Abstract: The join query optimization problem has been widely addressed in relational database management systems (RDBMS). The problem consists of finding a join order that minimizes the time required to execute a query. Many strategies have been implemented to solve this problem including deterministic algorithms, randomized algorithms, meta-heuristic algorithms and hybrid approaches. Such methodologies deeply depend on the correct configuration of various input parameters. In this paper, a meta-heuristic approach based on the automata theory will be adapted to solve the join-ordering problem. The proposed method requires a single input parameter that facilitates its usage respect to those previously described in the literature. The algorithm was embedded into PostgreSQL and compared with the genetic competitor using the most resent TPC-DS benchmark. The proposed method is supported by experimental results achieving up to 30% faster response time than GEQO in different queries. 1 INTRODUCTION theoretically possible for a small number of relations. When N increases substantially, finding Since the early days of RDBMS the problem of the optimal join order is considered an NP-hard finding a join order to minimize the execution time problem and thus deterministic algorithms cannot of a query has been approached. (Chaudhuri, 1998) find a solution easily. defined large join queries as relational algebra Systems holding workloads from applications queries with N join operations involving N+1 such as decision support systems and business relations when N is greater or equals to 10. intelligence require the ability of joining more than Consecutively the Large Join Query Optimization 10 relations easily. In this paper, a Meta heuristic Problem (LJQOP) was formally addressed as finding approach based on the automata theory that has been a Query Execution Plan (QEP) with a minimum cost effectively used in the solution of the Traveling for a large join query. Salesman Problem (TSP) will be presented and its The LJQOP have been widely addressed and application in the solution of the join ordering many methods have been developed to solve it. problem will be discussed. Finally the automata Randomized algorithms such as iterative based query optimizer proposed in this work will be improvement and simulated annealing, evolutionary tested using the most recent decision support algorithms such as genetic algorithms and Meta benchmark TPC-DS. heuristics such as ant colony optimization are some The remaining parts of this work will be common strategies used in the solution of the distributed as follows. Previous work on solving the problem. LJQOP is discussed in section two. The proposed The solution space of a LJQOP consists of all methodology will be explained in section three. The query trees that answers the query. There are three experimental design and setup used to test the types of query trees that can result from the solution algorithm is going to be exposed in section four. A space: left deep, bushy and right deep. An extended discussion about the results obtained by the discussion about types of query trees is given in algorithm and a comparison analysis between the (Ioannidis and Kang, 1991). proposed method and the PostgreSQL genetic The construction of a LJQOP solution space is optimizer module is showed in section five. Finally Rodríguez M., Jabba D., Niño E., Ardila C. and Tu Y.. 257 Automata Theory based Approach to the Join Ordering Problem in Relational Database Systems. DOI: 10.5220/0004433802570265 In Proceedings of the 2nd International Conference on Data Technologies and Applications (DATA-2013), pages 257-265 ISBN: 978-989-8565-67-9 Copyright c 2013 SCITEPRESS (Science and Technology Publications, Lda.) DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications conclusions and future work are presented. The calculation of the acceptance probability of the simulated annealing algorithm is adopted from the Metropolis algorithm and corresponds to the 2 RELATED WORK following equation. 1, The join ordering problem has been approached in (2.1) , different ways among the years. A literature review presented in (Steinbrunn et al., 1997) provides Different implementations of the simulated detailed information on different approaches to the annealing algorithm have been used to solve the join solution of the problem and classifies them in four ordering problem using different cooling schemas, groups. The first one corresponds to deterministic initial solutions, and solution generation algorithms such as dynamic programming and mechanisms. minimum selectivity algorithm. The second group, The implementation in (Ioannidis and Wong, randomized algorithms, includes simulated 1987) proposed the use of simulated annealing to annealing, iterative improvement, two-phase solve the recursive query optimization problem. The optimization and random sampling. The third group initial state was chosen using semi-naïve consists of genetic algorithms, which encode the evaluation methods and the initial temperature solutions and then uses selection, crossover and was chosen as twice the cost of the initial state. The mutation algorithms. Finally the fourth group is termination criterion of the algorithm is composed of compound of hybrid methods. two parts: the temperature must be below 1 and the Three of the most popular approximate solutions final state must remain the same for four consecutive to the join-ordering problem are simulated stages. The generation mechanism is based on a annealing, genetic algorithms and ant colony transition probability matrix : →0,1 optimization. where each neighbor of the current state has the same probability to be chosen as the next state. 2.1 Simulated Annealing 1 ∈ , ′ || (2.2) The annealing process in physics consists of 0 obtaining low energy states of a solid element being heated. Simulated Annealing takes advantage of the Finally authors suggest the use of two different Metropolis algorithm used to study equilibrium cooling schedules in their implementation. They properties in the microscopically analysis of solids. propose the use of the following equation to control Specifically the Metropolis algorithm generates a the temperature of the system. sequence of states for a solid object. Given an (2.3) element in state i with energy E a new element in state j is produced, if the difference between The function returns values between 0 and 1. energies is below cero, the new state is automatically The first strategy proposed consists of keeping a accepted; otherwise its acceptance will depend on constant value of 0.95 and the second one consists of certain probability based on the temperature the modifying the value of according to Table 1. system is exposed to and a physic constant known as Table 1: Factor to reduce temperature. Boltzmann constant k. Similarly, the simulated annealing algorithm constructs solutions to / combinatorial problems linking solution-generation 2 0.80 alternatives and an acceptance criterion. The states 4 0.85 of the system can be matched to solutions of the 8 0.90 combinatorial problem, and in the same way the cost ∞ 0.95 function of the optimization problem can be seen as the energy cost of the annealing system. Therefore A second approach to query optimization by the simulated annealing algorithm starts exposing simulated annealing is proposed in (Swami and the system to high temperatures and thus accepting Gupta, 1988) where two implementations of solutions that do not improve previous solutions. By simulated annealing are compared to several other terms of a cooling factor, the temperature starts algorithms including perturbation walk, Quasi- lowering until it reaches zero where solutions that do random sampling, local optimization and iterative not improve its parents are not accepted. improvement. The proposed simulated annealing 258 AutomataTheorybasedApproachtotheJoinOrderingProbleminRelationalDatabaseSystems implementation uses an interesting generation mutation operation is essential for the genetic mechanism that combines two different strategies. algorithm to work properly, it must be used The first strategy is swapping which consists of carefully. selecting two positions in the vector and Genetic algorithms have also been used to solve interchanges its values and the second strategy is 3- the query optimization problem as alternative to cycle, which consists of randomly selecting 3 randomized algorithms. The genetic algorithm elements of the actual state and shift them one implemented by the authors of (Bennett et al., 1991) position to the right in a circle. In order to select is the first known genetic algorithm used to which strategy is used to generate the new solution approach the query optimization problem. The at a given iteration, variable ∈0,1 that authors adapted a genetic algorithm used to solve the represents the frequency of swap selection and thus assembly line balancing problem focusing on 1 that represents 3-cycle selection is used. finding an appropriate encoding schema and

Automata Theory Based Approach to the Join Ordering Problem in Relational Database Systems

Language and Automata Theory and Applications

2020 SIGACT REPORT SIGACT EC – Eric Allender, Shuchi Chawla, Nicole Immorlica, Samir Khuller (Chair), Bobby Kleinberg September 14Th, 2020

COMPUTER MODELS and AUTOMATA THEORY in BIOLOGY and MEDICINE Ion C

A Mathematical Theory of Computation?

Introduction to Theory of Computation

A Combinatorial Approach to the Theory of O~-Automata

Combinatorics, Automata and Number Theory

A Theory of Regular Queries

Arxiv:2001.09864V2 [Cs.FL]

Some Contributions to the Algebraic Theory of Automata

Applying Abstract Algebraic Logic to Classical Automata Theory: an Exercise

Introduction to the Theory of Complexity