
Translating and Optimizing SQL Queries Having Aggregates Giinter von Biiltzingsloewen Forschungszentrum Informatik an der Universitiit Karlsruhe Haid-und-Neu-Str. 10-14, D-7500 Karlsruhe, West Germany Abstract is at least ae powerful as SQL. Two well known relational query languages that have a sound theoretical foundation are relational In this paper, we give a precise definition of the semantics of algebra and relational calculus. As SQL is more closely related to SQL queries having aggregate functions, identify the problems relational calculus, we define the semantics of SQL by translation associated with the optimisation of such queries and give some into calculus. However, relational calculus has to be extended in solutions. The semantics of SQL queries is defined by translating order to deal with aggregate functions and null values. The op them into expressions of an extended relational calculus (exten- timiation of SQL queries is thus reduced to the optimiaation of sions are necessary for a correct treatment of aggregate functions relational calculus queries, which makes the following results also and null values). The discussion of the optimization problems is applicable to other query languages based on relational calculus based on a new transformation of a relational calculus expres- (e.g. QUEL). sion into relational algebra. By investigating the transformation To develop an operator tree model for SQL and extended of aggregate functions we are able to identify two major prob- relational calculus, an algebraic query representation has to be lems: correct integration of the values of aggregate functions found; the basic problem here is to identify a suitable set of al- applied to empty relations and unnecessary computation of un- gebraic operators. A possible choice is relational algebra with needed function values. To solve these problems we propose to aggregate functions developed by [Klug82], with an extension to interpret an aggregate function applied to a calculus expression cover null values. To rtudy the problem whether thii is a good with some free variables as a function on the attributes of these choice, we present a translation of relational calculus queries variables that are referenced in the expression. Doing so, we are into relational algebra. Investigating the translation of aggre- able to develop several new processing strategies that should be gate functions, we are able to identify two major problems of considered by an optimiser. the representation in relational algebra: correct integration of the values of aggregate functions applied to empty relations and duplicated computation of certain function values. 1 Introduction To solve these problems we propose to interpret an aggregate function applied to a calculus expression with some free variables The SQL query language has become the standard relational as a function F on the attributes of these variables that are ref- manipulation language, as is reflected by ongoing standardisa- erenced in the expression, and to include selection using such tion activities [ANSI85]. An important feature of SQL are ag- a function into relational algebra. We develop a general pm gregate functions like sum, average, minimum etc. However, the cessing strategy using this approach and present several special processing of queries with aggregate functions is not well un- instances of the general strategy that are intended to approxi- derstood. A formal definition of the semantics of SQL queries mate the minimal representation of F (i.e. each function value having aggregates is still lacking, as is a unified operator tree is represented only once for a ret of arguments). model covering this class of queries [Kies 861. This paper is an The major contributions of our work are threefold: first, the attempt to solve these problems. extension of relational algebra and relational calculus to cover In order to give a formal definition of the semantics of SQL a significant subset of SQL. Second, the translations from SQL queries, we have to translate them into a formal language which into calculus and from calculus into algebra, showing that the extended versions of relational algebra and relational calculus have the same expressive power. Third, the development of a new general processing strategy for aggregate functions, which may form the foundation of an operator tree model for the class Permission to copy without fee all or part of this material is of queries considered. granted provided that the copies are not made or distributed for While this work was primarily motivated by [K&85] who direct commercial advantage. the VLDB copyright notice and the illustrated the semantic problems associated with efficient prc+ cessing of queries having aggregates, several other papers are title of the publication and its date appear, and notice is given that relevant to our discussion. [Kim82,CeGo85, Levi851 are con- copying is by permission of the Very Large Data Base Bndow- cerned with the transformation of an SQL query into some kind ment To copy otherwise, or to republish, requires a fee and/or spe- of normal form which is more suitable for further optimiiation. cial permission from the Endowment. [Kim821 describes a translation from SQL into SQL, transform- Proceedings of the 13th VLDB Conference, Brighton 1987 235 ing one complex query into several simpler ones. [Levi851 pro- A schema is a sequence <RI,. .,RN> of relation schemes. An in- poses a translation from SQL into QUEL and [CeGo85] gives stance I of schema <RI,. .,RN> is a sequence <ri,. .,r,v>, where a translation from SQL into relational algebra having aggregate for each i=l ,. .,N, r; is a relation over scheme R;. Throughout functions. All have in common that they treat only subsets of the this paper, one fixed schema <RI,. .,RN> is assumed. SQL query language. Subqueries may contain either no GROUP An aggregate function fcAgg is a function f: R + fi, where BY - HAVING clause [Kim82,CeGo85] or no referential variables R is the set of all relations. To translate SQL queries into re- [Levi851 (i.e. variables declared in an outer query block). The lational calculus we will need aggregate functions of the form SQL feature of null values and query evaluation using three- f = agg(Ai), where agg E {sum, tin, maz, aug, count} valued logic is generally ignored. Furthermore, arithmetic oper- and A; E N is an attribute name. For example, the function ators and control of duplicate elimination are omitted. sum(Ai) determines the sum of the values of attribute Ai when While we agree that the last restriction is justified for techni- applied to a relation. Aggregate functions f = agg(Ai) with cal simplification and does not constrain the applicability of the agg # count yield w as result, if the relation they are applied to transformations, we claim that to ignore three valued logic leads is empty or does not have an attribute with name A;. An aggre- to severe semantic problems. This is the case even if attributes gate function with agg = count has 0 as result when applied to may have only nonull values, as the application of an aggregate an empty relation. function like average, sum, min and max to an empty relation gives null as a result [ANSI85]. Consequently we include the 2.1 Relational Calculus treatment of null values and three-valued logic in our dicussion. Furthermore we pose no restrictions on subqueries. However, for The definition of relational calculus given in this section is technical simplification we also omit arithmetic operators and based on [Klug82]. However, the following extensions are neces- control of duplicate elimination. sary in order to be as expressive as the subset of SQL we consider, Besides their limitations, the above-mentioned approaches i.e. in order to include the treatment of null values and predicate cannot be used directly for two reasons: Fist, as has been evaluation using three-valued logic: pointed out by [K&85], the transformations of [Kim82,LeVi85] are incorrect in general: The application of the aggregate func- l First, the set 8 of comparison operators 0 has to be ex- tion count to an empty relation is not treated appropriately. Sec- tended: 8 = {=, f, <, 5, >, 2, m}. The special compar- ond, [CeGo85] translates an SQL query directly into relational ison operator z yields TRUE if the values compared are algebra. Therefore the application of optimbation techniques identical in the usual sense or both w, and FALSE other- developed for relational calculus like [ JaKo 831 is impossible and wise. The other operators are evaluated to UNKNOWN the problems associated with the processing of aggregate func- if at least one of the values compared equals w, TRUE if tions cannot be identified as easily. the comparison is TRUE in the usual sense and FALSE otherwise. In [Klug82] a precise definition of relational algebra and re- lational calculus query languages having aggregate functions is l Second, we need two new logical connectives T and I given and the expressive power of the two languages is proven which map the truth value UNKNOWN into TRUE and to be equivalent. However, the case that an aggregate function FALSE respectively. may have null as a result is not considered and the proof of equivalence is quite complex, making it difficult to discuss the Calculus expressions are defined recursively using several clas- processing of aggregate functions. ses of objects: variables, terms, formulas, range formulas and Though not directly applicable, we can make use of some of alphas. Thereof only the closed alphas (alphas containing no the techniques proposed earlier. The definition of extended relk free variables) correspond to calculus expressions. The definition tional algebra and relational calculus given in section 2 is based starts with atomic alphas, i.e. relations defined in the schema. on [Klug82]. The translation of au SQL query into extended re- Free variables, bound variables and closed objects are defined in lational calculus (section 3) makes use of the preprocessing step the usual way.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-