Translating and Optimizing SQL Queries Having Aggregates
Giinter von Biiltzingsloewen
Forschungszentrum Informatik an der Universitiit Karlsruhe Haid-und-Neu-Str. 10-14, D-7500 Karlsruhe, West Germany
Abstract is at least ae powerful as SQL. Two well known relational query languages that have a sound theoretical foundation are relational In this paper, we give a precise definition of the semantics of algebra and relational calculus. As SQL is more closely related to SQL queries having aggregate functions, identify the problems relational calculus, we define the semantics of SQL by translation associated with the optimisation of such queries and give some into calculus. However, relational calculus has to be extended in solutions. The semantics of SQL queries is defined by translating order to deal with aggregate functions and null values. The op them into expressions of an extended relational calculus (exten- timiation of SQL queries is thus reduced to the optimiaation of sions are necessary for a correct treatment of aggregate functions relational calculus queries, which makes the following results also and null values). The discussion of the optimization problems is applicable to other query languages based on relational calculus based on a new transformation of a relational calculus expres- (e.g. QUEL). sion into relational algebra. By investigating the transformation To develop an operator tree model for SQL and extended of aggregate functions we are able to identify two major prob- relational calculus, an algebraic query representation has to be lems: correct integration of the values of aggregate functions found; the basic problem here is to identify a suitable set of al- applied to empty relations and unnecessary computation of un- gebraic operators. A possible choice is relational algebra with needed function values. To solve these problems we propose to aggregate functions developed by [Klug82], with an extension to interpret an aggregate function applied to a calculus expression cover null values. To rtudy the problem whether thii is a good with some free variables as a function on the attributes of these choice, we present a translation of relational calculus queries variables that are referenced in the expression. Doing so, we are into relational algebra. Investigating the translation of aggre- able to develop several new processing strategies that should be gate functions, we are able to identify two major problems of considered by an optimiser. the representation in relational algebra: correct integration of the values of aggregate functions applied to empty relations and duplicated computation of certain function values. 1 Introduction To solve these problems we propose to interpret an aggregate function applied to a calculus expression with some free variables The SQL query language has become the standard relational as a function F on the attributes of these variables that are ref- manipulation language, as is reflected by ongoing standardisa- erenced in the expression, and to include selection using such tion activities [ANSI85]. An important feature of SQL are ag- a function into relational algebra. We develop a general pm gregate functions like sum, average, minimum etc. However, the cessing strategy using this approach and present several special processing of queries with aggregate functions is not well un- instances of the general strategy that are intended to approxi- derstood. A formal definition of the semantics of SQL queries mate the minimal representation of F (i.e. each function value having aggregates is still lacking, as is a unified operator tree is represented only once for a ret of arguments). model covering this class of queries [Kies 861. This paper is an The major contributions of our work are threefold: first, the attempt to solve these problems. extension of relational algebra and relational calculus to cover In order to give a formal definition of the semantics of SQL a significant subset of SQL. Second, the translations from SQL queries, we have to translate them into a formal language which into calculus and from calculus into algebra, showing that the extended versions of relational algebra and relational calculus have the same expressive power. Third, the development of a new general processing strategy for aggregate functions, which may form the foundation of an operator tree model for the class Permission to copy without fee all or part of this material is of queries considered. granted provided that the copies are not made or distributed for While this work was primarily motivated by [K&85] who direct commercial advantage. the VLDB copyright notice and the illustrated the semantic problems associated with efficient prc+ cessing of queries having aggregates, several other papers are title of the publication and its date appear, and notice is given that relevant to our discussion. [Kim82,CeGo85, Levi851 are con- copying is by permission of the Very Large Data Base Bndow- cerned with the transformation of an SQL query into some kind ment To copy otherwise, or to republish, requires a fee and/or spe- of normal form which is more suitable for further optimiiation. cial permission from the Endowment. [Kim821 describes a translation from SQL into SQL, transform-
Proceedings of the 13th VLDB Conference, Brighton 1987 235 ing one complex query into several simpler ones. [Levi851 pro- A schema is a sequence
236 Proceedings of the 13th VLDB Conference, Brighton 1987 r Terms T: I the X-values and the associated function value for each group (o q,a) E IQ denotes concatenation). It can be general&d such that several 4,P) = c functions are applied to each group (&x,(~~,.J~,,). vi[A](Z,P) = B(ui)[Al f(a)(Z, B) = f(4Zl a)) 2.2 Relational Algebra Formulas F: +(I, a) E {~(TRuE), O(FALSE), ~(UNKNOWN)) The definition of the set E of relational algebra expressions over our fixed schema and the value e(1) of an expression on ) = tl(z,a)etz(z,a) tl& I,B an instance I of the schema is given in table 2. The following ($1 v $2)(1, B) = m4+l(Zl Bh +a(Z, 81 notations are used in the definition: c denotes an element of fi. ($1 A $2W, /3) = mid+l(Z, B), +2(Z, PII e, cl and e2 denote relational algebra expressions. A and B are (-w, a) = 1 - w, B) attribute names, X and Y sets of attribute names of equal rise. W)(Z, PI = l@l P)J (T+)(Z, P) = I’@, 81 Relational Algebra Expressions: e(Z) E R ((%(ui))+)(Z,P) = m=40,m=m,u,~) (/ItI,BI~ilTl)) L&ml cz=c ((Vri(Vi))+)(Z,8) = min{l,minm,(m) clr(~~BI~iPI)~ Base Relation &(Z) = ri Range Formulas RF: Projection TX(e)(Z) = {T[X] : T E e(Z)} ri(ui)(Z,/3) E {l(TRUE),O(FALSE)} Union (el U es)(Z) = ei(Z) Ue2(Z) ail V V CYik Vi I, @ Diflerence (el \ e2)(Z) = cl(Z) \ e2(l) = 1, ifp(Vi) E ai,(Z,/3) U...UC+(I,P) Product (el x ez)(Z) = cl(Z) x e2U) 0, otherwise &atriction uAeB(e)(Z) = {T E e(Z) : T[A]BT[B] = 1) Alphas A: Restriction’ uLes(e)(Z) = {T E e(Z) : T[A]BT[B] E (1, i}} 4,P) E R Aggregate kx,~ (e)(Z)= {WI 0Y : T E e(l) A &(Z,/3) = ri formation &x.j>(a)(Z,B) = {Wlou: TEa(Z,8)A lel n e2)(z)re; W’ E 4’) : W4 = Vl~~ Intersection (r) n e2(l) y = f{T’ E a(Z, a) : T’[X] G T[X]}} Selection aAec(e)(Z) = {T E e(Z) : T[A]gc] = 1) ((h,..., L) : n(u1) ,...,*m(4:O)(Z,B) Selection’ u>o,(e)(Z) = {T E e(Z) : T[A]Bc E (1, $}} = {(h(Z,B’),..., L(Z, a’)) : w, a’) = 1 A Join (elW’B]eN) = oAe&l x 4(Z) B’= B[Vl/Tl,... ,h/Tm] A ri(Vi)(Z,B') = 11 d Table 2: Relational Algebra Expressions Table 1: Semantics of Calculus Objects
In order to be at least as expressive as SQL, relational al- Alphas: Alphas correspond to relations. The set A of alphas is de- gebra also has to be extended to include the h&dling of null fined as follows. If & is in the schema, then & EA (atomic values. This is accomplished by introducing modified versions alphas). If tl,..., t ,, are terms not containing aggregate u’ of restriction and selection, which retain tuples for which the functions, tl(V1), . . . r,(v,) are range formulas and (/, is a corresponding predicate yields UNKNOWN. Thii extension cor- formula, then responds to the introduction of I and T as logical connectives into relational calculus. In fact, instead of defining new operators we could aa well introduce the logical connectives in the restric- (t 1,. .., tn) : rl(v1) ,..., rm(v,) : (/, E A. tion/ selection predicate. Selection, intersection and join are or This definition is recursive, baaed on range formulas that can be defined in terms of the other of the above operators. consist of atomic alphas. If a is a closed alpha, feAgg and Xcattrs(a), then 3 Translating SQL Queries into Rela- d
In contrast to [Klug82], we do not allow aggregate functions As we are primarily interested in the translation and opti- in the target list of an alpha. Instead calculus includes the al- misation of queries having aggregate functions, we consider only gebraic aggregate formation operator 4. The reason for these a relevant subset of SQL, which is still more powerful than the modifications are that translation of an SQL query into calculus subsets covered in earlier papers [CeGo85,LeVi85,Kim82]. The as well as translation of a calculus expression into algebra become only restrictions are that attribute domains are restricted to the easier. However, the expressive power remains the same. set fi of natural numbers including w, terms are restricted to The semantics of calculus objects is defined in table 1. Ob- contain no arithmetic operators and duplicates may not be re- jects of relational calculus are interpreted with respect to a sche- tained in the result of a query or subquery (yhich actually does ma instance I and a mapping /3 (called a valuation) which asso- not restrict the formulation of subqnery predicates). The effect ciates with each variable vi EV a tuple a(vi) = 2’. B[vi/T] is of these restrictions is a technical simplification of the transla- defined to be the valuation, which yields T when applied to Vi tion into calculus and algebra, the extension to complete SQL and equals a(v) when applied to another variable v. being straightforward. The aggregate formation operator d
Proceedings of the 13th VLDB Conference, Brighton 1987 237 Sub Query SQ Predicate Calculus Formula
SELECT f EXISTS SQ: TRUE FROM n(vdr...,rn(vn) WHERE P, t 0 SOME SQ: tf?f’(* : ri(vi), . . ., r,(v,) : Pw)
SELECT ( 1 EXISTS SQ: ( 3rl (VI ), . . ., 3r,(v,))(lP,) FROM 4;($ ,...,r,(v,)f WHERE P, t6’SOMESQ: (3ri(vi),..., 3rn(vn))(lPw A th)
SELECT t. EXISTS SQ: ( 31 ( vl ) , . . . , 3r,(vn)) (I(Pw A PA)) FROM ;(vl),...,r,(v,) WHERE t B SOME SQ (3ri(vi), . . . , 3rn(vn)) (I(P, A PA) A tet:) GROUP BY vi:!& 1] t**-,Vir(AirI HAVING Ph ao= (* : rl(v~),...,r,(v~) : Pw[Vi/4] A vi1 [A;,] E vi, [Ail] A . A Vik[Ai,] 2 4,[&,] ) P; = Ph(f/f’(oo), f E ph], t: = blf/f’(aO), f E t.1, Table 3: ‘Ikanslation of Snbquery Predicates
SQL Query Calculus Expression
SELECT fi,...,fl ,...,r,(vJ (fi,. ..,f;)( * : n(w), . . .,m(v,) : Pw)
SELECT h,..., t1 (# f) ,...,rJh) (ti ,..., ti) : rl(v1) ,..., rn(v,) : P,
SELECT tl,...,tj FROM n (~1) ,...,rn(vn) (&...,ti) : a(v) : Pi WHERE GROUP BY $[A. (1] ,***gvi,[Ai,] CI = (&(A,~ ,...,+),(f; ,...,jg>( * : rl(vl),...,rn(vn) : Pw 1); HAVING Ph (G,..., t;,ph) = (tl,***, t1,Ph)[fi/V[k + ;]9Vij[AijI/V[jll; (fir..., fm aggregate functions in tl, . . . , tl, Ph) Table 4: Query ‘Ikanslation
SQL Query Calculus Expression
SELECT sum(vl l]), vi 3 FROM WI) (VIZ], v[l]) : a(v) : v[3] > 0 WHERE P, GROUP BY vi[3] Q = 4( * : Rl(vl) : Pw ) HAVING maz( v1 [4]) > 0
P, = -7 ( v1[2] = SOME SELECT P,,, = - (3Rs(vl)) (I min(t)(ao) > ~(31 A ~112) = maz(l)(ao)) FROM zz’ll) GROUP BY &I a0 = ( * : R&I;) : v: [3] > 431 A y[2] q v;[2] ) HAVING min(v2[3]) > vl[3] )
Example 1: l’kanslation of SQL Queries
Proceedings of the 13th VLDB Conference, Brighton 1987 238 l Associate with each relation in a FROM clause a unique The calculus expression resulting in the translation of an SQL variable and extend each attribute reference to contain the query gives the same result as the query when applied to the variable it refers to. same schema instance I. As the structure of a calculus expree- sion directly relects the structure of the original SQL query, l Transform each predicate into one of the three basic pred- the translation given in Table 4 should be self-explanatory. The icates. For instance, an *in predicate* can be reduced alpha cz built in the translation of the third query type is the to a cquantif ied predicates. result of evaluating FROM, WRERE and GROUP BY clause of the SQL query to which the aggregate functioue contained in To translate an SQL query into a calculus expression, the qn- the HAVING clause and the SELECT clause are applied. At- tactical differences between SQL and calculus (subquery predi- tribute references and aggregate functions contained in the latter cates, GROUP BY - HAVING clause and position of aggregate constructs have to be substituted such that they reference the functions) must be resolved. Thii is accomplished with two kinds corresponding attributes of o (In example 1, eum(vr[l]), ur[3] is of translations, i.e. translation of subquery predicates and tram+ substituted by u[2], u]l]). lation of the query (see Tables 3,4). In SQL, aggregate functions are represented in the form j = agg(ui[&]), where vi is bound to a relation ri in the FROM 4 Equivalence of Extended Relational clause of an SQL query or subquery (FROM ri(ur), . . . , r,,(u,)). j is applied to the result of the evaluation of the FROM and Algebra and Relational Calculus WHERE clauses of the query or subquery, which is a subset of In order to show that our extended relational algebra and r = tl x ... x rn, or to partitions of this subset (in case the relational calculus have the same expressive power, we have to query or subquery contains a GROUP BY clause). Hence j is prove the following two propositions: translated into j’ = ogg(Ai), where Ai names the attribute of r that corresponds to the attribute Ai of ri (In tables 3,4 this is (1) For every algebraic expression e EE there is a closed alpha indicated by j and j’). Q EA with e(Z) = a(Z) for all schema instances 1. The translation of eubquery predicates has the following prop erty: The calculus formula yields the same truth value as the , (2) For every closed alpha a EA there is an algebraic expres- original SQL predicate when applied to the same schema in- sion e EE with a(Z) = e(Z) for all schema instances I. stance I and the same valuation ,9 for the free variables of the Proposition (1) can be proven in the usual way (e.g. [Klug82]). calculus formula resp. SQL predicate. In the translation, three Hence we will prove proposition (2) only. It has been proven types of subqueries are distinguished: before by [Klug82] for query evaluation using two-valued logic without null values. His approach is to produce for all terms, l The first type yields exactly one value as the result of an formulas and alphas an equivalent algebraic expression. How- aggregate function. Hence the caxiatr predicate> is al- ever, his proof is unnecessarily complicated, as it is sufficient to ways TRUE and the
tl, t2 are not of the form f(a); 7’ equals 7 except that terms of the form u[A] are mapped on the corresponding attribute of e,
Table 5: ‘Lkanslation rules for the extended selection operator
Lemma 1: Let a = (tr,. . .,t,,) : rl(u:), . . . , rm(&) : cl, be an alpha. Then
u7,tOfb)(d(1) = ~~ttrs(e)“71t)eA,(ej)(l) and u&j(a)kW) = ~~ttrs(e)u~(t)eA,(ej)(Z) with ej = 4.httr+),f >(e‘4 U (e\ kr.te) (e a)) x f(e) and (e a) = “(attrs(e),,l(tl),...,~~(t,))u~l,~(e x e,, x t. x e,,), where 7’ equale 7 for the variables free in a and associates with $[A] the corresponding attribute in e,, X X er, and A, names the attribute of ej that contains values of the aggregate function. Proof: For T E e(r) let &- be the following valuation: &(Vi)[A] := T[7(vi[A])], for each free variable ui in a. Note that with this definition 4,/h-) = ~(71(t1)....,71(tn))u~~,~({T} x er, x .. x er,)(Z) and(e a)(Z) = ~(attr~(e),7’(t,),....71(t,))u?l.~(e x er, x . . . x er,N) = UTEe(l)@‘lx 4, P-r) Now
v(Z) = Lttr+j,j>(e~ a)(Z) U (44 \ rattr+)(e. a)(l)) x Ml = ~
Calculus Query Relational Algebra Expression
(ulll]) i RI(Q) : VI(~) = f(a) ~(7(~~111))~,,~~121=j(~)(R1) = ql)u2=&)
,z = vzll] : R~(Y) : ~1[3l~v2[31 ej = ~~~1,2.3~,j~~(1.2,3,~~(v,[11))~~~,~1[s~ev,1s~(R1 x R2) U (RI \ R(1,2,s)u?l,v,131eVt1s1(Rlx R2)) x f(e)
= ~<(1,2,s).r>91,2,3,r)Oges(R1 x R2) U (RI \ ~(w,s)~se6(Rlx Ra)) x f(e)
Relations: Rr(1,2,3), R41,2,3); 7(Vl [i]) = i, 7’ 01 i =iand7’mi =i+3
Example 2: Translation of a calculus query into algebra
240 Proceedings of the 13th VLDB Conference, Brighton 1987 For the following steps assume that for any closed alpha a Nested iteration is in general inefficient, as has been shown contained in the one currently translated a corresponding by [Kim 821. Algebraic processing has the disadvantage that algebraic expression e, has been constructed. computation of ef leads to duplication of work if ] {f(a(Z, a)) : (1) T E e(Z)} ] c ] e(Z) ], as then several function values are com- puted more than once. Specially, to compute {To f(fl) : T E (2) TO a range ri = a;, V . . V oib corresponds , e(Z)Aa(Z, /3r) = 0) is wasted, as f is constant. It is however nec- e,, = e,,l U U e,+. essary to compute this set if we want to stay inside relational al- gebra. What we really would like to do in this case is to find some (3) To a = ((tl,...,tn) : rl(ul), . . . , rm(vm) : $1 corm- kind of minimal representation of the aggregate function com- sponds bined with an efficient method to obtain the value corresponding ea= ?r(7(t,),...,7(t,))u?.~(erl x x erJr to a tuple T E e(Z). To state this problem more formally, we where 7 associates with each term v;[A] the corresponding define for an arbitrary but fixed expression u7,tej(a)(e)(Z) the attribute of e,, x x e,,. following function F, which computes f(a(Z,&)) for T E I@, where k is the number of attributes of e(Z): In order to translate an extended selection operation u7,e(e) into a regular algebraic expression, we have to break down the F : fi” -+ A, F(T) = f(a(Z, &)). operation into parts, thereby reducing the complexity of the se- lection predicate $. Most of the translation rules necessary to In the minima; representation of F each function value would achieve this reduction are quite simple. These rules are given in be represented exactly once. The basic idea is to approximate the table 5. (As (T) maps UNKNOWN to TRUE, a selection with minimal representation by grouping tuples with identical func- a predicate (T$) yields exactly the tuples where J, is evaluated tion values together and representing the function value only to TRUE or UNKNOWN. A similar rule holds for (I).) once for every group. Following this approach we can present the Note that the translation of a negation (7) transforms u into following general processing strategy for aggregate functions: u’ and vice versa. This has been a source of inconsistency in earlier approaches that did not include u’. For example, the General Processing Strategy: translation of an all subquery in [CeGo85] is performed by means of a negation. Therefore it is incorrect, if the predicate in the (1) Define an equivalence relation I on a set WHERE clause of the subquery contains an aggregate function Rz{TEe(Z): a(Z,/Sr)#B} that may return the null value as a result. The only nontrivial rule is concerned with the translation of such that an extended selection where the selection predicate contains an T ET’ + F(T) = F(T’) aggregate function (i.e. an expression 07,te,(al(e)). The idea behind this rule is as follows: We construct an algebraic expres- The corresponding equivalence classes are sion el representing {To f(o(Z,/?r)) : T E e(Z)}, i.e. to each f’ = {T’ E R : T I T’}, T E R. tuple T E e(Z) the corresponding function value is attached, and obtain the result by applying selection and projection to “f ap (4 Represent A& = {~+‘(T):TER} propriately. in the form of a relation mF with a set Ah of attributes Basically, cl is constructed by applying an aggregate for- mation operator to an expression (e a) representing the union corresponding to p, and an attribute AF corresponding to UTEefl)W x a(Z,kr), i.e. each tuple T E e(Z) is concate- F(T). nated with the tuples of a(Z,&). However, this represents only Use rnj to compute c~~,~s~t~l(e)(Z) in one of the following {Toj(a(Z, &-)) : T E e(Z)Aa(Z, /IT) # B}, as for empty relations (3) ways: a(Z, /?r) no tuple is contained in the union. Hence, ej will con- tain a second part representing (Tof(0) : T E e(Z) A(z(Z, /3~) = (3.1) Generally, we can compute F using mF and integrate 0). Formally the rule is presented in lemma 1. The translation selection using the function F into relational algebra: of a calculus query containing an aggregate function into algebra is illustrated in example 2. if T E rA&(mF) otherwise 5 Processing Strategies for Aggregate Functions
The purpose of this section is to develop a general strategy %,W(a)k)(z) = o,(t)eF(attr,(e))(e)(z). for the processing of aggregate functions, i.e. of expressions of That is, we compute the result by repeatedly looking the form u7,tej(p)(4(Z) ( we adopt the notation used in lemma up the value of in mp. 1) and to present several special strategies following the general F approach. Two obvious processing strategies are the following: (3.2) If f(0) = UNKNOWN, i.e. if f #COUNT, we can use mF directly to compute the desimd result, as all l Nested Iteration: Determine a(Z, &) for each T E e(Z), necessary function values are represented in mp: apply the aggregate function and test the selection predi- cate.
. Algebraic Processing: Determine {To f(a(Z, &)) : T E %.tej(o)te)fz) = ~ottrE(e)u7(t)eAru=~~=)=Ak(dz) xmF) e(Z)} as in lemma 1 and apply selection and projection. proceedings of the 13th VLDB Conference, Brighton 1987 241 According to this general strategy, relational algebra has tcs (1) R = {TcNk: V441 E ~(7’(ti,).....7’(t,~)) be extended to allow equivalence classes and selection using a u7~,tif(erlx . . . x kJ(l)l, function, where the mechanism to identify a certain equivalence T e T’ w T[f’ree(a)] = T’[free(a)]. clam depends upon the special equivalence relation chosen. We are now able to define special processing strategies by (2) mF = d<(l ....,kk).f>~~Vlt,l ).....7'(t,,).7'(tl),...,~'(t-)) specifying the choices made in steps (1) and (2). In the remain- u,~.d+, x . - - x +J(Z)- der of this section, we will present several strategies with the Note that R and mF are defined without using e. Con- corresponding choices. sequently mF can be determined independent of e. The correctness of this strategy follows from the fact, that R > Strategy I: Miniial representation of F {T E e(Z) : a(Z,fi) # 0) because a(Z,&) # @only for In the minimal representation of F each function value is tuples T[free(a)] matched by some (ti,, . . ., ti,). For the represented exactly once: case f(g) = UNKNOWN, strategy 3 has been presented by (Kim82). (1) R = U’ E 4l) : +,Bt) # 01, TIT’ - F(T) = F(T’). (Rx.) If V equals ‘ =‘, we can compute the query of example 2 in the following way: (2) It is an open problem to determine the minimal representa- tion rn~ corresponding to the equivalence relation defined R = {T E Ns : T[3] E qs,(Rz)}, in (1). Therefore the following strategies intend to approx- T n T’ - T[3] = T’]3], imate the minimal representation. mf = ~ If r(e) = UNKNOWN, we can use ml directly: gpi-values and performing a binary search. u7,va12)+x)(R1) = ~(1,2,3)~2&%~4(Rl X ml). Strategy 4 can be generalised to deal with arbitrary conjunc- tions and disjunction of range predicates uij [&]6&, . In or& Strategy 3: Conjunctive equality predicate to accomplish this, GP has to be defined as the cross product of Ifthepredicate$inaistl, =~‘A~i~[Ai~]=ti~r\...h sets GPj, determined as above for each individual range predi- v<~[A~,] = ti,, where Jl’,til, . . , , ti, do not contain any VU& cate. ables free in a, we can formulate the following strategy Strategies 2-4 should be considered by an optimiser to pro- (here 7’ associates with the variables vi,. . . , v& of a the cess queries having aggregates more efficiently. Whether one of corresponding attributes of (e,, x . . . x e,,)): the strategies 3,4 is superior to strategy 2 depends on the rise 242 proceedings of the 13th VLDB Conference, Brighton 1987 of mF (the number of function values that are represented) and References on the complexity of the computation that is necessary to deter- mine mp. The optimizer should contain corresponding decision [ANSI851 ANSC X3H2, (draft proposed) American National Standard procedures. Database Language SQL. Washington, Feb. 1985 (CeGo85] S. Ceri, G. Gottlob: Tranrlating SQL into Relational Alga bra: Optimization, Semantics and Equivalence of SQL Queries. IEEE 6 Summary and Future Work Trans. S.E., April 1985, pp. 324-345 [Codd72) E.F. Codd:’ Relational completeness of data base rablan- To recapitulate, we have extended relational algebra and re- guages. In Data Base System, R. Rustin, Rd., Prentice Hall, Engle- lational calculus to cover a significant subset of SQL. We gave a wood Cliffs, NJ., 1972 translation from SQL into relational calculus, thus obtaining a [JaKo83] M. Jarke, J.Koch: Range Nesting: A Fast Method to Evlrl- uate Quantified Queries. Proc. ACM SIGMOD 1983, San Jose, May formal definition of the semantics of SQL. We presented a trans- 1983, pp. 198-208 lation from relational calculus into relational algebra, which was [JaKo84] M. Jarke, J. Koch: Query Optimisation in Databara Syr- a good foundation to study the processing of aggregate functions. terns. ACM Computing Surveys, June 1984, pp. 111-152 Finally, we developed a new processing strategy for aggregate [Jarke85] M. Jarke: Common Subexpresrion Isolation in Multiple functions. Query Optimization. In Query Processing in Databaw Syrtemr, W. Our results can be applied in the following areaz, thus forming Kim, D. Reiner, D. Batory, Eds. Springer, New York 1985, pp. 191- a foundation for more work in these directions: 205 [Kies85) W. Kiessling: On Semantic Raefa and Eficient Processing of l In [ANSI85], the semantics of an SQL query is defined by Correlation Queries with Aggregates. Proc. VLDB 1985, pp. 241-250 nested iteration evaluation: A subquery is completely eval- [Kim 821 W. Kim: On Optimizing an SQGlike Nested Query. ACM uated for each tuple of the outer query block. This has been TODS, Sept. 1982, pp. 443-469 directly implemented in System R for example [Seli79]. [Klug 821 A. Klug: Equivalence of Relational Algebra and Relational However, the direct implementation of this feature is in Calculus Query Languagea Having Aggregate Functions. Journal of the ACM, Vol. 29, No.3, July 1982, pp. 699-717 general inefficient [Kim 821. Therefore defining the seman- [LsViSS] C. Le Viet: Translation and Compatibility of SQL and QUEL tics of an SQL query by translation into relational algebra Queries. Journ. Inf. Proc., Vol. 8, No. 1, 1985, pp. 1-15 and relational calculus opens up new optimization oppor- [Maie83] D. Maier: The Theory of Relational Databases. Pitman, tunities: London, 1983 [Seli79] P.G Selinger et. al.: Accers Path Selection in a Relational - The optimizer can investigate the whole query and is Database System. Proc. ACM SIGMOD 1979, Boston, May 1979, pp no longer constrained to look at one subquery at a 23-34 time. - The optimizer can use the broad body of knowledge Grammar for the SQL Subset developed for the optimization of relational calculus and relational algebra queries (see [ JaKo85] for a sur- vey and further literature). < query> : := 8NUcT Proceedings of the 13th VLDB Conference, Brighton 1987 243