Arxiv:1208.0084V1 [Cs.DB] 1 Aug 2012 Functional Dependencies

FundamentaAB CD EFdeF DependencieB Jaroslaw SAlBCDEaF Parke GodfreyF Jarek GryA York UnBversBEyF CompuEer SCBenCe & ngBneerBngF ToronEoF Canada and IBM CenEer for AdvanCed SEudBesF ToronEoF Canada '(sAlBCDEF godfreyF (arek)*Cse.yorku.Ca ABSTRACT query1s author could not leave DEABteB out of the group,by 2 Dependencies have played a significant role in database design for even if he realizes it would be better to 2 because it is stated in the many years. They have also been shown to be useful in query select. The query optimizer could, however, use an index scan optimization. In this paper, we discuss dependencies between to have the tuple stream in yeAB, Fonth order to accomplish the lexicographically ordered sets of tuples. We introduce formally gBoEp by on yeAB, DEABteB, Fonth, AD it recognizes that the concept of order dependency and present a set of axioms yeAB, Fonth and yeAB, DEABteB, Fonth offer the same (inference rules) for them. We show how query rewrites based on partition. This is done by query optimizers today 2 given the these axioms can be used for query optimization. We present functional dependency /D) information that Fonth 3 several interesting theorems that can be derived using the DEABteB is available to the optimizer 2 by rewrite. inference rules. We prove that functional dependencies are subsumed by order dependencies and that our set of axioms for /or the query above, the rewrite might still not be applied, since order dependencies is sound and complete. the query specifies the answers to be ordered by yeAB, quarter, Fonth. The /D that Fonth 3 DEABteB is noF 1. ABTRCDECTACB logically sufficient to eliminate DEABteB from the order,by, as it Consider the following SQL query (in Example 1). was to eliminate it from the group,by. Since a query plan must guarantee the order,by, it li-ely will include a sort operator for EXAMPLE 1. yeAB, DEABteB, Fonth, after all. select D.yeABC D.DEABteBC D.FonthC sEF(S.sAles) As totAl To see that the /D does not suffice to eliminate DEABteB from fBoF DAtes DC SAles S the order,by, imagine the values for DEABteB were the strings wheBe D.dAte_id = S.dAte_id DArCF, Cecond, FhArd, and DourFh. Data would be lexicographically And D.yeAB between 2001 And 200 ordered as DArCF, DourFh, Cecond, then FhArd4 .f course, we intend gBoEp by D.yeABC D.quarterC D.Fonth that values of DEABteB are, say, 1, 2, 3, and 7, so the data would oBdeB by D.yeABC D.quarterC D.Fonth order naturally as by date. It is unfortunate, then, that DEABteB is, in fact, redundant in this query) in the order,by also, but that In the schema, DAtes is a dABenCAon table with a row per day, the optimizer does not have the means to eliminate it. and SAles is a very large DEcF table recording all individual sales. Each has a surrogate,valued column dAte_id, which is the What is missing is the semantic information that Fonth orderC primary -ey for DAtes. In the DAtes dimension table, each row DEABteB, which is more than 8ust that Fonth DuncFAonElly describes a given day with explicit columns as yeAB, DEABteB, deFerBAneC DEABteB. This states that as values rACe from one Fonth, and dAy that describe the natural date values and tuple to another on Fonth, they must rACe, or stay the same, from additional columns that qualify that day, such as whether it is a the one tuple to the other on DEABteB that is, the values do not wee-end day or holiday). descend from the one tuple to the other on DEABteB). These have been called order dependencAeC .Ds), in contrast to Assume we have a tree index for DAtes on yeAB, Fonth, dAy. arXiv:1208.0084v1 [cs.DB] 1 Aug 2012 functional dependencies. .ur ob8ective is to bring reasoning about This index cannot help in a query plan, however, to accomplish order dependencies into the query optimizer. A query plan for the the group,by because DEABteB intercedes. .f course, DEABteB query above could then eliminate DEABteB from boFh the order, is logically redundant here, as Fonth which follows it in the by and the group,by clauses, and the index on yeAB, Fonth, group,by) functionally determines DEABteB. /irst quarter dAy might then provide for an efficient plan with no need for a encompasses the months of 0anuary, /ebruary, and March, second sort operator. quarter, the months of April, May, and 0une, and so forth.) The The notion of order dependencies can be greatly generalized, and the potential use of them in query optimization shown to be vast. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are The relationships between ordered sets have been explored in the not made or distributed for profit or commercial advantage and that copies past and several different notions of order have been considered. bear this notice and the full citation on the first page. To copy otherwise, to In this wor-, we consider 8ust lexAcogrEphAcEl ordering of tuples, republish, to post on servers or to redistribute to lists, requires prior specific as by the order,by operator in S$L, because this is the notion of permission and/or a fee. Articles from this volume were invited to present order used in S$L and within query optimization for tuple their results at The 38th International Conference on Very Large Data Bases, streams. August 27th - 31st 2012, Istanbul, Turkey. Proceedings of the VLDB Endowment, Vol. 5, No. 11 Copyright 2012 VLDB Endowment 2150-8097/12/07... $ 10.00. 1220 The contribution of this paper is to present an ExAoBEFAzEFAon for could be changed to BulFA-CeFC easily, with no consequences to our order dependencies, analogous to Armstrong1s axiomatization for axiomatization. /Ds 91:. This provides a formal framewor- for reasoning about .Ds. There are two reasons for one to pursue an axiomatization. Ta le 1. Botational conventions. 1. The axioms provide AnCAghF into how dependencies RelEFAonC behave 2 and patterns for how dependencies logically A capital letter in bold italics represents a relEFAonA R. follow from others 2 that are not easily evident A small letter in bold represent a relEFAonEl AnCFEnce reasoning from first principles. E FEble)A r. 2. A Cound and coBpleFe axiomatization is the first We use capital letters to represent single EFFrAbuFeCA necessary step to designing an efficient AnDerence A,B,C. TupleC are mar-ed with small letters in italicsA C, F. procedure. SeFC .ur axioms for .Ds help us explore beneficial query rewrites. Calligraphic letters stand for CeFC oD EFFrAbuFeA F, , . We show how they can be cast as a new type of integrity We use proximity for unAon of setsA F is shorthand constraint to be used in query optimization. We derive theorems for F. Li-ewise, AF or FA, where F is a set of based on our axioms, which illustrate surprising inferences and attributes and A a single attribute, stands for F {A. equivalences over .Ds, and which can provide for powerful query rewrites. While .Ds for databases have been explored before, we Also F denotes the pro8ection of the tuple F on the present the first general axiomatization for them. We prove the attributes of F, while is the shorthand for . CoundneCC of the axioms. We demonstrate that Armstrong1s LACFC axiomatization for /Ds is subsumed within our axiomatization for .Ds. In this sense, .Ds can be thought of as a generalization of Bold letters stand for lACFC of attributesA , , . Note list /Ds.) We then prove the coBpleFeneCC of the set of axioms. X could be the empty list, 9:. Wor-ing with .Ds is more involved than with /Ds because the We use square brac-ets to denote a listA 9A, B, C:. The order of the attributes matters. Thus, we must wor- with lACFC of notation 9A B T: denotes that A is the heEd of the list, and attributes instead of with CeFC. This necessarily complicates our T is the FEAl of the list, the remaining list with the first axioms 2 compared with Armstrong1s axioms for /Ds 2 and the element removed. proofs of our theorems. Proximity is used for concatenation of lists of attributesA is shorthand for . Li-ewise, A and A stands CFtline. In Section 2, we present .Ds) formally. We provide respectively for [A] and [A], where is list of bac-ground, our notational conventions, and definitions for .Ds attributes and A is a single attribute. AB denotes [A, B]. Section 2.1). We show from where .Ds in databases naturally denotes some other permutation of elements of list . arise Section 2.2). We demonstrate a number of effective ways .Ds may be used in query optimization Section 2.3). We discuss DE/INITI.N 1. operator ) Let X be a list of attributes, C and F be a query optimization technique with .Ds that we have two tuples in relation instance r. .perator is defined as followsA implemented as a prototype in IBM DB2 918:, and our ongoing wor- with these techniques. In Section 3, we introduce the where X $ 9A B T: axiomatization for .Ds Section 3.1), and we prove the soundness if ! ) of the axioms Section 3.2). We derive a collection of theorems or if " ) and T C 9: or # #)) using our axioms 2 which we use in the proof of completeness 2 which illustrate the utility of our axioms Section 3.3). In Section In this paper, we assume ECcendAng Asc) order in the 7, we prove the completeness of the axiomatization.

Load more