11. Table Operations – Implementation

Realization of DBS 11. Table Operations – Implementation Theo Härder www.haerder.de Goals - Systematic development of relational processing concepts for a single table or for several tables - Realization of plan operators Main reference: Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 11. Goetz Graefe: Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25:2, June 1993, pp. 73-170. Realization of Database Systems – SS 2011 © 2011 AG DBIS Realization of DBS Table Operations - Implementation Operations of the relational algebra - Unary operations: , Table Sort operations - Binary operations: , , , , , – Plan operators R T S Nested-loops & sort/merge join SQL queries contain logical expressions which can be mapped to the operations of the relational algebra. They are further transformed into Hash join access plans. So-called plan operators implement these logical operations Joins on type- Plan operators on a single table spanning paths Selection Distributed joins Operators across several tables Set operations Join algorithms - Nested-loops join, Sort-merge join - Hash join (classic hashing, simple hash join, hybrid hash join) - Exploitation of type-crossing access paths - Distributed join algorithms Further binary operations (set operations) © 2011 AG DBIS 11-2 Realization of DBS Plan Operators on a Single Table Selection – general ways of evaluation Table • Direct access via a given TID, via a hash method or a one- resp. multi- operations dimensional index structure • Sequential search in a table Plan operators • Search via an index structure (index table, bitlist) • Selection using several pointer lists where more than a single index Nested-loops & sort/merge join structure can be exploited • Search via a multi-dimensional index structure Hash join Joins on type- Projection spanning paths is typically performed in combination with sorting, selection, or join Distributed joins Modification Set operations • Updates are set-oriented in SQL, but restricted to a single table • INSERT, DELETE and UPDATE are directly mapped to the corresponding operations of the storage structures • “Automatic” execution of maintenance operations - to update access paths, - to guarantee clustering and reorganization etc. • Provisions for logging and recovery etc. © 2011 AG DBIS 11-3 Realization of DBS Plan Operators for the Selection Use of Scan Operators • Definition of start- and stop condition Table • Definition of simple search arguments operations Plan operators Plan operators 1. Table scan (relation scan) - Always possible Nested-loops & sort/merge join - SCAN operator implements selection operation 2. Index scan Hash join - Selection of most cost-effective index - Specification of search range (start-, stop condition) Joins on type- 3. k-d scan spanning paths - Evaluation of multi-dimensional search criteria - Use of differing evaluation directions by navigation Distributed joins 4. TID algorithm - Evaluation of all “useable" index structures Set operations - Location of TID lists of variable lengths - Boolean connection of the lists - Access to the records according to the hit list (result list) Further plan operators in combination with selection • Sorting • Grouping (see sort operator) • Special operators e.g. in Data-Warehouse applications for grouping and © 2011 AG DBIS aggregation (CUBE operator) 11-4 Realization of DBS Operators Across Several Tables SQL allows complex queries across k tables • One-variable expressions: Table describe conditions for the selection of elements from a table operations • Two-variable expressions: describe conditions for the combination of elements from two tables Plan operators • Typically, k-variable expressions are decomposed into one- and two-variable expressidltdbdiltions and evaluated by corresponding plan operators Nested-loops & sort/merge join Plan operators across several tables Hash join • General ways for the evaluation: - Nested iteration Joins on type- for each element of outer table T spanning paths o traversal of inner table Ti • O(No · Ni + No) Distributed joins • important application: nested-loops join - Merge method Set operations iterating traversals through T1, T2 • O(N1 + N2) • additional sort costs, if necessary • important application: merging join - Hashing Partitioning of inner table Ti and partition-wise loading in HT in memory. “Probing” by outer table To or its © 2011 AG DBIS resp. partitions using HT: O(p · No + Ni) 11-5 Realization of DBS Operators Across Several Tables (2) n-way joins • Decomposition into n-1 two-way joins2 Table operations • Number of possible join sequences is dependent on the join attributes chosen • Maximal n! different sequences possible Plan operators • Use of pipelining techniques • OiOptimal eval uati on sequence d epend ent on Nested-loops & sort/merge join - Plan operators - “Fitting” sort orders for join attributes Hash join - Size of operands etc. Joins on type- Some join sequences using two-way joins (n=5) spanning paths result result Distributed joins result Set operations T5 T2 T4 T5 T4 T3 T5 T1 T2 T1 T2 T3 T4 T3 T1 left-deep tree bushy tree right-deep tree Analogous proceeding in case of set operations © 2011 AG DBIS 2. Practicality test (Guy Lohman test for join techniques): Does a new technique apply to joining three inputs without 11-6 interrupting data flow between the join operators? Realization of DBS Plan Operators for the Join Join • Record-type-spanning operation: usually very expensive Table • Frequent use: important optimization candidate operations • Typical application: equi-join • General Θ-join infrequent Plan operators Imppjplementation of the join operation can process, at the same time, selections (and projections) on the participating tables R and S Nested-loops & sort/merge join SELECT * FROM R, S Hash join WHERE R.JA Θ S.JA AND PR AND P Joins on type- S spanning paths • JA: join attribute • PR and PS: predicates defined on selection attributes (SA) of R and S Distributed joins Possible access paths Set operations • Scans over R and S (always) • Scans over IR(JA), IS(JA) (if present) deliver sort sequence according to JA • Scans over IR(SA), IS(SA) (if present) if necessary, fast selection for PR and PS • Scans over other index structures (if present) if necessary, faster location of all records © 2011 AG DBIS 11-7 Realization of DBS Nested-Loops Join Assumptions • Records in R and S are not ordered according to join attributes Table • Index structures IR(JA) and IS(JA) do not exist operations Algorithm for Θ-join Scan over S, Plan operators for each record s, if PS: scan over R, Nested-loops & for each record r, if PR AND (r.JA Θ s.JA): sort/merge join execute join, i.e., write combined record (r, s) into the result set. Hash join Complexity: O(N*M) Joins on type- Nested-loops join using index access spanning paths Scan over S, for each record s, if PS: determine via access to IR(JA) all TIDs for records satisfying r.JA = s.JA, Distributed joins for each TID: fetch record r, if PR: write combdbined recor d()d (r, s ) into th e resul t set. Set operations Nested-block join Scan over S, for each page (resp. set of contiguous pages) of S: scan over R, for each page (resp. set of contiguous pages) of R: for each record s of the S-page, if PS: for each record r of the R-page, if PR AND (r.JA Θ s.JA): write combined record (r, s) into the result set. © 2011 AG DBIS 11-8 Realization of DBS Sort-Merge Join Algorithm consists of 2 phases • Phase 1: Sorting of R and S w.r.t R(JA) and S(JA) (if not already present); Table in doing so, early elimination of records not needed ( P , P ) operations R S • Phase 2: Iterating scans over sorted R- and S-records Plan operators where join is performed in case of r.JA = s.JA Complexity: O(N log N) Nested-loops & sort/merge join Special case Hash join If either IR(JA) and IS(JA) or GAPS over R(JA) and S(JA) (join index) is present: exploitation of index structures on join attributes Joins on type- Iterating scans over I (JA) and I (JA): spanning paths R S for each with two keys from IR(JA) and IS(JA), if r.JA = s.JA: fetch the records using the related TIDs, Distributed joins if PR and PS: write combined record (r , s) into the result set Set operations © 2011 AG DBIS 11-9 Realization of DBS Hash Join Simplest case (classic hashing) • Step 1: Partitioned read of (smaller) table R and construction of a hash Table operations table using hH(r(JA)) w.r.t. values of R(JA) of partitions Ri (1 i p): each partition fits into the available memory and each record satisfies PR Plan operators • Step 2: Probing for records of S using PS; if successful, execution of join Nested-loops & • Step 3: Repeat steps 1 and 2 as long as R is exhausted sort/merge join Construction of hash tables and probing Hash join Scan over R; building hash tables Hi (1 i p) one at a time in memory Joins on type- H spanning paths R 1 Scan over S with probing of S Distributed joins H1 . Set operations H R p Scan over S with probing of S HP Complexity: O(p · N) Special case R fits into memory: one partition (p = 1) 11-10 © 2011 AG DBIS a single scan over S is sufficient! Realization of DBS Hash Join (2) Partitioning of R with hp(r(JA)) Table operations #records / JA-value Plan operators Nested-loops & sort/merge join JA 0 100 Hash join #records / JA’-value hp(r(JA)) Joins on type- spanning paths JA’ Distributed joins 0 0.33 0.66 1 R R R Set operations 1 2 3 © 2011 AG DBIS 11-11 Realization of DBS Hash Join (3) Partitioning Table • Partitioning of R in subsets R1, R2, ..., Rp: operations a record r of R is in Ri, if h(r) is in Hi Plan operators R Nested-loops & sort/merge join Hash join Joins on type- .

Load more