11. Table Operations – Implementation

11. Table Operations – Implementation

Realization of DBS 11. Table Operations – Implementation Theo Härder www.haerder.de Goals - Systematic development of relational processing concepts for a single table or for several tables - Realization of plan operators Main reference: Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 11. Goetz Graefe: Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25:2, June 1993, pp. 73-170. Realization of Database Systems – SS 2011 © 2011 AG DBIS Realization of DBS Table Operations - Implementation Operations of the relational algebra - Unary operations: , Table Sort operations - Binary operations: , , , , , – Plan operators R T S Nested-loops & sort/merge join SQL queries contain logical expressions which can be mapped to the operations of the relational algebra. They are further transformed into Hash join access plans. So-called plan operators implement these logical operations Joins on type- Plan operators on a single table spanning paths Selection Distributed joins Operators across several tables Set operations Join algorithms - Nested-loops join, Sort-merge join - Hash join (classic hashing, simple hash join, hybrid hash join) - Exploitation of type-crossing access paths - Distributed join algorithms Further binary operations (set operations) © 2011 AG DBIS 11-2 Realization of DBS Plan Operators on a Single Table Selection – general ways of evaluation Table • Direct access via a given TID, via a hash method or a one- resp. multi- operations dimensional index structure • Sequential search in a table Plan operators • Search via an index structure (index table, bitlist) • Selection using several pointer lists where more than a single index Nested-loops & sort/merge join structure can be exploited • Search via a multi-dimensional index structure Hash join Joins on type- Projection spanning paths is typically performed in combination with sorting, selection, or join Distributed joins Modification Set operations • Updates are set-oriented in SQL, but restricted to a single table • INSERT, DELETE and UPDATE are directly mapped to the corresponding operations of the storage structures • “Automatic” execution of maintenance operations - to update access paths, - to guarantee clustering and reorganization etc. • Provisions for logging and recovery etc. © 2011 AG DBIS 11-3 Realization of DBS Plan Operators for the Selection Use of Scan Operators • Definition of start- and stop condition Table • Definition of simple search arguments operations Plan operators Plan operators 1. Table scan (relation scan) - Always possible Nested-loops & sort/merge join - SCAN operator implements selection operation 2. Index scan Hash join - Selection of most cost-effective index - Specification of search range (start-, stop condition) Joins on type- 3. k-d scan spanning paths - Evaluation of multi-dimensional search criteria - Use of differing evaluation directions by navigation Distributed joins 4. TID algorithm - Evaluation of all “useable" index structures Set operations - Location of TID lists of variable lengths - Boolean connection of the lists - Access to the records according to the hit list (result list) Further plan operators in combination with selection • Sorting • Grouping (see sort operator) • Special operators e.g. in Data-Warehouse applications for grouping and © 2011 AG DBIS aggregation (CUBE operator) 11-4 Realization of DBS Operators Across Several Tables SQL allows complex queries across k tables • One-variable expressions: Table describe conditions for the selection of elements from a table operations • Two-variable expressions: describe conditions for the combination of elements from two tables Plan operators • Typically, k-variable expressions are decomposed into one- and two-variable expressidltdbdiltions and evaluated by corresponding plan operators Nested-loops & sort/merge join Plan operators across several tables Hash join • General ways for the evaluation: - Nested iteration Joins on type- for each element of outer table T spanning paths o traversal of inner table Ti • O(No · Ni + No) Distributed joins • important application: nested-loops join - Merge method Set operations iterating traversals through T1, T2 • O(N1 + N2) • additional sort costs, if necessary • important application: merging join - Hashing Partitioning of inner table Ti and partition-wise loading in HT in memory. “Probing” by outer table To or its © 2011 AG DBIS resp. partitions using HT: O(p · No + Ni) 11-5 Realization of DBS Operators Across Several Tables (2) n-way joins • Decomposition into n-1 two-way joins2 Table operations • Number of possible join sequences is dependent on the join attributes chosen • Maximal n! different sequences possible Plan operators • Use of pipelining techniques • OiOptimal eval uati on sequence d epend ent on Nested-loops & sort/merge join - Plan operators - “Fitting” sort orders for join attributes Hash join - Size of operands etc. Joins on type- Some join sequences using two-way joins (n=5) spanning paths result result Distributed joins result Set operations T5 T2 T4 T5 T4 T3 T5 T1 T2 T1 T2 T3 T4 T3 T1 left-deep tree bushy tree right-deep tree Analogous proceeding in case of set operations © 2011 AG DBIS 2. Practicality test (Guy Lohman test for join techniques): Does a new technique apply to joining three inputs without 11-6 interrupting data flow between the join operators? Realization of DBS Plan Operators for the Join Join • Record-type-spanning operation: usually very expensive Table • Frequent use: important optimization candidate operations • Typical application: equi-join • General Θ-join infrequent Plan operators Imppjplementation of the join operation can process, at the same time, selections (and projections) on the participating tables R and S Nested-loops & sort/merge join SELECT * FROM R, S Hash join WHERE R.JA Θ S.JA AND PR AND P Joins on type- S spanning paths • JA: join attribute • PR and PS: predicates defined on selection attributes (SA) of R and S Distributed joins Possible access paths Set operations • Scans over R and S (always) • Scans over IR(JA), IS(JA) (if present) deliver sort sequence according to JA • Scans over IR(SA), IS(SA) (if present) if necessary, fast selection for PR and PS • Scans over other index structures (if present) if necessary, faster location of all records © 2011 AG DBIS 11-7 Realization of DBS Nested-Loops Join Assumptions • Records in R and S are not ordered according to join attributes Table • Index structures IR(JA) and IS(JA) do not exist operations Algorithm for Θ-join Scan over S, Plan operators for each record s, if PS: scan over R, Nested-loops & for each record r, if PR AND (r.JA Θ s.JA): sort/merge join execute join, i.e., write combined record (r, s) into the result set. Hash join Complexity: O(N*M) Joins on type- Nested-loops join using index access spanning paths Scan over S, for each record s, if PS: determine via access to IR(JA) all TIDs for records satisfying r.JA = s.JA, Distributed joins for each TID: fetch record r, if PR: write combdbined recor d()d (r, s ) into th e resul t set. Set operations Nested-block join Scan over S, for each page (resp. set of contiguous pages) of S: scan over R, for each page (resp. set of contiguous pages) of R: for each record s of the S-page, if PS: for each record r of the R-page, if PR AND (r.JA Θ s.JA): write combined record (r, s) into the result set. © 2011 AG DBIS 11-8 Realization of DBS Sort-Merge Join Algorithm consists of 2 phases • Phase 1: Sorting of R and S w.r.t R(JA) and S(JA) (if not already present); Table in doing so, early elimination of records not needed ( P , P ) operations R S • Phase 2: Iterating scans over sorted R- and S-records Plan operators where join is performed in case of r.JA = s.JA Complexity: O(N log N) Nested-loops & sort/merge join Special case Hash join If either IR(JA) and IS(JA) or GAPS over R(JA) and S(JA) (join index) is present: exploitation of index structures on join attributes Joins on type- Iterating scans over I (JA) and I (JA): spanning paths R S for each with two keys from IR(JA) and IS(JA), if r.JA = s.JA: fetch the records using the related TIDs, Distributed joins if PR and PS: write combined record (r , s) into the result set Set operations © 2011 AG DBIS 11-9 Realization of DBS Hash Join Simplest case (classic hashing) • Step 1: Partitioned read of (smaller) table R and construction of a hash Table operations table using hH(r(JA)) w.r.t. values of R(JA) of partitions Ri (1 i p): each partition fits into the available memory and each record satisfies PR Plan operators • Step 2: Probing for records of S using PS; if successful, execution of join Nested-loops & • Step 3: Repeat steps 1 and 2 as long as R is exhausted sort/merge join Construction of hash tables and probing Hash join Scan over R; building hash tables Hi (1 i p) one at a time in memory Joins on type- H spanning paths R 1 Scan over S with probing of S Distributed joins H1 . Set operations H R p Scan over S with probing of S HP Complexity: O(p · N) Special case R fits into memory: one partition (p = 1) 11-10 © 2011 AG DBIS a single scan over S is sufficient! Realization of DBS Hash Join (2) Partitioning of R with hp(r(JA)) Table operations #records / JA-value Plan operators Nested-loops & sort/merge join JA 0 100 Hash join #records / JA’-value hp(r(JA)) Joins on type- spanning paths JA’ Distributed joins 0 0.33 0.66 1 R R R Set operations 1 2 3 © 2011 AG DBIS 11-11 Realization of DBS Hash Join (3) Partitioning Table • Partitioning of R in subsets R1, R2, ..., Rp: operations a record r of R is in Ri, if h(r) is in Hi Plan operators R Nested-loops & sort/merge join Hash join Joins on type- .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us