Query Optimization

Query Optimization

CMU SCS CMU SCS Today’s Class Carnegie Mellon Univ. • History & Background Dept. of Computer Science • Relational Algebra Equivalences 15-415/615 - DB Applications • Plan Cost Estimation • Plan Enumeration C. Faloutsos – A. Pavlo Lecture#15: Query Optimization Faloutsos/Pavlo CMU SCS 15-415/615 2 CMU SCS CMU SCS Query Optimization 1970s – Relational Model • Remember that SQL is declarative. • Ted Codd saw the maintenance – User tells the DBMS what answer they want, overhead for IMS/Codasyl. not how to get the answer. • Proposed database abstraction based • There can be a big difference in on relations: Codd performance based on plan is used: – Store database in simple data structures. – See last week: 5.7 days vs. 45 seconds – Access it through high-level language. – Physical storage left up to implementation. Faloutsos/Pavlo CMU SCS 15-415/615 3 Faloutsos/Pavlo CMU SCS 15-415/615 4 CMU SCS CMU SCS IBM System R IBM System R • Skunkworks project at IBM Research in • First implementation of a query optimizer. San Jose to implement Codd’s ideas. • People argued that the DBMS could never • Had to figure out all of the things that we choose a query plan better than what a are discussing in this course themselves. human could write. • IBM never commercialized System R. • A lot of the concepts from System R’s optimizer are still used today. Faloutsos/Pavlo CMU SCS 15-415/615 5 Faloutsos/Pavlo CMU SCS 15-415/615 6 CMU SCS CMU SCS Today’s Class Relational Algebra Equivalences • History & Background • A query can be expressed in different • Relational Algebra Equivalences ways. • Plan Cost Estimation • The optimizer considers variations and • Plan Enumeration choose the one with the lowest cost. • Nested Sub-queries • How do we know whether two queries are equivalent? Faloutsos/Pavlo CMU SCS 15-415/615 7 Faloutsos/Pavlo CMU SCS 15-415/615 8 CMU SCS CMU SCS Relational Algebra Equivalences Predicate Pushdown SELECT name, cid • Two relational algebra expressions are FROM student, enrolled equivalent if they generate the same set of WHERE student.sid = enrolled.sid tuples. AND enrolled.grade = ‘A’ π name, cid π name, cid sid=sid σ grade=‘A’ sid=sid σ grade=‘A’ ⨝ student ⨝ enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 9 Faloutsos/Pavlo CMU SCS 15-415/615 10 CMU SCS CMU SCS Relational Algebra Equivalences Relational Algebra Equivalences SELECT name, cid FROM student, enrolled • Selections: WHERE student.sid = – Perform them early enrolled.sid AND enrolled.grade = ‘A’ – Break a complex predicate, and push down σp1 p2 …pn(R) = σp1(σp2(σ…pn(R))…) πname, cid(σgrade=‘A’(student enrolled)) ∧ ∧ • Simplify a complex predicate = ⋈ – (X=Y AND Y=3) → X=3 AND Y=3 π student σ (enrolled name, cid( ( grade=‘A’ ))) Faloutsos/Pavlo CMU⋈ SCS 15-415/615 11 Faloutsos/Pavlo CMU SCS 15-415/615 12 CMU SCS CMU SCS Relational Algebra Equivalences Projection Pushdown SELECT name, cid • Projections: FROM student, enrolled – Perform them early WHERE student.sid = enrolled.sid • Smaller tuples AND enrolled.grade = ‘A’ • Fewer tuples (if duplicates are eliminated) – Project out all attributes except the ones π name, cid π name, cid requested or required (e.g., joining attr.) sid=sid σ grade=‘A’ sid=sid σ grade=‘A’ • This is not important for a column store… ⨝ student ⨝ enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 13 Faloutsos/Pavlo CMU SCS 15-415/615 14 CMU SCS CMU SCS Projection Pushdown Relational Algebra Equivalences SELECT name, cid FROM student, enrolled • Joins: WHERE student.sid = – Commutative, associative enrolled.sid AND enrolled.grade = ‘A’ R S = S R π name, cid π name, cid (R S) T = R (S T) ⋈ ⋈ σ grade=‘A’ sid=sid • Q: How many⋈ different⋈ orderings⋈ ⋈ are there sid=sid π sid, cid π sid, name for an n-way join? ⨝ σ grade=‘A’ student ⨝ enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 14 Faloutsos/Pavlo CMU SCS 15-415/615 15 CMU SCS CMU SCS Relational Algebra Equivalences Today’s Class • Joins: How many different orderings are • History & Background there for an n-way join? • Relational Algebra Equivalences • A: Catalan number ~ 4n • Plan Cost Estimation – Exhaustive enumeration: too slow. • Plan Enumeration • We’ll see in a second how an optimizer limits the search space... Faloutsos/Pavlo CMU SCS 15-415/615 16 Faloutsos/Pavlo CMU SCS 15-415/615 17 CMU SCS CMU SCS Cost Estimation Cost Estimation – Statistics SR • How long will a query take? • For each relation R we keep: #1 #2 – CPU: Small cost; tough to estimate – NR → # tuples #3 – Disk: # of block transfers – SR → size of tuple in bytes – Memory: Amount of DRAM used – V(A,R) → # of distinct values – Network: # of messages of attribute ‘A’ • How many tuples will be read/written? … • What statistics do we need to keep? #NR Faloutsos/Pavlo CMU SCS 15-415/615 18 Faloutsos/Pavlo CMU SCS 15-415/615 19 CMU SCS CMU SCS Derivable Statistics Derivable Statistics SR • FR → max# records/block • SC(A,R) → Selection Cardinality #1 avg# of records with A=given • BR → # blocks FR → NR / V(A,R) • SC(A,R) → selection cardinality #2 • Note that this assumes data uniformity avg# of records with A=given #3 – 10,000 students, 10 colleges – how many … students in SCS? #BR Faloutsos/Pavlo CMU SCS 15-415/615 20 Faloutsos/Pavlo CMU SCS 15-415/615 21 CMU SCS CMU SCS Additional Statistics Statistics • For index i: • Where do we store them? HT – Fi → average fanout (~50-100) i • How often do we update them? – HTi → # levels of index i (~2-3) ~ log(#entries)/log(F ) i • Manual invocations: – LBi # → blocks at leaf level – Postgres/SQLite: ANALYZE – MySQL: ANALYZE TABLE Faloutsos/Pavlo CMU SCS 15-415/615 22 Faloutsos/Pavlo CMU SCS 15-415/615 23 CMU SCS CMU SCS Selection Statistics Selections – Complex Predicates • We saw simple predicates (name=“Kayne”) • Selectivity sel(P) of predicate P: • How about more complex predicates, like == fraction of tuples that qualify – salary > 10000 • Formula depends on type of predicate. – age=30 AND jobTitle=“Costermonger” – Equality • What is their selectivity? – Range – Negation – Conjunction – Disjunction Faloutsos/Pavlo CMU SCS 15-415/615 24 Faloutsos/Pavlo CMU SCS 15-415/615 25 CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Selectivity sel(P) of predicate P: • Assume that V(rating, sailors) has 5 == fraction of tuples that qualify distinct values (0–4) and NR = 5 • Formula depends on type of predicate. • Equality Predicate: A=constant – Equality – sel(A=constant) = SC(P) / V(A,R) – Range – Example: sel(rating=‘2’) = – Negation – Conjunction – Disjunction Faloutsos/Pavlo CMU SCS 15-415/615 25 26 CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Assume that V(rating, sailors) has 5 • Assume that V(rating, sailors) has 5 distinct values (0–4) and NR = 5 distinct values (0–4) and NR = 5 • Equality Predicate: A=constant • Equality Predicate: A=constant – sel(A=constant) = SC(P) / V(A,R) – sel(A=constant) = SC(P) / V(A,R) – Example: sel(rating=‘2’) = – Example: sel(rating=‘2’) = count count V(rating,R)=5 0 1 2 3 4 0 1 2 3 4 26 26 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Assume that V(rating, sailors) has 5 • Assume that V(rating, sailors) has 5 distinct values (0–4) and NR = 5 distinct values (0–4) and NR = 5 • Equality Predicate: A=constant • Equality Predicate: A=constant – sel(A=constant) = SC(P) / V(A,R) – sel(A=constant) = SC(P) / V(A,R) – Example: sel(rating=‘2’) = – Example: sel(rating=‘2’) = 1/5 SC(rating=‘2’)=1 SC(rating=‘2’)=1 count V(rating,R)=5 count V(rating,R)=5 0 1 2 3 4 0 1 2 3 4 26 26 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Range Query: • Range Query: – sel(A>a) = (Amax – a) / (Amax – Amin) – sel(A>a) = (Amax – a) / (Amax – Amin) – Example: sel(rating >= ‘2’) – Example: sel(rating >= ‘2’) = (4 – 2) / (4 – 0) = 1/2 ratingmin = 0 ratingmax = 4 count count 0 1 2 3 4 0 1 2 3 4 27 27 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Negation Query • Negation Query – sel(not P) = 1 – sel(P) – sel(not P) = 1 – sel(P) – Example: sel(rating != ‘2’) – Example: sel(rating != ‘2’) SC(rating=‘2’)=1 count count 0 1 2 3 4 0 1 2 3 4 28 28 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Negation Query • Negation Query – sel(not P) = 1 – sel(P) – sel(not P) = 1 – sel(P) – Example: sel(rating != ‘2’) – Example: sel(rating != ‘2’) = 1 – (1/5) = 4/5 • Observation: selectivity ≈ probability SC(rating!=‘2’)=2 SC(rating!=‘2’)=2 SC(rating!=‘2’)=2 SC(rating!=‘2’)=2 count count 0 1 2 3 4 0 1 2 3 4 28 28 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Conjunction: • Disjunction: – sel(rating = ‘2’ AND name LIKE ‘C%’) – sel(rating = ‘2’ OR name LIKE ‘C%’) – sel(P P ) – sel(P1 P2) = sel(P1) · sel(P2) 1 2 = sel(P ) + sel(P ) – sel(P P ) – INDEPENDENCE ASSUMPTION 1 2 1 2 ⋀ = sel⋁(P1) + sel(P2) – sel(P1) · sel(P2) – INDEPENDENCE ASSUMPTION,⋁ again P1 P2 Faloutsos/Pavlo CMU SCS 15-415/615 29 Faloutsos/Pavlo CMU SCS 15-415/615 30 CMU SCS CMU SCS Selections – Complex Predicates Joins • Disjunction, in general: • Q: Given a join of R and S, what is the – sel(P1 OR P2 OR … Pn) = range of possible result sizes in #of tuples? – 1 - (1- sel(P1) ) · (1 - sel(P2) ) · … (1 - sel(Pn)) P1 P2 Faloutsos/Pavlo CMU SCS 15-415/615 31 Faloutsos/Pavlo CMU SCS 15-415/615 32 CMU SCS CMU SCS Result Size Estimation for Joins Result Size Estimation for Joins • General case: Rcols Scols = {A} where A • General case: Rcols Scols = {A} where A is not a key for either table.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    31 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us