Query Optimization
Total Page:16
File Type:pdf, Size:1020Kb
CMU SCS CMU SCS Today’s Class Carnegie Mellon Univ. • History & Background Dept. of Computer Science • Relational Algebra Equivalences 15-415/615 - DB Applications • Plan Cost Estimation • Plan Enumeration C. Faloutsos – A. Pavlo Lecture#15: Query Optimization Faloutsos/Pavlo CMU SCS 15-415/615 2 CMU SCS CMU SCS Query Optimization 1970s – Relational Model • Remember that SQL is declarative. • Ted Codd saw the maintenance – User tells the DBMS what answer they want, overhead for IMS/Codasyl. not how to get the answer. • Proposed database abstraction based • There can be a big difference in on relations: Codd performance based on plan is used: – Store database in simple data structures. – See last week: 5.7 days vs. 45 seconds – Access it through high-level language. – Physical storage left up to implementation. Faloutsos/Pavlo CMU SCS 15-415/615 3 Faloutsos/Pavlo CMU SCS 15-415/615 4 CMU SCS CMU SCS IBM System R IBM System R • Skunkworks project at IBM Research in • First implementation of a query optimizer. San Jose to implement Codd’s ideas. • People argued that the DBMS could never • Had to figure out all of the things that we choose a query plan better than what a are discussing in this course themselves. human could write. • IBM never commercialized System R. • A lot of the concepts from System R’s optimizer are still used today. Faloutsos/Pavlo CMU SCS 15-415/615 5 Faloutsos/Pavlo CMU SCS 15-415/615 6 CMU SCS CMU SCS Today’s Class Relational Algebra Equivalences • History & Background • A query can be expressed in different • Relational Algebra Equivalences ways. • Plan Cost Estimation • The optimizer considers variations and • Plan Enumeration choose the one with the lowest cost. • Nested Sub-queries • How do we know whether two queries are equivalent? Faloutsos/Pavlo CMU SCS 15-415/615 7 Faloutsos/Pavlo CMU SCS 15-415/615 8 CMU SCS CMU SCS Relational Algebra Equivalences Predicate Pushdown SELECT name, cid • Two relational algebra expressions are FROM student, enrolled equivalent if they generate the same set of WHERE student.sid = enrolled.sid tuples. AND enrolled.grade = ‘A’ π name, cid π name, cid sid=sid σ grade=‘A’ sid=sid σ grade=‘A’ ⨝ student ⨝ enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 9 Faloutsos/Pavlo CMU SCS 15-415/615 10 CMU SCS CMU SCS Relational Algebra Equivalences Relational Algebra Equivalences SELECT name, cid FROM student, enrolled • Selections: WHERE student.sid = – Perform them early enrolled.sid AND enrolled.grade = ‘A’ – Break a complex predicate, and push down σp1 p2 …pn(R) = σp1(σp2(σ…pn(R))…) πname, cid(σgrade=‘A’(student enrolled)) ∧ ∧ • Simplify a complex predicate = ⋈ – (X=Y AND Y=3) → X=3 AND Y=3 π student σ (enrolled name, cid( ( grade=‘A’ ))) Faloutsos/Pavlo CMU⋈ SCS 15-415/615 11 Faloutsos/Pavlo CMU SCS 15-415/615 12 CMU SCS CMU SCS Relational Algebra Equivalences Projection Pushdown SELECT name, cid • Projections: FROM student, enrolled – Perform them early WHERE student.sid = enrolled.sid • Smaller tuples AND enrolled.grade = ‘A’ • Fewer tuples (if duplicates are eliminated) – Project out all attributes except the ones π name, cid π name, cid requested or required (e.g., joining attr.) sid=sid σ grade=‘A’ sid=sid σ grade=‘A’ • This is not important for a column store… ⨝ student ⨝ enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 13 Faloutsos/Pavlo CMU SCS 15-415/615 14 CMU SCS CMU SCS Projection Pushdown Relational Algebra Equivalences SELECT name, cid FROM student, enrolled • Joins: WHERE student.sid = – Commutative, associative enrolled.sid AND enrolled.grade = ‘A’ R S = S R π name, cid π name, cid (R S) T = R (S T) ⋈ ⋈ σ grade=‘A’ sid=sid • Q: How many⋈ different⋈ orderings⋈ ⋈ are there sid=sid π sid, cid π sid, name for an n-way join? ⨝ σ grade=‘A’ student ⨝ enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 14 Faloutsos/Pavlo CMU SCS 15-415/615 15 CMU SCS CMU SCS Relational Algebra Equivalences Today’s Class • Joins: How many different orderings are • History & Background there for an n-way join? • Relational Algebra Equivalences • A: Catalan number ~ 4n • Plan Cost Estimation – Exhaustive enumeration: too slow. • Plan Enumeration • We’ll see in a second how an optimizer limits the search space... Faloutsos/Pavlo CMU SCS 15-415/615 16 Faloutsos/Pavlo CMU SCS 15-415/615 17 CMU SCS CMU SCS Cost Estimation Cost Estimation – Statistics SR • How long will a query take? • For each relation R we keep: #1 #2 – CPU: Small cost; tough to estimate – NR → # tuples #3 – Disk: # of block transfers – SR → size of tuple in bytes – Memory: Amount of DRAM used – V(A,R) → # of distinct values – Network: # of messages of attribute ‘A’ • How many tuples will be read/written? … • What statistics do we need to keep? #NR Faloutsos/Pavlo CMU SCS 15-415/615 18 Faloutsos/Pavlo CMU SCS 15-415/615 19 CMU SCS CMU SCS Derivable Statistics Derivable Statistics SR • FR → max# records/block • SC(A,R) → Selection Cardinality #1 avg# of records with A=given • BR → # blocks FR → NR / V(A,R) • SC(A,R) → selection cardinality #2 • Note that this assumes data uniformity avg# of records with A=given #3 – 10,000 students, 10 colleges – how many … students in SCS? #BR Faloutsos/Pavlo CMU SCS 15-415/615 20 Faloutsos/Pavlo CMU SCS 15-415/615 21 CMU SCS CMU SCS Additional Statistics Statistics • For index i: • Where do we store them? HT – Fi → average fanout (~50-100) i • How often do we update them? – HTi → # levels of index i (~2-3) ~ log(#entries)/log(F ) i • Manual invocations: – LBi # → blocks at leaf level – Postgres/SQLite: ANALYZE – MySQL: ANALYZE TABLE Faloutsos/Pavlo CMU SCS 15-415/615 22 Faloutsos/Pavlo CMU SCS 15-415/615 23 CMU SCS CMU SCS Selection Statistics Selections – Complex Predicates • We saw simple predicates (name=“Kayne”) • Selectivity sel(P) of predicate P: • How about more complex predicates, like == fraction of tuples that qualify – salary > 10000 • Formula depends on type of predicate. – age=30 AND jobTitle=“Costermonger” – Equality • What is their selectivity? – Range – Negation – Conjunction – Disjunction Faloutsos/Pavlo CMU SCS 15-415/615 24 Faloutsos/Pavlo CMU SCS 15-415/615 25 CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Selectivity sel(P) of predicate P: • Assume that V(rating, sailors) has 5 == fraction of tuples that qualify distinct values (0–4) and NR = 5 • Formula depends on type of predicate. • Equality Predicate: A=constant – Equality – sel(A=constant) = SC(P) / V(A,R) – Range – Example: sel(rating=‘2’) = – Negation – Conjunction – Disjunction Faloutsos/Pavlo CMU SCS 15-415/615 25 26 CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Assume that V(rating, sailors) has 5 • Assume that V(rating, sailors) has 5 distinct values (0–4) and NR = 5 distinct values (0–4) and NR = 5 • Equality Predicate: A=constant • Equality Predicate: A=constant – sel(A=constant) = SC(P) / V(A,R) – sel(A=constant) = SC(P) / V(A,R) – Example: sel(rating=‘2’) = – Example: sel(rating=‘2’) = count count V(rating,R)=5 0 1 2 3 4 0 1 2 3 4 26 26 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Assume that V(rating, sailors) has 5 • Assume that V(rating, sailors) has 5 distinct values (0–4) and NR = 5 distinct values (0–4) and NR = 5 • Equality Predicate: A=constant • Equality Predicate: A=constant – sel(A=constant) = SC(P) / V(A,R) – sel(A=constant) = SC(P) / V(A,R) – Example: sel(rating=‘2’) = – Example: sel(rating=‘2’) = 1/5 SC(rating=‘2’)=1 SC(rating=‘2’)=1 count V(rating,R)=5 count V(rating,R)=5 0 1 2 3 4 0 1 2 3 4 26 26 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Range Query: • Range Query: – sel(A>a) = (Amax – a) / (Amax – Amin) – sel(A>a) = (Amax – a) / (Amax – Amin) – Example: sel(rating >= ‘2’) – Example: sel(rating >= ‘2’) = (4 – 2) / (4 – 0) = 1/2 ratingmin = 0 ratingmax = 4 count count 0 1 2 3 4 0 1 2 3 4 27 27 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Negation Query • Negation Query – sel(not P) = 1 – sel(P) – sel(not P) = 1 – sel(P) – Example: sel(rating != ‘2’) – Example: sel(rating != ‘2’) SC(rating=‘2’)=1 count count 0 1 2 3 4 0 1 2 3 4 28 28 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Negation Query • Negation Query – sel(not P) = 1 – sel(P) – sel(not P) = 1 – sel(P) – Example: sel(rating != ‘2’) – Example: sel(rating != ‘2’) = 1 – (1/5) = 4/5 • Observation: selectivity ≈ probability SC(rating!=‘2’)=2 SC(rating!=‘2’)=2 SC(rating!=‘2’)=2 SC(rating!=‘2’)=2 count count 0 1 2 3 4 0 1 2 3 4 28 28 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Conjunction: • Disjunction: – sel(rating = ‘2’ AND name LIKE ‘C%’) – sel(rating = ‘2’ OR name LIKE ‘C%’) – sel(P P ) – sel(P1 P2) = sel(P1) · sel(P2) 1 2 = sel(P ) + sel(P ) – sel(P P ) – INDEPENDENCE ASSUMPTION 1 2 1 2 ⋀ = sel⋁(P1) + sel(P2) – sel(P1) · sel(P2) – INDEPENDENCE ASSUMPTION,⋁ again P1 P2 Faloutsos/Pavlo CMU SCS 15-415/615 29 Faloutsos/Pavlo CMU SCS 15-415/615 30 CMU SCS CMU SCS Selections – Complex Predicates Joins • Disjunction, in general: • Q: Given a join of R and S, what is the – sel(P1 OR P2 OR … Pn) = range of possible result sizes in #of tuples? – 1 - (1- sel(P1) ) · (1 - sel(P2) ) · … (1 - sel(Pn)) P1 P2 Faloutsos/Pavlo CMU SCS 15-415/615 31 Faloutsos/Pavlo CMU SCS 15-415/615 32 CMU SCS CMU SCS Result Size Estimation for Joins Result Size Estimation for Joins • General case: Rcols Scols = {A} where A • General case: Rcols Scols = {A} where A is not a key for either table.