Logical Plan 1 Example Query

Review: Query Evaluation Steps SQL query CSE 444: Database Internals Parse & Rewrite Query Logical Select Logical Plan Query plan optimization Lecture 10 Select Physical Plan Query Optimization (part 1) Physical plan Query Execution Magda Balazinska - CSE 444, Spring 2012 1 Disk 2 What We Already Know… Example Query: Logical Plan 1 Supplier(sno,sname,scity,sstate) π Part(pno,pname,psize,pcolor) sname Supply(sno,pno,price) For each SQL query…. σ sscity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2 SELECT S.sname FROM Supplier S, Supply U WHERE S.scity='Seattle' AND S.sstate='WA’ AND S.sno = U.sno sno = sno AND U.pno = 2 There exist many logical query plan… Supplier Supply Magda Balazinska - CSE 444, Spring 2012 3 Magda Balazinska - CSE 444, Spring 2012 4 Example Query: Logical Plan 2 What We Also Know • For each logical plan… π sname • There exist many physical plans sno = sno σ sscity=‘Seattle’ ∧sstate=‘WA’ σ pno=2 Supplier Supply Magda Balazinska - CSE 444, Spring 2012 5 Magda Balazinska - CSE 444, Spring 2012 6 1 Example Query: Physical Plan 1 Example Query: Physical Plan 2 (On the fly) π sname (On the fly) π sname (On the fly) (On the fly) σ scity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2 σ scity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2 (Nested loop) (Index nested loop) sno = sno sno = sno Supplier Supply Supplier Supply (File scan) (File scan) (File scan) (Index scan) Magda Balazinska - CSE 444, Spring 2012 7 Magda Balazinska - CSE 444, Spring 2012 8 Query Optimizer Overview Today • Input: A logical query plan Estimating the cost of a query plan • Output: A good physical query plan • Basic query optimization algorithm – Enumerate alternative plans (logical and physical) – Compute estimated cost of each plan • Compute number of I/Os • Optionally take into account other resources – Choose plan with lowest cost – This is called cost-based optimization Magda Balazinska - CSE 444, Spring 2012 9 Magda Balazinska - CSE 444, Spring 2012 10 Estimating Cost of a Query Plan Access Path • We already know how to • Access path: a way to retrieve tuples from a table – Compute the cost of different operations – A file scan • In terms of number IOs – An index plus a matching selection condition • Normally should also consider CPU and network • We still need to • Index matches selection condition if it can be used – Compute cost of retrieving tuples from disk to retrieve just tuples that satisfy the condition • We can use different access paths – Example: Supplier(sid,sname,scity,sstate) • Queries have more sophisticated predicates than equality – B+-tree index on (scity,sstate) – Compute cost of a complete plan • matches scity=‘Seattle’ • does not match sid=3, does not match sstate=‘WA’ Magda Balazinska - CSE 444, Spring 2012 11 Magda Balazinska - CSE 444, Spring 2012 12 2 Access Path Selection Access Path Selectivity • Supplier(sid,sname,scity,sstate) • Access path selectivity is the number of pages retrieved if we use this access path • Selection condition: sid > 300 ∧ scity=‘Seattle’ – Most selective retrieves fewest pages • As we saw earlier, for equality predicates • Indexes: B+-tree on sid and B+-tree on scity – Selection on equality: σa=v(R) – V(R, a) = # of distinct values of attribute a – 1/V(R,a) is thus the reduction factor • Which access path should we use? – Clustered index on a: cost B(R)/V(R,a) – Unclustered index on a: cost T(R)/V(R,a) • We should pick the most selective access path – (we are ignoring I/O cost of index pages for simplicity) Magda Balazinska - CSE 444, Spring 2012 13 Magda Balazinska - CSE 444, Spring 2012 14 Selectivity for Range Predicates Back to Our Example • Selection condition: sid > 300 ∧ scity=‘Seattle’ Selection on range: σa>v(R) – Index I1: B+-tree on sid clustered – Index I2: B+-tree on scity unclustered • How to compute the selectivity? • Let’s assume • Assume values are uniformly distributed – V(Supplier,scity) = 20 • Reduction factor X – Max(Supplier, sid) = 1000, Min(Supplier,sid)=1 • X = (Max(R,a) - v) / (Max(R,a) - Min(R,a)) – B(Supplier) = 100, T(Supplier) = 1000 • Clustered index on a: cost B(R)*X • Cost I1: B(R) * (Max-v)/(Max-Min) = 100*700/999 ≈ 70 • Unclustered index on a: cost T(R)*X • Cost I2: T(R) * 1/V(Supplier,scity) = 1000/20 = 50 Magda Balazinska - CSE 444, Spring 2012 15 Magda Balazinska - CSE 444, Spring 2012 16 Selectivity with Back to Estimating Multiple Conditions Cost of a Query Plan What if we have an index on multiple attributes? • We already know how to • Ex: selection σa=v1 ∧ b= v2(R) and index on <a,b> – Compute the cost of different operations (last week) – Compute cost of retrieving tuples from disk with How to compute the selectivity? different access paths • Assume attributes are independent • X = 1 / (V(R,a) * V(R,b)) • We still need to – Compute cost of a complete plan • Clustered index on <a,b>: cost B(R)*X • Unclustered index on <a,b>: cost T(R)*X Magda Balazinska - CSE 444, Spring 2012 17 Magda Balazinska - CSE 444, Spring 2012 18 3 Computing the Cost of a Plan Projection: Cardinality of Result • Estimate cardinality in a bottom-up fashion • Output cardinality same as input cardinality – Cardinality is the size of a relation (nb of tuples) – Same number of input tuples – Compute size of all intermediate relations in plan – But tuples are smaller! • Estimate cost by using the estimated cardinalities • We learned how to compute the cost last week • But how do we compute cardinalities? Magda Balazinska - CSE 444, Spring 2012 19 Magda Balazinska - CSE 444, Spring 2012 20 Selection: Cardinality of Result Join: Cardinality of Result • Multiply input cardinality by reduction factor • For joins R ⋈ S – Similar to estimating access path selectivity – In fact, we also say operator/predicate selectivity – Take product of cardinalities of relations R and S – Condition is a = c /* value selection on R */ – Apply reduction factors for each term in join condition • Selectivity = 1/V(R,a) – Condition is a < v /* range selection on R */ – Terms are of the form: column1 = column2 • Selectivity = (v - Min(R, a))/(Max(R,a) - Min(R,a)) – Reduction: 1/ ( MAX( V(R,column1), V(S,column2)) – Multiple conditions: assume independence • Assumes each value in smaller set has a matching value • Use product of the reduction factors for the terms in the larger set (more on next slide) • Condition is a=v1 ∧ b= v2 • Selectivity = 1 / (V(R,a) * V(R,b)) Magda Balazinska - CSE 444, Spring 2012 21 Magda Balazinska - CSE 444, Spring 2012 22 Assumptions Selectivity of R ⨝A=B S • Containment of values: if V(R,A) <= V(S,B), then Assume V(R,A) <= V(S,B) the set of A values of R is included in the set of • Each tuple t in R joins with T(S)/V(S,B) tuple(s) in S B values of S – Note: this indeed holds when A is a foreign key in R, • Hence T(R ⨝A=B S) = T(R) T(S) / V(S,B) and B is a key in S In general: T(R ⨝A=B S) = T(R) T(S) / max(V(R,A),V(S,B)) • Preservation of values: for any other attribute C, V(R ⨝A=B S, C) = V(R, C) (or V(S, C)) Magda Balazinska - CSE 444, Spring 2012 23 Magda Balazinska - CSE 444, Spring 2012 24 4 T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 11 T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10 Complete Example V(Supply,pno) = 2,500 Physical Query Plan 1 SELECT sname Supplier(sid, sname, scity, sstate) (On the fly) π sname Selection and project on-the-fly FROM Supplier x, Supply y Supply(sid, pno, quantity) -> No additional cost. WHERE x.sid = y.sid (On the fly) • Some statistics and y.pno = 2 – T(Supplier) = 1000 records and x.scity = ‘Seattle’ σ scity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2 and x.sstate = ‘WA’ – T(Supply) = 10,000 records Total cost of plan is thus cost of join: – B(Supplier) = 100 pages = B(Supplier)+B(Supplier)*B(Supply)/(M-1) – B(Supply) = 100 pages (Nested loop) sno = sno = 100 + 100 * 100/10 – V(Supplier,scity) = 20, V(Suppliers,state) = 10 = 1,100 I/Os – V(Supply,pno) = 2,500 – Both relations are clustered Supplier Supply • M = 11 (File scan) (File scan) Magda Balazinska - CSE 444, Spring 2012 25 Magda Balazinska - CSE 444, Spring 2012 26 T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 11 M = 11 T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10 V(Supply,pno) = 2,500 Plan 2 with Different Numbers Physical Query Plan 2 Total cost Total cost (On the fly) π sname (d) What if we had: π sname (d) = 100 + 100 * 1/20 * 1/10 (a) = 10000 + 50 (a) 10K pages of Supplier + 100 + 100 * 1/2500 (b) + 10000 + 4 (b) 10K pages of Supply + 2 (c) + 3*50 + 4 (c) (Sort-merge join) (c) (Sort-merge join) (c) sno = sno + 0 (d) sno = sno + 0 (d) (Scan Total cost ≈ 204 I/Os (Scan Total cost ≈ 20,208 I/Os write to T1) (Scan write to T1) (Scan (a) σ write to T2) (a) σ write to T2) scity=‘Seattle’ ∧sstate=‘WA’ (b) σ pno=2 scity=‘Seattle’ ∧sstate=‘WA’ (b) σ pno=2 Need to do a two- Supplier Supply Supplier Supply pass sort algorithm (File scan) (File scan) (File scan) (File scan) Magda Balazinska - CSE 444, Spring 2012 27 Magda Balazinska - CSE 444, Spring 2012 28 T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 11 T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10 V(Supply,pno) = 2,500 Simplifications Physical Query Plan 3 (On the fly) d Total cost ( ) π sname • In the previous examples, we assumed that all = 1 (a) (On the fly) + 4 (b) index pages were in memory (c) σ scity=‘Seattle’ ∧sstate=‘WA’ + 0 (c) + 0 (d) • When this is not the case, we need to add the Total cost ≈ 5 I/Os (b) cost of fetching index pages from disk sno = sno (Index nested loop) (Use hash index) 4 tuples (a) σ pno=2 Supply Supplier (Hash index on pno ) (Hash index on sno) 29 Magda

Logical Plan 1 Example Query

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support